من توی یه شرکت فروش بلیط سینما کار میکنم به اسم gi...
Created on: October 15, 2025
Answered using GPT-5 Thinking by Chat01
Created on: October 15, 2025
Answered using GPT-5 Thinking by Chat01
من توی یه شرکت فروش بلیط سینما کار میکنم به اسم gisheh7 یه سازمان بزرگ وجود داره به اسم سمفا که به شرکت های مختلف مثل شرکت ما مجوز فروش بلیط میده. طبیعتا باید یسری اطلاعات ما مدام از سمفا بگیریم ینی یسری api باید بگیریم و بفرستیم. مثلا اینکه چه صندلی هاییی پره و خالیه و چی فروش رفت و چی کنسل شد و همه اینارو باید هم بدیم هم بگیریم. مثلا یه شرکت دیگه مثل سینما تیکت یکی فروش بلیطشو کنسل میکنه باید اون کنسلی برای ما هم بیاد ک بفهمیم فلان صندلی خالی شده. من باید یه داشبورد بزنم توی گرافانا برای مانتورینگ کارای سمفا. نمیدونم چه بخشایی باید داشته باشه چه چیزایی رو نشون بده و چه متریک هایی نیازه. خلاقیت و دفت میخواد دقیق بهم بگو چه پنل هایی به چه اسمی لازمه و هر پنل دقیقا چی بهمون نشون بده خوب و مفیده
این متریک های منه
Name Type Description
EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_count counter Count of events that have been observed for the base metric ( )
Select
EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_max gauge
Select
EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_sum counter Total sum of all observed values for the base metric ( )
Select
MELLAT_REFUND_REPORT_RESPONSE_seconds_count counter Count of events that have been observed for the base metric ( )
Select
MELLAT_REFUND_REPORT_RESPONSE_seconds_max gauge
Select
MELLAT_REFUND_REPORT_RESPONSE_seconds_sum counter Total sum of all observed values for the base metric ( )
Select
application_ready_time_seconds gauge Time taken for the application to be ready to service requests
Select
application_started_time_seconds gauge Time taken to start the application
Select
discount_response_time_seconds_count counter Count of events that have been observed for the base metric ( )
Select
discount_response_time_seconds_max gauge
Select
discount_response_time_seconds_sum counter Total sum of all observed values for the base metric ( )
Select
discount_usage_counter_total
Select
disk_free_bytes gauge Usable space for path
Select
disk_total_bytes gauge Total space for path
Select
executor_active_threads gauge The approximate number of threads that are actively executing tasks
Select
executor_completed_tasks_total
Select
executor_pool_core_threads gauge The core number of threads for the pool
Select
executor_pool_max_threads gauge The maximum allowed number of threads in the pool
Select
executor_pool_size_threads gauge The current number of threads in the pool
Select
executor_queue_remaining_tasks gauge The number of additional elements that this queue can ideally accept without blocking
Select
executor_queued_tasks gauge The approximate number of tasks that are queued for execution
Select
external_component_seconds_count counter Count of events that have been observed for the base metric (Duration of repository invocations)
Select
external_component_seconds_max gauge Duration of repository invocations
Select
external_component_seconds_sum counter Total sum of all observed values for the base metric (Duration of repository invocations)
Select
external_service_seconds_count counter Count of events that have been observed for the base metric ( )
Select
external_service_seconds_max gauge
Select
external_service_seconds_sum counter Total sum of all observed values for the base metric ( )
Select
get_daily_refunds_response_time_seconds_count counter Count of events that have been observed for the base metric ( )
Select
get_daily_refunds_response_time_seconds_max gauge
Select
get_daily_refunds_response_time_seconds_sum counter Total sum of all observed values for the base metric ( )
Select
hikaricp_connections gauge Total connections
Select
hikaricp_connections_acquire_seconds_count counter Count of events that have been observed for the base metric (Connection acquire time)
Select
hikaricp_connections_acquire_seconds_max gauge Connection acquire time
Select
hikaricp_connections_acquire_seconds_sum counter Total sum of all observed values for the base metric (Connection acquire time)
Select
hikaricp_connections_active gauge Active connections
Select
hikaricp_connections_creation_seconds_count counter Count of events that have been observed for the base metric (Connection creation time)
Select
hikaricp_connections_creation_seconds_max gauge Connection creation time
Select
hikaricp_connections_creation_seconds_sum counter Total sum of all observed values for the base metric (Connection creation time)
Select
hikaricp_connections_idle gauge Idle connections
Select
hikaricp_connections_max gauge Max connections
Select
hikaricp_connections_min gauge Min connections
Select
hikaricp_connections_pending gauge Pending threads
Select
hikaricp_connections_timeout_total
Select
hikaricp_connections_usage_seconds_count counter Count of events that have been observed for the base metric (Connection usage time)
Select
hikaricp_connections_usage_seconds_max gauge Connection usage time
Select
hikaricp_connections_usage_seconds_sum counter Total sum of all observed values for the base metric (Connection usage time)
Select
http_server_requests_active_seconds_active_count
Select
http_server_requests_active_seconds_bucket counter Cumulative counters for the observation buckets ( )
Select
http_server_requests_active_seconds_duration_sum
Select
http_server_requests_active_seconds_max gauge
Select
http_server_requests_seconds_bucket counter Cumulative counters for the observation buckets ( )
Select
http_server_requests_seconds_count counter (histogram) Count of events that have been observed for the histogram metric ( )
Select
http_server_requests_seconds_max gauge
Select
http_server_requests_seconds_sum counter (histogram) Total sum of all observed values for the histogram metric ( )
Select
invalid_refund_requests_total
Select
jdbc_connections_active gauge Current number of active connections that have been allocated from the data source.
Select
jdbc_connections_idle gauge Number of established but idle connections.
Select
jdbc_connections_max gauge Maximum number of active connections that can be allocated at the same time.
Select
jdbc_connections_min gauge Minimum number of idle connections in the pool.
Select
jvm_buffer_count_buffers gauge An estimate of the number of buffers in the pool
Select
jvm_buffer_memory_used_bytes gauge An estimate of the memory that the Java virtual machine is using for this buffer pool
Select
jvm_buffer_total_capacity_bytes gauge An estimate of the total capacity of the buffers in this pool
Select
jvm_classes_loaded_classes gauge The number of classes that are currently loaded in the Java virtual machine
Select
jvm_classes_unloaded_classes_total
Select
jvm_compilation_time_ms_total
Select
jvm_gc_concurrent_phase_time_seconds_count counter Count of events that have been observed for the base metric (Time spent in concurrent phase)
Select
jvm_gc_concurrent_phase_time_seconds_max gauge Time spent in concurrent phase
Select
jvm_gc_concurrent_phase_time_seconds_sum counter Total sum of all observed values for the base metric (Time spent in concurrent phase)
Select
jvm_gc_live_data_size_bytes gauge Size of long-lived heap memory pool after reclamation
Select
jvm_gc_max_data_size_bytes gauge Max size of long-lived heap memory pool
Select
jvm_gc_memory_allocated_bytes_total
Select
jvm_gc_memory_promoted_bytes_total
Select
jvm_gc_overhead gauge An approximation of the percent of CPU time used by GC activities over the last lookback period or since monitoring began, whichever is shorter, in the range [0..1]
Select
jvm_gc_overhead_percent gauge An approximation of the percent of CPU time used by GC activities over the last lookback period or since monitoring began, whichever is shorter, in the range [0..1]
Select
jvm_gc_pause_seconds_count counter Count of events that have been observed for the base metric (Time spent in GC pause)
Select
jvm_gc_pause_seconds_max gauge Time spent in GC pause
Select
jvm_gc_pause_seconds_sum counter Total sum of all observed values for the base metric (Time spent in GC pause)
Select
jvm_info gauge JVM version info
Select
jvm_memory_committed_bytes gauge The amount of memory in bytes that is committed for the Java virtual machine to use
Select
jvm_memory_max_bytes gauge The maximum amount of memory in bytes that can be used for memory management
Select
jvm_memory_usage_after_gc gauge The percentage of long-lived heap pool used after the last GC event, in the range [0..1]
Select
jvm_memory_usage_after_gc_percent gauge The percentage of long-lived heap pool used after the last GC event, in the range [0..1]
Select
jvm_memory_used_bytes gauge The amount of used memory
Select
jvm_threads_daemon_threads gauge The current number of live daemon threads
Select
jvm_threads_live_threads gauge The current number of live threads including both daemon and non-daemon threads
Select
jvm_threads_peak_threads gauge The peak live thread count since the Java virtual machine started or peak was reset
Select
jvm_threads_started_threads_total
Select
jvm_threads_states_threads gauge The current number of threads
Select
lettuce_command_completion_seconds_count counter Count of events that have been observed for the base metric (Latency between command send and command completion (complete response received)
Select
lettuce_command_completion_seconds_max gauge Latency between command send and command completion (complete response received
Select
lettuce_command_completion_seconds_sum counter Total sum of all observed values for the base metric (Latency between command send and command completion (complete response received)
Select
lettuce_command_firstresponse_seconds_count counter Count of events that have been observed for the base metric (Latency between command send and first response (first response received))
Select
lettuce_command_firstresponse_seconds_max gauge Latency between command send and first response (first response received)
Select
lettuce_command_firstresponse_seconds_sum counter Total sum of all observed values for the base metric (Latency between command send and first response (first response received))
Select
logback_events_total
Select
process_cpu_time_ns_total
Select
process_cpu_usage gauge The "recent cpu usage" for the Java Virtual Machine process
Select
process_files_max_files gauge The maximum file descriptor count
Select
process_files_open_files gauge The open file descriptor count
Select
process_start_time_seconds gauge Start time of the process since unix epoch.
Select
process_uptime_seconds
gauge
The uptime of the Java virtual machine
Select
refund_requests_total
Select
refund_response_time_seconds_count
Select
refund_response_time_seconds_max
Select
refund_response_time_seconds_sum
Select
reserve_ticket_response_time_seconds_count
counter
Count of events that have been observed for the base metric ( )
Select
reserve_ticket_response_time_seconds_max
gauge
Select
reserve_ticket_response_time_seconds_sum
counter
Total sum of all observed values for the base metric ( )
Select
scrape_duration_seconds
Select
scrape_samples_post_metric_relabeling
Select
scrape_samples_scraped
Select
scrape_series_added
Select
signup_count_total
Select
sms_kavenegar_total
Select
sms_niksms_total
Select
sms_sent_total
Select
spring_data_repository_invocations_seconds_count
counter
Count of events that have been observed for the base metric (Duration of repository invocations)
Select
spring_data_repository_invocations_seconds_max
gauge
Duration of repository invocations
Select
spring_data_repository_invocations_seconds_sum
counter
Total sum of all observed values for the base metric (Duration of repository invocations)
Select
spring_security_authorizations_active_seconds_active_count
Select
spring_security_authorizations_active_seconds_duration_sum
Select
spring_security_authorizations_active_seconds_max
gauge
Select
spring_security_authorizations_seconds_count
counter
Count of events that have been observed for the base metric ( )
Select
spring_security_authorizations_seconds_max
gauge
Select
spring_security_authorizations_seconds_sum
counter
Total sum of all observed values for the base metric ( )
Select
spring_security_filterchains_JwtAuthFilter_after_total
Select
spring_security_filterchains_JwtAuthFilter_before_total
Select
spring_security_filterchains_access_exceptions_after_total
Select
spring_security_filterchains_access_exceptions_before_total
Select
spring_security_filterchains_active_seconds_active_count
Select
spring_security_filterchains_active_seconds_duration_sum
Select
spring_security_filterchains_active_seconds_max
gauge
Select
spring_security_filterchains_authentication_anonymous_after_total
Select
spring_security_filterchains_authentication_anonymous_before_total
Select
spring_security_filterchains_authorization_after_total
Select
spring_security_filterchains_authorization_before_total
Select
spring_security_filterchains_context_async_after_total
Select
spring_security_filterchains_context_async_before_total
Select
spring_security_filterchains_context_holder_after_total
Select
spring_security_filterchains_context_holder_before_total
Select
spring_security_filterchains_context_servlet_after_total
Select
spring_security_filterchains_context_servlet_before_total
Select
spring_security_filterchains_cors_after_total
Select
spring_security_filterchains_cors_before_total
Select
spring_security_filterchains_header_after_total
Select
spring_security_filterchains_header_before_total
Select
spring_security_filterchains_logout_after_total
Select
spring_security_filterchains_logout_before_total
Select
spring_security_filterchains_requestcache_after_total
Select
spring_security_filterchains_requestcache_before_total
Select
spring_security_filterchains_seconds_count
counter
Count of events that have been observed for the base metric ( )
Select
spring_security_filterchains_seconds_max
gauge
Select
spring_security_filterchains_seconds_sum
counter
Total sum of all observed values for the base metric ( )
Select
spring_security_filterchains_session_management_after_total
Select
spring_security_filterchains_session_management_before_total
Select
spring_security_filterchains_session_urlencoding_after_total
Select
spring_security_filterchains_session_urlencoding_before_total
Select
spring_security_http_secured_requests_active_seconds_active_count
Select
spring_security_http_secured_requests_active_seconds_duration_sum
Select
spring_security_http_secured_requests_active_seconds_max
gauge
Select
spring_security_http_secured_requests_seconds_count
counter
Count of events that have been observed for the base metric ( )
Select
spring_security_http_secured_requests_seconds_max
gauge
Select
spring_security_http_secured_requests_seconds_sum
counter
Total sum of all observed values for the base metric ( )
Select
system_cpu_count
gauge
The number of processors available to the Java virtual machine
Select
system_cpu_usage
gauge
The "recent cpu usage" of the system the application is running in
Select
system_load_average_1m
gauge
The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
Select
tasks_scheduled_execution_active_seconds_active_count
Select
tasks_scheduled_execution_active_seconds_duration_sum
Select
tasks_scheduled_execution_active_seconds_max
gauge
Select
tasks_scheduled_execution_seconds_count
counter
Count of events that have been observed for the base metric ( )
Select
tasks_scheduled_execution_seconds_max
gauge
Select
tasks_scheduled_execution_seconds_sum
counter
Total sum of all observed values for the base metric ( )
Select
tickets_count_total
Select
tomcat_sessions_active_current_sessions
gauge
Select
tomcat_sessions_active_max_sessions
gauge
Select
tomcat_sessions_alive_max_seconds
gauge
Select
tomcat_sessions_created_sessions_total
Select
tomcat_sessions_expired_sessions_total
Select
tomcat_sessions_rejected_sessions_total
قشنگ بگو مرحله مرحله چیکار کنم و توی داشبوردم چه پنل هایی داشته باشم و هر پنل چی رو نشون بده و چه متریکی استفاده میکنه. یعنی کوئری هر پنل رو قشنگ بنویس. خلاصه کامل و جامع باشه و دقییق بهم بگو چیکار کنم. اسم پنل ها و همه چی هم بگو
خیلی خوب—بیایم یک «داشبورد مانیتورینگ سمفا» بسازیم که هم وضعیت فنی سرویسهات رو پوشش بده، هم سلامت یکپارچگی با «سمفا» و هم KPIهای بیزنسی (فروش/کنسلی). پایینتر همهچیز رو مرحلهبهمرحله نوشتم: از متغیرهای داشبورد و چیدمان پنلها تا کوئری دقیق هر پنل (PromQL). هرجا متریک لازم رو نداری، آخر کار «پیشنهاد متریکهای تکمیلی» گذاشتم که اضافه کنی تا تصویر کاملتر بشه.
$job
:
promqllabel_values(process_uptime_seconds, job)
$instance
:
promqllabel_values(process_uptime_seconds{job="$job"}, instance)
$samfa_uri
: مسیرهای وبهوک/کالبک سمفا روی سرور شما (اگر وجود دارد):
promqllabel_values(http_server_requests_seconds_count{uri=~"/samfa/.+"}, uri)
$reserve_uri
: مسیر رزرو (فرض: /api/v1/reserve
را متناسب با سیستمتان بگذارید):
promqllabel_values(http_server_requests_seconds_count{uri=~".*reserve.*"}, uri)
$payment_gateway
: اگر لیبل دارید (مثلاً gateway="mellat"
)، وگرنه این متغیر اختیاری است:
promqllabel_values(EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_count, gateway)
نکته: در تمام کوئریها از ماکروهای Grafana استفاده میکنیم:
$__rate_interval
,$__interval
,$__range
تا مقیاسپذیر باشد.
پیشنهاد: یک داشبورد اصلی به نام «سمفا – پایش یکپارچگی و فروش (GiSheh7)» با این سکشنها:
پایینتر، برای هر سکشن نام پنل، هدف و کوئری دقیق رو گذاشتم.
promqlsum by (job) ( rate(http_server_requests_seconds_count{job="$job"}[$__rate_interval]) )
promqlsum(rate(reserve_ticket_response_time_seconds_sum[$__rate_interval])) / sum(rate(reserve_ticket_response_time_seconds_count[$__rate_interval]))
promqlhistogram_quantile(0.95, sum by (le) ( rate(http_server_requests_seconds_bucket{job="$job"}[$__rate_interval]) ) )
promql100 * ( sum(rate(http_server_requests_seconds_count{job="$job", status=~"5.."}[$__rate_interval])) ) / sum(rate(http_server_requests_seconds_count{job="$job"}[$__rate_interval]))
ایده: وضعیت «رزرو → پرداخت → صدور بلیت → SMS → (در صورت لغو) استرداد» را ببین.
promqlsum(rate(reserve_ticket_response_time_seconds_count[$__rate_interval]))
promqlsum by ($reserve_uri)( rate(reserve_ticket_response_time_seconds_sum{uri="$reserve_uri"}[$__rate_interval]) ) / sum by ($reserve_uri)( rate(reserve_ticket_response_time_seconds_count{uri="$reserve_uri"}[$__rate_interval]) )
promqlincrease(tickets_count_total[$__range])
اگر tickets_count_total سراسری است، میتونی بر اساس job/instance هم group کنی.
promqlincrease(sms_sent_total[$__range])
promqlincrease(refund_requests_total[$__range])
promqlincrease(invalid_refund_requests_total[$__range])
پنل ترکیبی قیف: چهار Stat کنار هم
«رزروها» = 5 | «بلیتهای صادرشده» = 7 | «SMS ارسالشده» = 8 | «Refund» = 9
اختلاف زیاد بین رزرو و بلیت صادرشده ⇒ گلوگاه پرداخت. اختلاف زیاد بین بلیت صادرشده و SMS ⇒ مشکل سرویس پیامک.
فرض میکنیم outbound callها با
external_service_seconds_*
برچسبی مثلservice="samfa"
دارند. اگر ندارید، همین متریکها را با چنین لیبلهایی اضافه کنید.
promqlsum by (service) ( rate(external_service_seconds_count{service="samfa"}[$__rate_interval]) )
promqlsum(rate(external_service_seconds_sum{service="samfa"}[$__rate_interval])) / sum(rate(external_service_seconds_count{service="samfa"}[$__rate_interval]))
promqlmax_over_time(external_service_seconds_max{service="samfa"}[5m])
/samfa/*
)promql100 * sum(rate(http_server_requests_seconds_count{uri=~"/samfa/.+", status=~"5.."}[$__rate_interval])) / sum(rate(http_server_requests_seconds_count{uri=~"/samfa/.+"}[$__rate_interval]))
promqlhistogram_quantile(0.95, sum by (le) ( rate(http_server_requests_seconds_bucket{uri=~"/samfa/.+"}[$__rate_interval]) ) )
و ظرفیت با:promqlsum(executor_queued_tasks)
promqlsum(executor_queue_remaining_tasks)
اگر واقعاً job اختصاصی دارید، متریکهای اختصاصی queue length/lag اضافه کنید (در انتهای پاسخ پیشنهاد دادم).
promqlsum(rate(EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_count[$__rate_interval]))
promqlsum(rate(EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_sum[$__rate_interval])) / sum(rate(EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_count[$__rate_interval]))
promqlmax_over_time(EXTERNAL_PAYMENT_COMPONENT_TOKEN_seconds_max[5m])
promqlsum(rate(MELLAT_REFUND_REPORT_RESPONSE_seconds_count[$__rate_interval]))
promqlsum(rate(MELLAT_REFUND_REPORT_RESPONSE_seconds_sum[$__rate_interval])) / sum(rate(MELLAT_REFUND_REPORT_RESPONSE_seconds_count[$__rate_interval]))
promqlsum(rate(refund_response_time_seconds_sum[$__rate_interval])) / sum(rate(refund_response_time_seconds_count[$__rate_interval]))
uri
)promqlsum by (uri) ( rate(http_server_requests_seconds_count{job="$job"}[$__rate_interval]) )
promqlhistogram_quantile(0.5, sum by (le) (rate(http_server_requests_seconds_bucket{job="$job"}[$__rate_interval])) )
promqlhistogram_quantile(0.9, sum by (le) (rate(http_server_requests_seconds_bucket{job="$job"}[$__rate_interval])) )
promqlhistogram_quantile(0.99, sum by (le) (rate(http_server_requests_seconds_bucket{job="$job"}[$__rate_interval])) )
promqlsum by (status) (rate(http_server_requests_seconds_count{status=~"5.."}[$__rate_interval])) / sum by (status) (rate(http_server_requests_seconds_count[$__rate_interval]))
promqlsum(http_server_requests_active_seconds_active_count)
promqlavg(hikaricp_connections_active)
promqlavg(hikaricp_connections_idle)
promqlavg(hikaricp_connections_max)
promqlavg(hikaricp_connections_pending)
promqlsum(rate(hikaricp_connections_acquire_seconds_sum[$__rate_interval])) / sum(rate(hikaricp_connections_acquire_seconds_count[$__rate_interval]))
promqlsum(rate(lettuce_command_completion_seconds_sum[$__rate_interval])) / sum(rate(lettuce_command_completion_seconds_count[$__rate_interval]))
promqlmax_over_time(lettuce_command_completion_seconds_max[5m])
promqlsum(executor_active_threads)
promqlsum(executor_pool_size_threads)
promqlsum(executor_queued_tasks)
promqlsum(jvm_memory_used_bytes{area="heap"})
promqlsum(jvm_memory_max_bytes{area="heap"})
(ms)promql1000 * histogram_quantile(0.95, sum by (le) (rate(jvm_gc_pause_seconds_bucket[$__rate_interval])) )
promqlavg(process_cpu_usage)
promqlavg(system_load_average_1m)
promqlsum(disk_free_bytes)
promqlsum(disk_total_bytes)
promqlsum(rate(spring_security_authorizations_seconds_sum[$__rate_interval])) / sum(rate(spring_security_authorizations_seconds_count[$__rate_interval]))
promqlmax_over_time(spring_security_filterchains_seconds_max[5m])
promqlsum(rate(spring_security_filterchains_seconds_count[$__rate_interval]))
خطای وبهوک سمفا > ۱٪ برای ۱۰ دقیقه
promql( sum(rate(http_server_requests_seconds_count{uri=~"/samfa/.+", status=~"5.."}[10m])) ) / sum(rate(http_server_requests_seconds_count{uri=~"/samfa/.+"}[10m])) > 0.01
p95 تأخیر تماس با سمفا > ۲ ثانیه برای ۱۰ دقیقه
promqlhistogram_quantile(0.95, sum by (le) (rate(http_server_requests_seconds_bucket{uri=~"/samfa/.+"}[10m])) ) > 2
ThreadPool صف > ۱۰۰ برای ۵ دقیقه
promqlavg_over_time(executor_queued_tasks[5m]) > 100
Hikari Pending > 0 برای ۵ دقیقه
promqlavg_over_time(hikaricp_connections_pending[5m]) > 0
Error Ratio کل > ۲٪ برای ۱۰ دقیقه
promql( sum(rate(http_server_requests_seconds_count{status=~"5.."}[10m])) ) / sum(rate(http_server_requests_seconds_count[10m])) > 0.02
sum(rate(_sum)) / sum(rate(_count))
استفاده کردیم.*_bucket
داری (مثل http_server_requests_seconds_bucket
) از histogram_quantile
استفاده کن./samfa/*
) و رزرو (/reserve*
) label درست بذار تا فیلتر کردن راحت باشه.application_started_time_seconds
) و قطعیهای زیرساخت.برای سناریوی «وقتی CinemaTicket کنسل میکند، صندلی برای شما آزاد شود»، لازم است چند متریک دامنهمحور اضافه شوند تا سازگاری موجودی صندلی را ببینی:
seat_sync_events_total{source="samfa"|"local", type="reserve"|"cancel"}
seat_sync_lag_seconds{flow="cancel"|"reserve"}
(gauge)
seats_inconsistent_total
(gauge)
payment_result_total{gateway="mellat", result="success"|"failed"|"timeout"}
webhook_delivery_total{endpoint="/samfa/...", result="2xx|5xx|timeout"}
با این متریکها میتونی یک پنل «سازگاری صندلیها» بسازی:
promqlmax_over_time(seats_inconsistent_total[$__range])
(اگر bucket اضافه کنی. در غیر این صورت میانگین با gauge ساده.)promqlhistogram_quantile(0.95, sum by (le) (rate(seat_sync_lag_seconds_bucket{flow="cancel"}[$__rate_interval])) )
اگر دوست داری، اسم پنلها رو دقیقاً اینها بگذار (خواناتر برای تیم عملیات):
اگه بخوام یک «چکلیست سریع» بدم:
uri
, service
, operation
) استاندارد کن.هرجا دوست داشتی، برای هر endpoint واقعی شما (/samfa/...
, /reserve...
) جایش را در کوئریها با مقدار واقعی جایگزین کن و اگر لیبلهایت فرق میکند، فقط فیلترها را همنام لیبلهای خودت کن.