04 — Monitoring
Observability stack: Prometheus + Thanos for metrics, Grafana for dashboards, Alertmanager for email/Discord alerts, and a suite of custom Python exporters covering Supabase queues, slow queries, backup status, costs, and credential rotation age.
Stack runs on vps-i1 (217.154.82.162) via Docker Compose at /opt/p24-infra/monitoring/.
Key Documents
Component Map
Component Image Port Purpose Prometheus prom/prometheus127.0.0.1:9090Scrapes all targets, 15d local TSDB Thanos sidecar quay.io/thanos/thanosinternal Uploads 2h blocks to Wasabi S3 Thanos query quay.io/thanos/thanosinternal Unified PromQL over local + S3 Alertmanager prom/alertmanager127.0.0.1:9093Email via Mailgun EU Grafana grafana/grafana127.0.0.1:3000Dashboards (Thanos + Supabase PostgreSQL) queue-exporter custom Python :9200Supabase queue depths pg-stats-exporter custom Python :9201pg_stat_statements slow queriesbackup-exporter custom Python :9220Wasabi backup status JSON cost-exporter custom Python :9210Vercel/Supabase/Wasabi billing vercel-exporter custom Python internal Vercel deployment metrics credential-exporter custom Python internal Credential rotation age grafana-image-renderer grafana/grafana-image-rendererinternal PNG screenshots for daily reports
Public URLs
URL Service grafana.vps-i1.infra.zintegrowana.onlineGrafana (Grafana login) prometheus.vps-i1.infra.zintegrowana.onlinePrometheus (basic_auth) alertmanager.vps-i1.infra.zintegrowana.onlineAlertmanager (basic_auth) infra.zintegrowana.onlineGrafana public alias
Improvement Proposals
Cross-references
README — how to act on alerts fired by Alertmanager
README — Mailgun SMTP used for alert emails