Supabase slow-query monitoring
GitHub issue: radieu/p24-infra#38
Surfaces slow / hot / regressed queries on the production Supabase project (mwkqmgadqnkkihjdeqsi) by reading pg_stat_statements from two angles:
- Live exploration — Grafana panels query the view directly via the existing
SupabasePostgreSQL datasource (grafana_readonly). Real-time, no Prometheus dependency. - Long-term history + alerting —
pg-stats-exporter(custom Python,monitoring/exporters/pg-stats-exporter/) polls the view every 60s, publishes top-200 rows as Prometheus gauges on:9201, scraped by the monitoring Prometheus, stored long-term in Thanos/Wasabi.
One-time Supabase setup
pg_stat_statements is preloaded on every Supabase project. Two SQL statements are required, run once in the Supabase SQL editor with an admin role:
-- Records the extension in the schema if not already present.
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Grants the grafana_readonly role permission to read the view + pg_stat_database.
GRANT pg_read_all_stats TO grafana_readonly;Verification:
SET ROLE grafana_readonly;
SELECT count(*) FROM pg_stat_statements;
RESET ROLE;Files in this repo
| Path | Role |
|---|---|
monitoring/exporters/pg-stats-exporter/app.py | Polls pg_stat_statements, exposes Prometheus metrics on :9201 |
monitoring/exporters/pg-stats-exporter/Dockerfile | Container image (python:3.11-slim + psycopg2-binary) |
monitoring/exporters/pg-stats-exporter/requirements.txt | Pinned Python deps |
monitoring/docker-compose.yml | Adds pg-stats-exporter service |
monitoring/prometheus/prometheus.yml | Adds pg_stats scrape job (60s) |
monitoring/prometheus/rules/supabase-queries.yml | Alert rules (slow, regressed, high-frequency, exporter down) |
monitoring/grafana/provisioning/dashboards/supabase-slow-queries.json | Dashboard supabase-slow-queries (uid) |
Exposed metrics
Labels: queryid (stable per normalised statement) + fingerprint (collapsed-whitespace query text, truncated to 80 chars).
| Metric | Type | Source column |
|---|---|---|
supabase_query_calls_total | Gauge | pg_stat_statements.calls |
supabase_query_total_time_seconds | Gauge | total_exec_time / 1000 |
supabase_query_mean_time_seconds | Gauge | mean_exec_time / 1000 |
supabase_query_rows_total | Gauge | rows |
supabase_query_shared_blks_hit_total | Gauge | shared_blks_hit |
supabase_query_shared_blks_read_total | Gauge | shared_blks_read |
supabase_pg_stats_exporter_last_scrape_timestamp_seconds | Gauge | Wall-clock of last good scrape |
supabase_pg_stats_exporter_last_scrape_rows | Gauge | Rows pulled last cycle |
supabase_pg_stats_exporter_errors_total | Counter | Scrape errors by error_type |
The counters from
pg_stat_statementsare exposed as Prometheus gauges, not counters, because the underlying view can be reset (pg_stat_statements_reset()) and individual rows can disappear when crowded out — neither is compatible with the Prometheus counter contract.rate()over these gauges is still a reasonable approximation of calls/sec over short windows.
Alert rules — supabase_queries group
| Alert | Expression | Severity |
|---|---|---|
SupabasePgStatsExporterDown | absent(up{job="pg_stats"} == 1) for 5m | warning |
SupabasePgStatsScrapeStale | last_scrape > 5m old | warning |
SupabaseSlowQueryMeanHigh | any statement mean >1s for 10m | warning |
SupabaseSlowQueryMeanCritical | any statement mean >5s for 5m | critical |
SupabaseQueryRegressed | mean ≥2x vs 24h ago AND >250ms, for 15m | warning |
SupabaseHighFrequencyQuery | rate(calls[5m]) > 50 for 15m | warning |
Tune thresholds in monitoring/prometheus/rules/supabase-queries.yml after baselining for a week.
Dashboard
URL once deployed: https://grafana.vps-i1.infra.zintegrowana.online/d/supabase-slow-queries
Top section uses the Supabase PG datasource (live pg_stat_statements); bottom section uses Prometheus (history + rates). Refreshes every 1 minute.
Deploy
On vps-i1 (IONOS) the repo lives at /opt/p24-infra and is auto-pulled by the Claude PreToolUse hook; explicit deploy:
ssh root@217.154.82.162
cd /opt/p24-infra && git pull
cd monitoring
docker compose build pg-stats-exporter
docker compose up -d pg-stats-exporter
# Reload Prometheus + Grafana picks up new dashboard/scrape job
curl -X POST http://localhost:9090/-/reload
docker compose restart grafanaSmoke tests
# Exporter metrics
docker exec monitoring-prometheus-1 wget -qO- http://pg-stats-exporter:9201/metrics | head -50
# Prometheus sees the target
curl -s -u admin:$GRAFANA_ADMIN_PASSWORD \
'https://prometheus.vps-i1.infra.zintegrowana.online/api/v1/targets?state=active' \
| jq '.data.activeTargets[] | select(.labels.job=="pg_stats")'
# Rules loaded
curl -s -u admin:$GRAFANA_ADMIN_PASSWORD \
'https://prometheus.vps-i1.infra.zintegrowana.online/api/v1/rules' \
| jq '.data.groups[] | select(.name=="supabase_queries")'
# Live view from Supabase
psql "host=aws-1-eu-central-1.pooler.supabase.com user=grafana_readonly.mwkqmgadqnkkihjdeqsi sslmode=require dbname=postgres" \
-c "SELECT count(*) FROM pg_stat_statements;"Troubleshooting
| Symptom | Likely cause |
|---|---|
error_type="missing_view" counter rises | pg_stat_statements extension not created — run the CREATE EXTENSION block above |
error_type="connection" rises, target down | Supabase pooler unreachable — check IP allowlist / DNS / SUPABASE_GRAFANA_PASSWORD env on vps-i1 |
| No data in dashboard top panels | grafana_readonly lacks pg_read_all_stats — re-run the GRANT |
| Same queries appear forever even after they slowed down | Expected: gauges reflect the top-N snapshot; values reset to 0 only when the row drops off the list. Use rate() for short-window trends |