Monitoring Exporters — Operations Workbook
Covers all Prometheus exporters feeding the monitoring stack. All custom exporters run on IONOS VPS (vps-i1) as Docker containers within the monitoring compose stack. node-exporter and cAdvisor run on both vps-i1 and vps-h1.
Workbook last reviewed: 2026-05-15
Summary Table
| Exporter | Host(s) | Port | Source dir | Credentials |
|---|---|---|---|---|
| node-exporter | vps-i1, vps-h1 | 9100 | system binary / Docker | none |
| cAdvisor | vps-i1, vps-h1 | 8080 | Docker image | none |
| pg-stats-exporter | vps-i1 | 9201 | monitoring/exporters/pg-stats-exporter/ | SUPABASE_DB_HOST, SUPABASE_GRAFANA_PASSWORD |
| backup-exporter | vps-i1 | 9220 | monitoring/exporters/backup-exporter/ | P24_INFRA_WASABI_ACCESS_KEY, P24_INFRA_WASABI_SECRET_KEY (reads from s3://p24-infra, eu-central-2) |
| cost-exporter | vps-i1 | 9210 | monitoring/exporters/cost-exporter/ | VERCEL_TOKEN, SUPABASE_ACCESS_TOKEN, WASABI keys |
| vercel-exporter | vps-i1 | 9202 | monitoring/exporters/vercel-exporter/ | VERCEL_TOKEN |
| queue-exporter | vps-i1 | 9200 | monitoring/exporters/queue-exporter/ | SUPABASE_DB_HOST, SUPABASE_GRAFANA_PASSWORD |
| grafana-image-renderer | vps-i1 | 8081 | Docker image (sidecar) | none |
1. node-exporter
What it does
Exposes host OS metrics: CPU, memory, disk I/O, network, filesystem usage, load average, and hundreds of kernel-level counters. Used by Prometheus alert rules for HighCPU, HighMemory, LowDisk, ServerDown.
Install / config location
vps-i1 (IONOS): Part of the monitoring docker-compose stack.
# /opt/p24-infra/monitoring/docker-compose.yml
node-exporter:
image: prom/node-exporter:latest
network_mode: host
pid: host
volumes:
- /:/host:ro,rslave
command:
- --path.rootfs=/host
restart: unless-stoppedvps-h1 (Hostinger): Runs as a host-network container via /root/docker-compose.yml.
# Verify running
docker ps --filter name=node-exporterBackup
Configuration is in the git repo (monitoring/docker-compose.yml, hostinger/docker-compose.yml). Stateless — no persistent data. Backup = config in git.
Restore
# vps-i1
cd /opt/p24-infra/monitoring
git pull
docker compose up -d node-exporter
# vps-h1
cd /root
docker compose up -d root-node-exporter-1Monitoring
Prometheus scrapes :9100/metrics every 15s (default). Alert ServerDown fires if up{job="node-exporter"} == 0 for >2 minutes.
Healthcheck
# vps-i1
curl -s http://localhost:9100/metrics | head -5
# vps-h1
curl -s http://localhost:9100/metrics | head -5Password rotation
None. node-exporter requires no credentials.
2. cAdvisor
What it does
Exposes per-container resource metrics: CPU throttling, memory usage, network I/O, filesystem reads/writes. Provides the container_* metric family used in Grafana container dashboards and ContainerCrashLooping alerts.
Install / config location
vps-i1: Part of the monitoring docker-compose stack.
# /opt/p24-infra/monitoring/docker-compose.yml
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
restart: unless-stoppedvps-h1: Runs via /root/docker-compose.yml as root-cadvisor-1.
Backup
Stateless. Config in git. Backup = config in git.
Restore
# vps-i1
cd /opt/p24-infra/monitoring
git pull
docker compose up -d cadvisor
# vps-h1
cd /root
docker compose up -d root-cadvisor-1Monitoring
Prometheus scrapes :8080/metrics. Alert fires if up{job="cadvisor"} drops to 0 for >2 minutes.
Healthcheck
curl -s http://localhost:8080/metrics | grep 'container_cpu_usage_seconds_total' | head -3Password rotation
None. cAdvisor requires no credentials.
3. pg-stats-exporter
What it does
Reads Supabase extensions.pg_stat_statements via the session pooler, computes per-query performance stats (mean execution time, call count, rows returned), and publishes them as Prometheus gauges on :9201/metrics. Powers the slow-query Grafana dashboard.
Install / config location
Host: vps-i1 only. Part of the monitoring docker-compose stack.
Source: /opt/p24-infra/monitoring/exporters/pg-stats-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port: 9201
Scrape interval: 60s (configured in prometheus.yml)
Key environment variables (from .env):
| Variable | Purpose |
|---|---|
SUPABASE_DB_HOST | Session pooler host (aws-1-eu-central-1.pooler.supabase.com) |
SUPABASE_DB_PORT | Pooler port (5432) |
SUPABASE_DB_NAME | Database name |
SUPABASE_GRAFANA_PASSWORD | Password for grafana_readonly role |
Backup
Source code in git (monitoring/exporters/pg-stats-exporter/). No persistent data. Backup = git.
Restore
cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build pg-stats-exporterMonitoring
Prometheus scrapes :9201/metrics. If the exporter is down, slow-query metrics vanish from Grafana. No dedicated alert exists for the exporter itself — covered by general up == 0 target-down detection.
Healthcheck
curl -s http://localhost:9201/metrics | grep 'pg_stat_statements'Password rotation
| Credential | Key | Rotation handling |
|---|---|---|
| Supabase DB password | SUPABASE_GRAFANA_PASSWORD | Update .env on vps-i1, restart exporter. Also update GH Secret and .env.local. Log in secrets-rotation-log.md. |
4. backup-exporter
What it does
Fetches backups/supabase/metrics/backup-status.prom from s3://p24-infra/ (Wasabi eu-central-2) on each Prometheus scrape and re-exposes the metrics on :9220/metrics. The .prom file is written by the supabase-backup GitHub Actions workflow after each successful DB backup. Publishes backup age and backup size metrics used by the BackupStale and BackupSizeRegression alert rules.
Updated in PR #440 (2026-06-12): Previously the exporter read a local file at /opt/backups/backup-status.prom. It now reads directly from Wasabi to eliminate the local file dependency and use the correct bucket (p24-infra, eu-central-2) with dedicated IAM credentials.
Install / config location
Host: vps-i1 only. Part of the monitoring docker-compose stack.
Source: /opt/p24-infra/monitoring/exporters/backup-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port: 9220
Source: s3://p24-infra/backups/supabase/metrics/backup-status.prom (Wasabi eu-central-2)
Key environment variables (from .env):
| Variable | Purpose |
|---|---|
P24_INFRA_WASABI_ACCESS_KEY | Wasabi IAM access key for the p24-infra user |
P24_INFRA_WASABI_SECRET_KEY | Wasabi IAM secret key for the p24-infra user |
P24_INFRA_WASABI_BUCKET | p24-infra |
P24_INFRA_WASABI_REGION | eu-central-2 |
P24_INFRA_WASABI_ENDPOINT | s3.eu-central-2.wasabisys.com |
Backup
Source code in git. The .prom file in Wasabi is ephemeral state — it is re-written after each backup. Backup = git.
Restore
cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build backup-exporter
# Verify credentials are set in .env and Wasabi bucket is accessible
docker compose logs --tail=20 backup-exporterMonitoring
Prometheus scrapes :9220/metrics. BackupStale alert fires if the backup_last_success_timestamp_seconds metric shows age >26h. If the exporter cannot reach Wasabi (bad credentials, network issue), it returns no metrics and BackupStale fires.
Healthcheck
curl -s http://localhost:9220/metrics | grep 'backup_'If empty, check logs for Wasabi S3 auth errors:
docker compose logs --tail=30 backup-exporter
# Common errors: InvalidAccessKeyId, SignatureDoesNotMatch, NoSuchKeyPassword rotation
| Credential | Key | Rotation frequency | Next due |
|---|---|---|---|
| Wasabi p24-infra IAM access key | P24_INFRA_WASABI_ACCESS_KEY | 90 days | 2026-09-12 |
| Wasabi p24-infra IAM secret key | P24_INFRA_WASABI_SECRET_KEY | 90 days | 2026-09-12 |
Rotation procedure: See docs/cloud-services-operations.md → Wasabi S3 → Password/Credential Rotation → P24_INFRA_WASABI_ACCESS_KEY.
After rotating: update .env on vps-i1, restart backup-exporter, update GH Secrets and .env.local, delete old key from Wasabi, log in docs/secrets-rotation-log.md, update dev_r_services.last_rotated.
5. cost-exporter
What it does
Polls external SaaS and VPS provider APIs once per day and publishes monthly cost and usage metrics:
- Vercel: bandwidth and function invocation usage vs. limits (via Vercel REST API)
- Supabase: database size and egress (via Supabase management API)
- Wasabi S3: storage usage and request counts (via S3 API)
- OVH / SoYouStart: monthly billing amount per server (via OVH API v1, HMAC-SHA1 signed — added 2026-06-15)
Powers the VercelApproachingFreeTier and SupabaseDbSizeApproachingPro alert rules and the Cost dashboard in Grafana.
Install / config location
Host: vps-i1 only. Part of the monitoring docker-compose stack.
Source: /opt/p24-infra/monitoring/exporters/cost-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port: 9210
Refresh interval: once per day (data cached in memory between restarts)
Key environment variables (from .env):
| Variable | Purpose |
|---|---|
VERCEL_TOKEN | Vercel REST API bearer token |
SUPABASE_ACCESS_TOKEN | Supabase management API token |
WASABI_ACCESS_KEY | Wasabi S3 access key (monitoring bucket scope) |
WASABI_SECRET_KEY | Wasabi S3 secret key |
OVH_APP_KEY | OVH API application key (read-only billing scope) |
OVH_APP_SECRET | OVH API application secret |
OVH_CONSUMER_KEY | OVH API consumer key (validated) |
SYS_APP_KEY | SoYouStart API application key |
SYS_APP_SECRET | SoYouStart API application secret |
SYS_CONSUMER_KEY | SoYouStart API consumer key (validated) |
OVH/SoYouStart keys are optional — exporter logs a warning and increments the error counter if not set, but continues serving other metrics.
OVH / SoYouStart billing collectors
Added 2026-06-15. Calls GET /me/bill + GET /me/bill/{id}/details on the OVH EU API and the SoYouStart EU endpoint. Publishes:
| Metric | Labels | Description |
|---|---|---|
ovh_monthly_bill_eur | provider, bill_id, description | Monthly invoice total in EUR |
cost_collector_last_success_timestamp | provider=ovh, provider=sys | Unix timestamp of last successful collection |
cost_collector_errors_total | provider | Cumulative errors per provider |
Auth: HMAC-SHA1 signed — standard OVH API pattern. If credentials missing, exporter skips that provider and increments error counter (no crash).
Backup
Source code in git. No persistent data — metrics are refreshed from APIs on each cycle. Backup = git.
Restore
cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build cost-exporterMonitoring
Prometheus scrapes :9210/metrics. Metrics have a 25h staleness window — if the exporter is down for >25h, cost metrics disappear and related alerts may fire unexpectedly.
Healthcheck
curl -s http://localhost:9210/metrics | grep 'cost_'Password rotation
| Credential | Key | Rotation handling |
|---|---|---|
| Vercel API token | VERCEL_TOKEN | Update .env on vps-i1 + GH Secret + .env.local. Restart exporter. Log rotation. |
| Supabase management token | SUPABASE_ACCESS_TOKEN | Same procedure. |
| Wasabi access key | WASABI_ACCESS_KEY | Generate new key pair in Wasabi console. Update .env, GH Secret, .env.local. Restart exporter. Log rotation. |
| Wasabi secret key | WASABI_SECRET_KEY | Same as access key rotation. |
| OVH API keys | OVH_APP_KEY/SECRET/CONSUMER_KEY | Regenerate in OVH console (API → Tokens). Update .env on vps-i1 + GH Secret. Restart exporter. |
| SoYouStart keys | SYS_APP_KEY/SECRET/CONSUMER_KEY | Same procedure via SoYouStart API console. |
6. vercel-exporter
What it does
Polls the Vercel Deployments API every 5 minutes and upserts deployment state (status, project name, git ref, created-at) into Supabase. Also publishes deployment counts and failure counts as Prometheus gauges on :9202/metrics. Feeds the deployment version Grafana dashboard.
Install / config location
Host: vps-i1 only. Part of the monitoring docker-compose stack.
Source: /opt/p24-infra/monitoring/exporters/vercel-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port: 9202
Poll interval: 5 minutes
Key environment variables (from .env):
| Variable | Purpose |
|---|---|
VERCEL_TOKEN | Vercel REST API bearer token |
SUPABASE_URL | Supabase project URL |
SUPABASE_SERVICE_KEY | Supabase service_role key (for upserts) |
Backup
Source code in git. Supabase holds the upserted deployment records — those are covered by Supabase’s own backup. Backup = git + Supabase data.
Restore
cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build vercel-exporter
# Historical deployment data persists in SupabaseMonitoring
Prometheus scrapes :9202/metrics. If the exporter fails, Vercel deployment data in Grafana goes stale. Staleness visible because latest deployment timestamp stops updating.
Healthcheck
curl -s http://localhost:9202/metrics | grep 'vercel_'Password rotation
| Credential | Key | Rotation handling |
|---|---|---|
| Vercel API token | VERCEL_TOKEN | Update .env on vps-i1 + GH Secret + .env.local. Restart exporter. Log rotation. |
| Supabase service key | SUPABASE_SERVICE_KEY | Update .env on vps-i1 + GH Secret + .env.local. Restart exporter. Log rotation. |
7. queue-exporter
What it does
Connects to Supabase as grafana_readonly, reads the dev_r_exporters_queues registry table to discover which application tables to monitor, then for each active table runs a GROUP BY status query and publishes row counts as the queue_depth_by_status Prometheus gauge. Supports the TranscriptionQueueCritical alert and queue-depth Grafana panels.
Which tables are monitored is controlled entirely by the dev_r_exporters_queues SQL table — no code change needed to add or remove a queue.
Install / config location
Host: vps-i1 only. Part of the monitoring docker-compose stack.
Source: /opt/p24-infra/monitoring/exporters/queue-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port: 9200
Poll interval: 60s
Key environment variables (from .env):
| Variable | Purpose |
|---|---|
SUPABASE_DB_HOST | Session pooler host |
SUPABASE_DB_PORT | Pooler port |
SUPABASE_DB_NAME | Database name |
SUPABASE_GRAFANA_PASSWORD | Password for grafana_readonly role |
Backup
Source code in git. Monitoring configuration (which queues to scrape) is in Supabase dev_r_exporters_queues — covered by Supabase backup. Backup = git + Supabase.
Restore
cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build queue-exporterIf dev_r_exporters_queues rows are lost, re-insert from monitoring-stack-operations.md — queue list is documented there.
Monitoring
Prometheus scrapes :9200/metrics. If the exporter is down, queue metrics disappear. TranscriptionQueueCritical will not fire while the exporter is down — this is a monitoring gap during outages.
Healthcheck
curl -s http://localhost:9200/metrics | grep 'queue_depth'If the output is empty (no queue_depth lines), the exporter has no active queues or the DB connection failed. Check logs:
cd /opt/p24-infra/monitoring
docker compose logs --tail=30 queue-exporterPassword rotation
| Credential | Key | Rotation handling |
|---|---|---|
| Supabase DB password | SUPABASE_GRAFANA_PASSWORD | Update .env on vps-i1 + GH Secret + .env.local. Restart exporter. Log rotation. |
8. grafana-image-renderer
What it does
Headless Chromium sidecar that renders Grafana panels as PNG images on demand. Used by n8n daily report workflows to embed Grafana charts in WhatsApp / email messages. Listens on :8081 (internal only, not exposed externally).
Install / config location
Host: vps-i1 only. Part of the monitoring docker-compose stack.
# /opt/p24-infra/monitoring/docker-compose.yml
grafana-image-renderer:
image: grafana/grafana-image-renderer:latest
ports:
- "127.0.0.1:8081:8081"
environment:
ENABLE_METRICS: "true"
restart: unless-stoppedGrafana is configured to use it via GF_RENDERING_SERVER_URL=http://grafana-image-renderer:8081/render.
Backup
Stateless. No persistent data. Config in git. Backup = git.
Restore
cd /opt/p24-infra/monitoring
docker compose up -d grafana-image-rendererGrafana will automatically reconnect once the sidecar is up.
Monitoring
No dedicated Prometheus scrape (though it optionally exposes metrics on :8081/metrics when ENABLE_METRICS=true). If the renderer is down, Grafana shows a “rendering failed” error in panels and n8n report images break.
Healthcheck
curl -s http://localhost:8081/
# Expected: some response (not connection refused)
# Check Grafana can reach it
docker compose logs grafana | grep -i render | tail -10Password rotation
None. grafana-image-renderer requires no external credentials.
Common Operations
Rebuild a custom exporter after a code change
cd /opt/p24-infra/monitoring
git pull
# Force a clean rebuild (always use --no-cache to catch dep changes):
# Replace <exporter> with: queue-exporter, pg-stats-exporter, cost-exporter, vercel-exporter, backup-exporter, credential-exporter
docker compose build --no-cache <exporter>
docker compose up -d <exporter>
docker compose logs --tail=20 <exporter>Do not use --build flag with docker compose up — it uses the layer cache and may silently skip apt-get / pip updates. Always run build --no-cache first, then up -d.
Container image build — known issues
FROM inline comments break the parser
FROM python:3.13-slim # comment is invalid Dockerfile syntax. Docker treats the # as an extra argument and aborts with FROM requires either one or three arguments. Never add inline comments to FROM lines.
psycopg2-binary requires Python 3.13 wheel (≥ 2.9.10)
psycopg2-binary==2.9.9 has no Python 3.13 wheel. On --no-cache builds it tries to compile from source, fails because libpq-dev is absent in slim images. Pin to ≥ 2.9.10 which ships a pre-built wheel.
python:3.13-slim has no wget or curl All custom exporter Dockerfiles install wget via:
RUN apt-get update && apt-get install -y --no-install-recommends wget && rm -rf /var/lib/apt/lists/*Healthchecks use wget -qO- http://localhost:<port>/metrics. If wget is missing from a container, the healthcheck will show executable file not found in $PATH — rebuild the image.
Alpine busybox wget resolves localhost as IPv6 (::1)
Caddy’s admin API binds to IPv4 only. The healthcheck must use http://127.0.0.1:2019/config/ not http://localhost:2019/config/ — Alpine’s musl libc/busybox prefers ::1 when both 127.0.0.1 and ::1 are in /etc/hosts. Use explicit IPv4 for any healthcheck in Alpine-based images.
Check all exporter targets in Prometheus
Open https://prometheus.vps-i1.infra.zintegrowana.online/targets and verify all exporter targets show UP.
Exporter is DOWN — general recovery procedure
- Check container status:
docker compose ps - Check logs:
docker compose logs --tail=50 <exporter> - Verify
.envhas correct credentials - Restart:
docker compose restart <exporter> - Confirm metrics endpoint:
curl -s http://localhost:<port>/metrics | head -5 - If still failing, rebuild:
docker compose up -d --no-deps --build <exporter>