Monitoring Exporters — Operations Workbook

Covers all Prometheus exporters feeding the monitoring stack. All custom exporters run on IONOS VPS (vps-i1) as Docker containers within the monitoring compose stack. node-exporter and cAdvisor run on both vps-i1 and vps-h1.

Workbook last reviewed: 2026-05-15

Summary Table

Exporter	Host(s)	Port	Source dir	Credentials
node-exporter	vps-i1, vps-h1	9100	system binary / Docker	none
cAdvisor	vps-i1, vps-h1	8080	Docker image	none
pg-stats-exporter	vps-i1	9201	`monitoring/exporters/pg-stats-exporter/`	SUPABASE_DB_HOST, SUPABASE_GRAFANA_PASSWORD
backup-exporter	vps-i1	9220	`monitoring/exporters/backup-exporter/`	P24_INFRA_WASABI_ACCESS_KEY, P24_INFRA_WASABI_SECRET_KEY (reads from s3://p24-infra, eu-central-2)
cost-exporter	vps-i1	9210	`monitoring/exporters/cost-exporter/`	VERCEL_TOKEN, SUPABASE_ACCESS_TOKEN, WASABI keys
vercel-exporter	vps-i1	9202	`monitoring/exporters/vercel-exporter/`	VERCEL_TOKEN
queue-exporter	vps-i1	9200	`monitoring/exporters/queue-exporter/`	SUPABASE_DB_HOST, SUPABASE_GRAFANA_PASSWORD
grafana-image-renderer	vps-i1	8081	Docker image (sidecar)	none

1. node-exporter

What it does

Exposes host OS metrics: CPU, memory, disk I/O, network, filesystem usage, load average, and hundreds of kernel-level counters. Used by Prometheus alert rules for HighCPU, HighMemory, LowDisk, ServerDown.

Install / config location

vps-i1 (IONOS): Part of the monitoring docker-compose stack.

# /opt/p24-infra/monitoring/docker-compose.yml
node-exporter:
  image: prom/node-exporter:latest
  network_mode: host
  pid: host
  volumes:
    - /:/host:ro,rslave
  command:
    - --path.rootfs=/host
  restart: unless-stopped

vps-h1 (Hostinger): Runs as a host-network container via /root/docker-compose.yml.

# Verify running
docker ps --filter name=node-exporter

Backup

Configuration is in the git repo (monitoring/docker-compose.yml, hostinger/docker-compose.yml). Stateless — no persistent data. Backup = config in git.

Restore

# vps-i1
cd /opt/p24-infra/monitoring
git pull
docker compose up -d node-exporter
 
# vps-h1
cd /root
docker compose up -d root-node-exporter-1

Monitoring

Prometheus scrapes :9100/metrics every 15s (default). Alert ServerDown fires if up{job="node-exporter"} == 0 for >2 minutes.

Healthcheck

# vps-i1
curl -s http://localhost:9100/metrics | head -5
 
# vps-h1
curl -s http://localhost:9100/metrics | head -5

Password rotation

None. node-exporter requires no credentials.

2. cAdvisor

What it does

Exposes per-container resource metrics: CPU throttling, memory usage, network I/O, filesystem reads/writes. Provides the container_* metric family used in Grafana container dashboards and ContainerCrashLooping alerts.

Install / config location

vps-i1: Part of the monitoring docker-compose stack.

# /opt/p24-infra/monitoring/docker-compose.yml
cadvisor:
  image: gcr.io/cadvisor/cadvisor:latest
  ports:
    - "8080:8080"
  volumes:
    - /:/rootfs:ro
    - /var/run:/var/run:ro
    - /sys:/sys:ro
    - /var/lib/docker/:/var/lib/docker:ro
  restart: unless-stopped

vps-h1: Runs via /root/docker-compose.yml as root-cadvisor-1.

Backup

Stateless. Config in git. Backup = config in git.

Restore

# vps-i1
cd /opt/p24-infra/monitoring
git pull
docker compose up -d cadvisor
 
# vps-h1
cd /root
docker compose up -d root-cadvisor-1

Monitoring

Prometheus scrapes :8080/metrics. Alert fires if up{job="cadvisor"} drops to 0 for >2 minutes.

Healthcheck

curl -s http://localhost:8080/metrics | grep 'container_cpu_usage_seconds_total' | head -3

Password rotation

None. cAdvisor requires no credentials.

3. pg-stats-exporter

What it does

Reads Supabase extensions.pg_stat_statements via the session pooler, computes per-query performance stats (mean execution time, call count, rows returned), and publishes them as Prometheus gauges on :9201/metrics. Powers the slow-query Grafana dashboard.

Install / config location

Host: vps-i1 only. Part of the monitoring docker-compose stack.

Source:       /opt/p24-infra/monitoring/exporters/pg-stats-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port:         9201
Scrape interval: 60s (configured in prometheus.yml)

Key environment variables (from .env):

Variable	Purpose
`SUPABASE_DB_HOST`	Session pooler host (aws-1-eu-central-1.pooler.supabase.com)
`SUPABASE_DB_PORT`	Pooler port (5432)
`SUPABASE_DB_NAME`	Database name
`SUPABASE_GRAFANA_PASSWORD`	Password for `grafana_readonly` role

Backup

Source code in git (monitoring/exporters/pg-stats-exporter/). No persistent data. Backup = git.

Restore

cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build pg-stats-exporter

Monitoring

Prometheus scrapes :9201/metrics. If the exporter is down, slow-query metrics vanish from Grafana. No dedicated alert exists for the exporter itself — covered by general up == 0 target-down detection.

Healthcheck

curl -s http://localhost:9201/metrics | grep 'pg_stat_statements'

Password rotation

Credential	Key	Rotation handling
Supabase DB password	`SUPABASE_GRAFANA_PASSWORD`	Update `.env` on vps-i1, restart exporter. Also update GH Secret and `.env.local`. Log in `secrets-rotation-log.md`.

4. backup-exporter

What it does

Fetches backups/supabase/metrics/backup-status.prom from s3://p24-infra/ (Wasabi eu-central-2) on each Prometheus scrape and re-exposes the metrics on :9220/metrics. The .prom file is written by the supabase-backup GitHub Actions workflow after each successful DB backup. Publishes backup age and backup size metrics used by the BackupStale and BackupSizeRegression alert rules.

Updated in PR #440 (2026-06-12): Previously the exporter read a local file at /opt/backups/backup-status.prom. It now reads directly from Wasabi to eliminate the local file dependency and use the correct bucket (p24-infra, eu-central-2) with dedicated IAM credentials.

Install / config location

Host: vps-i1 only. Part of the monitoring docker-compose stack.

Source:       /opt/p24-infra/monitoring/exporters/backup-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port:         9220
Source:       s3://p24-infra/backups/supabase/metrics/backup-status.prom (Wasabi eu-central-2)

Key environment variables (from .env):

Variable	Purpose
`P24_INFRA_WASABI_ACCESS_KEY`	Wasabi IAM access key for the `p24-infra` user
`P24_INFRA_WASABI_SECRET_KEY`	Wasabi IAM secret key for the `p24-infra` user
`P24_INFRA_WASABI_BUCKET`	`p24-infra`
`P24_INFRA_WASABI_REGION`	`eu-central-2`
`P24_INFRA_WASABI_ENDPOINT`	`s3.eu-central-2.wasabisys.com`

Backup

Source code in git. The .prom file in Wasabi is ephemeral state — it is re-written after each backup. Backup = git.

Restore

cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build backup-exporter
# Verify credentials are set in .env and Wasabi bucket is accessible
docker compose logs --tail=20 backup-exporter

Monitoring

Prometheus scrapes :9220/metrics. BackupStale alert fires if the backup_last_success_timestamp_seconds metric shows age >26h. If the exporter cannot reach Wasabi (bad credentials, network issue), it returns no metrics and BackupStale fires.

Healthcheck

curl -s http://localhost:9220/metrics | grep 'backup_'

If empty, check logs for Wasabi S3 auth errors:

docker compose logs --tail=30 backup-exporter
# Common errors: InvalidAccessKeyId, SignatureDoesNotMatch, NoSuchKey

Password rotation

Credential	Key	Rotation frequency	Next due
Wasabi p24-infra IAM access key	`P24_INFRA_WASABI_ACCESS_KEY`	90 days	2026-09-12
Wasabi p24-infra IAM secret key	`P24_INFRA_WASABI_SECRET_KEY`	90 days	2026-09-12

Rotation procedure: See docs/cloud-services-operations.md → Wasabi S3 → Password/Credential Rotation → P24_INFRA_WASABI_ACCESS_KEY.

After rotating: update .env on vps-i1, restart backup-exporter, update GH Secrets and .env.local, delete old key from Wasabi, log in docs/secrets-rotation-log.md, update dev_r_services.last_rotated.

5. cost-exporter

What it does

Polls external SaaS and VPS provider APIs once per day and publishes monthly cost and usage metrics:

Vercel: bandwidth and function invocation usage vs. limits (via Vercel REST API)
Supabase: database size and egress (via Supabase management API)
Wasabi S3: storage usage and request counts (via S3 API)
OVH / SoYouStart: monthly billing amount per server (via OVH API v1, HMAC-SHA1 signed — added 2026-06-15)

Powers the VercelApproachingFreeTier and SupabaseDbSizeApproachingPro alert rules and the Cost dashboard in Grafana.

Install / config location

Host: vps-i1 only. Part of the monitoring docker-compose stack.

Source:       /opt/p24-infra/monitoring/exporters/cost-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port:         9210
Refresh interval: once per day (data cached in memory between restarts)

Key environment variables (from .env):

Variable	Purpose
`VERCEL_TOKEN`	Vercel REST API bearer token
`SUPABASE_ACCESS_TOKEN`	Supabase management API token
`WASABI_ACCESS_KEY`	Wasabi S3 access key (monitoring bucket scope)
`WASABI_SECRET_KEY`	Wasabi S3 secret key
`OVH_APP_KEY`	OVH API application key (read-only billing scope)
`OVH_APP_SECRET`	OVH API application secret
`OVH_CONSUMER_KEY`	OVH API consumer key (validated)
`SYS_APP_KEY`	SoYouStart API application key
`SYS_APP_SECRET`	SoYouStart API application secret
`SYS_CONSUMER_KEY`	SoYouStart API consumer key (validated)

OVH/SoYouStart keys are optional — exporter logs a warning and increments the error counter if not set, but continues serving other metrics.

OVH / SoYouStart billing collectors

Added 2026-06-15. Calls GET /me/bill + GET /me/bill/{id}/details on the OVH EU API and the SoYouStart EU endpoint. Publishes:

Metric	Labels	Description
`ovh_monthly_bill_eur`	`provider`, `bill_id`, `description`	Monthly invoice total in EUR
`cost_collector_last_success_timestamp`	`provider=ovh`, `provider=sys`	Unix timestamp of last successful collection
`cost_collector_errors_total`	`provider`	Cumulative errors per provider

Auth: HMAC-SHA1 signed — standard OVH API pattern. If credentials missing, exporter skips that provider and increments error counter (no crash).

Backup

Source code in git. No persistent data — metrics are refreshed from APIs on each cycle. Backup = git.

Restore

cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build cost-exporter

Monitoring

Prometheus scrapes :9210/metrics. Metrics have a 25h staleness window — if the exporter is down for >25h, cost metrics disappear and related alerts may fire unexpectedly.

Healthcheck

curl -s http://localhost:9210/metrics | grep 'cost_'

Password rotation

Credential	Key	Rotation handling
Vercel API token	`VERCEL_TOKEN`	Update `.env` on vps-i1 + GH Secret + `.env.local`. Restart exporter. Log rotation.
Supabase management token	`SUPABASE_ACCESS_TOKEN`	Same procedure.
Wasabi access key	`WASABI_ACCESS_KEY`	Generate new key pair in Wasabi console. Update `.env`, GH Secret, `.env.local`. Restart exporter. Log rotation.
Wasabi secret key	`WASABI_SECRET_KEY`	Same as access key rotation.
OVH API keys	`OVH_APP_KEY/SECRET/CONSUMER_KEY`	Regenerate in OVH console (API → Tokens). Update `.env` on vps-i1 + GH Secret. Restart exporter.
SoYouStart keys	`SYS_APP_KEY/SECRET/CONSUMER_KEY`	Same procedure via SoYouStart API console.

6. vercel-exporter

What it does

Polls the Vercel Deployments API every 5 minutes and upserts deployment state (status, project name, git ref, created-at) into Supabase. Also publishes deployment counts and failure counts as Prometheus gauges on :9202/metrics. Feeds the deployment version Grafana dashboard.

Install / config location

Host: vps-i1 only. Part of the monitoring docker-compose stack.

Source:       /opt/p24-infra/monitoring/exporters/vercel-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port:         9202
Poll interval: 5 minutes

Key environment variables (from .env):

Variable	Purpose
`VERCEL_TOKEN`	Vercel REST API bearer token
`SUPABASE_URL`	Supabase project URL
`SUPABASE_SERVICE_KEY`	Supabase service_role key (for upserts)

Backup

Source code in git. Supabase holds the upserted deployment records — those are covered by Supabase’s own backup. Backup = git + Supabase data.

Restore

cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build vercel-exporter
# Historical deployment data persists in Supabase

Monitoring

Prometheus scrapes :9202/metrics. If the exporter fails, Vercel deployment data in Grafana goes stale. Staleness visible because latest deployment timestamp stops updating.

Healthcheck

curl -s http://localhost:9202/metrics | grep 'vercel_'

Password rotation

Credential	Key	Rotation handling
Vercel API token	`VERCEL_TOKEN`	Update `.env` on vps-i1 + GH Secret + `.env.local`. Restart exporter. Log rotation.
Supabase service key	`SUPABASE_SERVICE_KEY`	Update `.env` on vps-i1 + GH Secret + `.env.local`. Restart exporter. Log rotation.

7. queue-exporter

What it does

Connects to Supabase as grafana_readonly, reads the dev_r_exporters_queues registry table to discover which application tables to monitor, then for each active table runs a GROUP BY status query and publishes row counts as the queue_depth_by_status Prometheus gauge. Supports the TranscriptionQueueCritical alert and queue-depth Grafana panels.

Which tables are monitored is controlled entirely by the dev_r_exporters_queues SQL table — no code change needed to add or remove a queue.

Install / config location

Host: vps-i1 only. Part of the monitoring docker-compose stack.

Source:       /opt/p24-infra/monitoring/exporters/queue-exporter/
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Port:         9200
Poll interval: 60s

Key environment variables (from .env):

Variable	Purpose
`SUPABASE_DB_HOST`	Session pooler host
`SUPABASE_DB_PORT`	Pooler port
`SUPABASE_DB_NAME`	Database name
`SUPABASE_GRAFANA_PASSWORD`	Password for `grafana_readonly` role

Backup

Source code in git. Monitoring configuration (which queues to scrape) is in Supabase dev_r_exporters_queues — covered by Supabase backup. Backup = git + Supabase.

Restore

cd /opt/p24-infra/monitoring
git pull
docker compose up -d --no-deps --build queue-exporter

If dev_r_exporters_queues rows are lost, re-insert from monitoring-stack-operations.md — queue list is documented there.

Monitoring

Prometheus scrapes :9200/metrics. If the exporter is down, queue metrics disappear. TranscriptionQueueCritical will not fire while the exporter is down — this is a monitoring gap during outages.

Healthcheck

curl -s http://localhost:9200/metrics | grep 'queue_depth'

If the output is empty (no queue_depth lines), the exporter has no active queues or the DB connection failed. Check logs:

cd /opt/p24-infra/monitoring
docker compose logs --tail=30 queue-exporter

Password rotation

Credential	Key	Rotation handling
Supabase DB password	`SUPABASE_GRAFANA_PASSWORD`	Update `.env` on vps-i1 + GH Secret + `.env.local`. Restart exporter. Log rotation.

8. grafana-image-renderer

What it does

Headless Chromium sidecar that renders Grafana panels as PNG images on demand. Used by n8n daily report workflows to embed Grafana charts in WhatsApp / email messages. Listens on :8081 (internal only, not exposed externally).

Install / config location

Host: vps-i1 only. Part of the monitoring docker-compose stack.

# /opt/p24-infra/monitoring/docker-compose.yml
grafana-image-renderer:
  image: grafana/grafana-image-renderer:latest
  ports:
    - "127.0.0.1:8081:8081"
  environment:
    ENABLE_METRICS: "true"
  restart: unless-stopped

Grafana is configured to use it via GF_RENDERING_SERVER_URL=http://grafana-image-renderer:8081/render.

Backup

Stateless. No persistent data. Config in git. Backup = git.

Restore

cd /opt/p24-infra/monitoring
docker compose up -d grafana-image-renderer

Grafana will automatically reconnect once the sidecar is up.

Monitoring

No dedicated Prometheus scrape (though it optionally exposes metrics on :8081/metrics when ENABLE_METRICS=true). If the renderer is down, Grafana shows a “rendering failed” error in panels and n8n report images break.

Healthcheck

curl -s http://localhost:8081/
# Expected: some response (not connection refused)
 
# Check Grafana can reach it
docker compose logs grafana | grep -i render | tail -10

Password rotation

None. grafana-image-renderer requires no external credentials.

Common Operations

Rebuild a custom exporter after a code change

cd /opt/p24-infra/monitoring
git pull
# Force a clean rebuild (always use --no-cache to catch dep changes):
# Replace <exporter> with: queue-exporter, pg-stats-exporter, cost-exporter, vercel-exporter, backup-exporter, credential-exporter
docker compose build --no-cache <exporter>
docker compose up -d <exporter>
docker compose logs --tail=20 <exporter>

Do not use --build flag with docker compose up — it uses the layer cache and may silently skip apt-get / pip updates. Always run build --no-cache first, then up -d.

Container image build — known issues

FROM inline comments break the parser FROM python:3.13-slim # comment is invalid Dockerfile syntax. Docker treats the # as an extra argument and aborts with FROM requires either one or three arguments. Never add inline comments to FROM lines.

psycopg2-binary requires Python 3.13 wheel (≥ 2.9.10) psycopg2-binary==2.9.9 has no Python 3.13 wheel. On --no-cache builds it tries to compile from source, fails because libpq-dev is absent in slim images. Pin to ≥ 2.9.10 which ships a pre-built wheel.

python:3.13-slim has no wget or curl All custom exporter Dockerfiles install wget via:

RUN apt-get update && apt-get install -y --no-install-recommends wget && rm -rf /var/lib/apt/lists/*

Healthchecks use wget -qO- http://localhost:<port>/metrics. If wget is missing from a container, the healthcheck will show executable file not found in $PATH — rebuild the image.

Alpine busybox wget resolves localhost as IPv6 (::1) Caddy’s admin API binds to IPv4 only. The healthcheck must use http://127.0.0.1:2019/config/ not http://localhost:2019/config/ — Alpine’s musl libc/busybox prefers ::1 when both 127.0.0.1 and ::1 are in /etc/hosts. Use explicit IPv4 for any healthcheck in Alpine-based images.

Check all exporter targets in Prometheus

Open https://prometheus.vps-i1.infra.zintegrowana.online/targets and verify all exporter targets show UP.

Exporter is DOWN — general recovery procedure

Check container status: docker compose ps
Check logs: docker compose logs --tail=50 <exporter>
Verify .env has correct credentials
Restart: docker compose restart <exporter>
Confirm metrics endpoint: curl -s http://localhost:<port>/metrics | head -5
If still failing, rebuild: docker compose up -d --no-deps --build <exporter>

p24-infra Docs

Explorer

monitoring-exporters-operations

Monitoring Exporters — Operations Workbook

Summary Table

1. node-exporter

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

2. cAdvisor

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

3. pg-stats-exporter

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

4. backup-exporter

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

5. cost-exporter

What it does

Install / config location

OVH / SoYouStart billing collectors

Backup

Restore

Monitoring

Healthcheck

Password rotation

6. vercel-exporter

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

7. queue-exporter

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

8. grafana-image-renderer

What it does

Install / config location

Backup

Restore

Monitoring

Healthcheck

Password rotation

Common Operations

Rebuild a custom exporter after a code change

Container image build — known issues

Check all exporter targets in Prometheus

Exporter is DOWN — general recovery procedure

Graph View

Table of Contents

Backlinks