Grafana — Operations Workbook
Grafana 11.3.0 running on IONOS VPS (vps-i1). Primary observability dashboard for infrastructure metrics and business KPIs.
Architecture
IONOS VPS (217.154.82.162)
├── Container: monitoring-grafana-1 image: grafana/grafana:11.3.0
│ ├── port 3000 web UI (proxied via Caddy → HTTPS)
│ ├── volume: grafana_data users, alert rules, playlists, annotations
│ └── mounts:
│ └── ./grafana/provisioning:/etc/grafana/provisioning (ro)
│ ├── dashboards/ JSON dashboard definitions (from repo)
│ └── datasources/ Prometheus (Thanos) + Supabase PostgreSQL
│
└── Container: monitoring-renderer-1 image: grafana/grafana-image-renderer:3.11.6
└── port 8081 PNG renderer for report emailsPublic URL: https://grafana.vps-i1.infra.zintegrowana.online
Also: https://infra.zintegrowana.online (Cloudflare CNAME alias)
Compose file: /opt/p24-infra/monitoring/docker-compose.yml
Config Management
| File | Location | In repo? | Contains secrets? |
|---|---|---|---|
docker-compose.yml | /opt/p24-infra/monitoring/ | ✅ | No |
grafana/provisioning/ | /opt/p24-infra/monitoring/grafana/provisioning/ | ✅ | No |
.env | /opt/p24-infra/monitoring/.env | ❌ .env.example only | Yes |
grafana_data volume | Docker volume | ❌ | No (users, alerts — not secrets) |
Key environment variables
| Variable | Purpose |
|---|---|
GRAFANA_ADMIN_PASSWORD | Admin login password |
GF_RENDERING_RENDERER_TOKEN | Shared secret between Grafana and renderer |
SUPABASE_GRAFANA_PASSWORD | grafana_readonly DB role password |
SMTP_HOST / SMTP_USER / SMTP_PASSWORD | Alert email delivery via Mailgun |
What’s provisioned vs. persisted
- Provisioned from repo (safe — always recoverable): all dashboard JSON, datasource definitions
- Persisted in
grafana_datavolume (must be backed up): user accounts, alert rules created via UI, playlists, annotation data, API keys
Datasources
Provisioned from monitoring/grafana/provisioning/datasources/ (in repo, no secrets — keys are env-var references resolved from .env).
| Datasource | uid | Type | Auth | Use for |
|---|---|---|---|---|
| Prometheus (via Thanos Query) | prometheus | prometheus | none (internal) | Infrastructure metrics |
| Supabase | supabase | postgres (direct :5432) | grafana_readonly role (SUPABASE_GRAFANA_PASSWORD) | Business KPIs — RLS-respecting, read-only |
| Supabase-REST | supabase-rest | yesoreyeram-infinity-datasource | SUPABASE_SERVICE_ROLE_KEY | Querying Supabase over the REST/PostgREST API |
Supabase-REST (Infinity datasource)
File: datasources/supabase-infinity.yml. Requires the yesoreyeram-infinity-datasource plugin, installed via GF_INSTALL_PLUGINS in docker-compose.yml.
⚠️ The apikey header is mandatory. Supabase’s REST API runs PostgREST behind a Kong gateway that authenticates on the apikey header. A bearer token alone is not enough — Kong returns 401 "No API key found in request". The key must therefore be sent twice:
jsonData:
auth_method: bearerToken
httpHeaderName1: apikey # Kong gateway auth
secureJsonData:
bearerToken: ${SUPABASE_SERVICE_ROLE_KEY} # PostgREST role (Authorization: Bearer)
httpHeaderValue1: ${SUPABASE_SERVICE_ROLE_KEY} # apikey header value⚠️ This datasource uses the service_role key, which bypasses RLS. For RLS-sensitive or least-privilege queries, prefer the direct-Postgres Supabase datasource (grafana_readonly). Reserve Supabase-REST for cases that genuinely need the REST API.
Quick verify (from vps-i1) — 200 = working, 401 = apikey header missing:
KEY=$(grep -E '^SUPABASE_SERVICE_ROLE_KEY=' /opt/p24-infra/monitoring/.env | cut -d= -f2-)
curl -s -o /dev/null -w '%{http_code}\n' \
-H "apikey: $KEY" -H "Authorization: Bearer $KEY" \
"https://mwkqmgadqnkkihjdeqsi.supabase.co/rest/v1/dev_r_services?select=*&limit=1"Registered in dev_r_services as grafana-supabase-rest-datasource (child of the grafana element).
Deployment
Fresh install
# On vps-i1 — assumes repo is already at /opt/p24-infra
cd /opt/p24-infra/monitoring
docker compose up -d grafana rendererUpdate Grafana version
- Update image tag in
docker-compose.yml - Commit and push to
main - On vps-i1:
cd /opt/p24-infra/monitoring git pull docker compose pull grafana renderer docker compose up -d grafana renderer - Verify:
https://grafana.vps-i1.infra.zintegrowana.online→ Help → About
Add a new dashboard
- Create/edit JSON in
monitoring/grafana/provisioning/dashboards/ - Commit and push
- On vps-i1:
git pull— Grafana hot-reloads provisioned dashboards within ~60s, or restart:docker compose restart grafana
Backup
What needs backing up
| Data | Storage | Method | Schedule |
|---|---|---|---|
| Dashboard JSON | Git repo | Provisioned from repo — no separate backup needed | Always current |
| Datasource config | Git repo | Provisioned from repo | Always current |
grafana_data volume (users, alert rules) | Docker volume | grafana-backup.yml GH Action → Wasabi p24-infra/grafana/ | Nightly 03:15 UTC |
Manual backup (emergency)
# On vps-i1
docker exec monitoring-grafana-1 grafana-cli admin export-settings > /tmp/grafana-settings.json
docker run --rm -v grafana_data:/data -v /tmp:/backup alpine \
tar czf /backup/grafana-data-$(date +%F).tar.gz -C /data .
# Upload to Wasabi manually or via scripts/backup-ionos.shRestore
Target RTO: 30 minutes
Scenario 1: Container crash (most common)
cd /opt/p24-infra/monitoring
docker compose up -d grafana
# grafana_data volume is preserved — no data lossScenario 2: Restore from Wasabi backup
# 1. Download latest backup
aws s3 cp s3://p24-infra/grafana/grafana-data-YYYY-MM-DD.tar.gz /tmp/ \
--endpoint-url https://s3.eu-central-2.wasabisys.com
# 2. Stop Grafana
cd /opt/p24-infra/monitoring
docker compose stop grafana
# 3. Restore volume
docker run --rm -v grafana_data:/data -v /tmp:/backup alpine \
sh -c "rm -rf /data/* && tar xzf /backup/grafana-data-YYYY-MM-DD.tar.gz -C /data"
# 4. Start Grafana
docker compose up -d grafana
# 5. Verify login and dashboards at https://grafana.vps-i1.infra.zintegrowana.onlineScenario 3: Fresh VPS (disaster recovery)
Dashboards are in the git repo — they will be provisioned automatically on first start.
Restore grafana_data from Wasabi to recover user accounts and UI-created alert rules.
All Prometheus alert rules live in monitoring/prometheus/rules/ — not in Grafana.
Healthcheck
Docker healthcheck: GET http://localhost:3000/api/health — defined in docker-compose.yml
External probe: blackbox-exporter via synthetic.yml Prometheus rules — EndpointDown alert fires within 5 minutes.
Manual check:
curl -s http://localhost:3000/api/health | python3 -m json.tool
# Expected: {"commit":"...", "database":"ok", "version":"11.3.0"}Password Rotation
GRAFANA_ADMIN_PASSWORD
# 1. Generate new password (32 chars)
NEW_PASS=$(openssl rand -base64 24)
# 2. Update .env on vps-i1
sed -i "s/^GRAFANA_ADMIN_PASSWORD=.*/GRAFANA_ADMIN_PASSWORD=${NEW_PASS}/" /opt/p24-infra/monitoring/.env
# 3. Update admin password via CLI (no restart needed)
docker exec monitoring-grafana-1 grafana-cli admin reset-admin-password "${NEW_PASS}"
# 4. Update GH Secret: gh secret set GRAFANA_ADMIN_PASSWORD -b "${NEW_PASS}" -R radieu/p24-infra
# 5. Update .env.local on local workstation
# 6. Log rotation in docs/secrets-rotation-log.mdGF_RENDERING_RENDERER_TOKEN
Update in .env and restart both grafana and renderer:
docker compose restart grafana rendererTroubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Dashboard shows “No data” | Prometheus/Thanos down | Check docker compose ps thanos-query prometheus |
| Supabase datasource error | grafana_readonly password wrong | Verify SUPABASE_GRAFANA_PASSWORD in .env |
| Supabase-REST returns 401 | apikey header missing | Ensure httpHeaderName1: apikey + httpHeaderValue1: ${SUPABASE_SERVICE_ROLE_KEY} in supabase-infinity.yml (see Datasources) |
| Renderer timeout | renderer container OOM | docker compose restart renderer |
| Cannot login after password rotation | Old password cached | Clear browser cookies |
| Provisioned dashboards disappeared | Volume mount issue | docker compose restart grafana then wait 60s |