Grafana — Operations Workbook

Grafana 11.3.0 running on IONOS VPS (vps-i1). Primary observability dashboard for infrastructure metrics and business KPIs.


Architecture

IONOS VPS (217.154.82.162)
├── Container: monitoring-grafana-1   image: grafana/grafana:11.3.0
│   ├── port 3000                    web UI (proxied via Caddy → HTTPS)
│   ├── volume: grafana_data          users, alert rules, playlists, annotations
│   └── mounts:
│       └── ./grafana/provisioning:/etc/grafana/provisioning (ro)
│           ├── dashboards/           JSON dashboard definitions (from repo)
│           └── datasources/          Prometheus (Thanos) + Supabase PostgreSQL

└── Container: monitoring-renderer-1  image: grafana/grafana-image-renderer:3.11.6
    └── port 8081                    PNG renderer for report emails

Public URL: https://grafana.vps-i1.infra.zintegrowana.online
Also: https://infra.zintegrowana.online (Cloudflare CNAME alias)
Compose file: /opt/p24-infra/monitoring/docker-compose.yml


Config Management

FileLocationIn repo?Contains secrets?
docker-compose.yml/opt/p24-infra/monitoring/No
grafana/provisioning//opt/p24-infra/monitoring/grafana/provisioning/No
.env/opt/p24-infra/monitoring/.env.env.example onlyYes
grafana_data volumeDocker volumeNo (users, alerts — not secrets)

Key environment variables

VariablePurpose
GRAFANA_ADMIN_PASSWORDAdmin login password
GF_RENDERING_RENDERER_TOKENShared secret between Grafana and renderer
SUPABASE_GRAFANA_PASSWORDgrafana_readonly DB role password
SMTP_HOST / SMTP_USER / SMTP_PASSWORDAlert email delivery via Mailgun

What’s provisioned vs. persisted

  • Provisioned from repo (safe — always recoverable): all dashboard JSON, datasource definitions
  • Persisted in grafana_data volume (must be backed up): user accounts, alert rules created via UI, playlists, annotation data, API keys

Datasources

Provisioned from monitoring/grafana/provisioning/datasources/ (in repo, no secrets — keys are env-var references resolved from .env).

DatasourceuidTypeAuthUse for
Prometheus (via Thanos Query)prometheusprometheusnone (internal)Infrastructure metrics
Supabasesupabasepostgres (direct :5432)grafana_readonly role (SUPABASE_GRAFANA_PASSWORD)Business KPIs — RLS-respecting, read-only
Supabase-RESTsupabase-restyesoreyeram-infinity-datasourceSUPABASE_SERVICE_ROLE_KEYQuerying Supabase over the REST/PostgREST API

Supabase-REST (Infinity datasource)

File: datasources/supabase-infinity.yml. Requires the yesoreyeram-infinity-datasource plugin, installed via GF_INSTALL_PLUGINS in docker-compose.yml.

⚠️ The apikey header is mandatory. Supabase’s REST API runs PostgREST behind a Kong gateway that authenticates on the apikey header. A bearer token alone is not enough — Kong returns 401 "No API key found in request". The key must therefore be sent twice:

    jsonData:
      auth_method: bearerToken
      httpHeaderName1: apikey                       # Kong gateway auth
    secureJsonData:
      bearerToken: ${SUPABASE_SERVICE_ROLE_KEY}      # PostgREST role (Authorization: Bearer)
      httpHeaderValue1: ${SUPABASE_SERVICE_ROLE_KEY} # apikey header value

⚠️ This datasource uses the service_role key, which bypasses RLS. For RLS-sensitive or least-privilege queries, prefer the direct-Postgres Supabase datasource (grafana_readonly). Reserve Supabase-REST for cases that genuinely need the REST API.

Quick verify (from vps-i1) — 200 = working, 401 = apikey header missing:

KEY=$(grep -E '^SUPABASE_SERVICE_ROLE_KEY=' /opt/p24-infra/monitoring/.env | cut -d= -f2-)
curl -s -o /dev/null -w '%{http_code}\n' \
  -H "apikey: $KEY" -H "Authorization: Bearer $KEY" \
  "https://mwkqmgadqnkkihjdeqsi.supabase.co/rest/v1/dev_r_services?select=*&limit=1"

Registered in dev_r_services as grafana-supabase-rest-datasource (child of the grafana element).


Deployment

Fresh install

# On vps-i1 — assumes repo is already at /opt/p24-infra
cd /opt/p24-infra/monitoring
docker compose up -d grafana renderer

Update Grafana version

  1. Update image tag in docker-compose.yml
  2. Commit and push to main
  3. On vps-i1:
    cd /opt/p24-infra/monitoring
    git pull
    docker compose pull grafana renderer
    docker compose up -d grafana renderer
  4. Verify: https://grafana.vps-i1.infra.zintegrowana.online → Help → About

Add a new dashboard

  1. Create/edit JSON in monitoring/grafana/provisioning/dashboards/
  2. Commit and push
  3. On vps-i1: git pull — Grafana hot-reloads provisioned dashboards within ~60s, or restart: docker compose restart grafana

Backup

What needs backing up

DataStorageMethodSchedule
Dashboard JSONGit repoProvisioned from repo — no separate backup neededAlways current
Datasource configGit repoProvisioned from repoAlways current
grafana_data volume (users, alert rules)Docker volumegrafana-backup.yml GH Action → Wasabi p24-infra/grafana/Nightly 03:15 UTC

Manual backup (emergency)

# On vps-i1
docker exec monitoring-grafana-1 grafana-cli admin export-settings > /tmp/grafana-settings.json
docker run --rm -v grafana_data:/data -v /tmp:/backup alpine \
  tar czf /backup/grafana-data-$(date +%F).tar.gz -C /data .
# Upload to Wasabi manually or via scripts/backup-ionos.sh

Restore

Target RTO: 30 minutes

Scenario 1: Container crash (most common)

cd /opt/p24-infra/monitoring
docker compose up -d grafana
# grafana_data volume is preserved — no data loss

Scenario 2: Restore from Wasabi backup

# 1. Download latest backup
aws s3 cp s3://p24-infra/grafana/grafana-data-YYYY-MM-DD.tar.gz /tmp/ \
  --endpoint-url https://s3.eu-central-2.wasabisys.com
 
# 2. Stop Grafana
cd /opt/p24-infra/monitoring
docker compose stop grafana
 
# 3. Restore volume
docker run --rm -v grafana_data:/data -v /tmp:/backup alpine \
  sh -c "rm -rf /data/* && tar xzf /backup/grafana-data-YYYY-MM-DD.tar.gz -C /data"
 
# 4. Start Grafana
docker compose up -d grafana
 
# 5. Verify login and dashboards at https://grafana.vps-i1.infra.zintegrowana.online

Scenario 3: Fresh VPS (disaster recovery)

Dashboards are in the git repo — they will be provisioned automatically on first start. Restore grafana_data from Wasabi to recover user accounts and UI-created alert rules. All Prometheus alert rules live in monitoring/prometheus/rules/ — not in Grafana.


Healthcheck

Docker healthcheck: GET http://localhost:3000/api/health — defined in docker-compose.yml

External probe: blackbox-exporter via synthetic.yml Prometheus rules — EndpointDown alert fires within 5 minutes.

Manual check:

curl -s http://localhost:3000/api/health | python3 -m json.tool
# Expected: {"commit":"...", "database":"ok", "version":"11.3.0"}

Password Rotation

GRAFANA_ADMIN_PASSWORD

# 1. Generate new password (32 chars)
NEW_PASS=$(openssl rand -base64 24)
 
# 2. Update .env on vps-i1
sed -i "s/^GRAFANA_ADMIN_PASSWORD=.*/GRAFANA_ADMIN_PASSWORD=${NEW_PASS}/" /opt/p24-infra/monitoring/.env
 
# 3. Update admin password via CLI (no restart needed)
docker exec monitoring-grafana-1 grafana-cli admin reset-admin-password "${NEW_PASS}"
 
# 4. Update GH Secret: gh secret set GRAFANA_ADMIN_PASSWORD -b "${NEW_PASS}" -R radieu/p24-infra
 
# 5. Update .env.local on local workstation
 
# 6. Log rotation in docs/secrets-rotation-log.md

GF_RENDERING_RENDERER_TOKEN

Update in .env and restart both grafana and renderer:

docker compose restart grafana renderer

Troubleshooting

SymptomCauseFix
Dashboard shows “No data”Prometheus/Thanos downCheck docker compose ps thanos-query prometheus
Supabase datasource errorgrafana_readonly password wrongVerify SUPABASE_GRAFANA_PASSWORD in .env
Supabase-REST returns 401apikey header missingEnsure httpHeaderName1: apikey + httpHeaderValue1: ${SUPABASE_SERVICE_ROLE_KEY} in supabase-infinity.yml (see Datasources)
Renderer timeoutrenderer container OOMdocker compose restart renderer
Cannot login after password rotationOld password cachedClear browser cookies
Provisioned dashboards disappearedVolume mount issuedocker compose restart grafana then wait 60s