Spec 07 — Internal status page
Purpose
Grafana has all the data, but no single “is the platform up right now” page. When something feels slow, the first move shouldn’t be “log in to Grafana, find the right dashboard, parse 10 panels”. An always-on status page — viewable from phone — answers that in 2 seconds.
Uptime Kuma is a single-container self-hosted status page. Same probes as spec 05, but presented as a public-ish “everything green?” board.
Rulebook
- Status page is read-only and public-ish. Hosted at
status.infra.zintegrowana.online. No login required (it’s not sensitive — it shows up/down, not URLs that aren’t already public). - Probes are duplicated by design. Kuma probes independently from Blackbox so a Prometheus outage doesn’t blank the status page.
- Maintenance windows are pre-declared. Use Kuma’s maintenance feature to suppress alerts during planned restarts.
Implementation plan
- Add
uptime-kumaservice tomonitoring/docker-compose.yml, single-binary, 1 GB volume. - Caddy route:
status.infra.zintegrowana.online → uptime-kuma:3001. - Manual setup via web UI (first-run only): create admin, add the 7 probe targets from spec 05.
- Export Kuma config to JSON, commit to
monitoring/uptime-kuma/config.jsonfor backup + restore (Kuma supports config import on bootstrap). - Add Kuma config volume to spec 01 backup list.
Acceptance criteria
-
https://status.infra.zintegrowana.onlineloads, shows 7 monitored services - Stopping a target makes the corresponding tile turn red within 60 s
- Kuma data volume included in nightly backup (verify in spec 01 output)
Cost impact
0 €. One small container.
Back-out plan
Remove uptime-kuma service + Caddy route + DNS won’t even be needed since wildcard covers it. Delete uptime_kuma_data volume.
Risks / open questions
- Risk: Kuma stores its config in SQLite; corruption = lose probe history. Mitigation: nightly volume backup.
- Q: Why not Grafana’s built-in stat panels? A: A dedicated status page loads in 200 ms on a 3G phone; a Grafana dashboard is heavy.
Bootstrap (post-merge deployment)
The artifacts PR ships compose + Caddy + docs only. The container boot and initial monitor setup (admin user + 7 monitors via Kuma’s web UI) is the manual checklist below — Kuma 1.x has no clean config-as-code import path, so first-run is web-UI driven. Reckoned time: ~15 min.
Step 1 — Pull repo + start container on vps-i1
ssh root@217.154.82.162 'cd /opt/p24-infra && git pull --ff-only origin main'
ssh root@217.154.82.162 'cd /opt/p24-infra/monitoring && docker compose up -d uptime-kuma'Verify the container is healthy:
ssh root@217.154.82.162 'cd /opt/p24-infra/monitoring && docker compose ps uptime-kuma'
ssh root@217.154.82.162 'cd /opt/p24-infra/monitoring && docker compose logs --tail=20 uptime-kuma'Step 2 — Reload Caddy to pick up the new route
ssh root@217.154.82.162 \
'cd /opt/p24-infra/monitoring && docker compose exec caddy caddy reload --config /etc/caddy/Caddyfile'Step 3 — DNS
Wildcard *.vps-i1.infra.zintegrowana.online already resolves to 217.154.82.162, so status.vps-i1.infra.zintegrowana.online works immediately. No Cloudflare change needed.
Step 4 — First-run admin
Visit https://status.vps-i1.infra.zintegrowana.online. Kuma’s first-boot wizard asks for an admin username and password — create one and save it to 1Password under p24-infra / Uptime Kuma / admin.
Step 5 — Create the 7 monitors
Per the list in monitoring/uptime-kuma/README.md: HTTP(s) monitor, heartbeat interval 60 s, retries 1. The 7 URLs are the same target list as spec 05’s Blackbox config — Kuma probes independently by design.
Create a public Status Page (Settings → Status Pages → Add) named p24-status and add all 7 monitors to it.
Step 6 — Export the config snapshot
Settings → Backup → Export → save the JSON. Commit it to monitoring/uptime-kuma/kuma-config.json in a follow-up PR — that file becomes the version-controlled snapshot we can diff over time and re-import on disaster recovery.
Step 7 — Verify acceptance criteria
https://status.vps-i1.infra.zintegrowana.onlineloads and shows all 7 services as green- Stop one container (e.g.
docker compose stop pdf-service— or use a non-critical target) and watch the corresponding tile flip to red within 60 s; restart the service and confirm it recovers - Run the nightly backup manually and confirm
uptime_kuma_dataappears in the resulting tarball listing (spec 01)