Spec 07 — Internal status page

Purpose

Grafana has all the data, but no single “is the platform up right now” page. When something feels slow, the first move shouldn’t be “log in to Grafana, find the right dashboard, parse 10 panels”. An always-on status page — viewable from phone — answers that in 2 seconds.

Uptime Kuma is a single-container self-hosted status page. Same probes as spec 05, but presented as a public-ish “everything green?” board.

Rulebook

Status page is read-only and public-ish. Hosted at status.infra.zintegrowana.online. No login required (it’s not sensitive — it shows up/down, not URLs that aren’t already public).
Probes are duplicated by design. Kuma probes independently from Blackbox so a Prometheus outage doesn’t blank the status page.
Maintenance windows are pre-declared. Use Kuma’s maintenance feature to suppress alerts during planned restarts.

Implementation plan

Add uptime-kuma service to monitoring/docker-compose.yml, single-binary, 1 GB volume.
Caddy route: status.infra.zintegrowana.online → uptime-kuma:3001.
Manual setup via web UI (first-run only): create admin, add the 7 probe targets from spec 05.
Export Kuma config to JSON, commit to monitoring/uptime-kuma/config.json for backup + restore (Kuma supports config import on bootstrap).
Add Kuma config volume to spec 01 backup list.

Acceptance criteria

https://status.infra.zintegrowana.online loads, shows 7 monitored services
Stopping a target makes the corresponding tile turn red within 60 s
Kuma data volume included in nightly backup (verify in spec 01 output)

Cost impact

0 €. One small container.

Back-out plan

Remove uptime-kuma service + Caddy route + DNS won’t even be needed since wildcard covers it. Delete uptime_kuma_data volume.

Risks / open questions

Risk: Kuma stores its config in SQLite; corruption = lose probe history. Mitigation: nightly volume backup.
Q: Why not Grafana’s built-in stat panels? A: A dedicated status page loads in 200 ms on a 3G phone; a Grafana dashboard is heavy.

Bootstrap (post-merge deployment)

The artifacts PR ships compose + Caddy + docs only. The container boot and initial monitor setup (admin user + 7 monitors via Kuma’s web UI) is the manual checklist below — Kuma 1.x has no clean config-as-code import path, so first-run is web-UI driven. Reckoned time: ~15 min.

Step 1 — Pull repo + start container on vps-i1

ssh root@217.154.82.162 'cd /opt/p24-infra && git pull --ff-only origin main'
ssh root@217.154.82.162 'cd /opt/p24-infra/monitoring && docker compose up -d uptime-kuma'

Verify the container is healthy:

ssh root@217.154.82.162 'cd /opt/p24-infra/monitoring && docker compose ps uptime-kuma'
ssh root@217.154.82.162 'cd /opt/p24-infra/monitoring && docker compose logs --tail=20 uptime-kuma'

Step 2 — Reload Caddy to pick up the new route

ssh root@217.154.82.162 \
  'cd /opt/p24-infra/monitoring && docker compose exec caddy caddy reload --config /etc/caddy/Caddyfile'

Step 3 — DNS

Wildcard *.vps-i1.infra.zintegrowana.online already resolves to 217.154.82.162, so status.vps-i1.infra.zintegrowana.online works immediately. No Cloudflare change needed.

Step 4 — First-run admin

Visit https://status.vps-i1.infra.zintegrowana.online. Kuma’s first-boot wizard asks for an admin username and password — create one and save it to 1Password under p24-infra / Uptime Kuma / admin.

Step 5 — Create the 7 monitors

Per the list in monitoring/uptime-kuma/README.md: HTTP(s) monitor, heartbeat interval 60 s, retries 1. The 7 URLs are the same target list as spec 05’s Blackbox config — Kuma probes independently by design.

Create a public Status Page (Settings → Status Pages → Add) named p24-status and add all 7 monitors to it.

Step 6 — Export the config snapshot

Settings → Backup → Export → save the JSON. Commit it to monitoring/uptime-kuma/kuma-config.json in a follow-up PR — that file becomes the version-controlled snapshot we can diff over time and re-import on disaster recovery.

Step 7 — Verify acceptance criteria

https://status.vps-i1.infra.zintegrowana.online loads and shows all 7 services as green
Stop one container (e.g. docker compose stop pdf-service — or use a non-critical target) and watch the corresponding tile flip to red within 60 s; restart the service and confirm it recovers
Run the nightly backup manually and confirm uptime_kuma_data appears in the resulting tarball listing (spec 01)

p24-infra Docs

Explorer

Internal status page (Uptime Kuma)