05 — Backups & Disaster Recovery
Backup coverage, retention policies, and disaster-recovery procedures for all stateful systems: MongoDB rs0, Supabase PostgreSQL, Traccar MySQL, and Wasabi S3.
CRITICAL GAP: The last verified MongoDB backup restore test is 4+ months old. The backup-exporter checks Wasabi for backup-status JSON files — if alerts are not firing, confirm the files are actually current. A restore drill is overdue.
Key Documents
| Document | Description |
|---|---|
| 01-backups.md | Backup improvement plan — coverage gaps, recommendations |
Backup Coverage by System
| System | Backup method | Location | Frequency | Last verified |
|---|---|---|---|---|
| MongoDB rs0 | mongodump → Wasabi S3 | p24-infra bucket | nightly | >4 months ago — CRITICAL |
| Supabase PostgreSQL | Supabase managed backups (daily) + pg_dump | Supabase + Wasabi | daily | Unknown |
| Traccar MySQL | mysqldump → local + Wasabi | p24-infra bucket | nightly | Unknown |
| n8n PostgreSQL (bms-4) | pg_dump via cron | Wasabi | nightly | Unknown |
| Grafana config | Git (monitoring/) | GitHub | on commit | Current |
Wasabi S3 Bucket Layout
Bucket: p24-infra — region eu-central-2 — endpoint s3.eu-central-2.wasabisys.com
| Folder | Contents |
|---|---|
thanos/ | Prometheus TSDB blocks (2h chunks, uploaded by Thanos sidecar) |
pdfs/ | Generated fleet inspection PDFs |
| Bucket root | Backup status JSON files polled by backup-exporter |
Backup Exporter
The backup-exporter container on vps-i1 polls Wasabi for backup status JSON files and exposes metrics on :9220. If the JSON is stale, Prometheus fires a BackupStale alert → Alertmanager sends email.
See monitoring-exporters-operations.md for configuration.
Disaster Recovery
No dedicated DR runbook exists yet — creating one is tracked in README.
Interim priorities for a DR scenario:
- MongoDB rs0 — bms-2 is PRIMARY, bms-3 is SECONDARY, bms-4 is ARBITER. If PRIMARY fails, rs0 elects automatically.
- Supabase — managed service; failover is Supabase’s responsibility.
- vps-i1 (monitoring) — monitoring is non-critical for platform availability; restore from git +
.env.bak. - bms-1 (Pinbox24 production) — no hot standby; restore from Docker image tags on ECR + last
mongodump.