Pinbox24 Infrastructure Map & Disaster Recovery Audit
Created: 2026-06-14
Author: Claude Code (p24-infra admin)
Scope: Pinbox24 Angular production stack — bms-1, bms-3, MongoDB rs0, AWS ECR, supporting services
Sources: bms-1 ops workbook, bms-3 ops workbook, infrastructure-overview.md, elements.md, 01-backups.md, cloud-services-operations.md, backup-exporter/app.py
Table of Contents
- Part 1: Pinbox24 Infrastructure Map
- Part 2: Disaster Recovery Audit
- Part 3: Workbook Audit
- Part 4: Action Plan
Part 1: Pinbox24 Infrastructure Map
1. Architecture Overview
INTERNET
│
┌────────────┴────────────┐
│ │
w3.pinbox24.com w4.pinbox24.com
api.w3.pinbox24.com api.w4.pinbox24.com
socket.w3.pinbox24.com s3-api.w4.pinbox24.com
│ │
▼ ▼
┌─────────────────────────────────────────────────────┐
│ bms-1 (94.23.26.113) │
│ OVH Kimsufi — Ubuntu 20.04.1 LTS EOL │
│ 8 vCPU · 32 GB RAM · 440 GB │
│ │
│ nginx-proxy (jwilder) │
│ nginx-proxy-letsencrypt (Let's Encrypt TLS) │
│ │
│ ┌── v4.2 Stack (Current Production) ─────────┐ │
│ │ v42-prod api.w4.pinbox24.com │ │
│ │ s3-v42-prod s3-api.w4.pinbox24.com │ │
│ │ s3-v2-v42-prod s3-v2-api.w4.pinbox24.com │ │
│ │ mailgun-v42-prod mailgun-api.w4.pinbox24.com│ │
│ │ v41-prod w4.pinbox24.com │ │
│ │ pdf-gen-v42-prod pdf-gen-api.w4.pinbox24.com│ │
│ │ v42-notify-prod api-notify.w4.pinbox24.com │ │
│ │ wkhtml-v42-prod (internal) │ │
│ │ git-deploy-v42-prod git-deploy-api.w4.p24. │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ ┌── v3.2 Stack (Legacy) ────────────────────┐ │
│ │ v31-prod w3.pinbox24.com │ │
│ │ v32-prod api.w3.pinbox24.com │ │
│ │ v32-prod-socket socket.w3.pinbox24.com │ │
│ │ v32-prod-reso w3.reso-integration.p24. │ │
│ │ s3-v32-prod* (internal S3 microsvcs) │ │
│ │ cron-v32-prod* (cron jobs, 3 variants) │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ Native: PM2 NodeChat:3001, Redis:6379, │
│ PostgreSQL:5432, GitLab runner, │
│ node_exporter:9100 │
└─────────────────────────────────────────────────────┘
│
│ MongoDB rs0 (port 27017)
▼
┌───────────────────────────────────────────────────────┐
│ MongoDB Replica Set (rs0) │
│ │
│ bms-3 (51.68.155.224) ──── PRIMARY or SECONDARY │
│ bms-2 (145.239.133.104) ──── SECONDARY (observer, │
│ non-voting, p0) │
│ bms-4 (54.36.123.110) ──── ARBITER (elections │
│ only, no data) │
└───────────────────────────────────────────────────────┘
External services wired into Pinbox24 production:
Pinbox24 backend ──► AWS ECR (563740926945.dkr.ecr.eu-central-1.amazonaws.com) [container registry]
──► Convertio.ai [PDF conversion — scheduled for replacement]
──► Mailgun (via mailgun-v42-prod container) [transactional email]
──► GitLab (git-deploy-api.w4.pinbox24.com) [CI/CD webhooks]
──► private-registry.dev.pinbox24.com [legacy v3.x images — status unknown]
2. Service Inventory
Production Server — bms-1 (94.23.26.113)
| Service | Container | Domain | Purpose | Image Source | Uptime |
|---|---|---|---|---|---|
| Frontend v4.1 | v41-prod | w4.pinbox24.com | Angular frontend | local image (no tag) | 14 months |
| Backend v4.2 | v42-prod | api.w4.pinbox24.com | Node.js main API | AWS ECR | 3 months |
| S3 microservice v4.2 | s3-v42-prod | s3-api.w4.pinbox24.com | File/document storage API | AWS ECR | 3 months |
| S3 v2 microservice | s3-v2-v42-prod | s3-v2-api.w4.pinbox24.com | S3 v2 storage API | AWS ECR | 3 months |
| Mailgun microservice | mailgun-v42-prod | mailgun-api.w4.pinbox24.com | Email sending | AWS ECR | 3 months |
| PDF generation | pdf-gen-v42-prod | pdf-gen-api.w4.pinbox24.com | PDF generation service | AWS ECR | 5 years |
| Push notifications | v42-notify-prod | api-notify.w4.pinbox24.com | Push notification service | AWS ECR | 5 years |
| HTML-to-PDF | wkhtml-v42-prod | internal | wkhtmltopdf wrapper | AWS ECR | 5 years |
| Deployment webhook | git-deploy-v42-prod | git-deploy-api.w4.pinbox24.com | GitLab CI integration | AWS ECR | 5 years |
| Frontend v3.1 (legacy) | v31-prod | w3.pinbox24.com | Angular frontend v3 | private-registry.dev.pinbox24.com | 4 years |
| Backend v3.2 (legacy) | v32-prod | api.w3.pinbox24.com | Node.js backend v3 | private-registry.dev.pinbox24.com | 4 months (restarted) |
| WebSocket backend v3.2 | v32-prod-socket | socket.w3.pinbox24.com | WebSocket server | untagged image | varies |
| RESO integration | v32-prod-reso | w3.reso-integration-addrecords.pinbox24.com | RESO API integration | untagged image | varies |
| S3 services v3.2 | s3-v32-prod* (4 containers) | internal | File storage microservices | private-registry.dev.pinbox24.com | varies |
| Cron jobs v3.2 | cron-v32-prod* (3 containers) | internal | Scheduled background tasks | private-registry.dev.pinbox24.com | varies |
| Reverse proxy | nginx-proxy | — | Auto-routing (jwilder) | Docker Hub | 6 months |
| TLS automation | nginx-proxy-letsencrypt | — | Let’s Encrypt cert renewal | Docker Hub | 6 months |
| Docker UI | portainer-pinbox24 | port 49154 | Portainer v1 (legacy) | Docker Hub | 5 years |
| Deprecated S3 | s3-v42-prod-02-25-old | internal | Superseded Feb 2025 | AWS ECR | 5 years |
| NodeChat | PM2 native | :3001 | Unknown chat service | local | 4 years |
| Redis | native | 172.17.0.1:6379 | In-memory cache/session | OS package | unknown |
| PostgreSQL | native | 127.0.0.1:5432 | Relational DB (unknown use) | OS package | unknown |
| GitLab runner | native | — | CI/CD pipeline runner | OS package | unknown |
Staging Server — bms-3 (51.68.155.224)
| Service | Container | Purpose | Image Source | Uptime |
|---|---|---|---|---|
| Backend v4.2 staging | v42-stage | Staging backend | AWS ECR | 3 months |
| S3 v4 staging | s3-v42-stage | Staging S3 | AWS ECR | 3 months |
| Frontend v4.1 staging | v41-stage | Staging frontend | AWS ECR | 6 months |
| Backend v3.2 staging | v32-stage | Legacy staging | AWS ECR | 4 months |
| S3 v3 staging | s3-v32-stage | Legacy S3 staging | AWS ECR | 4 months |
| Frontend v3.1 staging | v31-stage | Legacy frontend staging | AWS ECR | 5 months |
| GPS tracking | traccar | Traccar GPS server | Docker Hub | 4 months |
| Financial data | mt5 | MetaTrader 5 | local image | 5 months |
| Docker UI | portainer-pinbox24 | Portainer management | Docker Hub | 5 months |
| Reverse proxy | nginx-proxy | jwilder proxy | Docker Hub | 6 months |
| TLS automation | nginx-proxy-letsencrypt | Let’s Encrypt | Docker Hub | 6 months |
MongoDB Replica Set rs0
| Member | Server | IP | Role | RAM | Data Stored |
|---|---|---|---|---|---|
| bms-3 (ns3129867) | OVH Kimsufi | 51.68.155.224 | PRIMARY/SECONDARY | ~21.7 GB MongoDB | Full Pinbox24 data |
| bms-2 (ns3087638) | OVH Kimsufi | 145.239.133.104 | SECONDARY observer (non-voting, p0) | unknown | Full replica (cold) |
| bms-4 (ns3101999) | OVH Kimsufi | 54.36.123.110 | ARBITER | ~75 MB | No data |
Note: MongoDB admin credentials are NOT stored in .env.local or p24-infra secrets — they are managed via Pinbox24 app secrets / AWS Secrets Manager. This is a documentation gap.
3. Version Matrix
Pinbox24 uses version numbers to segment clients, not for gradual rollout. Multiple versions run simultaneously on the same server because different client organizations are on different product versions.
| Version | Frontend Domain | API Domain | Status | Architecture | Image Source |
|---|---|---|---|---|---|
| v4.2 | w4.pinbox24.com | api.w4.pinbox24.com | Current production | Angular + Node.js v4.x | AWS ECR |
| v4.1 | w4.pinbox24.com | (served by v42-prod) | Frontend only (running from local image — no registry tag) | Angular | Local image — unrecoverable if stopped |
| v3.2 | w3.pinbox24.com | api.w3.pinbox24.com | Legacy production (4+ years running) | Angular + Node.js v3.x | private-registry (unreachable?) |
| v3.1 | w3.pinbox24.com | (served by v32-prod) | Legacy frontend | Angular | private-registry (4 years) |
Why v4.1 frontend serves from w4.pinbox24.com: The frontend container (v41-prod) hosts the Angular SPA that calls api.w4.pinbox24.com. The v4.2 backend serves multiple frontend versions. This is the current architecture for the w4 environment.
v3.x clients: The w3.pinbox24.com subdomain serves clients still on the legacy 3.x stack. These clients have not been migrated to v4. No sunset timeline is documented.
RESO integration: v32-prod-reso appears to be a Real Estate Standards Organization (RESO) data import variant — likely a special deployment for real estate clients.
4. Data Flows
User Request Flow (v4.2 current production)
User browser → Cloudflare DNS → w4.pinbox24.com
→ bms-1 port 443
→ nginx-proxy (TLS terminated)
→ v41-prod (Angular SPA, static files)
│
├── API calls → api.w4.pinbox24.com → v42-prod (Node.js)
│ │
│ ├── MongoDB queries → bms-3:27017 (rs0 PRIMARY)
│ │ └── replicated to bms-2 (observer)
│ │
│ ├── File uploads → s3-api.w4.pinbox24.com → s3-v42-prod
│ │ └── [UNKNOWN — where does s3-v42-prod store files?]
│ │ (likely AWS S3 or local volume — not documented)
│ │
│ ├── Email send → mailgun-api.w4.pinbox24.com → mailgun-v42-prod
│ │ └── → Mailgun EU API → user email
│ │
│ └── PDF generation → pdf-gen-api.w4.pinbox24.com → pdf-gen-v42-prod
│ └── → wkhtml-v42-prod (wkhtmltopdf internal)
│
└── Push notifications → api-notify.w4.pinbox24.com → v42-notify-prod
File Storage Architecture (UNKNOWN — critical gap)
The s3-v42-prod and s3-v32-prod containers are described as “S3 microservices.” Their actual storage backend is not documented:
- They may proxy to AWS S3 buckets (most likely given AWS ECR usage)
- They may use Wasabi S3
- They may store files on local bms-1 disk volumes
- If local disk: 440 GB disk at 85% used is a risk
This is a critical DR gap — file storage location is unknown.
MongoDB Write Path
Application write → v42-prod (Node.js) → bms-3:27017 PRIMARY
→ rs0 replication → bms-2:27017 SECONDARY (observer, non-voting)
(bms-4 arbiter receives no data, participates only in elections)
Deployment Flow
Developer push to GitLab/GitHub → GitLab CI pipeline
→ Build Docker image
→ Push to AWS ECR (563740926945.dkr.ecr.eu-central-1.amazonaws.com)
→ Webhook → git-deploy-api.w4.pinbox24.com → git-deploy-v42-prod
→ docker pull from AWS ECR → docker stop old → docker run new
(nginx-proxy auto-detects VIRTUAL_HOST env var → routes traffic)
5. Deployment Pipeline
| Stage | Tool | Location | Auth |
|---|---|---|---|
| Source code | GitLab (repo location unknown) | Unknown — not documented | Unknown |
| CI build | GitLab CI runner | bms-1 (native GitLab runner) | GitLab token |
| Container registry | AWS ECR | 563740926945.dkr.ecr.eu-central-1.amazonaws.com | AWS IAM (ECR token, 12h TTL) |
| Deploy trigger | git-deploy-v42-prod webhook | bms-1 port 443 | Webhook secret (unknown) |
| Image pull | AWS ECR login | bms-1 + bms-3 | aws ecr get-login-password |
| Container start | docker (manual or git-deploy) | bms-1 / bms-3 | root / docker group |
Gap: The GitLab source repository location is not documented anywhere in p24-infra. The infrastructure overview mentions “GitLab runner active” on bms-1 but does not identify the GitLab instance (self-hosted? gitlab.com? what org?).
Gap: AWS ECR credentials (IAM user, access key, region) for bms-1 and bms-3 are not documented in p24-infra. The 12-hour token expiry means any long-running incident requiring fresh image pulls could fail without valid credentials.
6. External Dependencies
| Service | Provider | Purpose | Plan | Failure Impact | Backup/Alternative |
|---|---|---|---|---|---|
| AWS ECR | Amazon AWS | Container registry — all Pinbox24 images (21+ repos) | Paid | Cannot pull images for deploys or restarts | None — single registry |
| MongoDB rs0 | Self-hosted (bms-2/3/4) | Primary Pinbox24 database | Self-hosted | No writes if PRIMARY lost; data loss risk | bms-2 observer copy, manual failover |
| Convertio.ai | Convertio | PDF conversion (v4.x clients) | Paid SaaS | PDF features fail for affected clients | Scheduled for replacement — no current fallback |
| Mailgun EU | Sinch Mailgun | Transactional email | Paid | All user email stops | None documented |
| Cloudflare DNS | Cloudflare | DNS for pinbox24.com subdomains | Free | All traffic stops if DNS fails | Records documented — quick re-add possible |
| GitLab | Unknown | Source control + CI | Unknown | Cannot build or deploy new versions | GitHub mirror? Not documented |
| private-registry.dev.pinbox24.com | Unknown (self-hosted?) | Legacy v3.x container images | Unknown | v3.x containers cannot restart if stopped | Unknown — registry may not be reachable |
| AWS (other) | Amazon | ECR + possibly S3 for file storage | Paid | File storage loss if on AWS S3 | Not documented |
Critical gap: private-registry.dev.pinbox24.com status is unknown. The v3.x containers on bms-1 were pulled from this registry years ago. If the registry is unreachable (likely — described as “unreachable?” in bms-1 workbook), the v3.x stack cannot be rebuilt from scratch.
7. Staging vs Production
| Property | bms-1 (Production) | bms-3 (Staging) |
|---|---|---|
| IP | 94.23.26.113 | 51.68.155.224 |
| OS | Ubuntu 20.04.1 LTS (EOL) | Ubuntu 22.04.5 LTS |
| Hardware | 8 vCPU · 32 GB RAM · 440 GB | 8 vCPU · 32 GB RAM · 410 GB RAID1 NVMe |
| Disk usage | 85% (354/440 GB) | 44% (170/410 GB) |
| MongoDB role | None (client only) | PRIMARY/SECONDARY of rs0 |
| MongoDB RAM | — | 21.7 GB (leaves ~12 GB for containers) |
| v4.2 | Running (3 months, ECR) | Staging version (3 months, ECR) |
| v4.1 frontend | Running (local image, 14 months) | Staging version (6 months, ECR) |
| v3.2 | Running (4 years, private-registry) | Staging version (4 months, ECR) |
| v3.1 | Running (4 years, private-registry) | Staging version (5 months, ECR) |
| Traccar | Not present | Running (4 months) |
| MetaTrader 5 (mt5) | Not present | Running (5 months) — purpose undocumented |
| Monitoring | node_exporter added 2026-06-14 | Not connected to Prometheus |
| Firewall | ufw inactive, iptables only | Unknown |
| Portainer version | v1 (legacy, 5 years) | Legacy version |
| Verified restore | Never | Never |
Key difference: bms-3 hosts the MongoDB PRIMARY. This means the staging server is also the most critical database server in the Pinbox24 infrastructure. A staging workload crash or disk fill on bms-3 could trigger a MongoDB PRIMARY election or OOM kill.
MetaTrader 5 (mt5): Running on bms-3 staging for 5 months. Purpose completely undocumented. Likely connects to MetaTrader forex data feed — possibly used for financial/forex integration features in Pinbox24. No documentation exists for what data it processes, where it stores it, or what breaks if it stops.
Part 2: Disaster Recovery Audit
8. DR Readiness Score
Overall DR Score: 2/10 — Critical
| Category | Score | Finding |
|---|---|---|
| Database backups | 3/10 | MongoDB dumps exist on bms-1 (/root) but are local-only, unautomated, and last verified in Feb 2026 |
| Application container recovery | 1/10 | v3.x containers run from unreachable/untagged images — cannot restart |
| File storage backup | 0/10 | Storage backend not documented; no backup confirmed |
| Configuration backup | 2/10 | docker-compose files not in version control; no documented location |
| Credentials/secrets backup | 2/10 | MongoDB admin credentials not in p24-infra — location unknown |
| Restore procedures | 0/10 | No documented restore procedure exists for any Pinbox24 service |
| Restore testing | 0/10 | No restore drill ever performed |
| OS/infrastructure | 1/10 | Ubuntu 20.04 EOL on bms-1; disk at 85%; no automated off-site backup |
Summary: Pinbox24 production (bms-1) has essentially zero disaster recovery capability. The MongoDB data exists in a replica set but with no automated backups to off-site storage. Critical v3.x containers run from images that may no longer be pullable. There are no documented restore procedures, no restore drills, and no automated backup pipeline.
9. Backup Coverage Table
| Component | Backup Type | Schedule | Location | Last Verified | Status |
|---|---|---|---|---|---|
| MongoDB data (Pinbox24 all versions) | Manual mongodump | None — manual only | /root/w3-2026-02-05 (25 GB), /root/w4-2026-02-23 (19 GB), /root/w4-2026-02-24 (16 GB) on bms-1 | Feb 2026 (4+ months ago) | CRITICAL: local-only, unautomated, stale |
| Docker container images — v4.x | AWS ECR | On every deploy | 563740926945.dkr.ecr.eu-central-1.amazonaws.com | Last deploy (3 months ago) | OK: AWS ECR durable |
| Docker container images — v3.x | private-registry.dev.pinbox24.com | On every deploy (historical) | Unknown registry | 4 years ago | CRITICAL: registry status unknown; images may be lost |
| Docker container images — v4.1 frontend | Local-only (no tag) | Never | Only on bms-1 disk | Never | CRITICAL: if container stops, cannot restart |
| Docker container images — v3.2 socket, reso | Untagged local image | Never | Only on bms-1 disk | Never | CRITICAL: if containers stop, cannot restart |
| Docker compose / container configs | Not version-controlled | None | Unknown | Never | CRITICAL: no config backup |
| nginx-proxy vhost configs | Unknown | None | Unknown | Never | Gap: not documented |
| SSL certificates (Let’s Encrypt) | Auto-renewed via nginx-proxy-letsencrypt | Continuous | Local bms-1 filesystem | Never (automated) | OK in normal ops; no off-site backup |
| Environment variables (.env per container) | None | None | Local bms-1 only | Never | CRITICAL: no backup |
| Redis data (bms-1 native) | None documented | None | Local disk only | Never | Unknown risk — data purpose unknown |
| PostgreSQL data (bms-1 native, native install) | None documented | None | Local disk only | Never | CRITICAL: unknown data — no backup |
| bms-1 disk volumes (Docker volumes) | None | None | Local disk only | Never | CRITICAL: no off-site backup |
| Pinbox24 file uploads (S3 microservice) | Unknown | Unknown | Unknown storage backend | Never | CRITICAL: storage backend not identified |
| GitLab source code | Unknown | Unknown | GitLab (location unknown) | Unknown | Gap: repo location not documented |
| MongoDB rs0 replication | Continuous replication | Real-time | bms-2 SECONDARY | Implicit (ongoing) | Partial: DR copy exists but no tested failover procedure |
| bms-3 disk (MongoDB data) | None | None | Local /var/lib/mongodb | Never | CRITICAL: primary DB data not backed up off-site |
| bms-3 Docker volumes | None | None | Local disk | Never | No backup |
| MetaTrader 5 data (bms-3) | None | None | Local disk | Never | Unknown risk |
10. Restore Procedure Gaps
The following are required for a full Pinbox24 restore from zero — each is currently blocked:
-
Pinbox24 MongoDB restore
- Backups exist (Feb 2026 dumps on bms-1) but are local — restoring after bms-1 failure requires them
- No procedure written for which dump to use, which database to restore to which mongod
- No procedure for restoring into a fresh rs0 cluster
- Estimated data loss: up to 4+ months (since last dump)
-
v4.x container re-deployment
- ECR images exist — this is the healthiest part of the stack
- But: no documented docker-compose files or run commands stored in version control
- AWS ECR credentials not documented in p24-infra
- Without run commands/compose files, which ports, volumes, env vars, networks to use is unknown
-
v3.x container re-deployment
private-registry.dev.pinbox24.com— status unknown; likely inaccessiblev32-prod-socket,v32-prod-reso,v41-prodrun from untagged local images — permanently lost if the containers stop and images are removed- No way to rebuild without source code + build pipeline
-
nginx-proxy configuration
jwilder/nginx-proxyconfigures itself via containerVIRTUAL_HOSTenv vars — config lives in container env vars- Those env vars are only known if docker-compose files exist or
docker inspectis run against running containers - No static vhost config backup exists
-
SSL certificates
- Let’s Encrypt certs stored on bms-1 disk (nginx-proxy volumes)
- Loss of bms-1 = loss of certs; new certs require domain reachability + Let’s Encrypt rate limits
- No acme.json equivalent backed up to Wasabi (unlike vps-i1’s Caddy certs which are backed up)
-
Environment variables
- Each container has environment variables (MongoDB URIs, API keys, feature flags)
- No .env files backed up
- Restoring would require hunting down all values from Pinbox24 app secrets / AWS Secrets Manager
- p24-infra has no inventory of what keys are needed per container
-
Native services (Redis, PostgreSQL, PM2 NodeChat)
- Running outside Docker, not documented, not backed up
- PostgreSQL data purpose unknown — if it’s needed for Pinbox24 functionality, loss is permanent
- Redis is likely session cache (recoverable) but not confirmed
-
bms-3 MongoDB as staging + PRIMARY
- bms-3 hosts staging containers AND MongoDB PRIMARY simultaneously
- A staging disaster (disk fill, OOM) could corrupt or lose MongoDB data
- MongoDB has no off-site backup despite being the only source of truth
11. RTO/RPO Assessment
Estimated Recovery Time Objective (RTO): 72–168 hours (3–7 days) — IF MongoDB backups are fresh and sufficient human expertise is available. Without MongoDB admin credentials and container configs, indefinite.
Estimated Recovery Point Objective (RPO): Current state — up to 4+ months of data loss (last MongoDB dump Feb 2026). If MongoDB rs0 itself survives, RPO is near-zero for data — but the cluster has no tested failover.
| Scenario | Estimated RTO | Data Loss (RPO) | Blocking Issue |
|---|---|---|---|
| bms-1 full loss (fire/disk failure) | 7+ days | Up to 4 months MongoDB + all files | No container configs, no env vars, local-only backups |
| bms-3 full loss | 3–5 days | Near-zero (rs0 failover to bms-2) but application offline until server rebuilt | No staging configs backed up; MongoDB rs0 needs manual PRIMARY promotion |
| MongoDB PRIMARY lost (bms-3 loss) | 4–8 hours (technical) + days (procedure) | Near-zero data if rs0 fails over | Manual failover procedure not documented; bms-2 is non-voting observer, cannot auto-elect |
| Single v4.x container crash | Minutes | Zero | ECR images available |
| Single v3.x container crash (socket/reso) | Permanent loss of that container version | Zero data but service offline indefinitely | Untagged image — no rebuild path |
| v41-prod (local image) container crash | Permanent until rebuilt from source | Zero data | Local image only |
| AWS ECR account lost | Days (rebuild all images) | Zero data | All image tags lost; rebuild from GitLab source required |
Critical finding: Because bms-2 (the replica set observer) is a non-voting member with priority: 0, it cannot automatically become PRIMARY. If bms-3 goes down:
- rs0 has 1 SECONDARY (bms-2, non-voting) + 1 ARBITER (bms-4)
- No election quorum is achievable — the replica set freezes in read-only state
- Manual reconfiguration is required:
rs.reconfig({...}, {force: true})from bms-2 - No runbook for this scenario exists
12. Missing DR Documentation
The following runbooks do not exist anywhere in the p24-infra documentation:
- Pinbox24-MongoDBFailover.md — Manual PRIMARY promotion when bms-3 is unavailable
- Pinbox24-ContainerRestore.md — How to re-deploy all containers on bms-1 from scratch
- Pinbox24-MongoDBRestore.md — How to restore MongoDB from a mongodump
- Pinbox24-NewServerMigration.md — How to migrate Pinbox24 production to a new server
- Pinbox24-ImageInventory.md — Which ECR repos correspond to which containers, with tagged versions
- Pinbox24-SecretsInventory.md — What env vars each container needs (key names, not values)
- Pinbox24-FileStorageInventory.md — Where user-uploaded files are stored (AWS S3 bucket names, etc.)
- bms-1-OS-Upgrade-Plan.md — How to upgrade Ubuntu 20.04 → 24.04 without data loss
- bms-3-MongoDB-Disk-Full.md — Emergency procedure if bms-3 disk fills (MongoDB + staging data compete)
- PrivateRegistry-Recovery.md — How to rebuild v3.x images from source if private-registry is lost
13. Immediate DR Actions
Listed in priority order — each addresses a specific imminent failure risk:
-
Export all untagged/local container images from bms-1 (CRITICAL, do today)
v41-prod,v32-prod-socket,v32-prod-reso— if any of these containers stop, they are permanently unrecoverable- Action:
docker save <image_id> | gzip > /root/image-exports/<container>.tar.gzand upload to Wasabi - Commands (run on bms-1):
docker inspect v41-prod --format='{{.Image}}'+docker save [IMAGE_ID] | gzip | aws s3 cp - s3://p24-infra/bms-1/images/v41-prod.tar.gz
-
Create MongoDB backup script and run it now (CRITICAL, do today)
- Current dumps are from Feb 2026. MongoDB is the source of truth for all Pinbox24 data.
- Action: Create
mongodumpscript on bms-3, upload result to Wasabip24-infra/bms-3/mongodb/ - Schedule nightly via cron (same pattern as bms-1/vps-i1 backup spec)
-
Document all container run commands / docker-compose configuration (CRITICAL, this week)
- Run
docker inspecton every container on bms-1 and bms-3 to capture Image, Env, Mounts, NetworkMode, Cmd - Store result in
bms-1/container-inventory.jsoncommitted to repo or uploaded to Wasabi
- Run
-
Identify and document Pinbox24 file storage (HIGH, this week)
- SSH to bms-1, inspect s3-v42-prod container env vars:
docker inspect s3-v42-prod | grep -A 50 Env - Identify whether files go to local volumes, AWS S3, Wasabi, or elsewhere
- Document bucket names, credentials, and backup status
- SSH to bms-1, inspect s3-v42-prod container env vars:
-
Document MongoDB admin credentials location (CRITICAL, this week)
- Identify where MongoDB admin password is stored (Pinbox24 AWS Secrets Manager? .env file on bms-3?)
- Add a reference (NOT the value) to
docs/servers/p4-ovh-bms-3-ns3129867-operations.md - Without this, any MongoDB restore or failover requires a human who knows the password
-
Identify private-registry.dev.pinbox24.com (HIGH, this week)
- Determine if this registry is still running (test:
docker pull private-registry.dev.pinbox24.com/v31:latestfrom bms-1) - If unreachable, mark v3.x as “permanently frozen — do not restart” and document
- If reachable, document where it runs and add it to p24-infra inventory
- Determine if this registry is still running (test:
-
Set up automated Wasabi backup for bms-3 MongoDB (P1 after above)
- Add
backup-bms3.shto p24-infra scripts - Nightly
mongodump --gzip --archiveto Wasabip24-infra/bms-3/mongodb/YYYY-MM-DD.archive.gz - Alert on failure via Discord webhook
- Add
Part 3: Workbook Audit
14. Workbook Compliance Table
The following covers all registered Pinbox24 and shared infrastructure elements. Pinbox24-specific services (bms-1/bms-3 containers) are not in dev_r_services at all — a separate gap.
Registered in dev_r_services (docs/elements.md as of 2026-05-13)
| Service | Compliance Status | Workbook Location | Notes |
|---|---|---|---|
traccar (vps-i1) | Full (partial rotation) | docs/traccar-operations.md | Solid workbook |
monitoring-prometheus-1 | Partial | docs/monitoring-stack-operations.md | Missing healthcheck proc |
monitoring-thanos-sidecar-1 | Partial | docs/monitoring-stack-operations.md | Missing healthcheck proc |
monitoring-thanos-query-1 | Partial | docs/monitoring-stack-operations.md | Missing healthcheck proc |
monitoring-grafana-1 | Partial | docs/grafana-operations.md | Backup added 2026-05-14 |
monitoring-alertmanager-1 | Partial | docs/monitoring-stack-operations.md | Missing healthcheck |
monitoring-renderer-1 | Low | docs/monitoring-stack-operations.md | No separate workbook section |
monitoring-loki-1 | None | — | No workbook |
monitoring-promtail-1 | None | — | No workbook |
monitoring-blackbox-exporter-1 | Partial | — | No standalone workbook |
monitoring-caddy-1 | Partial | — | No standalone workbook |
monitoring-uptime-kuma-1 | None | — | No workbook, not in Prometheus |
monitoring-queue-exporter-1 | None | docs/monitoring-exporters-operations.md | No dedicated section |
monitoring-cost-exporter-1 | None | docs/monitoring-exporters-operations.md | No dedicated section |
monitoring-pg-stats-exporter-1 | None | docs/monitoring-exporters-operations.md | No dedicated section |
monitoring-backup-exporter-1 | None | docs/monitoring-exporters-operations.md | No dedicated section |
monitoring-gotenberg-1 | None | — | No workbook |
monitoring-pdf-service-1 | Partial | docs/pdf-service-operations.md | Exists |
openclaw-openclaw-gateway-1 | None | docs/openclaw-operations.md | Workbook exists but no compliance flag |
openclaw-openclaw-cli-1 | None | — | Exited(1), no workbook |
root-traefik-1 (vps-h1) | None | docs/traefik-operations.md | Workbook exists |
root-n8n-1 (vps-h1) | Partial | docs/n8n-operations.md | Workbook exists |
waha (vps-h1) | Partial | docs/waha-operations.md | Workbook exists |
NOT registered in dev_r_services — Pinbox24 stack
All Pinbox24 production containers on bms-1 and bms-3 are ABSENT from dev_r_services. The elements.md was last updated 2026-05-13 — before bms-1/bms-3 were inventoried on 2026-06-14. The servers themselves appear in a legacy form (vps-p24dev label) but the services are not registered.
| Service Group | compliance_workbook | Notes |
|---|---|---|
| bms-1 server record | No entry in dev_r_services | Only referenced as vps-p24dev legacy entry in elements.md |
| bms-3 server record | No entry in dev_r_services | Not registered |
| All 24 bms-1 containers | No entry | Completely absent |
| All 11 bms-3 containers | No entry | Completely absent |
| MongoDB rs0 as a service | No entry | Not registered as a service |
| AWS ECR (21 repos) | No dedicated workbook | Only mentioned in infrastructure-overview.md |
| GitLab (Pinbox24 CI) | No entry | Not registered, location unknown |
| private-registry.dev.pinbox24.com | No entry | Not registered |
| MetaTrader 5 (mt5, bms-3) | No entry, no workbook | Purpose undocumented |
15. Missing Workbooks Priority List
Ordered by business criticality:
| Priority | Service | Why Critical | Estimated Effort |
|---|---|---|---|
| P1 | MongoDB rs0 operational workbook | PRIMARY database for all Pinbox24 production data; failover requires documented procedure | 1d |
| P1 | bms-1 container inventory + config snapshot | 24 containers, many with unrecoverable images; no runbook to rebuild | 0.5d |
| P1 | Pinbox24 file storage workbook | S3 microservice storage backend unknown; potential silent data loss | 0.5d |
| P1 | Pinbox24 secrets inventory (key names only) | Cannot restore containers without knowing what env vars they require | 0.5d |
| P2 | bms-1 production server full workbook | Ubuntu 20.04 EOL, disk 85%, PM2/Redis/PostgreSQL native services | 1d |
| P2 | bms-3 staging + MongoDB workbook expansion | bms-3 workbook exists but MongoDB section is thin; OOM risk needs procedure | 0.5d |
| P2 | AWS ECR workbook | Container registry for v4.x; auth expires every 12h; 21 repos | 0.5d |
| P2 | v4.x deployment pipeline workbook | GitLab CI → ECR → git-deploy flow not documented end-to-end | 1d |
| P3 | v3.x legacy stack workbook | Legacy clients still active; sunset plan needed | 0.5d |
| P3 | MetaTrader 5 workbook | Running 5 months on bms-3; purpose and data unknown | 0.5d |
| P3 | private-registry.dev.pinbox24.com workbook | Status unclear; v3.x depends on it | 0.5d |
| P3 | Pinbox24 Angular (radieu/fuse-angular) repo workbook | Source code — build and deploy process | 0.5d |
16. Workbook Quality Issues
docs/servers/p4-ovh-bms-1-ns367522-operations.md
- Created 2026-06-14 — good start
- Missing: container env var inventory, Docker volume sizes, docker-compose equivalent configs
- Missing: backup section (currently just “none”)
- Missing: restore procedure
- Missing:
compliance_workbookupdate indev_r_services Open Taskssection documents issues but no timelines or owners
docs/servers/p4-ovh-bms-3-ns3129867-operations.md
- Created 2026-06-14 — thin
- Missing: MongoDB section is present but has no backup procedure, no failover procedure, no restore procedure
- Missing: container list on bms-3 is documented but no config/env details
- Missing: MetaTrader 5 purpose, data, and recovery
- Not yet connected to Prometheus — monitoring gap
docs/infrastructure-overview.md
- Section 2 documents bms-3 as “Pinbox24 Dev VPS” (legacy) — should be updated to reflect current role
- Section 11 (“Open / Unknown”) has accumulated items that have since been resolved or need resolution
- Secrets section (§9) still shows “ROTATION PENDING” entries from 2026-05-06 — status not updated after rotation
docs/elements.md
- Last updated 2026-05-13 — 1 month stale
- bms-1, bms-2, bms-3, bms-4 servers not in the Servers table
- All Pinbox24 containers absent
vps-p24deventry is a legacy stub that should be replaced with proper bms-1/bms-3 records
17. Workbook Creation Backlog
Ordered list of workbooks to write, by priority:
docs/pinbox24-mongodb-operations.md— rs0 administration: status, failover, restore, add/remove members, credential managementdocs/pinbox24-production-containers.md— all 24 bms-1 containers: run commands, volumes, env var keys, dependenciesdocs/pinbox24-file-storage.md— s3-v42-prod and s3-v32-prod: backend identification, bucket names, backup, restoredocs/pinbox24-secrets-inventory.md— key names (NOT values) per container, where they are stored, rotation owners- Update
docs/servers/p4-ovh-bms-1-ns367522-operations.md— add backup section, restore section, container config details - Update
docs/servers/p4-ovh-bms-3-ns3129867-operations.md— add MongoDB backup/failover, MetaTrader 5 section, monitoring setup docs/pinbox24-ecr-registry.md— 21 ECR repos, auth renewal, image lifecycle, which repos are activedocs/pinbox24-deployment-pipeline.md— GitLab → ECR → git-deploy → nginx-proxy flowdocs/pinbox24-v3x-legacy-workbook.md— v3.x stack: private registry status, image exports, sunset plan, active clientsdocs/pinbox24-metatrader5.md— mt5 purpose, data stored, dependencies, recovery- Update
docs/elements.md— add all bms-1/bms-3 containers and servers
Part 4: Action Plan
18. P1 Actions (Immediate)
These actions address imminent data loss or unrecoverable failure risk. Execute within 24–72 hours.
| # | Action | Why | How | Server |
|---|---|---|---|---|
| P1-1 | Export untagged container images from bms-1 to Wasabi | v41-prod, v32-prod-socket, v32-prod-reso will be permanently lost if containers stop | docker save $(docker inspect --format='{{.Image}}' v41-prod) | gzip | aws s3 cp - s3://p24-infra/bms-1/images/v41-prod-$(date +%F).tar.gz | bms-1 |
| P1-2 | Run MongoDB dump on bms-3 and upload to Wasabi | Last dump Feb 2026 — 4+ months of data at risk | mongodump --authenticationDatabase admin -u admin -p "$PW" --gzip --archive | aws s3 cp - s3://p24-infra/bms-3/mongodb/$(date +%F).archive.gz | bms-3 |
| P1-3 | Document MongoDB admin credential location | Cannot failover or restore without this | SSH bms-3, check /root/.env, Pinbox24 AWS Secrets Manager, or ask Pinbox24 team | human action |
| P1-4 | Run docker inspect --format json on all bms-1 containers and commit to repo | Configuration backup before anything breaks | docker inspect $(docker ps -aq) > /root/container-snapshot-$(date +%F).json then upload to Wasabi or commit to bms-1/ in repo | bms-1 |
| P1-5 | Test private-registry.dev.pinbox24.com reachability | v3.x containers depend on it; status unknown | docker pull private-registry.dev.pinbox24.com/test 2>&1 | bms-1 |
19. P2 Actions (This Week)
These improve documentation coverage and reduce ongoing risk.
| # | Action | Effort | Output |
|---|---|---|---|
| P2-1 | Identify and document Pinbox24 file storage (S3 microservice) | 2h | Add section to bms-1 workbook + new docs/pinbox24-file-storage.md |
| P2-2 | Create automated MongoDB backup script (backup-bms3.sh) | 4h | Script + cron entry on bms-3 + Wasabi upload + Discord alert on failure |
| P2-3 | Register bms-1 and bms-3 servers + all containers in dev_r_services | 3h | Update docs/elements.md + Supabase dev_r_services table |
| P2-4 | Write docs/pinbox24-mongodb-operations.md | 4h | rs0 status, failover procedure, restore from dump, credentials reference |
| P2-5 | Investigate and document native services on bms-1 (PM2, Redis, PostgreSQL) | 2h | Add section to bms-1 workbook; determine if PostgreSQL data is critical |
| P2-6 | Identify GitLab instance (host/org) for Pinbox24 source code | 1h | Add to infrastructure-overview.md §5 and elements.md |
| P2-7 | Connect bms-3 to Prometheus (install node_exporter) | 1h | apt install prometheus-node-exporter + add to prometheus.yml + set disk/RAM alerts |
| P2-8 | Update docs/elements.md to reflect 2026-06-14 inventory | 2h | Add bms-1, bms-2, bms-3, bms-4 to Servers table; add Pinbox24 containers |
20. P3 Actions (This Month)
These complete the documentation and improve the operational maturity of the Pinbox24 stack.
| # | Action | Effort | Output |
|---|---|---|---|
| P3-1 | Write full docs/pinbox24-production-containers.md | 1d | All 24 bms-1 containers: run commands, volumes, env var keys, network modes |
| P3-2 | Write docs/pinbox24-deployment-pipeline.md | 0.5d | GitLab → ECR → git-deploy → nginx-proxy documented end-to-end |
| P3-3 | Write docs/pinbox24-v3x-legacy-workbook.md | 0.5d | v3.x stack: image status, private registry, active clients, sunset plan |
| P3-4 | Write docs/pinbox24-metatrader5.md | 0.5d | mt5 purpose, data, broker/feed connections, recovery |
| P3-5 | Plan bms-1 OS upgrade from Ubuntu 20.04 → 24.04 | 1d | Migration plan + risk assessment + rollback strategy |
| P3-6 | Investigate and resolve disk pressure on bms-1 | 0.5d | Clean up ~100 GB of old backups in /root after verifying MongoDB dumps are on Wasabi |
| P3-7 | Perform first Pinbox24 MongoDB restore drill | 1d | Restore Feb 2026 dump into isolated container; verify data integrity |
| P3-8 | Consolidate container registry strategy | 0.5d | Migrate remaining containers from private-registry to AWS ECR; retire private-registry |
| P3-9 | Document Portainer v1 upgrade path on bms-1 and bms-3 | 0.5d | Upgrade plan to Portainer CE 2.x+ or Agent |
| P3-10 | Set up Prometheus alerts for bms-1 disk (alert at 90%) | 0.5h | Add DiskAlmostFull alert rule for bms-1; bms-1 is at 85% now |
| P3-11 | Implement firewall policy on bms-1 (ufw currently inactive) | 0.5d | Enable ufw, allow 22/80/443/49154/9100, deny all else |
| P3-12 | Evaluate and resolve OOM risk on bms-3 | 1d | MongoDB 21.7 GB RAM + staging containers sharing 32 GB total; plan dedicated MongoDB node or restrict container RAM |
Appendix: Key Credentials to Locate
The following credentials are required for Pinbox24 DR but their location is NOT documented in p24-infra. Finding and documenting them (key names and storage location only — never values in this file) is prerequisite to any meaningful restore.
| Credential | Why Needed | Where to Look |
|---|---|---|
| MongoDB admin password (rs0) | failover, restore, addArb | bms-3 /root/.env? AWS Secrets Manager? Pinbox24 team? |
| MongoDB keyFile content | all rs0 members must have the same keyFile to join | /etc/mongodb-keyfile on bms-3 — already transferred to bms-4 |
| AWS ECR auth (bms-1 + bms-3) | pulling new container images | ~/.aws/credentials or IAM role on bms-1/bms-3; not in p24-infra secrets |
| Pinbox24 container env vars | all containers need their env to start | docker inspect (capture before containers stop) |
| git-deploy webhook secret | deployment automation | Container env on git-deploy-v42-prod; not in p24-infra |
| private-registry.dev.pinbox24.com auth | pulling v3.x images | Unknown — not in p24-infra |
| Pinbox24 S3/file storage bucket name + credentials | file storage for user uploads | s3-v42-prod container env; not in p24-infra |
Document generated from available p24-infra repository documentation. Information gaps identified above reflect what is NOT documented and should be treated as action items, not assumptions.