Pinbox24 Infrastructure Map & Disaster Recovery Audit

Created: 2026-06-14
Author: Claude Code (p24-infra admin)
Scope: Pinbox24 Angular production stack — bms-1, bms-3, MongoDB rs0, AWS ECR, supporting services
Sources: bms-1 ops workbook, bms-3 ops workbook, infrastructure-overview.md, elements.md, 01-backups.md, cloud-services-operations.md, backup-exporter/app.py


Table of Contents


Part 1: Pinbox24 Infrastructure Map

1. Architecture Overview

                        INTERNET
                           │
              ┌────────────┴────────────┐
              │                         │
         w3.pinbox24.com          w4.pinbox24.com
       api.w3.pinbox24.com      api.w4.pinbox24.com
     socket.w3.pinbox24.com   s3-api.w4.pinbox24.com
              │                         │
              ▼                         ▼
     ┌─────────────────────────────────────────────────────┐
     │              bms-1 (94.23.26.113)                   │
     │         OVH Kimsufi — Ubuntu 20.04.1 LTS EOL        │
     │              8 vCPU · 32 GB RAM · 440 GB            │
     │                                                     │
     │  nginx-proxy (jwilder)                              │
     │  nginx-proxy-letsencrypt (Let's Encrypt TLS)        │
     │                                                     │
     │  ┌── v4.2 Stack (Current Production) ─────────┐    │
     │  │  v42-prod         api.w4.pinbox24.com       │    │
     │  │  s3-v42-prod      s3-api.w4.pinbox24.com    │    │
     │  │  s3-v2-v42-prod   s3-v2-api.w4.pinbox24.com │    │
     │  │  mailgun-v42-prod mailgun-api.w4.pinbox24.com│    │
     │  │  v41-prod         w4.pinbox24.com            │    │
     │  │  pdf-gen-v42-prod pdf-gen-api.w4.pinbox24.com│    │
     │  │  v42-notify-prod  api-notify.w4.pinbox24.com │    │
     │  │  wkhtml-v42-prod  (internal)                 │    │
     │  │  git-deploy-v42-prod git-deploy-api.w4.p24.  │    │
     │  └─────────────────────────────────────────────┘    │
     │                                                     │
     │  ┌── v3.2 Stack (Legacy) ────────────────────┐      │
     │  │  v31-prod         w3.pinbox24.com          │      │
     │  │  v32-prod         api.w3.pinbox24.com      │      │
     │  │  v32-prod-socket  socket.w3.pinbox24.com   │      │
     │  │  v32-prod-reso    w3.reso-integration.p24. │      │
     │  │  s3-v32-prod*     (internal S3 microsvcs)  │      │
     │  │  cron-v32-prod*   (cron jobs, 3 variants)  │      │
     │  └─────────────────────────────────────────────┘    │
     │                                                     │
     │  Native: PM2 NodeChat:3001, Redis:6379,             │
     │          PostgreSQL:5432, GitLab runner,            │
     │          node_exporter:9100                         │
     └─────────────────────────────────────────────────────┘
                           │
                           │ MongoDB rs0 (port 27017)
                           ▼
     ┌───────────────────────────────────────────────────────┐
     │          MongoDB Replica Set (rs0)                    │
     │                                                       │
     │  bms-3 (51.68.155.224)  ────  PRIMARY or SECONDARY   │
     │  bms-2 (145.239.133.104) ────  SECONDARY (observer,  │
     │                                 non-voting, p0)      │
     │  bms-4 (54.36.123.110)  ────  ARBITER (elections     │
     │                                 only, no data)       │
     └───────────────────────────────────────────────────────┘

External services wired into Pinbox24 production:

Pinbox24 backend ──► AWS ECR (563740926945.dkr.ecr.eu-central-1.amazonaws.com)  [container registry]
                 ──► Convertio.ai  [PDF conversion — scheduled for replacement]
                 ──► Mailgun (via mailgun-v42-prod container)  [transactional email]
                 ──► GitLab (git-deploy-api.w4.pinbox24.com)  [CI/CD webhooks]
                 ──► private-registry.dev.pinbox24.com  [legacy v3.x images — status unknown]

2. Service Inventory

Production Server — bms-1 (94.23.26.113)

ServiceContainerDomainPurposeImage SourceUptime
Frontend v4.1v41-prodw4.pinbox24.comAngular frontendlocal image (no tag)14 months
Backend v4.2v42-prodapi.w4.pinbox24.comNode.js main APIAWS ECR3 months
S3 microservice v4.2s3-v42-prods3-api.w4.pinbox24.comFile/document storage APIAWS ECR3 months
S3 v2 microservices3-v2-v42-prods3-v2-api.w4.pinbox24.comS3 v2 storage APIAWS ECR3 months
Mailgun microservicemailgun-v42-prodmailgun-api.w4.pinbox24.comEmail sendingAWS ECR3 months
PDF generationpdf-gen-v42-prodpdf-gen-api.w4.pinbox24.comPDF generation serviceAWS ECR5 years
Push notificationsv42-notify-prodapi-notify.w4.pinbox24.comPush notification serviceAWS ECR5 years
HTML-to-PDFwkhtml-v42-prodinternalwkhtmltopdf wrapperAWS ECR5 years
Deployment webhookgit-deploy-v42-prodgit-deploy-api.w4.pinbox24.comGitLab CI integrationAWS ECR5 years
Frontend v3.1 (legacy)v31-prodw3.pinbox24.comAngular frontend v3private-registry.dev.pinbox24.com4 years
Backend v3.2 (legacy)v32-prodapi.w3.pinbox24.comNode.js backend v3private-registry.dev.pinbox24.com4 months (restarted)
WebSocket backend v3.2v32-prod-socketsocket.w3.pinbox24.comWebSocket serveruntagged imagevaries
RESO integrationv32-prod-resow3.reso-integration-addrecords.pinbox24.comRESO API integrationuntagged imagevaries
S3 services v3.2s3-v32-prod* (4 containers)internalFile storage microservicesprivate-registry.dev.pinbox24.comvaries
Cron jobs v3.2cron-v32-prod* (3 containers)internalScheduled background tasksprivate-registry.dev.pinbox24.comvaries
Reverse proxynginx-proxyAuto-routing (jwilder)Docker Hub6 months
TLS automationnginx-proxy-letsencryptLet’s Encrypt cert renewalDocker Hub6 months
Docker UIportainer-pinbox24port 49154Portainer v1 (legacy)Docker Hub5 years
Deprecated S3s3-v42-prod-02-25-oldinternalSuperseded Feb 2025AWS ECR5 years
NodeChatPM2 native:3001Unknown chat servicelocal4 years
Redisnative172.17.0.1:6379In-memory cache/sessionOS packageunknown
PostgreSQLnative127.0.0.1:5432Relational DB (unknown use)OS packageunknown
GitLab runnernativeCI/CD pipeline runnerOS packageunknown

Staging Server — bms-3 (51.68.155.224)

ServiceContainerPurposeImage SourceUptime
Backend v4.2 stagingv42-stageStaging backendAWS ECR3 months
S3 v4 stagings3-v42-stageStaging S3AWS ECR3 months
Frontend v4.1 stagingv41-stageStaging frontendAWS ECR6 months
Backend v3.2 stagingv32-stageLegacy stagingAWS ECR4 months
S3 v3 stagings3-v32-stageLegacy S3 stagingAWS ECR4 months
Frontend v3.1 stagingv31-stageLegacy frontend stagingAWS ECR5 months
GPS trackingtraccarTraccar GPS serverDocker Hub4 months
Financial datamt5MetaTrader 5local image5 months
Docker UIportainer-pinbox24Portainer managementDocker Hub5 months
Reverse proxynginx-proxyjwilder proxyDocker Hub6 months
TLS automationnginx-proxy-letsencryptLet’s EncryptDocker Hub6 months

MongoDB Replica Set rs0

MemberServerIPRoleRAMData Stored
bms-3 (ns3129867)OVH Kimsufi51.68.155.224PRIMARY/SECONDARY~21.7 GB MongoDBFull Pinbox24 data
bms-2 (ns3087638)OVH Kimsufi145.239.133.104SECONDARY observer (non-voting, p0)unknownFull replica (cold)
bms-4 (ns3101999)OVH Kimsufi54.36.123.110ARBITER~75 MBNo data

Note: MongoDB admin credentials are NOT stored in .env.local or p24-infra secrets — they are managed via Pinbox24 app secrets / AWS Secrets Manager. This is a documentation gap.


3. Version Matrix

Pinbox24 uses version numbers to segment clients, not for gradual rollout. Multiple versions run simultaneously on the same server because different client organizations are on different product versions.

VersionFrontend DomainAPI DomainStatusArchitectureImage Source
v4.2w4.pinbox24.comapi.w4.pinbox24.comCurrent productionAngular + Node.js v4.xAWS ECR
v4.1w4.pinbox24.com(served by v42-prod)Frontend only (running from local image — no registry tag)AngularLocal image — unrecoverable if stopped
v3.2w3.pinbox24.comapi.w3.pinbox24.comLegacy production (4+ years running)Angular + Node.js v3.xprivate-registry (unreachable?)
v3.1w3.pinbox24.com(served by v32-prod)Legacy frontendAngularprivate-registry (4 years)

Why v4.1 frontend serves from w4.pinbox24.com: The frontend container (v41-prod) hosts the Angular SPA that calls api.w4.pinbox24.com. The v4.2 backend serves multiple frontend versions. This is the current architecture for the w4 environment.

v3.x clients: The w3.pinbox24.com subdomain serves clients still on the legacy 3.x stack. These clients have not been migrated to v4. No sunset timeline is documented.

RESO integration: v32-prod-reso appears to be a Real Estate Standards Organization (RESO) data import variant — likely a special deployment for real estate clients.


4. Data Flows

User Request Flow (v4.2 current production)

User browser → Cloudflare DNS → w4.pinbox24.com
    → bms-1 port 443
    → nginx-proxy (TLS terminated)
    → v41-prod (Angular SPA, static files)
        │
        ├── API calls → api.w4.pinbox24.com → v42-prod (Node.js)
        │       │
        │       ├── MongoDB queries → bms-3:27017 (rs0 PRIMARY)
        │       │       └── replicated to bms-2 (observer)
        │       │
        │       ├── File uploads → s3-api.w4.pinbox24.com → s3-v42-prod
        │       │       └── [UNKNOWN — where does s3-v42-prod store files?]
        │       │           (likely AWS S3 or local volume — not documented)
        │       │
        │       ├── Email send → mailgun-api.w4.pinbox24.com → mailgun-v42-prod
        │       │       └── → Mailgun EU API → user email
        │       │
        │       └── PDF generation → pdf-gen-api.w4.pinbox24.com → pdf-gen-v42-prod
        │               └── → wkhtml-v42-prod (wkhtmltopdf internal)
        │
        └── Push notifications → api-notify.w4.pinbox24.com → v42-notify-prod

File Storage Architecture (UNKNOWN — critical gap)

The s3-v42-prod and s3-v32-prod containers are described as “S3 microservices.” Their actual storage backend is not documented:

  • They may proxy to AWS S3 buckets (most likely given AWS ECR usage)
  • They may use Wasabi S3
  • They may store files on local bms-1 disk volumes
  • If local disk: 440 GB disk at 85% used is a risk

This is a critical DR gap — file storage location is unknown.

MongoDB Write Path

Application write → v42-prod (Node.js) → bms-3:27017 PRIMARY
    → rs0 replication → bms-2:27017 SECONDARY (observer, non-voting)
    (bms-4 arbiter receives no data, participates only in elections)

Deployment Flow

Developer push to GitLab/GitHub → GitLab CI pipeline
    → Build Docker image
    → Push to AWS ECR (563740926945.dkr.ecr.eu-central-1.amazonaws.com)
    → Webhook → git-deploy-api.w4.pinbox24.com → git-deploy-v42-prod
    → docker pull from AWS ECR → docker stop old → docker run new
    (nginx-proxy auto-detects VIRTUAL_HOST env var → routes traffic)

5. Deployment Pipeline

StageToolLocationAuth
Source codeGitLab (repo location unknown)Unknown — not documentedUnknown
CI buildGitLab CI runnerbms-1 (native GitLab runner)GitLab token
Container registryAWS ECR563740926945.dkr.ecr.eu-central-1.amazonaws.comAWS IAM (ECR token, 12h TTL)
Deploy triggergit-deploy-v42-prod webhookbms-1 port 443Webhook secret (unknown)
Image pullAWS ECR loginbms-1 + bms-3aws ecr get-login-password
Container startdocker (manual or git-deploy)bms-1 / bms-3root / docker group

Gap: The GitLab source repository location is not documented anywhere in p24-infra. The infrastructure overview mentions “GitLab runner active” on bms-1 but does not identify the GitLab instance (self-hosted? gitlab.com? what org?).

Gap: AWS ECR credentials (IAM user, access key, region) for bms-1 and bms-3 are not documented in p24-infra. The 12-hour token expiry means any long-running incident requiring fresh image pulls could fail without valid credentials.


6. External Dependencies

ServiceProviderPurposePlanFailure ImpactBackup/Alternative
AWS ECRAmazon AWSContainer registry — all Pinbox24 images (21+ repos)PaidCannot pull images for deploys or restartsNone — single registry
MongoDB rs0Self-hosted (bms-2/3/4)Primary Pinbox24 databaseSelf-hostedNo writes if PRIMARY lost; data loss riskbms-2 observer copy, manual failover
Convertio.aiConvertioPDF conversion (v4.x clients)Paid SaaSPDF features fail for affected clientsScheduled for replacement — no current fallback
Mailgun EUSinch MailgunTransactional emailPaidAll user email stopsNone documented
Cloudflare DNSCloudflareDNS for pinbox24.com subdomainsFreeAll traffic stops if DNS failsRecords documented — quick re-add possible
GitLabUnknownSource control + CIUnknownCannot build or deploy new versionsGitHub mirror? Not documented
private-registry.dev.pinbox24.comUnknown (self-hosted?)Legacy v3.x container imagesUnknownv3.x containers cannot restart if stoppedUnknown — registry may not be reachable
AWS (other)AmazonECR + possibly S3 for file storagePaidFile storage loss if on AWS S3Not documented

Critical gap: private-registry.dev.pinbox24.com status is unknown. The v3.x containers on bms-1 were pulled from this registry years ago. If the registry is unreachable (likely — described as “unreachable?” in bms-1 workbook), the v3.x stack cannot be rebuilt from scratch.


7. Staging vs Production

Propertybms-1 (Production)bms-3 (Staging)
IP94.23.26.11351.68.155.224
OSUbuntu 20.04.1 LTS (EOL)Ubuntu 22.04.5 LTS
Hardware8 vCPU · 32 GB RAM · 440 GB8 vCPU · 32 GB RAM · 410 GB RAID1 NVMe
Disk usage85% (354/440 GB)44% (170/410 GB)
MongoDB roleNone (client only)PRIMARY/SECONDARY of rs0
MongoDB RAM21.7 GB (leaves ~12 GB for containers)
v4.2Running (3 months, ECR)Staging version (3 months, ECR)
v4.1 frontendRunning (local image, 14 months)Staging version (6 months, ECR)
v3.2Running (4 years, private-registry)Staging version (4 months, ECR)
v3.1Running (4 years, private-registry)Staging version (5 months, ECR)
TraccarNot presentRunning (4 months)
MetaTrader 5 (mt5)Not presentRunning (5 months) — purpose undocumented
Monitoringnode_exporter added 2026-06-14Not connected to Prometheus
Firewallufw inactive, iptables onlyUnknown
Portainer versionv1 (legacy, 5 years)Legacy version
Verified restoreNeverNever

Key difference: bms-3 hosts the MongoDB PRIMARY. This means the staging server is also the most critical database server in the Pinbox24 infrastructure. A staging workload crash or disk fill on bms-3 could trigger a MongoDB PRIMARY election or OOM kill.

MetaTrader 5 (mt5): Running on bms-3 staging for 5 months. Purpose completely undocumented. Likely connects to MetaTrader forex data feed — possibly used for financial/forex integration features in Pinbox24. No documentation exists for what data it processes, where it stores it, or what breaks if it stops.


Part 2: Disaster Recovery Audit

8. DR Readiness Score

Overall DR Score: 2/10 — Critical

CategoryScoreFinding
Database backups3/10MongoDB dumps exist on bms-1 (/root) but are local-only, unautomated, and last verified in Feb 2026
Application container recovery1/10v3.x containers run from unreachable/untagged images — cannot restart
File storage backup0/10Storage backend not documented; no backup confirmed
Configuration backup2/10docker-compose files not in version control; no documented location
Credentials/secrets backup2/10MongoDB admin credentials not in p24-infra — location unknown
Restore procedures0/10No documented restore procedure exists for any Pinbox24 service
Restore testing0/10No restore drill ever performed
OS/infrastructure1/10Ubuntu 20.04 EOL on bms-1; disk at 85%; no automated off-site backup

Summary: Pinbox24 production (bms-1) has essentially zero disaster recovery capability. The MongoDB data exists in a replica set but with no automated backups to off-site storage. Critical v3.x containers run from images that may no longer be pullable. There are no documented restore procedures, no restore drills, and no automated backup pipeline.


9. Backup Coverage Table

ComponentBackup TypeScheduleLocationLast VerifiedStatus
MongoDB data (Pinbox24 all versions)Manual mongodumpNone — manual only/root/w3-2026-02-05 (25 GB), /root/w4-2026-02-23 (19 GB), /root/w4-2026-02-24 (16 GB) on bms-1Feb 2026 (4+ months ago)CRITICAL: local-only, unautomated, stale
Docker container images — v4.xAWS ECROn every deploy563740926945.dkr.ecr.eu-central-1.amazonaws.comLast deploy (3 months ago)OK: AWS ECR durable
Docker container images — v3.xprivate-registry.dev.pinbox24.comOn every deploy (historical)Unknown registry4 years agoCRITICAL: registry status unknown; images may be lost
Docker container images — v4.1 frontendLocal-only (no tag)NeverOnly on bms-1 diskNeverCRITICAL: if container stops, cannot restart
Docker container images — v3.2 socket, resoUntagged local imageNeverOnly on bms-1 diskNeverCRITICAL: if containers stop, cannot restart
Docker compose / container configsNot version-controlledNoneUnknownNeverCRITICAL: no config backup
nginx-proxy vhost configsUnknownNoneUnknownNeverGap: not documented
SSL certificates (Let’s Encrypt)Auto-renewed via nginx-proxy-letsencryptContinuousLocal bms-1 filesystemNever (automated)OK in normal ops; no off-site backup
Environment variables (.env per container)NoneNoneLocal bms-1 onlyNeverCRITICAL: no backup
Redis data (bms-1 native)None documentedNoneLocal disk onlyNeverUnknown risk — data purpose unknown
PostgreSQL data (bms-1 native, native install)None documentedNoneLocal disk onlyNeverCRITICAL: unknown data — no backup
bms-1 disk volumes (Docker volumes)NoneNoneLocal disk onlyNeverCRITICAL: no off-site backup
Pinbox24 file uploads (S3 microservice)UnknownUnknownUnknown storage backendNeverCRITICAL: storage backend not identified
GitLab source codeUnknownUnknownGitLab (location unknown)UnknownGap: repo location not documented
MongoDB rs0 replicationContinuous replicationReal-timebms-2 SECONDARYImplicit (ongoing)Partial: DR copy exists but no tested failover procedure
bms-3 disk (MongoDB data)NoneNoneLocal /var/lib/mongodbNeverCRITICAL: primary DB data not backed up off-site
bms-3 Docker volumesNoneNoneLocal diskNeverNo backup
MetaTrader 5 data (bms-3)NoneNoneLocal diskNeverUnknown risk

10. Restore Procedure Gaps

The following are required for a full Pinbox24 restore from zero — each is currently blocked:

  1. Pinbox24 MongoDB restore

    • Backups exist (Feb 2026 dumps on bms-1) but are local — restoring after bms-1 failure requires them
    • No procedure written for which dump to use, which database to restore to which mongod
    • No procedure for restoring into a fresh rs0 cluster
    • Estimated data loss: up to 4+ months (since last dump)
  2. v4.x container re-deployment

    • ECR images exist — this is the healthiest part of the stack
    • But: no documented docker-compose files or run commands stored in version control
    • AWS ECR credentials not documented in p24-infra
    • Without run commands/compose files, which ports, volumes, env vars, networks to use is unknown
  3. v3.x container re-deployment

    • private-registry.dev.pinbox24.com — status unknown; likely inaccessible
    • v32-prod-socket, v32-prod-reso, v41-prod run from untagged local images — permanently lost if the containers stop and images are removed
    • No way to rebuild without source code + build pipeline
  4. nginx-proxy configuration

    • jwilder/nginx-proxy configures itself via container VIRTUAL_HOST env vars — config lives in container env vars
    • Those env vars are only known if docker-compose files exist or docker inspect is run against running containers
    • No static vhost config backup exists
  5. SSL certificates

    • Let’s Encrypt certs stored on bms-1 disk (nginx-proxy volumes)
    • Loss of bms-1 = loss of certs; new certs require domain reachability + Let’s Encrypt rate limits
    • No acme.json equivalent backed up to Wasabi (unlike vps-i1’s Caddy certs which are backed up)
  6. Environment variables

    • Each container has environment variables (MongoDB URIs, API keys, feature flags)
    • No .env files backed up
    • Restoring would require hunting down all values from Pinbox24 app secrets / AWS Secrets Manager
    • p24-infra has no inventory of what keys are needed per container
  7. Native services (Redis, PostgreSQL, PM2 NodeChat)

    • Running outside Docker, not documented, not backed up
    • PostgreSQL data purpose unknown — if it’s needed for Pinbox24 functionality, loss is permanent
    • Redis is likely session cache (recoverable) but not confirmed
  8. bms-3 MongoDB as staging + PRIMARY

    • bms-3 hosts staging containers AND MongoDB PRIMARY simultaneously
    • A staging disaster (disk fill, OOM) could corrupt or lose MongoDB data
    • MongoDB has no off-site backup despite being the only source of truth

11. RTO/RPO Assessment

Estimated Recovery Time Objective (RTO): 72–168 hours (3–7 days) — IF MongoDB backups are fresh and sufficient human expertise is available. Without MongoDB admin credentials and container configs, indefinite.

Estimated Recovery Point Objective (RPO): Current state — up to 4+ months of data loss (last MongoDB dump Feb 2026). If MongoDB rs0 itself survives, RPO is near-zero for data — but the cluster has no tested failover.

ScenarioEstimated RTOData Loss (RPO)Blocking Issue
bms-1 full loss (fire/disk failure)7+ daysUp to 4 months MongoDB + all filesNo container configs, no env vars, local-only backups
bms-3 full loss3–5 daysNear-zero (rs0 failover to bms-2) but application offline until server rebuiltNo staging configs backed up; MongoDB rs0 needs manual PRIMARY promotion
MongoDB PRIMARY lost (bms-3 loss)4–8 hours (technical) + days (procedure)Near-zero data if rs0 fails overManual failover procedure not documented; bms-2 is non-voting observer, cannot auto-elect
Single v4.x container crashMinutesZeroECR images available
Single v3.x container crash (socket/reso)Permanent loss of that container versionZero data but service offline indefinitelyUntagged image — no rebuild path
v41-prod (local image) container crashPermanent until rebuilt from sourceZero dataLocal image only
AWS ECR account lostDays (rebuild all images)Zero dataAll image tags lost; rebuild from GitLab source required

Critical finding: Because bms-2 (the replica set observer) is a non-voting member with priority: 0, it cannot automatically become PRIMARY. If bms-3 goes down:

  1. rs0 has 1 SECONDARY (bms-2, non-voting) + 1 ARBITER (bms-4)
  2. No election quorum is achievable — the replica set freezes in read-only state
  3. Manual reconfiguration is required: rs.reconfig({...}, {force: true}) from bms-2
  4. No runbook for this scenario exists

12. Missing DR Documentation

The following runbooks do not exist anywhere in the p24-infra documentation:

  1. Pinbox24-MongoDBFailover.md — Manual PRIMARY promotion when bms-3 is unavailable
  2. Pinbox24-ContainerRestore.md — How to re-deploy all containers on bms-1 from scratch
  3. Pinbox24-MongoDBRestore.md — How to restore MongoDB from a mongodump
  4. Pinbox24-NewServerMigration.md — How to migrate Pinbox24 production to a new server
  5. Pinbox24-ImageInventory.md — Which ECR repos correspond to which containers, with tagged versions
  6. Pinbox24-SecretsInventory.md — What env vars each container needs (key names, not values)
  7. Pinbox24-FileStorageInventory.md — Where user-uploaded files are stored (AWS S3 bucket names, etc.)
  8. bms-1-OS-Upgrade-Plan.md — How to upgrade Ubuntu 20.04 → 24.04 without data loss
  9. bms-3-MongoDB-Disk-Full.md — Emergency procedure if bms-3 disk fills (MongoDB + staging data compete)
  10. PrivateRegistry-Recovery.md — How to rebuild v3.x images from source if private-registry is lost

13. Immediate DR Actions

Listed in priority order — each addresses a specific imminent failure risk:

  1. Export all untagged/local container images from bms-1 (CRITICAL, do today)

    • v41-prod, v32-prod-socket, v32-prod-reso — if any of these containers stop, they are permanently unrecoverable
    • Action: docker save <image_id> | gzip > /root/image-exports/<container>.tar.gz and upload to Wasabi
    • Commands (run on bms-1): docker inspect v41-prod --format='{{.Image}}' + docker save [IMAGE_ID] | gzip | aws s3 cp - s3://p24-infra/bms-1/images/v41-prod.tar.gz
  2. Create MongoDB backup script and run it now (CRITICAL, do today)

    • Current dumps are from Feb 2026. MongoDB is the source of truth for all Pinbox24 data.
    • Action: Create mongodump script on bms-3, upload result to Wasabi p24-infra/bms-3/mongodb/
    • Schedule nightly via cron (same pattern as bms-1/vps-i1 backup spec)
  3. Document all container run commands / docker-compose configuration (CRITICAL, this week)

    • Run docker inspect on every container on bms-1 and bms-3 to capture Image, Env, Mounts, NetworkMode, Cmd
    • Store result in bms-1/container-inventory.json committed to repo or uploaded to Wasabi
  4. Identify and document Pinbox24 file storage (HIGH, this week)

    • SSH to bms-1, inspect s3-v42-prod container env vars: docker inspect s3-v42-prod | grep -A 50 Env
    • Identify whether files go to local volumes, AWS S3, Wasabi, or elsewhere
    • Document bucket names, credentials, and backup status
  5. Document MongoDB admin credentials location (CRITICAL, this week)

    • Identify where MongoDB admin password is stored (Pinbox24 AWS Secrets Manager? .env file on bms-3?)
    • Add a reference (NOT the value) to docs/servers/p4-ovh-bms-3-ns3129867-operations.md
    • Without this, any MongoDB restore or failover requires a human who knows the password
  6. Identify private-registry.dev.pinbox24.com (HIGH, this week)

    • Determine if this registry is still running (test: docker pull private-registry.dev.pinbox24.com/v31:latest from bms-1)
    • If unreachable, mark v3.x as “permanently frozen — do not restart” and document
    • If reachable, document where it runs and add it to p24-infra inventory
  7. Set up automated Wasabi backup for bms-3 MongoDB (P1 after above)

    • Add backup-bms3.sh to p24-infra scripts
    • Nightly mongodump --gzip --archive to Wasabi p24-infra/bms-3/mongodb/YYYY-MM-DD.archive.gz
    • Alert on failure via Discord webhook

Part 3: Workbook Audit

14. Workbook Compliance Table

The following covers all registered Pinbox24 and shared infrastructure elements. Pinbox24-specific services (bms-1/bms-3 containers) are not in dev_r_services at all — a separate gap.

Registered in dev_r_services (docs/elements.md as of 2026-05-13)

ServiceCompliance StatusWorkbook LocationNotes
traccar (vps-i1)Full (partial rotation)docs/traccar-operations.mdSolid workbook
monitoring-prometheus-1Partialdocs/monitoring-stack-operations.mdMissing healthcheck proc
monitoring-thanos-sidecar-1Partialdocs/monitoring-stack-operations.mdMissing healthcheck proc
monitoring-thanos-query-1Partialdocs/monitoring-stack-operations.mdMissing healthcheck proc
monitoring-grafana-1Partialdocs/grafana-operations.mdBackup added 2026-05-14
monitoring-alertmanager-1Partialdocs/monitoring-stack-operations.mdMissing healthcheck
monitoring-renderer-1Lowdocs/monitoring-stack-operations.mdNo separate workbook section
monitoring-loki-1NoneNo workbook
monitoring-promtail-1NoneNo workbook
monitoring-blackbox-exporter-1PartialNo standalone workbook
monitoring-caddy-1PartialNo standalone workbook
monitoring-uptime-kuma-1NoneNo workbook, not in Prometheus
monitoring-queue-exporter-1Nonedocs/monitoring-exporters-operations.mdNo dedicated section
monitoring-cost-exporter-1Nonedocs/monitoring-exporters-operations.mdNo dedicated section
monitoring-pg-stats-exporter-1Nonedocs/monitoring-exporters-operations.mdNo dedicated section
monitoring-backup-exporter-1Nonedocs/monitoring-exporters-operations.mdNo dedicated section
monitoring-gotenberg-1NoneNo workbook
monitoring-pdf-service-1Partialdocs/pdf-service-operations.mdExists
openclaw-openclaw-gateway-1Nonedocs/openclaw-operations.mdWorkbook exists but no compliance flag
openclaw-openclaw-cli-1NoneExited(1), no workbook
root-traefik-1 (vps-h1)Nonedocs/traefik-operations.mdWorkbook exists
root-n8n-1 (vps-h1)Partialdocs/n8n-operations.mdWorkbook exists
waha (vps-h1)Partialdocs/waha-operations.mdWorkbook exists

NOT registered in dev_r_services — Pinbox24 stack

All Pinbox24 production containers on bms-1 and bms-3 are ABSENT from dev_r_services. The elements.md was last updated 2026-05-13 — before bms-1/bms-3 were inventoried on 2026-06-14. The servers themselves appear in a legacy form (vps-p24dev label) but the services are not registered.

Service Groupcompliance_workbookNotes
bms-1 server recordNo entry in dev_r_servicesOnly referenced as vps-p24dev legacy entry in elements.md
bms-3 server recordNo entry in dev_r_servicesNot registered
All 24 bms-1 containersNo entryCompletely absent
All 11 bms-3 containersNo entryCompletely absent
MongoDB rs0 as a serviceNo entryNot registered as a service
AWS ECR (21 repos)No dedicated workbookOnly mentioned in infrastructure-overview.md
GitLab (Pinbox24 CI)No entryNot registered, location unknown
private-registry.dev.pinbox24.comNo entryNot registered
MetaTrader 5 (mt5, bms-3)No entry, no workbookPurpose undocumented

15. Missing Workbooks Priority List

Ordered by business criticality:

PriorityServiceWhy CriticalEstimated Effort
P1MongoDB rs0 operational workbookPRIMARY database for all Pinbox24 production data; failover requires documented procedure1d
P1bms-1 container inventory + config snapshot24 containers, many with unrecoverable images; no runbook to rebuild0.5d
P1Pinbox24 file storage workbookS3 microservice storage backend unknown; potential silent data loss0.5d
P1Pinbox24 secrets inventory (key names only)Cannot restore containers without knowing what env vars they require0.5d
P2bms-1 production server full workbookUbuntu 20.04 EOL, disk 85%, PM2/Redis/PostgreSQL native services1d
P2bms-3 staging + MongoDB workbook expansionbms-3 workbook exists but MongoDB section is thin; OOM risk needs procedure0.5d
P2AWS ECR workbookContainer registry for v4.x; auth expires every 12h; 21 repos0.5d
P2v4.x deployment pipeline workbookGitLab CI → ECR → git-deploy flow not documented end-to-end1d
P3v3.x legacy stack workbookLegacy clients still active; sunset plan needed0.5d
P3MetaTrader 5 workbookRunning 5 months on bms-3; purpose and data unknown0.5d
P3private-registry.dev.pinbox24.com workbookStatus unclear; v3.x depends on it0.5d
P3Pinbox24 Angular (radieu/fuse-angular) repo workbookSource code — build and deploy process0.5d

16. Workbook Quality Issues

docs/servers/p4-ovh-bms-1-ns367522-operations.md

  • Created 2026-06-14 — good start
  • Missing: container env var inventory, Docker volume sizes, docker-compose equivalent configs
  • Missing: backup section (currently just “none”)
  • Missing: restore procedure
  • Missing: compliance_workbook update in dev_r_services
  • Open Tasks section documents issues but no timelines or owners

docs/servers/p4-ovh-bms-3-ns3129867-operations.md

  • Created 2026-06-14 — thin
  • Missing: MongoDB section is present but has no backup procedure, no failover procedure, no restore procedure
  • Missing: container list on bms-3 is documented but no config/env details
  • Missing: MetaTrader 5 purpose, data, and recovery
  • Not yet connected to Prometheus — monitoring gap

docs/infrastructure-overview.md

  • Section 2 documents bms-3 as “Pinbox24 Dev VPS” (legacy) — should be updated to reflect current role
  • Section 11 (“Open / Unknown”) has accumulated items that have since been resolved or need resolution
  • Secrets section (§9) still shows “ROTATION PENDING” entries from 2026-05-06 — status not updated after rotation

docs/elements.md

  • Last updated 2026-05-13 — 1 month stale
  • bms-1, bms-2, bms-3, bms-4 servers not in the Servers table
  • All Pinbox24 containers absent
  • vps-p24dev entry is a legacy stub that should be replaced with proper bms-1/bms-3 records

17. Workbook Creation Backlog

Ordered list of workbooks to write, by priority:

  1. docs/pinbox24-mongodb-operations.md — rs0 administration: status, failover, restore, add/remove members, credential management
  2. docs/pinbox24-production-containers.md — all 24 bms-1 containers: run commands, volumes, env var keys, dependencies
  3. docs/pinbox24-file-storage.md — s3-v42-prod and s3-v32-prod: backend identification, bucket names, backup, restore
  4. docs/pinbox24-secrets-inventory.md — key names (NOT values) per container, where they are stored, rotation owners
  5. Update docs/servers/p4-ovh-bms-1-ns367522-operations.md — add backup section, restore section, container config details
  6. Update docs/servers/p4-ovh-bms-3-ns3129867-operations.md — add MongoDB backup/failover, MetaTrader 5 section, monitoring setup
  7. docs/pinbox24-ecr-registry.md — 21 ECR repos, auth renewal, image lifecycle, which repos are active
  8. docs/pinbox24-deployment-pipeline.md — GitLab → ECR → git-deploy → nginx-proxy flow
  9. docs/pinbox24-v3x-legacy-workbook.md — v3.x stack: private registry status, image exports, sunset plan, active clients
  10. docs/pinbox24-metatrader5.md — mt5 purpose, data stored, dependencies, recovery
  11. Update docs/elements.md — add all bms-1/bms-3 containers and servers

Part 4: Action Plan

18. P1 Actions (Immediate)

These actions address imminent data loss or unrecoverable failure risk. Execute within 24–72 hours.

#ActionWhyHowServer
P1-1Export untagged container images from bms-1 to Wasabiv41-prod, v32-prod-socket, v32-prod-reso will be permanently lost if containers stopdocker save $(docker inspect --format='{{.Image}}' v41-prod) | gzip | aws s3 cp - s3://p24-infra/bms-1/images/v41-prod-$(date +%F).tar.gzbms-1
P1-2Run MongoDB dump on bms-3 and upload to WasabiLast dump Feb 2026 — 4+ months of data at riskmongodump --authenticationDatabase admin -u admin -p "$PW" --gzip --archive | aws s3 cp - s3://p24-infra/bms-3/mongodb/$(date +%F).archive.gzbms-3
P1-3Document MongoDB admin credential locationCannot failover or restore without thisSSH bms-3, check /root/.env, Pinbox24 AWS Secrets Manager, or ask Pinbox24 teamhuman action
P1-4Run docker inspect --format json on all bms-1 containers and commit to repoConfiguration backup before anything breaksdocker inspect $(docker ps -aq) > /root/container-snapshot-$(date +%F).json then upload to Wasabi or commit to bms-1/ in repobms-1
P1-5Test private-registry.dev.pinbox24.com reachabilityv3.x containers depend on it; status unknowndocker pull private-registry.dev.pinbox24.com/test 2>&1bms-1

19. P2 Actions (This Week)

These improve documentation coverage and reduce ongoing risk.

#ActionEffortOutput
P2-1Identify and document Pinbox24 file storage (S3 microservice)2hAdd section to bms-1 workbook + new docs/pinbox24-file-storage.md
P2-2Create automated MongoDB backup script (backup-bms3.sh)4hScript + cron entry on bms-3 + Wasabi upload + Discord alert on failure
P2-3Register bms-1 and bms-3 servers + all containers in dev_r_services3hUpdate docs/elements.md + Supabase dev_r_services table
P2-4Write docs/pinbox24-mongodb-operations.md4hrs0 status, failover procedure, restore from dump, credentials reference
P2-5Investigate and document native services on bms-1 (PM2, Redis, PostgreSQL)2hAdd section to bms-1 workbook; determine if PostgreSQL data is critical
P2-6Identify GitLab instance (host/org) for Pinbox24 source code1hAdd to infrastructure-overview.md §5 and elements.md
P2-7Connect bms-3 to Prometheus (install node_exporter)1hapt install prometheus-node-exporter + add to prometheus.yml + set disk/RAM alerts
P2-8Update docs/elements.md to reflect 2026-06-14 inventory2hAdd bms-1, bms-2, bms-3, bms-4 to Servers table; add Pinbox24 containers

20. P3 Actions (This Month)

These complete the documentation and improve the operational maturity of the Pinbox24 stack.

#ActionEffortOutput
P3-1Write full docs/pinbox24-production-containers.md1dAll 24 bms-1 containers: run commands, volumes, env var keys, network modes
P3-2Write docs/pinbox24-deployment-pipeline.md0.5dGitLab → ECR → git-deploy → nginx-proxy documented end-to-end
P3-3Write docs/pinbox24-v3x-legacy-workbook.md0.5dv3.x stack: image status, private registry, active clients, sunset plan
P3-4Write docs/pinbox24-metatrader5.md0.5dmt5 purpose, data, broker/feed connections, recovery
P3-5Plan bms-1 OS upgrade from Ubuntu 20.04 → 24.041dMigration plan + risk assessment + rollback strategy
P3-6Investigate and resolve disk pressure on bms-10.5dClean up ~100 GB of old backups in /root after verifying MongoDB dumps are on Wasabi
P3-7Perform first Pinbox24 MongoDB restore drill1dRestore Feb 2026 dump into isolated container; verify data integrity
P3-8Consolidate container registry strategy0.5dMigrate remaining containers from private-registry to AWS ECR; retire private-registry
P3-9Document Portainer v1 upgrade path on bms-1 and bms-30.5dUpgrade plan to Portainer CE 2.x+ or Agent
P3-10Set up Prometheus alerts for bms-1 disk (alert at 90%)0.5hAdd DiskAlmostFull alert rule for bms-1; bms-1 is at 85% now
P3-11Implement firewall policy on bms-1 (ufw currently inactive)0.5dEnable ufw, allow 22/80/443/49154/9100, deny all else
P3-12Evaluate and resolve OOM risk on bms-31dMongoDB 21.7 GB RAM + staging containers sharing 32 GB total; plan dedicated MongoDB node or restrict container RAM

Appendix: Key Credentials to Locate

The following credentials are required for Pinbox24 DR but their location is NOT documented in p24-infra. Finding and documenting them (key names and storage location only — never values in this file) is prerequisite to any meaningful restore.

CredentialWhy NeededWhere to Look
MongoDB admin password (rs0)failover, restore, addArbbms-3 /root/.env? AWS Secrets Manager? Pinbox24 team?
MongoDB keyFile contentall rs0 members must have the same keyFile to join/etc/mongodb-keyfile on bms-3 — already transferred to bms-4
AWS ECR auth (bms-1 + bms-3)pulling new container images~/.aws/credentials or IAM role on bms-1/bms-3; not in p24-infra secrets
Pinbox24 container env varsall containers need their env to startdocker inspect (capture before containers stop)
git-deploy webhook secretdeployment automationContainer env on git-deploy-v42-prod; not in p24-infra
private-registry.dev.pinbox24.com authpulling v3.x imagesUnknown — not in p24-infra
Pinbox24 S3/file storage bucket name + credentialsfile storage for user uploadss3-v42-prod container env; not in p24-infra

Document generated from available p24-infra repository documentation. Information gaps identified above reflect what is NOT documented and should be treated as action items, not assumptions.