Service Distribution Evaluation — p24-infra

Date: 2026-06-14 Scope: All servers, SaaS, and services in the p24-infra ecosystem Status: Current state as of branch fix/n8n-2.26.3-compose-cleanup; n8n migration to bms-4 is pending.


1. Executive Summary

Key findings

  1. vps-h1 is critically overloaded for its size (2 vCPU / 7.8 GB RAM). With n8n capped at 1.5 CPU, Traefik, PostgreSQL, WAHA, cadvisor, and promtail all running, the host is at ~90–100% CPU burst capacity and likely 5–6 GB RAM consumed. A single traffic spike or scheduled workflow burst can cause container restarts or OOM events. This is the highest-priority risk in the entire infrastructure.

  2. bms-4 (54.36.123.110) has enormous headroom. 8 vCPU, 32 GB RAM, 1.8 TB disk — with only the MongoDB arbiter (~75 MB RAM) and node_exporter running. It is the correct destination for n8n and new services.

  3. pdf-service currently serves only internal tooling (Claude agents, audit-engine). It runs on vps-i1 and is not accessible to Pinbox24 production on bms-1. For Pinbox24 to use it, it must be deployed independently — bms-4 is the recommended location.

  4. Convertio.ai replacement (PDF-to-JPG) should co-locate with the PDF generation service on bms-4. A Gotenberg-based or ImageMagick microservice is the lowest-friction option given existing Gotenberg expertise.

  5. bms-2 (AI-Dev-OV1) and bms-3 are both underutilised for Docker workloads but bms-3 is constrained by MongoDB RAM (21.7 GB). Neither should receive additional critical services at this time.

  6. WAHA is a critical service (WhatsApp incident management feed for the operations team). After n8n moves to bms-4, WAHA should also move to bms-4 — vps-h1 would then be idle and could be terminated to reduce cost.

Top recommendations (priority order)

PriorityActionWhy
P1Migrate n8n from vps-h1 → bms-4vps-h1 is overloaded; bms-4 compose file is ready
P2Migrate WAHA from vps-h1 → bms-4After n8n moves, WAHA is the only remaining critical service
P3Deploy PDF generation + PDF-to-JPG on bms-4Pinbox24 production needs it; bms-4 has capacity
P4Install AI-Dev-BMS4 Claude agent on bms-4Infrastructure management agent for the new server
P5Decommission vps-h1Once n8n and WAHA are off, no remaining workloads justify ~10€/month
P6Add node_exporter + Prometheus scrape for bms-2 and bms-3Monitoring gap — neither is currently scraped

2. Complete Service Inventory

2.1 vps-i1 — IONOS (217.154.82.162) — AlmaLinux 9.7 — 6 vCPU / 7.4 GB RAM / 239 GB

ServiceTypeImage / SourceRAM (est.)CPU (est.)Port(s)Criticality
caddyDockercaddy:2.11.4-alpine50 MBlow80, 443Critical — TLS proxy for all infra services
prometheusDockerprom/prometheus:v3.12.0300 MBlow127.0.0.1:9090High — metrics collection
thanos-sidecarDockerquay.io/thanos/thanos:v0.41.0100 MBlow10901, 10902High — S3 block upload
thanos-queryDockerquay.io/thanos/thanos:v0.41.0100 MBlow127.0.0.1:10904Medium — unified PromQL
alertmanagerDockerprom/alertmanager:v0.33.050 MBminimal127.0.0.1:9093High — email alerts on failures
grafanaDockergrafana/grafana:11.6.15200 MBlow127.0.0.1:3000Medium — dashboards
rendererDockergrafana/grafana-image-renderer:v5.8.9300 MBburst127.0.0.1:8081Low — PNG for n8n daily report
queue-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9200Medium — Supabase queue depths
cost-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9210Low — billing tracking
backup-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9220Low — backup status
pg-stats-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9201Medium — slow query tracking
vercel-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9202Low — Vercel metrics
n8n-cloud-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9225Low — n8n.io cloud metrics
credential-exporterDocker (custom Python)local build80 MBminimal127.0.0.1:9230Medium — rotation age tracking
blackbox-exporterDockerprom/blackbox-exporter:v0.28.050 MBminimal127.0.0.1:9115Medium — endpoint probing
gotenbergDockergotenberg/gotenberg:8.34.0400 MBburstinternalHigh — PDF rendering engine
pdf-serviceDocker (custom Python)infra-src/gotenberg/pdf-service100 MBlow127.0.0.1:8100High — PDF API for agents/audit
p24-infra-mcpDocker (custom Python)infra-src/p24-infra-mcp80 MBlow127.0.0.1:8101High — MCP server for Claude agents
audit-engineDocker (custom Python)../audit-engine150 MBburst127.0.0.1:8200Medium — AI audit pipeline
uptime-kumaDockerlouislam/uptime-kuma:1.23.16200 MBlow127.0.0.1:3001Medium — endpoint uptime monitoring
lokiDockergrafana/loki:3.7.2200 MBlow127.0.0.1:3100Medium — log aggregation
promtailDockergrafana/promtail:3.6.1150 MBminimalLow — log shipping from vps-i1
traccarDockertraccar/traccar:latest500 MBlow8082, 5027/UDPHigh — GPS fleet tracking
traccar-dbDockerMySQL 8.0400 MBlowinternalHigh — Traccar data persistence
openclaw-gatewayDockercustom200 MBlow18789-18790High — WhatsApp issue intake
node_exportersystemdsystem package20 MBminimal9100Medium
GitHub Actions runner (ionos)systemd/opt/actions-runner200 MBburstHigh — CI/CD for et-operational-platform
GitHub Actions runner (KDP)systemd/opt/actions-runner-kdp200 MBburstMedium — CI for amazon-kdp-tango
claude-proxy.pysystemd (python3)claude-proxy.py50 MBminimal8765Medium — OpenClaw Claude proxy
claude-runner (agent)user process/usr/bin/claude300 MBburstMedium — autonomous agent (nightly)

Estimated total: ~4.5–5 GB RAM, peaks to ~6 GB with Gotenberg/rendering bursts. Disk: ~239 GB — Prometheus data + Docker images are the main consumers. Needs monitoring.


2.2 vps-h1 — Hostinger (72.60.32.61) — Ubuntu 24.04 — 2 vCPU / 7.8 GB RAM / 96 GB

ServiceTypeImageRAM (est.)CPU limitPort(s)Criticality
traefikDockertraefik:v3.7.580 MBnone80, 443Critical — TLS proxy
n8n-postgresDockerpostgres:16.9-alpine300 MBnoneinternalCritical — n8n DB
n8nDockerdocker.n8n.io/n8nio/n8n:2.26.31.2 GB1.5 CPU127.0.0.1:5678Critical — automation engine
wahaDockerdevlikeapro/waha:noweb-2026.5.1800 MBnone127.0.0.1:13000Critical — WhatsApp incidents
node-exporterDockerprom/node-exporter:v1.11.120 MBnonehost:9100Low
cadvisorDockerghcr.io/google/cadvisor:v0.57.0100 MBnone0.0.0.0:8080Low
promtailDockergrafana/promtail:3.6.1150 MBnone9080Low
claude-runner (agent)user process/usr/bin/claude300 MBnoneMedium — nightly automation
GitHub Actions runner (hstgr)systemd/opt/actions-runner-hstgr200 MBnoneMedium — CI for et-operational-platform

Estimated total: ~3.1 GB RAM baseline, peaks to 5–6 GB during n8n workflow bursts. WAHA alone uses ~800 MB. With n8n at 1.2 GB, these two services consume 2 GB of the 7.8 GB available. CPU is the critical constraint: 1.5 CPU cap on n8n leaves only 0.5 CPU for everything else — traefik, waha, cadvisor, node-exporter, OS.


2.3 bms-1 — OVH Kimsufi (94.23.26.113) — Ubuntu 20.04 EOL — 8 cores / 32 GB RAM

ServiceTypeRAM (est.)Criticality
nginx-proxy + letsencryptDocker200 MBCritical — TLS for all Pinbox24 services
portainerDocker200 MBLow — management UI
Pinbox24 v31-v42 (4 instances)Docker (ECR)~4 GB totalCritical — production app
mailgun containerDocker100 MBHigh — transactional email relay
pdf-gen (wkhtml)Docker300 MBHigh — internal PDF gen for Pinbox24
git-deployDocker100 MBMedium — deployment automation

Known critical issue: disk 100% full. Pinbox24 production may fail on next container update. OS: Ubuntu 20.04 LTS — EOL April 2025. Security risk. No migration plan yet. No Prometheus monitoring.


2.4 bms-2 — OVH Kimsufi (145.239.133.104) — Ubuntu 24.04 — 8 vCPU / 32 GB RAM / 410 GB

ServiceTypeRAM (est.)Criticality
mongod (rs0 observer, non-voting)systemd2–4 GBHigh — replica set read replica
claude-runner (AI-Dev-OV1 agent)user process300 MB per agentMedium — max 4 parallel

Disk: 16% used (62/410 GB). Ample headroom. No Docker services deployed. No monitoring (node_exporter not installed).


2.5 bms-3 — OVH Kimsufi (51.68.155.224) — Ubuntu 22.04 — 8 vCPU / 32 GB RAM / 410 GB

ServiceTypeRAM (est.)Criticality
mongod (rs0 PRIMARY)systemd21.7 GBCritical — Pinbox24 production DB
Pinbox24 v31-v42 staging (4 instances)Docker (ECR)~3 GBMedium — staging
traccarDocker500 MBMedium — GPS tracking (staging)
nginx-proxy + letsencryptDocker200 MBMedium — TLS for staging
portainer-pinbox24Docker200 MBLow
mt5Docker500 MBLow

WARNING: MongoDB at 21.7 GB RAM leaves only ~10 GB for all Docker workloads. At risk of OOM if staging load spikes. Disk: 44% used (170/410 GB). Monitor — staging logs can fill quickly. No Prometheus monitoring (node_exporter not installed).


2.6 bms-4 — OVH Kimsufi (54.36.123.110) — Ubuntu 22.04 — 8 vCPU / 32 GB RAM / 1.8 TB

ServiceTypeRAM (est.)Status
mongod (rs0 arbiter)systemd~75 MBActive — awaiting rs.addArb() by human
prometheus-node-exportersystemd20 MBActive
traefikDocker (planned)80 MBIn repo, not yet deployed
n8n-postgresDocker (planned)300 MBIn repo, not yet deployed
n8nDocker (planned)1.2 GBMigration pending
node-exporterDocker (planned)20 MBIn repo
cadvisorDocker (planned)100 MBIn repo

Disk: 1.7 TB free (8.3 GB / 1.8 TB used). No disk pressure ever expected. Free RAM after planned services: ~30 GB. Massive headroom for additional workloads.


2.7 SaaS Services

ServiceProviderPlanRoleCriticality
SupabaseSupabaseProFleet management DB, audit engine, DevOps tablesCritical
VercelVercelTeamet-operational-platform, p24-nextjs-v2026, portalCritical
CloudflareCloudflareFree/ProDNS (zintegrowana.online), WAF for WorkersCritical
Wasabi S3WasabiPay-as-you-goThanos metrics, PDFs, backup status JSONHigh
GitHubGitHubProSource control, CI/CD, issue trackingCritical
Mailgun EUMailgunPay-as-you-goAlertmanager email, Pinbox24 transactionalHigh
n8n.io Cloudn8n.ioPaidSeparate automation instance (backups etc.)Medium
Convertio.aiConvertioSaaSPDF-to-JPG for Pinbox24 AngularScheduled for replacement

3. vps-h1 Load Analysis

Current state

ResourceTotaln8nWAHATraefik+PGObservabilityOS+miscHeadroom
RAM7.8 GB~1.2 GB~800 MB~380 MB~170 MB~400 MB~2.8 GB
CPU2.0 vCPUcap 1.5~0.1–0.3~0.1~0.05~0.1~0 (negative on burst)
Disk96 GB~5–10 GB~2 GB~1 GBminimalOS~75 GB

Risk assessment

Severity: HIGH — near capacity on CPU, moderate on RAM

  1. CPU is the primary constraint. n8n has a hard cap at 1.5 CPU — this leaves only 0.5 CPU for the remaining 6 containers plus the OS kernel. When n8n executes multiple workflows in parallel (GPS sync, WhatsApp routing, daily report generation), it pegs the 1.5 CPU limit while everything else starves.

  2. WAHA is a critical single point of failure. The WhatsApp integration is the primary incident reporting channel. If vps-h1 crashes or OOMs, incident reports from WhatsApp stop flowing to n8n and Supabase. WAHA does not have automatic reconnect to WhatsApp Web; phone number re-pairing requires human action (scanning QR code).

  3. n8n-postgres has no CPU cap. PostgreSQL can occasionally spike CPU during autovacuum or complex queries from n8n’s execution history pruning. This can push total CPU above 2.0 vCPU.

  4. No redundancy. All critical automation runs on a single 2-vCPU machine with no failover.

  5. 96 GB disk is limited for a machine also running the GitHub Actions runner which builds et-operational-platform.

What should move first

n8n (with n8n-postgres) is the highest priority because:

  • The compose file for bms-4 is already written and committed
  • n8n consumes 1.5 CPU — removing it frees 75% of vps-h1’s CPU
  • Migration checklist is documented in docs/servers/p4-ovh-bms-4-ns3101999-operations.md

WAHA should move second because:

  • After n8n leaves, WAHA is the only remaining critical workload on vps-h1
  • WAHA’s webhook URL currently points to n8n.vps-h1.infra.zintegrowana.online — once n8n moves to bms-4, this config line must change anyway
  • bms-4 already has Traefik configured for TLS termination
  • Keeping WAHA on vps-h1 alone maintains a second monthly VPS cost for a single container

Post-n8n-migration state on vps-h1

After n8n and n8n-postgres move to bms-4:

Remaining serviceRAMCritical?
traefik80 MBYes (for WAHA TLS)
waha800 MBCritical
node-exporter20 MBNo
cadvisor100 MBNo
promtail50 MBNo
claude-runner (nightly agent)300 MBMedium
GitHub Actions runner (hstgr)200 MBMedium

Conclusion: vps-h1 would have ~6.5 GB free RAM and ~1.9 vCPU free — massively underutilised for ~10€/month. Recommendation: migrate WAHA to bms-4, reassign GitHub Actions runner to bms-4, decommission vps-h1.


4. bms-4 Expansion Plan

Current state (post-provisioning)

  • MongoDB arbiter: 75 MB RAM
  • node_exporter: 20 MB RAM
  • Docker CE: installed, no containers running
  • Free: ~31.9 GB RAM, 7.5 vCPU, 1.7 TB disk

Phase 1 — n8n migration (immediate, PENDING)

Deploy from bms-4/docker-compose.yml:

ServiceRAM (est.)CPUNotes
traefik80 MBlowTLS for all bms-4 services
n8n-postgres300 MBlowPostgreSQL 16 for n8n
n8n1.2 GBcap 1.5Migrated from vps-h1
node-exporter (Docker)20 MBminimalDuplicate of systemd exporter; needed for cadvisor compat
cadvisor100 MBlowContainer metrics

After Phase 1: ~30.2 GB RAM free, ~6 vCPU free.

Phase 2 — WAHA migration (after n8n verified stable)

Add WAHA to bms-4 compose:

ServiceRAM (est.)Notes
waha800 MBRequires updating WHATSAPP_HOOK_URL → n8n.bms-4.infra.zintegrowana.online

After Phase 2: ~29.4 GB RAM free.

Phase 3 — PDF services (new deployment, see §5)

ServiceRAM (est.)Notes
gotenberg400 MBChromium PDF renderer
pdf-service-pinbox100 MBNew instance — production-facing
pdf-to-jpg200 MBNew service — replaces Convertio.ai

After Phase 3: ~28.7 GB RAM free.

Phase 4 — AI-Dev-BMS4 agent (see §6)

ServiceRAM (est.)Notes
claude-runner (user process)300 MB per agentMax 4 parallel agents
GitHub Actions runner200 MBRelocate from vps-h1

After Phase 4: ~27.9 GB RAM free (4 agents simultaneously).

bms-4 Resource Forecast (fully loaded)

CategoryRAMCPUDisk
MongoDB arbiter75 MBnegligibleminimal
n8n stack1.6 GB1.5 cap + overhead10 GB
WAHA800 MB0.22 GB
PDF services700 MBburst5 GB
Observability240 MBminimal1 GB
AI agents (4x)1.2 GBburst
OS + kernel~500 MB0.58 GB
TOTAL~5.1 GB~5 vCPU peak~26 GB
Remaining headroom~26.9 GB~3 vCPU idle~1.77 TB

bms-4 can comfortably absorb all planned workloads while retaining over 80% RAM headroom.


5. PDF Services Production Plan

Current state

The existing pdf-service (on vps-i1, port 8100) serves only:

  • Claude Code agents via p24-infra-mcp MCP server
  • audit-engine internal workflows

It is not accessible to Pinbox24 production on bms-1, and the existing instance should remain dedicated to internal tooling.

Requirements for production PDF services

  1. PDF generation — replace pdf-gen (wkhtml container on bms-1) with a modern Gotenberg-based service callable by Pinbox24 Angular
  2. PDF-to-JPG conversion — replace Convertio.ai (external SaaS); must accept PDF input, return JPG/PNG output; used for document thumbnails in Pinbox24

Option analysis

OptionPDF GenerationPDF-to-JPGNetwork path from bms-1Disk pressure on bms-1
A: Deploy on bms-4Gotenberg + pdf-serviceGotenberg img conversion or ImageMagickCross-server HTTP (LAN-speed OVH internal network)None
B: Deploy on bms-1SameSamelocalhostCRITICAL — disk 100% full
C: Deploy on bms-3SameSameCross-server HTTPNone

Option B is eliminated: bms-1 disk is 100% full and OS is EOL. Adding containers there would immediately fail.

Option C is not recommended: bms-3 RAM is at ~25 GB/32 GB (MongoDB alone uses 21.7 GB), leaving only ~10 GB for all staging containers. Adding PDF conversion load on an already memory-constrained server is a stability risk for the MongoDB primary.

Recommendation: Option A — Deploy on bms-4.

Justification:

  • bms-4 has 28+ GB free RAM and 1.7 TB free disk after all planned services
  • Traefik is already being deployed on bms-4 — PDF services get TLS endpoints for free
  • OVH bare metal servers are on the same internal network — cross-server HTTP between bms-1 → bms-4 is low-latency
  • bms-4 will also run AI-Dev-BMS4 agent, which can monitor and restart PDF services autonomously
  • Separating PDF infrastructure from vps-i1 (monitoring stack) removes a cross-concern dependency
  • A dedicated pdf.bms-4.infra.zintegrowana.online endpoint can be secured with API key auth (same pattern as existing pdf-service)

Deployment plan for bms-4 PDF services

Service 1: pdf-service-p24 (PDF generation for Pinbox24)

Reuse existing infra-src/gotenberg/pdf-service/ codebase with a new Gotenberg instance.

# Add to bms-4/docker-compose.yml
 
  gotenberg-p24:
    image: gotenberg/gotenberg:8.34.0
    restart: unless-stopped
    command:
      - gotenberg
      - --chromium-disable-javascript=false   # Pinbox24 Angular uses JS-rendered pages
      - --api-timeout=60s
 
  pdf-service-p24:
    build: ../infra-src/gotenberg/pdf-service
    restart: unless-stopped
    environment:
      - PDF_SERVICE_API_KEY=${PDF_SERVICE_API_KEY}
      - GOTENBERG_URL=http://gotenberg-p24:3000
      - SUPABASE_URL=${SUPABASE_URL}
      - SUPABASE_SERVICE_KEY=${SUPABASE_SERVICE_KEY}
      - WASABI_ACCESS_KEY=${WASABI_ACCESS_KEY}
      - WASABI_SECRET_KEY=${WASABI_SECRET_KEY}
      - WASABI_BUCKET=p24-infra
      - WASABI_ENDPOINT=https://s3.eu-central-2.wasabisys.com
      - WASABI_REGION=eu-central-2
    labels:
      - traefik.enable=true
      - traefik.http.routers.pdf-p24.rule=Host(`pdf.bms-4.infra.zintegrowana.online`)
      - traefik.http.routers.pdf-p24.tls=true
      - traefik.http.routers.pdf-p24.entrypoints=websecure
      - traefik.http.routers.pdf-p24.tls.certresolver=mytlschallenge

Service 2: pdf-to-jpg (Convertio.ai replacement)

Gotenberg supports POST /forms/chromium/convert/url and POST /forms/libreoffice/convert but not direct PDF-to-image rasterisation. The correct approach is a thin Python microservice using pdf2image (wraps pdftoppm) or ImageMagick via Ghostscript.

Recommended implementation: pdf-to-jpg microservice using pdf2image Python library.

  • Input: PDF file (multipart upload)
  • Output: JPG bytes (single page or ZIP of all pages)
  • Auth: same PDF_SERVICE_API_KEY pattern
  • Base image: python:3.12-slim with poppler-utils installed
  • RAM: ~150–200 MB per conversion, ~50 MB idle
  pdf-to-jpg:
    build: ../infra-src/pdf-to-jpg   # new microservice to be created
    restart: unless-stopped
    environment:
      - PDF_SERVICE_API_KEY=${PDF_SERVICE_API_KEY}
      - DPI=${PDF_TO_JPG_DPI:-150}
      - MAX_FILE_MB=${PDF_TO_JPG_MAX_MB:-20}
    labels:
      - traefik.enable=true
      - traefik.http.routers.pdf-to-jpg.rule=Host(`pdf.bms-4.infra.zintegrowana.online`) && PathPrefix(`/v1/pdf-to-jpg`)
      - traefik.http.routers.pdf-to-jpg.tls=true
      - traefik.http.routers.pdf-to-jpg.entrypoints=websecure
      - traefik.http.routers.pdf-to-jpg.tls.certresolver=mytlschallenge

Pinbox24 migration path:

  1. Deploy pdf-service-p24 + pdf-to-jpg on bms-4
  2. Test endpoints from bms-1: curl -X POST https://pdf.bms-4.infra.zintegrowana.online/v1/md-render ...
  3. Update Pinbox24 Angular app config to use new endpoints (remove Convertio.ai API key)
  4. Remove Convertio.ai subscription once verified
  5. Remove pdf-gen + wkhtml containers from bms-1 (helps reclaim the critical disk space)

Implementation note on pdf-to-jpg microservice: The infra-src/pdf-to-jpg/ directory needs to be created with:

  • app.py (FastAPI, similar pattern to pdf-service)
  • Dockerfile (python:3.12-slim, installs poppler-utils via apt, pdf2image via pip)
  • tests/ directory

6. AI-Dev-BMS4 Agent Setup Plan

Overview

bms-4 should have a Claude Code autonomous agent (AI-Dev-BMS4) for:

  • Local Docker operations on bms-4 itself
  • Managing n8n, WAHA, PDF services
  • Running scheduled tasks and issue implementation

User setup

Follow the same pattern as AI-Dev-OV1 on bms-2 and claude-admin on bms-3:

# Run as root on bms-4 (54.36.123.110)
 
# 1. Create claude-runner user
useradd -m -s /bin/bash claude-runner
mkdir -p /home/claude-runner/workspace
 
# 2. Create claude-admin user for SSH access from GitHub Actions / remote ops
useradd -m -s /bin/bash claude-admin
mkdir -p /home/claude-admin/.ssh
 
# 3. Install VPS_SSH_PRIVATE_KEY public part for claude-admin
echo "<VPS_SSH_PRIVATE_KEY public part>" > /home/claude-admin/.ssh/authorized_keys
chmod 700 /home/claude-admin/.ssh && chmod 600 /home/claude-admin/.ssh/authorized_keys
chown -R claude-admin:claude-admin /home/claude-admin/.ssh
 
# 4. Grant scoped sudo to claude-admin
echo "claude-admin ALL=(ALL) NOPASSWD: /usr/bin/docker, /bin/systemctl, /bin/mkdir, /bin/chown, /bin/cp, /usr/bin/tee" \
  > /etc/sudoers.d/claude-admin
 
# 5. Install Claude Code CLI
curl -fsSL https://raw.githubusercontent.com/anthropics/claude-code/refs/heads/main/install.sh | bash
# or: copy from bms-2 if network is slow
 
# 6. Copy OAuth credentials from local workstation
# scp C:\Users\konar\.claude\.credentials.json root@54.36.123.110:/home/claude-runner/.claude/
# mkdir -p /home/claude-runner/.claude
# chown -R claude-runner:claude-runner /home/claude-runner/.claude
 
# 7. Clone infra repo
git clone https://github.com/radieu/p24-infra /opt/p24-infra
chown -R claude-runner:claude-runner /opt/p24-infra
 
# 8. Set up .claude-env for secrets
# Copy from vps-h1 or create fresh — contains GITHUB_TOKEN, PDF_SERVICE_API_KEY, etc.

GitHub setup

  • Label: AI-Dev-BMS4
  • GitHub user: AI-Dev-BMS4 (to be created using ai-dev-bms4@zintegrowana.online)
  • Cloudflare email route: Add routing rule for ai-dev-bms4@zintegrowana.onlineradieu@gmail.com
  • Repository access: Add as collaborator to radieu/p24-infra (write) and radieu/et-operational-platform (write)

Resource limits

ResourceLimitRationale
Parallel agentsmax 4bms-4 has 8 vCPU, matches AI-Dev-OV1 on bms-2
RAM per agent~300 MB (unbounded)32 GB RAM — no pressure
Disk for workspace50 GB/opt/ on 1.8 TB RAID1

Capabilities

  • Full docker access (via claude-admin scoped sudo)
  • systemctl for mongod management
  • SSH to other servers using VPS_SSH_PRIVATE_KEY (via GitHub Actions)
  • Access to p24-infra-mcp MCP server for PDF operations

7. Service Distribution Matrix

Security risk

ServiceSPoF RiskSecurity RiskAction
WAHA (vps-h1)HIGH — only 1 node, no failoverMedium — exposed WhatsApp sessionMigrate to bms-4; consider session backup
n8n (vps-h1)HIGH — only 1 nodeHigh — holds all automation secretsMigrate to bms-4 (larger, more stable)
SupabaseLow — managed HALow — managed serviceNone
Prometheus (vps-i1)Medium — single nodeLow — internal onlyThanos provides remote backup
Traccar (vps-i1)Medium — single nodeLow — internal onlyLow priority
mongod PRIMARY (bms-3)HIGH — if bms-3 fails, writes stopMedium — RS election neededbms-4 arbiter provides election quorum
pdf-service (vps-i1)Medium — mixed with monitoring stackLow — API key protectedDeploy dedicated instance on bms-4
Convertio.aiMedium — external SaaS dependencyHIGH — sends Pinbox24 documents to 3rd partyReplace with self-hosted pdf-to-jpg ASAP
bms-1 diskCRITICAL — 100% fullHigh — EOL OSEmergency: prune Docker images, plan migration
bms-3 RAMHigh — MongoDB at 21.7 GBLowMonitor for OOM
openclaw gateway (vps-i1)Medium — no failoverLow — HMAC verifiedAcceptable

Single Point of Failure analysis

ComponentSPoF?Mitigation
n8n workflowsYes — single instancen8n volume backed by PostgreSQL; data survives container restart
WAHA WhatsApp sessionYesSession stored in Docker volume (waha_sessions); requires QR re-pairing if session corrupted
MongoDB PRIMARY (bms-3)Partial — arbiter on bms-4 allows electionbms-2 observer can be promoted; bms-4 arbiter provides quorum
Vercel (et-operational-platform)No — Vercel HANone needed
SupabaseNo — managed HANone needed
PDF generation (vps-i1)Yes — Gotenberg is single instanceAcceptable for internal tooling; production instance on bms-4 adds redundancy
Prometheus metricsPartial — 15d local + Wasabi via ThanosThanos provides durable history
GitHub Actions runnerPartial — only 1 ionos runnerhstgr runner on vps-h1 is backup

Desired final state (after all migrations)

+─────────────────────────────────────────────────────────────────────────────+
|  COMPUTE INFRASTRUCTURE                                                     |
+─────────────────────────────────────────────────────────────────────────────+

vps-i1 (IONOS) — 6 vCPU / 7.4 GB — MONITORING HUB
  ├── Caddy (TLS)
  ├── Prometheus + Thanos + Alertmanager
  ├── Grafana + Image Renderer
  ├── Loki + Promtail
  ├── Uptime Kuma
  ├── All custom Python exporters (7x)
  ├── Blackbox exporter
  ├── pdf-service + p24-infra-mcp (internal tooling only)
  ├── audit-engine
  ├── Traccar + MySQL (GPS)
  ├── OpenClaw WhatsApp gateway
  ├── GitHub Actions runner (ionos) — CI/CD
  └── claude-runner autonomous agent (nightly)

bms-4 (OVH) — 8 vCPU / 32 GB / 1.8 TB — AUTOMATION + PDF HUB
  ├── MongoDB arbiter (rs0 quorum)
  ├── Traefik (TLS)
  ├── n8n + n8n-postgres (migrated from vps-h1)  [Phase 1]
  ├── WAHA WhatsApp gateway (migrated from vps-h1)  [Phase 2]
  ├── gotenberg-p24 + pdf-service-p24 (Pinbox24 PDF gen)  [Phase 3]
  ├── pdf-to-jpg microservice (Convertio.ai replacement)  [Phase 3]
  ├── node-exporter + cadvisor
  ├── AI-Dev-BMS4 Claude agent (max 4 parallel)  [Phase 4]
  └── GitHub Actions runner (relocate from vps-h1)  [Phase 4]

bms-2 (OVH) — 8 vCPU / 32 GB / 410 GB — CLAUDE DEV ENV
  ├── MongoDB observer (rs0 non-voting read replica)
  └── AI-Dev-OV1 Claude agent (max 4 parallel)

bms-3 (OVH) — 8 vCPU / 32 GB / 410 GB — MONGODB PRIMARY + STAGING
  ├── MongoDB 7.0 PRIMARY (rs0)
  ├── Pinbox24 staging (v31/v32/v41/v42)
  ├── Traccar (GPS staging)
  ├── nginx-proxy + Let's Encrypt
  └── mt5

bms-1 (OVH) — 8 cores / 32 GB — PINBOX24 PRODUCTION  [EOL — plan migration]
  ├── nginx-proxy + Let's Encrypt
  ├── Pinbox24 v31/v32/v41/v42 (production)
  ├── mailgun relay
  └── git-deploy
  NOTE: pdf-gen + wkhtml removable once bms-4 PDF services are live

vps-h1 (Hostinger) — DECOMMISSION after Phase 2
  └── (empty — all workloads migrated to bms-4)

+─────────────────────────────────────────────────────────────────────────────+
|  SAAS                                                                       |
+─────────────────────────────────────────────────────────────────────────────+

Supabase — fleet management DB, audit control plane, DevOps tables
Vercel — et-operational-platform (prod + staging), portal, et-lager
Cloudflare — DNS zintegrowana.online, WAF
Wasabi S3 — Thanos blocks, PDFs, backup status
GitHub — source control, CI/CD
Mailgun EU — alerts + transactional email
n8n.io Cloud — secondary automation instance

Migration sequence

Step 1 (IMMEDIATE): rs.addArb("54.36.123.110:27017") + rs.remove dead arbiter
         └── HUMAN ACTION — requires MongoDB admin password from bms-3
Step 2 (WEEK 1):    Migrate n8n from vps-h1 → bms-4 (compose file ready)
Step 3 (WEEK 1):    Update WAHA webhook URL → n8n.bms-4.infra.zintegrowana.online
Step 4 (WEEK 2):    Migrate WAHA from vps-h1 → bms-4
Step 5 (WEEK 2):    Decommission vps-h1 (cancel subscription)
Step 6 (WEEK 3):    Create infra-src/pdf-to-jpg microservice
Step 7 (WEEK 3):    Deploy pdf-service-p24 + pdf-to-jpg on bms-4
Step 8 (WEEK 3):    Update Pinbox24 Angular to use new PDF endpoints
Step 9 (WEEK 4):    Remove Convertio.ai subscription
Step 10 (WEEK 4):   Install AI-Dev-BMS4 agent + GitHub user
Step 11 (ONGOING):  Monitor bms-1 disk (CRITICAL), plan OS migration Ubuntu 20.04 → 24.04
Step 12 (ONGOING):  Install node_exporter on bms-2 and bms-3 for Prometheus coverage

Appendix: Open Issues

IssueServerSeverityAction required
bms-1 disk 100% fullbms-1CRITICALdocker system prune -f, identify largest directories, plan migration
bms-1 Ubuntu 20.04 EOLbms-1HIGHPlan in-place upgrade or migration to bms-4-like server
WAHA at risk on overloaded vps-h1vps-h1HIGHMigrate to bms-4 (Phase 2)
MongoDB arbiter not yet added to rs0bms-4HIGHHuman action: rs.addArb() from bms-3
n8n migration not yet executedvps-h1/bms-4HIGHExecute migration checklist
bms-2 + bms-3 not in Prometheusbms-2, bms-3MEDIUMInstall node_exporter, add scrape targets
bms-3 MongoDB RAM at 21.7 GBbms-3MEDIUMMonitor; alert if container RAM pressure approaches 10 GB remaining
n8n bms-4 compose has deprecated N8N_RUNNERS_ENABLED=falsebms-4LOWRemove from bms-4/docker-compose.yml (current branch fixes it on vps-h1)
Convertio.ai — external SaaS with document exposure riskPinbox24MEDIUMPhase 3: deploy pdf-to-jpg on bms-4
claude-admin not set up on bms-2 and bms-3bms-2, bms-3LOWFollow setup instructions in ops workbooks