Service Distribution Evaluation — p24-infra

Date: 2026-06-14 Scope: All servers, SaaS, and services in the p24-infra ecosystem Status: Current state as of branch fix/n8n-2.26.3-compose-cleanup; n8n migration to bms-4 is pending.

1. Executive Summary

Key findings

vps-h1 is critically overloaded for its size (2 vCPU / 7.8 GB RAM). With n8n capped at 1.5 CPU, Traefik, PostgreSQL, WAHA, cadvisor, and promtail all running, the host is at ~90–100% CPU burst capacity and likely 5–6 GB RAM consumed. A single traffic spike or scheduled workflow burst can cause container restarts or OOM events. This is the highest-priority risk in the entire infrastructure.
bms-4 (54.36.123.110) has enormous headroom. 8 vCPU, 32 GB RAM, 1.8 TB disk — with only the MongoDB arbiter (~75 MB RAM) and node_exporter running. It is the correct destination for n8n and new services.
pdf-service currently serves only internal tooling (Claude agents, audit-engine). It runs on vps-i1 and is not accessible to Pinbox24 production on bms-1. For Pinbox24 to use it, it must be deployed independently — bms-4 is the recommended location.
Convertio.ai replacement (PDF-to-JPG) should co-locate with the PDF generation service on bms-4. A Gotenberg-based or ImageMagick microservice is the lowest-friction option given existing Gotenberg expertise.
bms-2 (AI-Dev-OV1) and bms-3 are both underutilised for Docker workloads but bms-3 is constrained by MongoDB RAM (21.7 GB). Neither should receive additional critical services at this time.
WAHA is a critical service (WhatsApp incident management feed for the operations team). After n8n moves to bms-4, WAHA should also move to bms-4 — vps-h1 would then be idle and could be terminated to reduce cost.

Top recommendations (priority order)

Priority	Action	Why
P1	Migrate n8n from vps-h1 → bms-4	vps-h1 is overloaded; bms-4 compose file is ready
P2	Migrate WAHA from vps-h1 → bms-4	After n8n moves, WAHA is the only remaining critical service
P3	Deploy PDF generation + PDF-to-JPG on bms-4	Pinbox24 production needs it; bms-4 has capacity
P4	Install AI-Dev-BMS4 Claude agent on bms-4	Infrastructure management agent for the new server
P5	Decommission vps-h1	Once n8n and WAHA are off, no remaining workloads justify ~10€/month
P6	Add node_exporter + Prometheus scrape for bms-2 and bms-3	Monitoring gap — neither is currently scraped

2. Complete Service Inventory

2.1 vps-i1 — IONOS (217.154.82.162) — AlmaLinux 9.7 — 6 vCPU / 7.4 GB RAM / 239 GB

Service	Type	Image / Source	RAM (est.)	CPU (est.)	Port(s)	Criticality
caddy	Docker	`caddy:2.11.4-alpine`	50 MB	low	80, 443	Critical — TLS proxy for all infra services
prometheus	Docker	`prom/prometheus:v3.12.0`	300 MB	low	127.0.0.1:9090	High — metrics collection
thanos-sidecar	Docker	`quay.io/thanos/thanos:v0.41.0`	100 MB	low	10901, 10902	High — S3 block upload
thanos-query	Docker	`quay.io/thanos/thanos:v0.41.0`	100 MB	low	127.0.0.1:10904	Medium — unified PromQL
alertmanager	Docker	`prom/alertmanager:v0.33.0`	50 MB	minimal	127.0.0.1:9093	High — email alerts on failures
grafana	Docker	`grafana/grafana:11.6.15`	200 MB	low	127.0.0.1:3000	Medium — dashboards
renderer	Docker	`grafana/grafana-image-renderer:v5.8.9`	300 MB	burst	127.0.0.1:8081	Low — PNG for n8n daily report
queue-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9200	Medium — Supabase queue depths
cost-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9210	Low — billing tracking
backup-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9220	Low — backup status
pg-stats-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9201	Medium — slow query tracking
vercel-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9202	Low — Vercel metrics
n8n-cloud-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9225	Low — n8n.io cloud metrics
credential-exporter	Docker (custom Python)	local build	80 MB	minimal	127.0.0.1:9230	Medium — rotation age tracking
blackbox-exporter	Docker	`prom/blackbox-exporter:v0.28.0`	50 MB	minimal	127.0.0.1:9115	Medium — endpoint probing
gotenberg	Docker	`gotenberg/gotenberg:8.34.0`	400 MB	burst	internal	High — PDF rendering engine
pdf-service	Docker (custom Python)	`infra-src/gotenberg/pdf-service`	100 MB	low	127.0.0.1:8100	High — PDF API for agents/audit
p24-infra-mcp	Docker (custom Python)	`infra-src/p24-infra-mcp`	80 MB	low	127.0.0.1:8101	High — MCP server for Claude agents
audit-engine	Docker (custom Python)	`../audit-engine`	150 MB	burst	127.0.0.1:8200	Medium — AI audit pipeline
uptime-kuma	Docker	`louislam/uptime-kuma:1.23.16`	200 MB	low	127.0.0.1:3001	Medium — endpoint uptime monitoring
loki	Docker	`grafana/loki:3.7.2`	200 MB	low	127.0.0.1:3100	Medium — log aggregation
promtail	Docker	`grafana/promtail:3.6.11`	50 MB	minimal	—	Low — log shipping from vps-i1
traccar	Docker	`traccar/traccar:latest`	500 MB	low	8082, 5027/UDP	High — GPS fleet tracking
traccar-db	Docker	MySQL 8.0	400 MB	low	internal	High — Traccar data persistence
openclaw-gateway	Docker	custom	200 MB	low	18789-18790	High — WhatsApp issue intake
node_exporter	systemd	system package	20 MB	minimal	9100	Medium
GitHub Actions runner (ionos)	systemd	`/opt/actions-runner`	200 MB	burst	—	High — CI/CD for et-operational-platform
GitHub Actions runner (KDP)	systemd	`/opt/actions-runner-kdp`	200 MB	burst	—	Medium — CI for amazon-kdp-tango
claude-proxy.py	systemd (python3)	`claude-proxy.py`	50 MB	minimal	8765	Medium — OpenClaw Claude proxy
claude-runner (agent)	user process	`/usr/bin/claude`	300 MB	burst	—	Medium — autonomous agent (nightly)

Estimated total: ~4.5–5 GB RAM, peaks to ~6 GB with Gotenberg/rendering bursts. Disk: ~239 GB — Prometheus data + Docker images are the main consumers. Needs monitoring.

2.2 vps-h1 — Hostinger (72.60.32.61) — Ubuntu 24.04 — 2 vCPU / 7.8 GB RAM / 96 GB

Service	Type	Image	RAM (est.)	CPU limit	Port(s)	Criticality
traefik	Docker	`traefik:v3.7.5`	80 MB	none	80, 443	Critical — TLS proxy
n8n-postgres	Docker	`postgres:16.9-alpine`	300 MB	none	internal	Critical — n8n DB
n8n	Docker	`docker.n8n.io/n8nio/n8n:2.26.3`	1.2 GB	1.5 CPU	127.0.0.1:5678	Critical — automation engine
waha	Docker	`devlikeapro/waha:noweb-2026.5.1`	800 MB	none	127.0.0.1:13000	Critical — WhatsApp incidents
node-exporter	Docker	`prom/node-exporter:v1.11.1`	20 MB	none	host:9100	Low
cadvisor	Docker	`ghcr.io/google/cadvisor:v0.57.0`	100 MB	none	0.0.0.0:8080	Low
promtail	Docker	`grafana/promtail:3.6.11`	50 MB	none	9080	Low
claude-runner (agent)	user process	`/usr/bin/claude`	300 MB	none	—	Medium — nightly automation
GitHub Actions runner (hstgr)	systemd	`/opt/actions-runner-hstgr`	200 MB	none	—	Medium — CI for et-operational-platform

Estimated total: ~3.1 GB RAM baseline, peaks to 5–6 GB during n8n workflow bursts. WAHA alone uses ~800 MB. With n8n at 1.2 GB, these two services consume 2 GB of the 7.8 GB available. CPU is the critical constraint: 1.5 CPU cap on n8n leaves only 0.5 CPU for everything else — traefik, waha, cadvisor, node-exporter, OS.

2.3 bms-1 — OVH Kimsufi (94.23.26.113) — Ubuntu 20.04 EOL — 8 cores / 32 GB RAM

Service	Type	RAM (est.)	Criticality
nginx-proxy + letsencrypt	Docker	200 MB	Critical — TLS for all Pinbox24 services
portainer	Docker	200 MB	Low — management UI
Pinbox24 v31-v42 (4 instances)	Docker (ECR)	~4 GB total	Critical — production app
mailgun container	Docker	100 MB	High — transactional email relay
pdf-gen (wkhtml)	Docker	300 MB	High — internal PDF gen for Pinbox24
git-deploy	Docker	100 MB	Medium — deployment automation

Known critical issue: disk 100% full. Pinbox24 production may fail on next container update. OS: Ubuntu 20.04 LTS — EOL April 2025. Security risk. No migration plan yet. No Prometheus monitoring.

2.4 bms-2 — OVH Kimsufi (145.239.133.104) — Ubuntu 24.04 — 8 vCPU / 32 GB RAM / 410 GB

Service	Type	RAM (est.)	Criticality
mongod (rs0 observer, non-voting)	systemd	2–4 GB	High — replica set read replica
claude-runner (AI-Dev-OV1 agent)	user process	300 MB per agent	Medium — max 4 parallel

Disk: 16% used (62/410 GB). Ample headroom. No Docker services deployed. No monitoring (node_exporter not installed).

2.5 bms-3 — OVH Kimsufi (51.68.155.224) — Ubuntu 22.04 — 8 vCPU / 32 GB RAM / 410 GB

Service	Type	RAM (est.)	Criticality
mongod (rs0 PRIMARY)	systemd	21.7 GB	Critical — Pinbox24 production DB
Pinbox24 v31-v42 staging (4 instances)	Docker (ECR)	~3 GB	Medium — staging
traccar	Docker	500 MB	Medium — GPS tracking (staging)
nginx-proxy + letsencrypt	Docker	200 MB	Medium — TLS for staging
portainer-pinbox24	Docker	200 MB	Low
mt5	Docker	500 MB	Low

WARNING: MongoDB at 21.7 GB RAM leaves only ~10 GB for all Docker workloads. At risk of OOM if staging load spikes. Disk: 44% used (170/410 GB). Monitor — staging logs can fill quickly. No Prometheus monitoring (node_exporter not installed).

2.6 bms-4 — OVH Kimsufi (54.36.123.110) — Ubuntu 22.04 — 8 vCPU / 32 GB RAM / 1.8 TB

Service	Type	RAM (est.)	Status
mongod (rs0 arbiter)	systemd	~75 MB	Active — awaiting `rs.addArb()` by human
prometheus-node-exporter	systemd	20 MB	Active
traefik	Docker (planned)	80 MB	In repo, not yet deployed
n8n-postgres	Docker (planned)	300 MB	In repo, not yet deployed
n8n	Docker (planned)	1.2 GB	Migration pending
node-exporter	Docker (planned)	20 MB	In repo
cadvisor	Docker (planned)	100 MB	In repo

Disk: 1.7 TB free (8.3 GB / 1.8 TB used). No disk pressure ever expected. Free RAM after planned services: ~30 GB. Massive headroom for additional workloads.

2.7 SaaS Services

Service	Provider	Plan	Role	Criticality
Supabase	Supabase	Pro	Fleet management DB, audit engine, DevOps tables	Critical
Vercel	Vercel	Team	et-operational-platform, p24-nextjs-v2026, portal	Critical
Cloudflare	Cloudflare	Free/Pro	DNS (zintegrowana.online), WAF for Workers	Critical
Wasabi S3	Wasabi	Pay-as-you-go	Thanos metrics, PDFs, backup status JSON	High
GitHub	GitHub	Pro	Source control, CI/CD, issue tracking	Critical
Mailgun EU	Mailgun	Pay-as-you-go	Alertmanager email, Pinbox24 transactional	High
n8n.io Cloud	n8n.io	Paid	Separate automation instance (backups etc.)	Medium
Convertio.ai	Convertio	SaaS	PDF-to-JPG for Pinbox24 Angular	Scheduled for replacement

3. vps-h1 Load Analysis

Current state

Resource	Total	n8n	WAHA	Traefik+PG	Observability	OS+misc	Headroom
RAM	7.8 GB	~1.2 GB	~800 MB	~380 MB	~170 MB	~400 MB	~2.8 GB
CPU	2.0 vCPU	cap 1.5	~0.1–0.3	~0.1	~0.05	~0.1	~0 (negative on burst)
Disk	96 GB	~5–10 GB	~2 GB	~1 GB	minimal	OS	~75 GB

Risk assessment

Severity: HIGH — near capacity on CPU, moderate on RAM

CPU is the primary constraint. n8n has a hard cap at 1.5 CPU — this leaves only 0.5 CPU for the remaining 6 containers plus the OS kernel. When n8n executes multiple workflows in parallel (GPS sync, WhatsApp routing, daily report generation), it pegs the 1.5 CPU limit while everything else starves.
WAHA is a critical single point of failure. The WhatsApp integration is the primary incident reporting channel. If vps-h1 crashes or OOMs, incident reports from WhatsApp stop flowing to n8n and Supabase. WAHA does not have automatic reconnect to WhatsApp Web; phone number re-pairing requires human action (scanning QR code).
n8n-postgres has no CPU cap. PostgreSQL can occasionally spike CPU during autovacuum or complex queries from n8n’s execution history pruning. This can push total CPU above 2.0 vCPU.
No redundancy. All critical automation runs on a single 2-vCPU machine with no failover.
96 GB disk is limited for a machine also running the GitHub Actions runner which builds et-operational-platform.

What should move first

n8n (with n8n-postgres) is the highest priority because:

The compose file for bms-4 is already written and committed
n8n consumes 1.5 CPU — removing it frees 75% of vps-h1’s CPU
Migration checklist is documented in docs/servers/p4-ovh-bms-4-ns3101999-operations.md

WAHA should move second because:

After n8n leaves, WAHA is the only remaining critical workload on vps-h1
WAHA’s webhook URL currently points to n8n.vps-h1.infra.zintegrowana.online — once n8n moves to bms-4, this config line must change anyway
bms-4 already has Traefik configured for TLS termination
Keeping WAHA on vps-h1 alone maintains a second monthly VPS cost for a single container

Post-n8n-migration state on vps-h1

After n8n and n8n-postgres move to bms-4:

Remaining service	RAM	Critical?
traefik	80 MB	Yes (for WAHA TLS)
waha	800 MB	Critical
node-exporter	20 MB	No
cadvisor	100 MB	No
promtail	50 MB	No
claude-runner (nightly agent)	300 MB	Medium
GitHub Actions runner (hstgr)	200 MB	Medium

Conclusion: vps-h1 would have ~6.5 GB free RAM and ~1.9 vCPU free — massively underutilised for ~10€/month. Recommendation: migrate WAHA to bms-4, reassign GitHub Actions runner to bms-4, decommission vps-h1.

4. bms-4 Expansion Plan

Current state (post-provisioning)

MongoDB arbiter: 75 MB RAM
node_exporter: 20 MB RAM
Docker CE: installed, no containers running
Free: ~31.9 GB RAM, 7.5 vCPU, 1.7 TB disk

Phase 1 — n8n migration (immediate, PENDING)

Deploy from bms-4/docker-compose.yml:

Service	RAM (est.)	CPU	Notes
traefik	80 MB	low	TLS for all bms-4 services
n8n-postgres	300 MB	low	PostgreSQL 16 for n8n
n8n	1.2 GB	cap 1.5	Migrated from vps-h1
node-exporter (Docker)	20 MB	minimal	Duplicate of systemd exporter; needed for cadvisor compat
cadvisor	100 MB	low	Container metrics

After Phase 1: ~30.2 GB RAM free, ~6 vCPU free.

Phase 2 — WAHA migration (after n8n verified stable)

Add WAHA to bms-4 compose:

Service	RAM (est.)	Notes
waha	800 MB	Requires updating WHATSAPP_HOOK_URL → `n8n.bms-4.infra.zintegrowana.online`

After Phase 2: ~29.4 GB RAM free.

Phase 3 — PDF services (new deployment, see §5)

Service	RAM (est.)	Notes
gotenberg	400 MB	Chromium PDF renderer
pdf-service-pinbox	100 MB	New instance — production-facing
pdf-to-jpg	200 MB	New service — replaces Convertio.ai

After Phase 3: ~28.7 GB RAM free.

Phase 4 — AI-Dev-BMS4 agent (see §6)

Service	RAM (est.)	Notes
claude-runner (user process)	300 MB per agent	Max 4 parallel agents
GitHub Actions runner	200 MB	Relocate from vps-h1

After Phase 4: ~27.9 GB RAM free (4 agents simultaneously).

bms-4 Resource Forecast (fully loaded)

Category	RAM	CPU	Disk
MongoDB arbiter	75 MB	negligible	minimal
n8n stack	1.6 GB	1.5 cap + overhead	10 GB
WAHA	800 MB	0.2	2 GB
PDF services	700 MB	burst	5 GB
Observability	240 MB	minimal	1 GB
AI agents (4x)	1.2 GB	burst	—
OS + kernel	~500 MB	0.5	8 GB
TOTAL	~5.1 GB	~5 vCPU peak	~26 GB
Remaining headroom	~26.9 GB	~3 vCPU idle	~1.77 TB

bms-4 can comfortably absorb all planned workloads while retaining over 80% RAM headroom.

5. PDF Services Production Plan

Current state

The existing pdf-service (on vps-i1, port 8100) serves only:

Claude Code agents via p24-infra-mcp MCP server
audit-engine internal workflows

It is not accessible to Pinbox24 production on bms-1, and the existing instance should remain dedicated to internal tooling.

Requirements for production PDF services

PDF generation — replace pdf-gen (wkhtml container on bms-1) with a modern Gotenberg-based service callable by Pinbox24 Angular
PDF-to-JPG conversion — replace Convertio.ai (external SaaS); must accept PDF input, return JPG/PNG output; used for document thumbnails in Pinbox24

Option analysis

Option	PDF Generation	PDF-to-JPG	Network path from bms-1	Disk pressure on bms-1
A: Deploy on bms-4	Gotenberg + pdf-service	Gotenberg img conversion or ImageMagick	Cross-server HTTP (LAN-speed OVH internal network)	None
B: Deploy on bms-1	Same	Same	localhost	CRITICAL — disk 100% full
C: Deploy on bms-3	Same	Same	Cross-server HTTP	None

Option B is eliminated: bms-1 disk is 100% full and OS is EOL. Adding containers there would immediately fail.

Option C is not recommended: bms-3 RAM is at ~25 GB/32 GB (MongoDB alone uses 21.7 GB), leaving only ~10 GB for all staging containers. Adding PDF conversion load on an already memory-constrained server is a stability risk for the MongoDB primary.

Recommendation: Option A — Deploy on bms-4.

Justification:

bms-4 has 28+ GB free RAM and 1.7 TB free disk after all planned services
Traefik is already being deployed on bms-4 — PDF services get TLS endpoints for free
OVH bare metal servers are on the same internal network — cross-server HTTP between bms-1 → bms-4 is low-latency
bms-4 will also run AI-Dev-BMS4 agent, which can monitor and restart PDF services autonomously
Separating PDF infrastructure from vps-i1 (monitoring stack) removes a cross-concern dependency
A dedicated pdf.bms-4.infra.zintegrowana.online endpoint can be secured with API key auth (same pattern as existing pdf-service)

Deployment plan for bms-4 PDF services

Service 1: pdf-service-p24 (PDF generation for Pinbox24)

Reuse existing infra-src/gotenberg/pdf-service/ codebase with a new Gotenberg instance.

# Add to bms-4/docker-compose.yml
 
  gotenberg-p24:
    image: gotenberg/gotenberg:8.34.0
    restart: unless-stopped
    command:
      - gotenberg
      - --chromium-disable-javascript=false   # Pinbox24 Angular uses JS-rendered pages
      - --api-timeout=60s
 
  pdf-service-p24:
    build: ../infra-src/gotenberg/pdf-service
    restart: unless-stopped
    environment:
      - PDF_SERVICE_API_KEY=${PDF_SERVICE_API_KEY}
      - GOTENBERG_URL=http://gotenberg-p24:3000
      - SUPABASE_URL=${SUPABASE_URL}
      - SUPABASE_SERVICE_KEY=${SUPABASE_SERVICE_KEY}
      - WASABI_ACCESS_KEY=${WASABI_ACCESS_KEY}
      - WASABI_SECRET_KEY=${WASABI_SECRET_KEY}
      - WASABI_BUCKET=p24-infra
      - WASABI_ENDPOINT=https://s3.eu-central-2.wasabisys.com
      - WASABI_REGION=eu-central-2
    labels:
      - traefik.enable=true
      - traefik.http.routers.pdf-p24.rule=Host(`pdf.bms-4.infra.zintegrowana.online`)
      - traefik.http.routers.pdf-p24.tls=true
      - traefik.http.routers.pdf-p24.entrypoints=websecure
      - traefik.http.routers.pdf-p24.tls.certresolver=mytlschallenge

Service 2: pdf-to-jpg (Convertio.ai replacement)

Gotenberg supports POST /forms/chromium/convert/url and POST /forms/libreoffice/convert but not direct PDF-to-image rasterisation. The correct approach is a thin Python microservice using pdf2image (wraps pdftoppm) or ImageMagick via Ghostscript.

Recommended implementation: pdf-to-jpg microservice using pdf2image Python library.

Input: PDF file (multipart upload)
Output: JPG bytes (single page or ZIP of all pages)
Auth: same PDF_SERVICE_API_KEY pattern
Base image: python:3.12-slim with poppler-utils installed
RAM: ~150–200 MB per conversion, ~50 MB idle

  pdf-to-jpg:
    build: ../infra-src/pdf-to-jpg   # new microservice to be created
    restart: unless-stopped
    environment:
      - PDF_SERVICE_API_KEY=${PDF_SERVICE_API_KEY}
      - DPI=${PDF_TO_JPG_DPI:-150}
      - MAX_FILE_MB=${PDF_TO_JPG_MAX_MB:-20}
    labels:
      - traefik.enable=true
      - traefik.http.routers.pdf-to-jpg.rule=Host(`pdf.bms-4.infra.zintegrowana.online`) && PathPrefix(`/v1/pdf-to-jpg`)
      - traefik.http.routers.pdf-to-jpg.tls=true
      - traefik.http.routers.pdf-to-jpg.entrypoints=websecure
      - traefik.http.routers.pdf-to-jpg.tls.certresolver=mytlschallenge

Pinbox24 migration path:

Deploy pdf-service-p24 + pdf-to-jpg on bms-4
Test endpoints from bms-1: curl -X POST https://pdf.bms-4.infra.zintegrowana.online/v1/md-render ...
Update Pinbox24 Angular app config to use new endpoints (remove Convertio.ai API key)
Remove Convertio.ai subscription once verified
Remove pdf-gen + wkhtml containers from bms-1 (helps reclaim the critical disk space)

Implementation note on pdf-to-jpg microservice: The infra-src/pdf-to-jpg/ directory needs to be created with:

app.py (FastAPI, similar pattern to pdf-service)
Dockerfile (python:3.12-slim, installs poppler-utils via apt, pdf2image via pip)
tests/ directory

6. AI-Dev-BMS4 Agent Setup Plan

Overview

bms-4 should have a Claude Code autonomous agent (AI-Dev-BMS4) for:

Local Docker operations on bms-4 itself
Managing n8n, WAHA, PDF services
Running scheduled tasks and issue implementation

User setup

Follow the same pattern as AI-Dev-OV1 on bms-2 and claude-admin on bms-3:

# Run as root on bms-4 (54.36.123.110)
 
# 1. Create claude-runner user
useradd -m -s /bin/bash claude-runner
mkdir -p /home/claude-runner/workspace
 
# 2. Create claude-admin user for SSH access from GitHub Actions / remote ops
useradd -m -s /bin/bash claude-admin
mkdir -p /home/claude-admin/.ssh
 
# 3. Install VPS_SSH_PRIVATE_KEY public part for claude-admin
echo "<VPS_SSH_PRIVATE_KEY public part>" > /home/claude-admin/.ssh/authorized_keys
chmod 700 /home/claude-admin/.ssh && chmod 600 /home/claude-admin/.ssh/authorized_keys
chown -R claude-admin:claude-admin /home/claude-admin/.ssh
 
# 4. Grant scoped sudo to claude-admin
echo "claude-admin ALL=(ALL) NOPASSWD: /usr/bin/docker, /bin/systemctl, /bin/mkdir, /bin/chown, /bin/cp, /usr/bin/tee" \
  > /etc/sudoers.d/claude-admin
 
# 5. Install Claude Code CLI
curl -fsSL https://raw.githubusercontent.com/anthropics/claude-code/refs/heads/main/install.sh | bash
# or: copy from bms-2 if network is slow
 
# 6. Copy OAuth credentials from local workstation
# scp C:\Users\konar\.claude\.credentials.json root@54.36.123.110:/home/claude-runner/.claude/
# mkdir -p /home/claude-runner/.claude
# chown -R claude-runner:claude-runner /home/claude-runner/.claude
 
# 7. Clone infra repo
git clone https://github.com/radieu/p24-infra /opt/p24-infra
chown -R claude-runner:claude-runner /opt/p24-infra
 
# 8. Set up .claude-env for secrets
# Copy from vps-h1 or create fresh — contains GITHUB_TOKEN, PDF_SERVICE_API_KEY, etc.

GitHub setup

Label: AI-Dev-BMS4
GitHub user: AI-Dev-BMS4 (to be created using ai-dev-bms4@zintegrowana.online)
Cloudflare email route: Add routing rule for ai-dev-bms4@zintegrowana.online → radieu@gmail.com
Repository access: Add as collaborator to radieu/p24-infra (write) and radieu/et-operational-platform (write)

Resource limits

Resource	Limit	Rationale
Parallel agents	max 4	bms-4 has 8 vCPU, matches AI-Dev-OV1 on bms-2
RAM per agent	~300 MB (unbounded)	32 GB RAM — no pressure
Disk for workspace	50 GB	/opt/ on 1.8 TB RAID1

Capabilities

Full docker access (via claude-admin scoped sudo)
systemctl for mongod management
SSH to other servers using VPS_SSH_PRIVATE_KEY (via GitHub Actions)
Access to p24-infra-mcp MCP server for PDF operations

7. Service Distribution Matrix

Security risk

Service	SPoF Risk	Security Risk	Action
WAHA (vps-h1)	HIGH — only 1 node, no failover	Medium — exposed WhatsApp session	Migrate to bms-4; consider session backup
n8n (vps-h1)	HIGH — only 1 node	High — holds all automation secrets	Migrate to bms-4 (larger, more stable)
Supabase	Low — managed HA	Low — managed service	None
Prometheus (vps-i1)	Medium — single node	Low — internal only	Thanos provides remote backup
Traccar (vps-i1)	Medium — single node	Low — internal only	Low priority
mongod PRIMARY (bms-3)	HIGH — if bms-3 fails, writes stop	Medium — RS election needed	bms-4 arbiter provides election quorum
pdf-service (vps-i1)	Medium — mixed with monitoring stack	Low — API key protected	Deploy dedicated instance on bms-4
Convertio.ai	Medium — external SaaS dependency	HIGH — sends Pinbox24 documents to 3rd party	Replace with self-hosted pdf-to-jpg ASAP
bms-1 disk	CRITICAL — 100% full	High — EOL OS	Emergency: prune Docker images, plan migration
bms-3 RAM	High — MongoDB at 21.7 GB	Low	Monitor for OOM
openclaw gateway (vps-i1)	Medium — no failover	Low — HMAC verified	Acceptable

Single Point of Failure analysis

Component	SPoF?	Mitigation
n8n workflows	Yes — single instance	n8n volume backed by PostgreSQL; data survives container restart
WAHA WhatsApp session	Yes	Session stored in Docker volume (`waha_sessions`); requires QR re-pairing if session corrupted
MongoDB PRIMARY (bms-3)	Partial — arbiter on bms-4 allows election	bms-2 observer can be promoted; bms-4 arbiter provides quorum
Vercel (et-operational-platform)	No — Vercel HA	None needed
Supabase	No — managed HA	None needed
PDF generation (vps-i1)	Yes — Gotenberg is single instance	Acceptable for internal tooling; production instance on bms-4 adds redundancy
Prometheus metrics	Partial — 15d local + Wasabi via Thanos	Thanos provides durable history
GitHub Actions runner	Partial — only 1 ionos runner	hstgr runner on vps-h1 is backup

8. Recommended Target Architecture

Desired final state (after all migrations)

+─────────────────────────────────────────────────────────────────────────────+
|  COMPUTE INFRASTRUCTURE                                                     |
+─────────────────────────────────────────────────────────────────────────────+

vps-i1 (IONOS) — 6 vCPU / 7.4 GB — MONITORING HUB
  ├── Caddy (TLS)
  ├── Prometheus + Thanos + Alertmanager
  ├── Grafana + Image Renderer
  ├── Loki + Promtail
  ├── Uptime Kuma
  ├── All custom Python exporters (7x)
  ├── Blackbox exporter
  ├── pdf-service + p24-infra-mcp (internal tooling only)
  ├── audit-engine
  ├── Traccar + MySQL (GPS)
  ├── OpenClaw WhatsApp gateway
  ├── GitHub Actions runner (ionos) — CI/CD
  └── claude-runner autonomous agent (nightly)

bms-4 (OVH) — 8 vCPU / 32 GB / 1.8 TB — AUTOMATION + PDF HUB
  ├── MongoDB arbiter (rs0 quorum)
  ├── Traefik (TLS)
  ├── n8n + n8n-postgres (migrated from vps-h1)  [Phase 1]
  ├── WAHA WhatsApp gateway (migrated from vps-h1)  [Phase 2]
  ├── gotenberg-p24 + pdf-service-p24 (Pinbox24 PDF gen)  [Phase 3]
  ├── pdf-to-jpg microservice (Convertio.ai replacement)  [Phase 3]
  ├── node-exporter + cadvisor
  ├── AI-Dev-BMS4 Claude agent (max 4 parallel)  [Phase 4]
  └── GitHub Actions runner (relocate from vps-h1)  [Phase 4]

bms-2 (OVH) — 8 vCPU / 32 GB / 410 GB — CLAUDE DEV ENV
  ├── MongoDB observer (rs0 non-voting read replica)
  └── AI-Dev-OV1 Claude agent (max 4 parallel)

bms-3 (OVH) — 8 vCPU / 32 GB / 410 GB — MONGODB PRIMARY + STAGING
  ├── MongoDB 7.0 PRIMARY (rs0)
  ├── Pinbox24 staging (v31/v32/v41/v42)
  ├── Traccar (GPS staging)
  ├── nginx-proxy + Let's Encrypt
  └── mt5

bms-1 (OVH) — 8 cores / 32 GB — PINBOX24 PRODUCTION  [EOL — plan migration]
  ├── nginx-proxy + Let's Encrypt
  ├── Pinbox24 v31/v32/v41/v42 (production)
  ├── mailgun relay
  └── git-deploy
  NOTE: pdf-gen + wkhtml removable once bms-4 PDF services are live

vps-h1 (Hostinger) — DECOMMISSION after Phase 2
  └── (empty — all workloads migrated to bms-4)

+─────────────────────────────────────────────────────────────────────────────+
|  SAAS                                                                       |
+─────────────────────────────────────────────────────────────────────────────+

Supabase — fleet management DB, audit control plane, DevOps tables
Vercel — et-operational-platform (prod + staging), portal, et-lager
Cloudflare — DNS zintegrowana.online, WAF
Wasabi S3 — Thanos blocks, PDFs, backup status
GitHub — source control, CI/CD
Mailgun EU — alerts + transactional email
n8n.io Cloud — secondary automation instance

Migration sequence

Step 1 (IMMEDIATE): rs.addArb("54.36.123.110:27017") + rs.remove dead arbiter
         └── HUMAN ACTION — requires MongoDB admin password from bms-3
Step 2 (WEEK 1):    Migrate n8n from vps-h1 → bms-4 (compose file ready)
Step 3 (WEEK 1):    Update WAHA webhook URL → n8n.bms-4.infra.zintegrowana.online
Step 4 (WEEK 2):    Migrate WAHA from vps-h1 → bms-4
Step 5 (WEEK 2):    Decommission vps-h1 (cancel subscription)
Step 6 (WEEK 3):    Create infra-src/pdf-to-jpg microservice
Step 7 (WEEK 3):    Deploy pdf-service-p24 + pdf-to-jpg on bms-4
Step 8 (WEEK 3):    Update Pinbox24 Angular to use new PDF endpoints
Step 9 (WEEK 4):    Remove Convertio.ai subscription
Step 10 (WEEK 4):   Install AI-Dev-BMS4 agent + GitHub user
Step 11 (ONGOING):  Monitor bms-1 disk (CRITICAL), plan OS migration Ubuntu 20.04 → 24.04
Step 12 (ONGOING):  Install node_exporter on bms-2 and bms-3 for Prometheus coverage

Appendix: Open Issues

Issue	Server	Severity	Action required
bms-1 disk 100% full	bms-1	CRITICAL	`docker system prune -f`, identify largest directories, plan migration
bms-1 Ubuntu 20.04 EOL	bms-1	HIGH	Plan in-place upgrade or migration to bms-4-like server
WAHA at risk on overloaded vps-h1	vps-h1	HIGH	Migrate to bms-4 (Phase 2)
MongoDB arbiter not yet added to rs0	bms-4	HIGH	Human action: `rs.addArb()` from bms-3
n8n migration not yet executed	vps-h1/bms-4	HIGH	Execute migration checklist
bms-2 + bms-3 not in Prometheus	bms-2, bms-3	MEDIUM	Install node_exporter, add scrape targets
bms-3 MongoDB RAM at 21.7 GB	bms-3	MEDIUM	Monitor; alert if container RAM pressure approaches 10 GB remaining
n8n bms-4 compose has deprecated `N8N_RUNNERS_ENABLED=false`	bms-4	LOW	Remove from `bms-4/docker-compose.yml` (current branch fixes it on vps-h1)
Convertio.ai — external SaaS with document exposure risk	Pinbox24	MEDIUM	Phase 3: deploy pdf-to-jpg on bms-4
claude-admin not set up on bms-2 and bms-3	bms-2, bms-3	LOW	Follow setup instructions in ops workbooks

p24-infra Docs

Explorer

01-service-distribution

Service Distribution Evaluation — p24-infra

1. Executive Summary

Key findings

Top recommendations (priority order)

2. Complete Service Inventory

2.1 vps-i1 — IONOS (217.154.82.162) — AlmaLinux 9.7 — 6 vCPU / 7.4 GB RAM / 239 GB

2.2 vps-h1 — Hostinger (72.60.32.61) — Ubuntu 24.04 — 2 vCPU / 7.8 GB RAM / 96 GB

2.3 bms-1 — OVH Kimsufi (94.23.26.113) — Ubuntu 20.04 EOL — 8 cores / 32 GB RAM

2.4 bms-2 — OVH Kimsufi (145.239.133.104) — Ubuntu 24.04 — 8 vCPU / 32 GB RAM / 410 GB

2.5 bms-3 — OVH Kimsufi (51.68.155.224) — Ubuntu 22.04 — 8 vCPU / 32 GB RAM / 410 GB

2.6 bms-4 — OVH Kimsufi (54.36.123.110) — Ubuntu 22.04 — 8 vCPU / 32 GB RAM / 1.8 TB

2.7 SaaS Services

3. vps-h1 Load Analysis

Current state

Risk assessment

What should move first

Post-n8n-migration state on vps-h1

4. bms-4 Expansion Plan

Current state (post-provisioning)

Phase 1 — n8n migration (immediate, PENDING)

Phase 2 — WAHA migration (after n8n verified stable)

Phase 3 — PDF services (new deployment, see §5)

Phase 4 — AI-Dev-BMS4 agent (see §6)

bms-4 Resource Forecast (fully loaded)

5. PDF Services Production Plan

Current state

Requirements for production PDF services

Option analysis

Deployment plan for bms-4 PDF services

6. AI-Dev-BMS4 Agent Setup Plan

Overview

User setup

GitHub setup

Resource limits

Capabilities

7. Service Distribution Matrix

Security risk

Single Point of Failure analysis

8. Recommended Target Architecture

Desired final state (after all migrations)

Migration sequence

Appendix: Open Issues

Graph View

Table of Contents

Backlinks