p4-ovh-bms-1-ns367522 — Operations Workbook

Label: p4-ovh-bms-1-ns367522 Host: 94.23.26.113 Hostname: ns367522 Provider: OVH / Kimsufi (ns367522.ip-94-23-26.eu) Hardware: Intel Core (8 vCPU) · 32 GB RAM · 440 GB RAID (/dev/md127) OS: Ubuntu 20.04.1 LTS ⚠️ EOL April 2025 Role: Pinbox24 SaaS production backend (multi-version) Inventoried: 2026-06-14


CRITICAL RISKS

These risks require immediate attention. Do not deprioritise.

RiskSeverityStatus
Ubuntu 20.04 EOL — No security patches since April 2025. CVEs accumulate daily. Any exploited vulnerability = total production compromise.P1 CRITICALUnresolved
Disk 85% full — 354 GB used of 440 GB. Cannot pull new Docker images, container log writes may fail, and container restarts may abort. Further growth without cleanup = outage.P1 CRITICALUnresolved
No automated backup — MongoDB data, PostgreSQL, Redis, container volumes — none are backed up off-server. The most recent dumps are manual, local-only, and months old. A disk failure = permanent production data loss.P1 CRITICALUnresolved
Untagged Docker imagesv32-prod-socket, v32-prod-reso, v41-prod run from local-only image IDs. If these containers stop, they cannot be restarted — no image to pull.P1 CRITICALUnresolved

Server Role

Primary production server for the Pinbox24 SaaS platform. Runs multiple application versions simultaneously — different clients are on different versions (v3.x legacy, v4.x current). Managed via jwilder/nginx-proxy with automatic Let’s Encrypt TLS.


SSH Access

MethodCommand
Human (radieu)ssh root@94.23.26.113 — key ~/.ssh/id_ed25519
Claude agentssh root@94.23.26.113VPS_SSH_PRIVATE_KEY in /root/.ssh/authorized_keys
OVH panelIPMI/KVM via OVH manager — server ID 1823494
Portainerhttp://94.23.26.113:49154 (Portainer v1 legacy)

Running Application Stacks

v4.2 — Current Production (AWS ECR, ~3 months uptime)

ContainerDomainRole
v42-prodapi.w4.pinbox24.comMain backend (Node.js)
s3-v42-prods3-api.w4.pinbox24.comS3 microservice
s3-v2-v42-prods3-v2-api.w4.pinbox24.comS3 v2 microservice
mailgun-v42-prodmailgun-api.w4.pinbox24.comEmail microservice
v41-prodw4.pinbox24.comFrontend (local image ⚠️, 14 months uptime)

Registry: 563740926945.dkr.ecr.eu-central-1.amazonaws.com

v3.2 — Legacy (private registry / untagged images, 4–5 years uptime)

ContainerDomainRole
v31-prodw3.pinbox24.comFrontend (4 years)
v32-prodapi.w3.pinbox24.comBackend (restarted 4 months ago)
v32-prod-socketsocket.w3.pinbox24.comWebSocket backend (untagged image ⚠️)
v32-prod-resow3.reso-integration-addrecords.pinbox24.comRESO integration (untagged image ⚠️)
s3-v32-prodS3 microservice
s3-v32-prod-renamedS3 alias
s3-v32-prod-socketS3 for socket variant
s3-v32-prod-resoS3 for RESO variant
cron-v32-prodCron jobs
cron-v32-prod-socketCron for socket variant
cron-v32-prod-resoCron for RESO variant

v4.2 Support Microservices (~5 years uptime)

ContainerDomainRole
pdf-gen-v42-prodpdf-gen-api.w4.pinbox24.comPDF generation
v42-notify-prodapi-notify.w4.pinbox24.comPush notifications
wkhtml-v42-prodwkhtmltopdf-as-a-service
git-deploy-v42-prodgit-deploy-api.w4.pinbox24.comGitLab CI webhook / auto-deploy

Infrastructure Containers

ContainerPortRole
nginx-proxy80, 443Auto-routing reverse proxy
nginx-proxy-letsencryptTLS cert automation
portainer-pinbox2449154→9000Docker UI (Portainer v1, legacy ⚠️)

Deprecated (still running)

ContainerNote
s3-v42-prod-02-25-oldSuperseded Feb 2025, still running — safe to stop

Host-Native Services (outside Docker)

ServiceDetail
PM2 v5.1.0NodeChat v1.0.0 at /temp/p24-v-3.2, port :3001 — 4 years uptime, unclear if serving live traffic
Redis172.17.0.1:6379 — Docker bridge only, not exposed publicly
PostgreSQL127.0.0.1:5432 — local only; unknown data; not backed up
node_exporter:9100 — Prometheus metrics (added 2026-06-14)
Netdata v1.19.0127.0.0.1:19999 — local only, not integrated with Prometheus
GitLab runnerActive — CI/CD pipelines deploy via this runner

Port Map

PortService
22SSH
80 / 443nginx-proxy
3001PM2 NodeChat (host-native)
8081Unknown Node.js process
9100node_exporter
19999Netdata
49154Portainer
172.17.0.1:6379Redis (Docker bridge)
127.0.0.1:5432PostgreSQL (local)

Image Registries

RegistryUsed by
563740926945.dkr.ecr.eu-central-1.amazonaws.comv4.2 production stack
private-registry.dev.pinbox24.comv3.2 legacy stack (⚠️ reachability unknown)
Local image IDs (no registry)v41-prod, v32-prod-socket, v32-prod-reso

To authenticate with AWS ECR:

aws ecr get-login-password --region eu-central-1 | \
  docker login --username AWS --password-stdin \
  563740926945.dkr.ecr.eu-central-1.amazonaws.com

AWS credentials for ECR are stored in Infisical bms-servers project. Never hardcode them.


Day-to-Day Operations

Check container status

docker ps
docker stats --no-stream

Check disk usage

df -h
du -sh /root/* 2>/dev/null | sort -rh | head -20
docker system df

View container logs

docker logs --tail 100 <container-name>
docker logs --since 1h <container-name>

Restart a container

docker restart <container-name>

Start/stop a specific Pinbox24 version

# Stop v3.2 stack (example)
docker stop v31-prod v32-prod cron-v32-prod
 
# Start individual container
docker start v42-prod

Disk Management (CRITICAL)

Current state (2026-06-14): 354 GB used of 440 GB (85%). Large consumers in /root:

PathSizeNotes
w3-2026-02-05~25 GBMongoDB dump — keep until off-server backup exists
w4-2026-02-23~19 GBMongoDB dump — keep until off-server backup exists
w4-2026-02-24~16 GBMongoDB dump — keep until off-server backup exists
backup_eat1~32 GBTBD — review before deleting
pinbox24-production~25 GBTBD — review before deleting
Various old backups~100 GBCandidate for Wasabi offload or deletion

Emergency disk cleanup steps

# Step 1: Remove dangling images and stopped containers (safe)
docker image prune -a
docker container prune
docker volume prune
 
# Step 2: Clear large log files in /var/log
find /var/log -type f -name "*.gz" -delete
find /var/log -type f -name "*.1" -delete
 
# Step 3: Truncate Docker container log files (last resort — loses historical logs)
find /var/lib/docker/containers -name "*.log" -size +500M
 
# Step 4: Move MongoDB dumps to Wasabi S3 before deleting
# Use aws s3 cp or rclone — authenticate via Infisical bms-servers credentials

Warning: Do not delete the MongoDB dumps in /root until they are confirmed uploaded to Wasabi S3 and verified. These are the only existing off-container backups.


Deploy Procedure

Deployments are triggered via the git-deploy-v42-prod container, which receives GitLab CI webhooks. New image versions are pushed to AWS ECR by the CI pipeline, then pulled here.

# Manual image pull (if auto-deploy fails):
docker pull 563740926945.dkr.ecr.eu-central-1.amazonaws.com/<image>:<tag>
docker stop v42-prod
docker run -d --name v42-prod ... <new-image>

Full deploy config lives in the GitLab pipeline. Contact Pinbox24 dev team for CI details.


Monitoring

ComponentDetail
node_exporter :9100Prometheus scrape from vps-i1 (added 2026-06-14)
Grafana dashboardServers Overview — disk %, RAM, CPU, uptime
Netdata 127.0.0.1:19999Local only — not integrated with Prometheus
UptimeTracked via Grafana node_exporter uptime metric

Backup Gap

No automated backup exists for any data on bms-1. This includes:

  • MongoDB databases (v3 and v4 production data) — local dumps in /root only
  • PostgreSQL (host-native) — no backup at all
  • Redis (host-native) — no backup at all
  • Docker container volumes — no backup at all

Priority action: set up automated Wasabi offloads. See 01-backups.md.


Open Tasks

Critical

  • Untagged image protection — Export v41-prod, v32-prod-socket, v32-prod-reso images to Wasabi S3 or rebuild them. If these containers stop they cannot be restarted.
  • Disk cleanup — Offload 100+ GB of old backups in /root to Wasabi S3, verify, then delete locally.
  • Automated backup — Implement daily MongoDB dumps → Wasabi S3. Extend to PostgreSQL and Redis.
  • OS upgrade — Ubuntu 20.04 EOL. Migrate to 24.04. Requires coordinated maintenance window.

High

  • PM2 investigation — Clarify if NodeChat v1.0.0 on :3001 serves live traffic.
  • PostgreSQL + Redis audit — Identify what data they hold, who uses them, implement backup.
  • Port :8081 investigation — Unknown Node.js process. Identify and document or remove.

Medium

  • Deprecated container removal — Stop and remove s3-v42-prod-02-25-old.
  • Portainer v1 upgrade — Running Portainer v1 (5 years). Upgrade to CE v2+.
  • Firewall policyufw inactive. Only iptables rule added for node_exporter. Implement full policy.
  • Log rotation/var/log at ~16 GB. Configure logrotate.
  • GitLab runner review — Clarify if git-deploy and GitLab runner are running in parallel.

Low / Future

  • v3.x sunset planning — Identify clients on v3.x, plan migration to v4.x.
  • Registry consolidation — Clarify private-registry.dev.pinbox24.com reachability and strategy.
  • Netdata integration — Expose Netdata to Prometheus or replace with node_exporter dashboards.

Known Limitations

  • No IaC — server was provisioned manually. No Ansible playbook exists to reproduce it.
  • Long-lived containers — some containers have never been recreated (4–5 years). Restarting them is risky if their image is untagged.
  • EOL OS — no kernel or package security patches available. Any patch-requiring CVE is permanently unmitigated until OS upgrade.
  • No DR plan — if this server is destroyed, recovery is partial at best (depends on state of MongoDB dumps).
  • GitLab dependency — deploy automation depends on GitLab CI. GitLab account access is required for deployments.