p4-ovh-bms-1-ns367522 — Operations Workbook
Label:
p4-ovh-bms-1-ns367522Host:94.23.26.113Hostname:ns367522Provider: OVH / Kimsufi (ns367522.ip-94-23-26.eu) Hardware: Intel Core (8 vCPU) · 32 GB RAM · 440 GB RAID (/dev/md127) OS: Ubuntu 20.04.1 LTS ⚠️ EOL April 2025 Role: Pinbox24 SaaS production backend (multi-version) Inventoried: 2026-06-14
CRITICAL RISKS
These risks require immediate attention. Do not deprioritise.
| Risk | Severity | Status |
|---|---|---|
| Ubuntu 20.04 EOL — No security patches since April 2025. CVEs accumulate daily. Any exploited vulnerability = total production compromise. | P1 CRITICAL | Unresolved |
| Disk 85% full — 354 GB used of 440 GB. Cannot pull new Docker images, container log writes may fail, and container restarts may abort. Further growth without cleanup = outage. | P1 CRITICAL | Unresolved |
| No automated backup — MongoDB data, PostgreSQL, Redis, container volumes — none are backed up off-server. The most recent dumps are manual, local-only, and months old. A disk failure = permanent production data loss. | P1 CRITICAL | Unresolved |
Untagged Docker images — v32-prod-socket, v32-prod-reso, v41-prod run from local-only image IDs. If these containers stop, they cannot be restarted — no image to pull. | P1 CRITICAL | Unresolved |
Server Role
Primary production server for the Pinbox24 SaaS platform. Runs multiple application versions
simultaneously — different clients are on different versions (v3.x legacy, v4.x current).
Managed via jwilder/nginx-proxy with automatic Let’s Encrypt TLS.
SSH Access
| Method | Command |
|---|---|
| Human (radieu) | ssh root@94.23.26.113 — key ~/.ssh/id_ed25519 |
| Claude agent | ssh root@94.23.26.113 — VPS_SSH_PRIVATE_KEY in /root/.ssh/authorized_keys |
| OVH panel | IPMI/KVM via OVH manager — server ID 1823494 |
| Portainer | http://94.23.26.113:49154 (Portainer v1 legacy) |
Running Application Stacks
v4.2 — Current Production (AWS ECR, ~3 months uptime)
| Container | Domain | Role |
|---|---|---|
v42-prod | api.w4.pinbox24.com | Main backend (Node.js) |
s3-v42-prod | s3-api.w4.pinbox24.com | S3 microservice |
s3-v2-v42-prod | s3-v2-api.w4.pinbox24.com | S3 v2 microservice |
mailgun-v42-prod | mailgun-api.w4.pinbox24.com | Email microservice |
v41-prod | w4.pinbox24.com | Frontend (local image ⚠️, 14 months uptime) |
Registry: 563740926945.dkr.ecr.eu-central-1.amazonaws.com
v3.2 — Legacy (private registry / untagged images, 4–5 years uptime)
| Container | Domain | Role |
|---|---|---|
v31-prod | w3.pinbox24.com | Frontend (4 years) |
v32-prod | api.w3.pinbox24.com | Backend (restarted 4 months ago) |
v32-prod-socket | socket.w3.pinbox24.com | WebSocket backend (untagged image ⚠️) |
v32-prod-reso | w3.reso-integration-addrecords.pinbox24.com | RESO integration (untagged image ⚠️) |
s3-v32-prod | — | S3 microservice |
s3-v32-prod-renamed | — | S3 alias |
s3-v32-prod-socket | — | S3 for socket variant |
s3-v32-prod-reso | — | S3 for RESO variant |
cron-v32-prod | — | Cron jobs |
cron-v32-prod-socket | — | Cron for socket variant |
cron-v32-prod-reso | — | Cron for RESO variant |
v4.2 Support Microservices (~5 years uptime)
| Container | Domain | Role |
|---|---|---|
pdf-gen-v42-prod | pdf-gen-api.w4.pinbox24.com | PDF generation |
v42-notify-prod | api-notify.w4.pinbox24.com | Push notifications |
wkhtml-v42-prod | — | wkhtmltopdf-as-a-service |
git-deploy-v42-prod | git-deploy-api.w4.pinbox24.com | GitLab CI webhook / auto-deploy |
Infrastructure Containers
| Container | Port | Role |
|---|---|---|
nginx-proxy | 80, 443 | Auto-routing reverse proxy |
nginx-proxy-letsencrypt | — | TLS cert automation |
portainer-pinbox24 | 49154→9000 | Docker UI (Portainer v1, legacy ⚠️) |
Deprecated (still running)
| Container | Note |
|---|---|
s3-v42-prod-02-25-old | Superseded Feb 2025, still running — safe to stop |
Host-Native Services (outside Docker)
| Service | Detail |
|---|---|
| PM2 v5.1.0 | NodeChat v1.0.0 at /temp/p24-v-3.2, port :3001 — 4 years uptime, unclear if serving live traffic |
| Redis | 172.17.0.1:6379 — Docker bridge only, not exposed publicly |
| PostgreSQL | 127.0.0.1:5432 — local only; unknown data; not backed up |
| node_exporter | :9100 — Prometheus metrics (added 2026-06-14) |
| Netdata v1.19.0 | 127.0.0.1:19999 — local only, not integrated with Prometheus |
| GitLab runner | Active — CI/CD pipelines deploy via this runner |
Port Map
| Port | Service |
|---|---|
| 22 | SSH |
| 80 / 443 | nginx-proxy |
| 3001 | PM2 NodeChat (host-native) |
| 8081 | Unknown Node.js process |
| 9100 | node_exporter |
| 19999 | Netdata |
| 49154 | Portainer |
172.17.0.1:6379 | Redis (Docker bridge) |
127.0.0.1:5432 | PostgreSQL (local) |
Image Registries
| Registry | Used by |
|---|---|
563740926945.dkr.ecr.eu-central-1.amazonaws.com | v4.2 production stack |
private-registry.dev.pinbox24.com | v3.2 legacy stack (⚠️ reachability unknown) |
| Local image IDs (no registry) | v41-prod, v32-prod-socket, v32-prod-reso |
To authenticate with AWS ECR:
aws ecr get-login-password --region eu-central-1 | \
docker login --username AWS --password-stdin \
563740926945.dkr.ecr.eu-central-1.amazonaws.comAWS credentials for ECR are stored in Infisical bms-servers project. Never hardcode them.
Day-to-Day Operations
Check container status
docker ps
docker stats --no-streamCheck disk usage
df -h
du -sh /root/* 2>/dev/null | sort -rh | head -20
docker system dfView container logs
docker logs --tail 100 <container-name>
docker logs --since 1h <container-name>Restart a container
docker restart <container-name>Start/stop a specific Pinbox24 version
# Stop v3.2 stack (example)
docker stop v31-prod v32-prod cron-v32-prod
# Start individual container
docker start v42-prodDisk Management (CRITICAL)
Current state (2026-06-14): 354 GB used of 440 GB (85%). Large consumers in /root:
| Path | Size | Notes |
|---|---|---|
w3-2026-02-05 | ~25 GB | MongoDB dump — keep until off-server backup exists |
w4-2026-02-23 | ~19 GB | MongoDB dump — keep until off-server backup exists |
w4-2026-02-24 | ~16 GB | MongoDB dump — keep until off-server backup exists |
backup_eat1 | ~32 GB | TBD — review before deleting |
pinbox24-production | ~25 GB | TBD — review before deleting |
| Various old backups | ~100 GB | Candidate for Wasabi offload or deletion |
Emergency disk cleanup steps
# Step 1: Remove dangling images and stopped containers (safe)
docker image prune -a
docker container prune
docker volume prune
# Step 2: Clear large log files in /var/log
find /var/log -type f -name "*.gz" -delete
find /var/log -type f -name "*.1" -delete
# Step 3: Truncate Docker container log files (last resort — loses historical logs)
find /var/lib/docker/containers -name "*.log" -size +500M
# Step 4: Move MongoDB dumps to Wasabi S3 before deleting
# Use aws s3 cp or rclone — authenticate via Infisical bms-servers credentialsWarning: Do not delete the MongoDB dumps in
/rootuntil they are confirmed uploaded to Wasabi S3 and verified. These are the only existing off-container backups.
Deploy Procedure
Deployments are triggered via the git-deploy-v42-prod container, which receives GitLab CI
webhooks. New image versions are pushed to AWS ECR by the CI pipeline, then pulled here.
# Manual image pull (if auto-deploy fails):
docker pull 563740926945.dkr.ecr.eu-central-1.amazonaws.com/<image>:<tag>
docker stop v42-prod
docker run -d --name v42-prod ... <new-image>Full deploy config lives in the GitLab pipeline. Contact Pinbox24 dev team for CI details.
Monitoring
| Component | Detail |
|---|---|
node_exporter :9100 | Prometheus scrape from vps-i1 (added 2026-06-14) |
| Grafana dashboard | Servers Overview — disk %, RAM, CPU, uptime |
Netdata 127.0.0.1:19999 | Local only — not integrated with Prometheus |
| Uptime | Tracked via Grafana node_exporter uptime metric |
Backup Gap
No automated backup exists for any data on bms-1. This includes:
- MongoDB databases (v3 and v4 production data) — local dumps in
/rootonly - PostgreSQL (host-native) — no backup at all
- Redis (host-native) — no backup at all
- Docker container volumes — no backup at all
Priority action: set up automated Wasabi offloads. See 01-backups.md.
Open Tasks
Critical
- Untagged image protection — Export
v41-prod,v32-prod-socket,v32-prod-resoimages to Wasabi S3 or rebuild them. If these containers stop they cannot be restarted. - Disk cleanup — Offload 100+ GB of old backups in
/rootto Wasabi S3, verify, then delete locally. - Automated backup — Implement daily MongoDB dumps → Wasabi S3. Extend to PostgreSQL and Redis.
- OS upgrade — Ubuntu 20.04 EOL. Migrate to 24.04. Requires coordinated maintenance window.
High
- PM2 investigation — Clarify if
NodeChat v1.0.0on:3001serves live traffic. - PostgreSQL + Redis audit — Identify what data they hold, who uses them, implement backup.
- Port
:8081investigation — Unknown Node.js process. Identify and document or remove.
Medium
- Deprecated container removal — Stop and remove
s3-v42-prod-02-25-old. - Portainer v1 upgrade — Running Portainer v1 (5 years). Upgrade to CE v2+.
- Firewall policy —
ufwinactive. Only iptables rule added for node_exporter. Implement full policy. - Log rotation —
/var/logat ~16 GB. Configure logrotate. - GitLab runner review — Clarify if git-deploy and GitLab runner are running in parallel.
Low / Future
- v3.x sunset planning — Identify clients on v3.x, plan migration to v4.x.
- Registry consolidation — Clarify
private-registry.dev.pinbox24.comreachability and strategy. - Netdata integration — Expose Netdata to Prometheus or replace with node_exporter dashboards.
Known Limitations
- No IaC — server was provisioned manually. No Ansible playbook exists to reproduce it.
- Long-lived containers — some containers have never been recreated (4–5 years). Restarting them is risky if their image is untagged.
- EOL OS — no kernel or package security patches available. Any patch-requiring CVE is permanently unmitigated until OS upgrade.
- No DR plan — if this server is destroyed, recovery is partial at best (depends on state of MongoDB dumps).
- GitLab dependency — deploy automation depends on GitLab CI. GitLab account access is required for deployments.