vps-i1 — Operations Workbook

IONOS VPS (vps-i1). Primary infrastructure host: monitoring stack, Traccar GPS, OpenClaw WhatsApp gateway, GitHub Actions runner.


Architecture

IONOS VPS (217.154.82.162)   AlmaLinux 9.7
  CPU: AMD EPYC-Milan, 6 vCPUs
  RAM: 8 GB
  Role: monitoring host, GPS tracking, WhatsApp gateway, GH Actions runner
 
  Caddy (443/80) ─── TLS termination for all public endpoints

  ├── monitoring-prometheus-1       :9090  (127.0.0.1 only)
  ├── monitoring-thanos-sidecar     :10901 (uploads blocks → Wasabi)
  ├── monitoring-thanos-query       :10904 (unified PromQL)
  ├── monitoring-alertmanager-1     :9093  (127.0.0.1 only)
  ├── monitoring-loki-1             :3100  (127.0.0.1 only)
  ├── monitoring-promtail-1         (log shipper, no public port)
  ├── monitoring-grafana-1          :3000  (127.0.0.1 only)
  ├── monitoring-blackbox-exporter  :9115
  ├── monitoring-queue-exporter     :9200
  ├── monitoring-pg-stats-exporter  :9201
  ├── monitoring-cost-exporter      :9210
  ├── monitoring-vercel-exporter    :9202
  ├── monitoring-backup-exporter    :9220
  ├── node_exporter                 :9100  (host network)
  ├── openclaw-openclaw-gateway-1   :18789-18790 (proxied via Caddy)
  ├── traccar                       :8082  (web), 5027/UDP (GPS)
  ├── traccar-db                    (MySQL, internal)
  └── status.vps-i1 (Uptime Kuma)  (proxied via Caddy)

Compose files:

StackFile on serverFile in repo
Monitoring/opt/p24-infra/monitoring/docker-compose.ymlmonitoring/docker-compose.yml
OpenClaw/opt/p24-infra/openclaw/docker-compose.ymlopenclaw/docker-compose.yml
Traccar/opt/traccar/docker-compose.ymlnot tracked

Public URLs:

ServiceURLAuth
Grafanahttps://grafana.vps-i1.infra.zintegrowana.onlineGrafana login
Prometheushttps://prometheus.vps-i1.infra.zintegrowana.onlinebasic_auth (admin / GRAFANA_ADMIN_PASSWORD)
Alertmanagerhttps://alertmanager.vps-i1.infra.zintegrowana.onlinebasic_auth (admin / GRAFANA_ADMIN_PASSWORD)
OpenClawhttps://openclaw.vps-i1.infra.zintegrowana.onlineAPI key
Traccarhttps://traccar.vps-i1.infra.zintegrowana.onlineTraccar login
Statushttps://status.vps-i1.infra.zintegrowana.onlineKuma login

SSH Access

UserKeyScope
rootC:\Users\konar\.ssh\id_ed25519 (local workstation)Full admin
claude-adminGH Secret VPS_SSH_PRIVATE_KEY (ed25519)Passwordless sudo: docker, systemctl, mkdir, chown, cp, tee
# Python paramiko — non-interactive SSH from Windows
import paramiko
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect("217.154.82.162", port=22, username="root",
               key_filename=r"C:\Users\konar\.ssh\id_ed25519", timeout=15)
stdin, stdout, stderr = client.exec_command("docker compose -f /opt/p24-infra/monitoring/docker-compose.yml ps")
print(stdout.read().decode())
# Direct SSH
ssh -i C:\Users\konar\.ssh\id_ed25519 root@217.154.82.162

Container Overview

ContainerPurposePortsStack
monitoring-caddy-1TLS reverse proxy80, 443monitoring
monitoring-grafana-1Dashboards127.0.0.1:3000monitoring
monitoring-prometheus-1Metrics collection, 15d TSDB127.0.0.1:9090monitoring
monitoring-thanos-sidecarUploads TSDB blocks → Wasabi10901monitoring
monitoring-thanos-queryUnified PromQL (local + Wasabi)10904monitoring
monitoring-alertmanager-1Alert routing → email127.0.0.1:9093monitoring
monitoring-loki-1Log aggregation (14d retention)127.0.0.1:3100monitoring
monitoring-promtail-1Ships Docker logs → Lokimonitoring
monitoring-blackbox-exporterHTTP/S probes9115monitoring
monitoring-queue-exporterSupabase queue depths → Prometheus9200monitoring
monitoring-pg-stats-exporterSupabase slow-query metrics → Prometheus9201monitoring
monitoring-cost-exporterVercel/Supabase/Wasabi spend → Prometheus9210monitoring
monitoring-vercel-exporterVercel deployment state → Prometheus9202monitoring
monitoring-backup-exporterBackup freshness → Prometheus9220monitoring
node_exporterHost OS metrics9100 (host net)systemd
openclaw-openclaw-gateway-1WhatsApp gateway18789-18790openclaw
traccarGPS tracking web UI8082traccar
traccar-dbMySQL for Traccarinternaltraccar

Config Management

File / DirectoryIn repo?Notes
monitoring/docker-compose.ymlYesFull stack definition
monitoring/prometheus/YesScrape config + alert rules
monitoring/alertmanager/alertmanager.ymlYesAlert routing
monitoring/CaddyfileYesReverse proxy + TLS
monitoring/.envNo (.env.example)Secrets — on server only
openclaw/docker-compose.ymlYesOpenClaw stack
ansible/playbooks/provision-new-vps.ymlYesFull server provisioning

Apply config change

# On vps-i1 — pull latest and reload affected service
cd /opt/p24-infra
git pull
 
# Hot-reload Prometheus rules (no restart)
curl -X POST http://localhost:9090/-/reload
 
# Hot-reload Alertmanager
curl -X POST http://localhost:9093/-/reload
 
# Caddy config reload (no restart)
docker compose -f monitoring/docker-compose.yml exec caddy caddy reload --config /etc/caddy/Caddyfile
 
# Full restart of a service
cd /opt/p24-infra/monitoring
docker compose restart <service>

Backup

DataMethodScheduleDestination
Prometheus TSDBThanos sidecar continuous uploadEvery 2h (block upload)s3://ecotrans-monitoring/ (Wasabi eu-central-1)
Config + rulesGit pushOn every commitGitHub radieu/p24-infra
SSH root keyGH Secret VPS_ROOT_SSH_KEY (base64-encoded)Manual, on rotationGitHub Secrets
Traccar DBBackup script (if configured)See traccar-operations.mdWasabi
OS-level configNot backed up — rebuild from AnsibleAnsible playbook in repo
Docker volumes (stateless)N/A — ephemeral by design
Caddy TLS certs (caddy_data)Not backed up — auto-renewedRe-provisioned on restart
Alertmanager silencesNot backed upEphemeral — acceptable gap

Server rebuild source of truth: ansible/playbooks/provision-new-vps.yml — provisions full OS baseline, installs Docker, sets up claude-admin, deploys systemd units.


Restore

Scenario 1: Service crash (server intact)

# Monitoring stack
cd /opt/p24-infra/monitoring
docker compose up -d
 
# OpenClaw
cd /opt/p24-infra/openclaw
docker compose up -d

Scenario 2: Full server rebuild

# 1. Provision new IONOS VPS (same or replacement IP)
# 2. Run Ansible playbook from local workstation
ansible-playbook ansible/playbooks/provision-new-vps.yml -i <new-ip>,
 
# 3. Clone repo on new server
ssh root@<new-ip> "git clone https://github.com/radieu/p24-infra /opt/p24-infra"
 
# 4. Restore .env files from local .env.local
scp -i C:\Users\konar\.ssh\id_ed25519 monitoring/.env root@<new-ip>:/opt/p24-infra/monitoring/.env
 
# 5. Start stacks
ssh root@<new-ip> "cd /opt/p24-infra/monitoring && docker compose up -d"
ssh root@<new-ip> "cd /opt/p24-infra/openclaw && docker compose up -d"
 
# 6. Restore Prometheus TSDB from Wasabi (if needed — see monitoring-stack-operations.md)

Estimated RTO: ~30 minutes for full service restore (Ansible ~10min, stacks up ~5min, Prometheus data pull optional).

Scenario 3: Prometheus data loss only

See docs/monitoring-stack-operations.md — Restore from Wasabi section.


Healthcheck / Monitoring

CheckMethodAlert
Host reachabilityPrometheus job node scrapes 217.154.82.162:9100ServerDown fires after 5m
Container healthDocker healthcheck: directives on all monitoring containersContainerCrashLooping rule
Disk usagenode_exporterLowDisk rule (< 10% free)LowDisk fires
Memory usagenode_exporterHighMemory rule (> 90%)HighMemory fires
CPU usagenode_exporterHighCPU rule (> 80% 5m avg)HighCPU fires
SSH auth failures/var/log/secure via node_exporterSSHAuthFailures rule

Manual check:

# Container status
ssh -i C:\Users\konar\.ssh\id_ed25519 root@217.154.82.162 \
  "cd /opt/p24-infra/monitoring && docker compose ps"
 
# node_exporter up?
curl http://217.154.82.162:9100/metrics | head -5

Password Rotation

SSH key rotation (root + claude-admin)

Rotation frequency: 365 days. Last rotated: see docs/secrets-rotation-log.md.

# 1. Generate new key pair (on local workstation)
ssh-keygen -t ed25519 -f C:\Users\konar\.ssh\id_ed25519_new -C "vps-i1-root-$(date +%F)"
 
# 2. Add new public key to authorized_keys on vps-i1
ssh -i C:\Users\konar\.ssh\id_ed25519 root@217.154.82.162 \
  "echo '<new-public-key>' >> /root/.ssh/authorized_keys"
 
# 3. Verify new key works
ssh -i C:\Users\konar\.ssh\id_ed25519_new root@217.154.82.162 "hostname"
 
# 4. Remove old key
ssh -i C:\Users\konar\.ssh\id_ed25519_new root@217.154.82.162 \
  "sed -i '/<old-key-fingerprint>/d' /root/.ssh/authorized_keys"
 
# 5. Replace local key
mv C:\Users\konar\.ssh\id_ed25519_new C:\Users\konar\.ssh\id_ed25519
mv C:\Users\konar\.ssh\id_ed25519_new.pub C:\Users\konar\.ssh\id_ed25519.pub
 
# 6. Update GH Secret VPS_ROOT_SSH_KEY (base64)
$key = [Convert]::ToBase64String([IO.File]::ReadAllBytes("C:\Users\konar\.ssh\id_ed25519"))
gh secret set VPS_ROOT_SSH_KEY -b $key -R radieu/p24-infra
 
# 7. Update VPS_SSH_PRIVATE_KEY for claude-admin separately if different key
# 8. Log in docs/secrets-rotation-log.md

Grafana admin / Prometheus basic_auth password

See docs/grafana-operations.md and docs/monitoring-stack-operations.md.


Troubleshooting

SymptomLikely causeFix
docker compose ps shows container Exit 1Bad config or missing env vardocker compose logs <container>
Prometheus targets all DOWNPrometheus itself restarteddocker compose restart prometheus
Caddy 502Upstream container not runningdocker compose up -d <service>
SSH connection refusedsshd crashed or firewall changedConsole login via IONOS panel → systemctl restart sshd
Disk fullLog accumulation or Prometheus TSDBdocker system prune -f; extend volume if needed
node_exporter unreachableSystemd service stoppedsystemctl restart node_exporter