n8n — Operations Workbook
n8n workflow automation running on OVH bms-4 (n8n.bms-4.infra.zintegrowana.online). All workflows migrated from vps-h1 on 2026-06-15; vps-h1 n8n decommissioned same day.
Migration (2026-06-15): All workflows moved from vps-h1 → bms-4 (PostgreSQL backend, Traefik TLS). vps-h1 n8n containers stopped and removed — vps-h1 now runs only WAHA + Traefik + monitoring exporters.
Architecture
OVH bms-4 (54.36.123.110)
├── Container: n8n image: docker.n8n.io/n8nio/n8n:2.26.3
│ ├── port 5678 web UI + webhook receiver (internal only)
│ ├── EXECUTIONS_MODE=queue enqueues jobs to Redis; does NOT execute them
│ ├── DB: n8n-postgres container
│ └── Redis: redis container (job broker)
│
├── Container: n8n-worker-1 image: docker.n8n.io/n8nio/n8n:2.26.3
│ ├── port 5679 Prometheus scrape endpoint
│ ├── cpus: 1.5 · N8N_RUNNERS_MAX_CONCURRENCY=3
│ └── dequeues jobs from Redis, executes, writes results to PostgreSQL
│
├── Container: n8n-worker-2 image: docker.n8n.io/n8nio/n8n:2.26.3
│ ├── port 5680 Prometheus scrape endpoint
│ ├── cpus: 1.5 · N8N_RUNNERS_MAX_CONCURRENCY=3
│ └── same as n8n-worker-1
│
├── Container: redis image: redis:7-alpine
│ ├── AOF persistence (appendonly yes)
│ ├── maxmemory 256mb (noeviction policy)
│ └── password-protected (REDIS_PASSWORD)
│
├── Container: redis-exporter image: oliver006/redis_exporter:v1.67.0
│ └── port 9121 Prometheus scrape (vps-i1 → 54.36.123.110:9121)
│
└── Container: traefik (bms-4) TLS termination → n8n:5678Public URL: https://n8n.bms-4.infra.zintegrowana.online
Compose file on server: /opt/bms4-services/docker-compose.yml
Compose file in repo: bms-4/docker-compose.yml (canonical — keep in sync)
Active Workflows
| Workflow | Trigger | Purpose |
|---|---|---|
| ATRAX GPS Sync | Cron every 15min | Fetches vehicle positions from ATRAX API → Supabase fleet_positions |
| wa-router | Webhook (WAHA) | Routes WhatsApp messages to AI / Supabase |
| wa-ai-to-inbox | Webhook | WhatsApp AI reply generation |
| wa-processing-watchdog | Cron every 5min | Alerts if WhatsApp processing stalls |
| GPS Sync Watchdog | Cron every 30min | Alerts if last GPS sync >30min ago |
| Daily Fleet Report | Cron 07:00 UTC | Generates and emails daily fleet status report |
Config Management
| File | Location | In repo? | Contains secrets? |
|---|---|---|---|
docker-compose.yml | /opt/bms4-services/docker-compose.yml | ✅ bms-4/docker-compose.yml | No |
| PostgreSQL data | Docker volume n8n_postgres_data on bms-4 | ❌ | Yes (credential vault) |
| n8n workflow exports | infra-src/n8n-workflows/*.json | ✅ | No (credentials stripped) |
Secrets injected at runtime
n8n credentials are stored encrypted in PostgreSQL (AES-256 via N8N_ENCRYPTION_KEY).
Key env vars in /opt/bms4-services/.env on bms-4:
| Variable | Purpose |
|---|---|
N8N_ENCRYPTION_KEY | Master encryption key for credential vault — must be set in shell env on bms-4, not just in docker-compose.yml, so workers can decrypt credentials at startup |
BMS4_N8N_API_KEY | n8n REST API key (external access) |
N8N_DB_PASSWORD | PostgreSQL password for n8n database |
REDIS_PASSWORD | Redis auth password (shared by main + workers + redis-exporter) |
QUEUE_BULL_REDIS_HOST | Redis hostname — redis (Docker network) |
QUEUE_BULL_REDIS_PORT | Redis port — 6379 |
QUEUE_BULL_REDIS_PASSWORD | Same value as REDIS_PASSWORD |
QUEUE_BULL_REDIS_DB | Redis DB index — 0 |
N8N_ENCRYPTION_KEY advisory: Workers read this key directly from the shell environment when they start. If it is only declared inside
docker-compose.ymlenvironment:and not exported in the host shell, workers may fail to decrypt credentials and executions will error. Always ensure it is present in/opt/bms4-services/.envand exported in the session before runningdocker compose up.
Queue Mode
Added 2026-06-15. n8n runs in queue mode (EXECUTIONS_MODE=queue) with Redis as the job broker.
Why queue mode
Single-container n8n was blocking on long-running workflows (Atrax GPS sync, WhatsApp AI, daily report). Queue mode offloads execution to dedicated workers, keeping the main container responsive to webhooks.
Capacity
| Component | Count | CPU limit | Max concurrent jobs |
|---|---|---|---|
| n8n main | 1 | 0.5 | 0 (webhook + scheduling only) |
| n8n-worker-1 | 1 | 1.5 | 3 (N8N_RUNNERS_MAX_CONCURRENCY=3) |
| n8n-worker-2 | 1 | 1.5 | 3 |
| Total | — | — | 6 concurrent executions |
Redis configuration
| Setting | Value | Reason |
|---|---|---|
| Image | redis:7-alpine | Stable LTS |
| Persistence | AOF (appendonly yes) | Job queue survives container restart |
| Max memory | 256mb (noeviction) | Prevents OOM; noeviction keeps queue intact |
| Auth | requirepass ${REDIS_PASSWORD} | Required — all clients pass password |
Scaling workers
To add a third worker, clone n8n-worker-2 block in bms-4/docker-compose.yml, assign a unique port (e.g. 5681:5678), and run docker compose up -d n8n-worker-3.
Checking worker health
ssh ubuntu@54.36.123.110
cd /opt/bms4-services
# All containers running?
docker compose ps
# Worker logs
docker compose logs --tail=50 n8n-worker-1
docker compose logs --tail=50 n8n-worker-2
# Redis queue depth
docker compose exec redis redis-cli -a "${REDIS_PASSWORD}" llen bull:jobs:waitTroubleshooting queue mode
| Symptom | Cause | Fix |
|---|---|---|
| Workers exit immediately | N8N_ENCRYPTION_KEY not in shell env | Export key in host shell; docker compose up -d again |
| Executions stuck in “running” | Worker crashed mid-job | Restart workers; Redis retains job state |
| Redis OOM | Queue flooded | Check for runaway workflow; docker compose restart redis (jobs lost) |
| Worker can’t connect to Redis | REDIS_PASSWORD mismatch | Verify .env has matching REDIS_PASSWORD; restart all |
Deployment
Fresh install
# On bms-4 — assumes docker-compose.yml and .env are already deployed
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n n8n-postgresUpdate n8n version
- Update image tag in
bms-4/docker-compose.yml - Commit and push
- On bms-4:
ssh ubuntu@54.36.123.110 cd /opt/bms4-services docker compose pull n8n docker compose up -d n8n - Verify:
curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
Import a workflow
# Via REST API (no UI needed)
curl -X POST https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows \
-H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
-H "Content-Type: application/json" \
-d @infra-src/n8n-workflows/wa-router.jsonBackup
What needs backing up
| Data | Method | Schedule | Destination |
|---|---|---|---|
| Workflow definitions | n8n-backup.yml GH Action → Wasabi | Nightly 03:00 UTC | s3://p24-infra/n8n/workflows-YYYY-MM-DD.json |
| n8n credentials (encrypted) | n8n-backup.yml → Wasabi | Nightly | s3://p24-infra/n8n/credentials-YYYY-MM-DD.json |
PostgreSQL DB (n8n_postgres_data) | Not directly backed up | — | Gap — execution history only; workflows/creds covered by API export |
Note: Workflows and credentials are exported via REST API nightly. If the PostgreSQL volume is lost, workflows can be re-imported but execution history will be gone. Credentials must be re-entered manually (values are never exported).
Manual backup
# Export workflows via API
curl -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows > /tmp/n8n-workflows.json
# Export credentials (encrypted, no values)
curl -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
https://n8n.bms-4.infra.zintegrowana.online/api/v1/credentials > /tmp/n8n-credentials.jsonRestore
Target RTO: 45 minutes (includes workflow re-activation)
Scenario 1: Container crash
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n
# n8n_postgres_data volume preserved — no data lossScenario 2: Fresh install (n8n_postgres_data volume lost)
# 1. Download latest workflow backup from Wasabi
aws s3 cp s3://p24-infra/n8n/workflows-YYYY-MM-DD.json /tmp/ \
--endpoint-url https://s3.eu-central-2.wasabisys.com
# 2. Start fresh n8n (PostgreSQL will be initialized on first start)
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n-postgres n8n
# 3. Wait for n8n to be ready
curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
# 4. Import workflows via API
curl -X POST https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/import \
-H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/n8n-workflows.json
# 5. Re-enter credentials manually in the n8n UI (cannot be restored from export)
# 6. Activate workflows: set active=true for each
# 7. Verify ATRAX sync: check Supabase fleet_positions for new entriesNote: Credentials (API keys, passwords stored in n8n) must be re-entered manually after a fresh restore. Keep the credential list in
docs/elements.md§ Credentials Index.
Healthcheck
Docker healthcheck: wget http://localhost:5678/healthz — defined in bms-4/docker-compose.yml
External probe: blackbox-exporter via https://n8n.bms-4.infra.zintegrowana.online/healthz — EndpointDown alert.
Manual check:
curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
# Expected: {"status":"ok"}Password Rotation
BMS4_N8N_API_KEY
# 1. Generate new key in n8n UI: Settings → API → Create API Key
# 2. Update /opt/bms4-services/.env on bms-4:
ssh ubuntu@54.36.123.110 "sed -i 's/BMS4_N8N_API_KEY=.*/BMS4_N8N_API_KEY=<new-key>/' /opt/bms4-services/.env"
# 3. Restart n8n to pick up new key:
ssh ubuntu@54.36.123.110 "docker compose -f /opt/bms4-services/docker-compose.yml restart n8n"
# 4. Update GH Secret (BMS4_N8N_API_KEY) and .env.local
# 5. Log rotation in docs/secrets-rotation-log.mdGPS Sync Workflow — atrax, kravag-scheduled-fleet-updates (CCx9UMdphmGficDX)
Production host: bms-4 (n8n.bms-4.infra.zintegrowana.online) — migrated from vps-h1 on 2026-06-15.
DB: PostgreSQL on bms-4 (not SQLite — see docs/n8n-postgresql-operations.md).
Status: Active, production, criticality = high.
Triggers
| Name | Type | Schedule / Path | Notes |
|---|---|---|---|
3 minutes1 | Schedule | Every 3 min | Primary trigger — main fleet sync |
Webhook atrax sync | Webhook | /webhook/atrax-sync (prod) / /webhook-test/atrax-sync (test) | On-demand sync |
5 morning | Schedule | 05:00 UTC daily | Disabled 2026-06-15 — was causing double executions |
When clicking 'Execute workflow' | Manual | — | Dev/test only |
What it writes
p24_gps_current_state— vehicle GPS positions (viaupsert_gps_current_statenode)p24_l_cars_atrax— vehicle metadata (viaupdate-atrax-pojazdy-w car-atraxnode)p24_l_cars— mileage viarpc/update_atrax_rt_batch; driver info viarpc/update_atrax_drivers_batch(both single batch calls per cycle — see issue #629)
Freshness is monitored by .github/workflows/atrax-data-freshness.yml (runs every 10 min).
Batch update pattern (IMPORTANT — do not revert)
All writes to p24_l_cars MUST use batch RPCs, never individual row upserts via REST API.
Individual POST /rest/v1/p24_l_cars?on_conflict=... calls (one per vehicle) caused AccessExclusiveLock contention blocking all reads (issue #629, 2026-06-17).
| Node | RPC called | Fields written |
|---|---|---|
update atraxId na rej floty by plate1 | rpc/update_atrax_rt_batch | current_mileage, km_upd_date |
batch_drivers_rpc | rpc/update_atrax_drivers_batch | atrax_driver1, atrax_driver1_name, atrax_driver2, atrax_driver2_name |
The aggregate_driver_updates node (Aggregate type) collapses all per-vehicle items into a single item before batch_drivers_rpc runs, ensuring exactly one HTTP call per sync cycle regardless of fleet size.
Known failure modes (discovered 2026-05-19, issue #201)
| Failure | Symptom | Fix |
|---|---|---|
| Workflow deactivated in n8n | active=0 in SQLite, no executions | See “Re-activating a workflow” below |
activeVersionId NULL in SQLite | Workflow shows active=1 but never appears in startup activation log | See “Re-activating a workflow” below |
Missing gps_atrax_installed column on p24_l_cars | supabase upsert query node fails, GPS branch never runs | ALTER TABLE p24_l_cars ADD COLUMN IF NOT EXISTS gps_atrax_installed boolean DEFAULT false |
Ambiguous update_atrax_rt_batch function overload | PGRST203 error on update atraxId na rej floty by plate1 node | Drop old single-arg overload: DROP FUNCTION public.update_atrax_rt_batch(vehicles jsonb) |
Missing status_atrax column on p24_l_cars | upsert_gps_current_state fails — trigger sync_gps_state_to_cars references it | ALTER TABLE p24_l_cars ADD COLUMN IF NOT EXISTS status_atrax text |
Re-activating a workflow after deactivation
If n8n was restarted and the startup log does NOT show Activated workflow "atrax, kravag-scheduled-fleet-updates":
# SSH to bms-4 as ubuntu (n8n now on bms-4, using PostgreSQL — not SQLite)
ssh ubuntu@54.36.123.110
# Check active state via n8n API
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/CCx9UMdphmGficDX \
| jq '{id,name,active}'
# Re-activate via API
curl -X PATCH \
-H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"active": true}' \
https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/CCx9UMdphmGficDX
# Or restart the n8n container on bms-4
docker compose -f /opt/bms4-services/docker-compose.yml restart n8n
# Verify: check executions appear
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
"https://n8n.bms-4.infra.zintegrowana.online/api/v1/executions?workflowId=CCx9UMdphmGficDX&limit=3" \
| jq '[.data[] | {id,status,startedAt}]'n8n 2.x Upgrade (2026-05-24, issue #220)
What changed
n8n was upgraded from 1.123.44 to 2.22.0 to fix two critical CVEs:
| CVE | CVSS | Description | Fixed in |
|---|---|---|---|
| CVE-2025-68668 | 9.9 | Python Code Node RCE via Pyodide | n8n 2.0.0 |
| CVE-2026-25115 | 10.0 | Python sandbox escape | n8n 2.4.8 |
Both CVEs required the Python Code Node (Pyodide-based). None of our workflows use Python — all Code nodes use JavaScript — so the real-world exposure was zero, but the CVSS scores demanded the upgrade.
Workflow compatibility audit
All workflows were audited before upgrade. All Code nodes use JavaScript (not Python):
| Workflow | Code nodes | Language | 2.x compatible |
|---|---|---|---|
| wa-router | 2 | JavaScript | Yes |
| wa-ai-to-inbox | 3 | JavaScript | Yes |
| wa-group-sync | 1 | JavaScript | Yes |
| wa-processing-watchdog | 1 | JavaScript | Yes |
| fleet-update-v2-batch | 4 | JavaScript | Yes |
| alertmanager-to-incidents | 0 | — | Yes |
| sentry-to-github | 2 | JavaScript | Yes |
| github-auto-trigger | 1 | JavaScript | Yes |
| hu-sp-report-email | 1 | JavaScript | Yes |
n8n 2.x breaking changes (1.x → 2.x)
Key breaking changes in n8n 2.0.0 relevant to our stack:
- Python Code Node removed — Pyodide sandbox dropped entirely. Not used in our workflows.
N8N_BASIC_AUTH_ACTIVEdeprecated — useN8N_COMMUNITY_PACKAGES_ALLOW_TOOL_USAGEfor community nodes. Basic auth still works for API key auth (our setup usesX-N8N-API-KEY).- SQLite schema migration — n8n 2.x auto-migrates the SQLite DB on first start. Expect ~10–30s extra startup time. Migration is non-destructive.
- Execution data schema —
executionDatacolumn type changed. Old executions remain readable but storage format differs. - Webhook node v2 — our workflows already use
typeVersion: 2(confirmed in audit above). - httpRequest node v4.2 — our workflows use this version (confirmed in audit above). No changes needed.
- Code node v2 — our Code nodes use
typeVersion: 2. Compatible with 2.x.
Deployment procedure
n8n 2.x upgrade is performed via a standard image tag bump + rolling restart:
# On bms-4 (PostgreSQL backend — no SQLite migration needed)
ssh ubuntu@54.36.123.110
cd /opt/bms4-services
docker compose pull n8n
docker compose up -d n8n
# Monitor startup logs for workflow activation
docker compose logs -f n8n | grep -E "Activated|error|ERROR"
# Verify healthz
curl -s http://localhost:5678/healthz
# Expected: {"status":"ok"}
# Verify all workflows are active
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
http://localhost:5678/api/v1/workflows | jq '[.data[] | {name,active}]'Post-upgrade checklist
- n8n container starts and
/healthzreturns{"status":"ok"} - All workflows show
active: truein API response - wa-router webhook receives test event from WAHA
- fleet-update-v2-batch (ATRAX sync) runs on schedule — check
fleet_positionsfreshness - Trivy scan passes clean (CVE-2025-68668 and CVE-2026-25115 no longer reported)
Rollback procedure
If issues arise, roll back to previous version:
# On bms-4
ssh ubuntu@54.36.123.110
# Edit bms-4/docker-compose.yml image tag back to previous version
cd /opt/bms4-services
docker compose up -d n8n
# If rollback causes DB errors, restore from last Wasabi backup:
# aws s3 cp s3://p24-infra/n8n/workflows-YYYY-MM-DD.json /tmp/ \
# --endpoint-url https://s3.eu-central-2.wasabisys.comTroubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| GPS positions stale (>10min) | ATRAX workflow deactivated or DB schema mismatch | See “GPS Sync Workflow — Known failure modes” above; check n8n startup log |
| GPS positions stale (>10min) | ATRAX API unreachable | Check get_token node output in latest execution; test Atrax API from bms-4 |
| WhatsApp messages not routing | wa-router webhook timeout | Check WAHA logs on vps-h1; verify webhook URL points to n8n.bms-4 |
| n8n UI unreachable | Traefik or container down | ssh ubuntu@54.36.123.110 "docker compose -f /opt/bms4-services/docker-compose.yml ps" |
N8N_ENCRYPTION_KEY changed | Credential vault corrupted | Must re-enter all credentials manually |
| n8n crashes on startup | N8N_DB_PASSWORD not set | Verify N8N_DB_PASSWORD in /opt/bms4-services/.env on bms-4 |