n8n — Operations Workbook

n8n workflow automation running on OVH bms-4 (n8n.bms-4.infra.zintegrowana.online). All workflows migrated from vps-h1 on 2026-06-15; vps-h1 n8n decommissioned same day.

Migration (2026-06-15): All workflows moved from vps-h1 → bms-4 (PostgreSQL backend, Traefik TLS). vps-h1 n8n containers stopped and removed — vps-h1 now runs only WAHA + Traefik + monitoring exporters.


Architecture

OVH bms-4 (54.36.123.110)
├── Container: n8n                    image: docker.n8n.io/n8nio/n8n:2.26.3
│   ├── port 5678                    web UI + webhook receiver (internal only)
│   ├── EXECUTIONS_MODE=queue        enqueues jobs to Redis; does NOT execute them
│   ├── DB: n8n-postgres container
│   └── Redis: redis container (job broker)

├── Container: n8n-worker-1           image: docker.n8n.io/n8nio/n8n:2.26.3
│   ├── port 5679                    Prometheus scrape endpoint
│   ├── cpus: 1.5 · N8N_RUNNERS_MAX_CONCURRENCY=3
│   └── dequeues jobs from Redis, executes, writes results to PostgreSQL

├── Container: n8n-worker-2           image: docker.n8n.io/n8nio/n8n:2.26.3
│   ├── port 5680                    Prometheus scrape endpoint
│   ├── cpus: 1.5 · N8N_RUNNERS_MAX_CONCURRENCY=3
│   └── same as n8n-worker-1

├── Container: redis                  image: redis:7-alpine
│   ├── AOF persistence (appendonly yes)
│   ├── maxmemory 256mb (noeviction policy)
│   └── password-protected (REDIS_PASSWORD)

├── Container: redis-exporter         image: oliver006/redis_exporter:v1.67.0
│   └── port 9121                    Prometheus scrape (vps-i1 → 54.36.123.110:9121)

└── Container: traefik (bms-4)        TLS termination → n8n:5678

Public URL: https://n8n.bms-4.infra.zintegrowana.online
Compose file on server: /opt/bms4-services/docker-compose.yml
Compose file in repo: bms-4/docker-compose.yml (canonical — keep in sync)


Active Workflows

WorkflowTriggerPurpose
ATRAX GPS SyncCron every 15minFetches vehicle positions from ATRAX API → Supabase fleet_positions
wa-routerWebhook (WAHA)Routes WhatsApp messages to AI / Supabase
wa-ai-to-inboxWebhookWhatsApp AI reply generation
wa-processing-watchdogCron every 5minAlerts if WhatsApp processing stalls
GPS Sync WatchdogCron every 30minAlerts if last GPS sync >30min ago
Daily Fleet ReportCron 07:00 UTCGenerates and emails daily fleet status report

Config Management

FileLocationIn repo?Contains secrets?
docker-compose.yml/opt/bms4-services/docker-compose.ymlbms-4/docker-compose.ymlNo
PostgreSQL dataDocker volume n8n_postgres_data on bms-4Yes (credential vault)
n8n workflow exportsinfra-src/n8n-workflows/*.jsonNo (credentials stripped)

Secrets injected at runtime

n8n credentials are stored encrypted in PostgreSQL (AES-256 via N8N_ENCRYPTION_KEY).
Key env vars in /opt/bms4-services/.env on bms-4:

VariablePurpose
N8N_ENCRYPTION_KEYMaster encryption key for credential vault — must be set in shell env on bms-4, not just in docker-compose.yml, so workers can decrypt credentials at startup
BMS4_N8N_API_KEYn8n REST API key (external access)
N8N_DB_PASSWORDPostgreSQL password for n8n database
REDIS_PASSWORDRedis auth password (shared by main + workers + redis-exporter)
QUEUE_BULL_REDIS_HOSTRedis hostname — redis (Docker network)
QUEUE_BULL_REDIS_PORTRedis port — 6379
QUEUE_BULL_REDIS_PASSWORDSame value as REDIS_PASSWORD
QUEUE_BULL_REDIS_DBRedis DB index — 0

N8N_ENCRYPTION_KEY advisory: Workers read this key directly from the shell environment when they start. If it is only declared inside docker-compose.yml environment: and not exported in the host shell, workers may fail to decrypt credentials and executions will error. Always ensure it is present in /opt/bms4-services/.env and exported in the session before running docker compose up.


Queue Mode

Added 2026-06-15. n8n runs in queue mode (EXECUTIONS_MODE=queue) with Redis as the job broker.

Why queue mode

Single-container n8n was blocking on long-running workflows (Atrax GPS sync, WhatsApp AI, daily report). Queue mode offloads execution to dedicated workers, keeping the main container responsive to webhooks.

Capacity

ComponentCountCPU limitMax concurrent jobs
n8n main10.50 (webhook + scheduling only)
n8n-worker-111.53 (N8N_RUNNERS_MAX_CONCURRENCY=3)
n8n-worker-211.53
Total6 concurrent executions

Redis configuration

SettingValueReason
Imageredis:7-alpineStable LTS
PersistenceAOF (appendonly yes)Job queue survives container restart
Max memory256mb (noeviction)Prevents OOM; noeviction keeps queue intact
Authrequirepass ${REDIS_PASSWORD}Required — all clients pass password

Scaling workers

To add a third worker, clone n8n-worker-2 block in bms-4/docker-compose.yml, assign a unique port (e.g. 5681:5678), and run docker compose up -d n8n-worker-3.

Checking worker health

ssh ubuntu@54.36.123.110
cd /opt/bms4-services
 
# All containers running?
docker compose ps
 
# Worker logs
docker compose logs --tail=50 n8n-worker-1
docker compose logs --tail=50 n8n-worker-2
 
# Redis queue depth
docker compose exec redis redis-cli -a "${REDIS_PASSWORD}" llen bull:jobs:wait

Troubleshooting queue mode

SymptomCauseFix
Workers exit immediatelyN8N_ENCRYPTION_KEY not in shell envExport key in host shell; docker compose up -d again
Executions stuck in “running”Worker crashed mid-jobRestart workers; Redis retains job state
Redis OOMQueue floodedCheck for runaway workflow; docker compose restart redis (jobs lost)
Worker can’t connect to RedisREDIS_PASSWORD mismatchVerify .env has matching REDIS_PASSWORD; restart all

Deployment

Fresh install

# On bms-4 — assumes docker-compose.yml and .env are already deployed
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n n8n-postgres

Update n8n version

  1. Update image tag in bms-4/docker-compose.yml
  2. Commit and push
  3. On bms-4:
    ssh ubuntu@54.36.123.110
    cd /opt/bms4-services
    docker compose pull n8n
    docker compose up -d n8n
  4. Verify: curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz

Import a workflow

# Via REST API (no UI needed)
curl -X POST https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows \
  -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @infra-src/n8n-workflows/wa-router.json

Backup

What needs backing up

DataMethodScheduleDestination
Workflow definitionsn8n-backup.yml GH Action → WasabiNightly 03:00 UTCs3://p24-infra/n8n/workflows-YYYY-MM-DD.json
n8n credentials (encrypted)n8n-backup.yml → WasabiNightlys3://p24-infra/n8n/credentials-YYYY-MM-DD.json
PostgreSQL DB (n8n_postgres_data)Not directly backed upGap — execution history only; workflows/creds covered by API export

Note: Workflows and credentials are exported via REST API nightly. If the PostgreSQL volume is lost, workflows can be re-imported but execution history will be gone. Credentials must be re-entered manually (values are never exported).

Manual backup

# Export workflows via API
curl -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows > /tmp/n8n-workflows.json
 
# Export credentials (encrypted, no values)
curl -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/credentials > /tmp/n8n-credentials.json

Restore

Target RTO: 45 minutes (includes workflow re-activation)

Scenario 1: Container crash

ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n
# n8n_postgres_data volume preserved — no data loss

Scenario 2: Fresh install (n8n_postgres_data volume lost)

# 1. Download latest workflow backup from Wasabi
aws s3 cp s3://p24-infra/n8n/workflows-YYYY-MM-DD.json /tmp/ \
  --endpoint-url https://s3.eu-central-2.wasabisys.com
 
# 2. Start fresh n8n (PostgreSQL will be initialized on first start)
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n-postgres n8n
 
# 3. Wait for n8n to be ready
curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
 
# 4. Import workflows via API
curl -X POST https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/import \
  -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/n8n-workflows.json
 
# 5. Re-enter credentials manually in the n8n UI (cannot be restored from export)
 
# 6. Activate workflows: set active=true for each
 
# 7. Verify ATRAX sync: check Supabase fleet_positions for new entries

Note: Credentials (API keys, passwords stored in n8n) must be re-entered manually after a fresh restore. Keep the credential list in docs/elements.md § Credentials Index.


Healthcheck

Docker healthcheck: wget http://localhost:5678/healthz — defined in bms-4/docker-compose.yml

External probe: blackbox-exporter via https://n8n.bms-4.infra.zintegrowana.online/healthzEndpointDown alert.

Manual check:

curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
# Expected: {"status":"ok"}

Password Rotation

BMS4_N8N_API_KEY

# 1. Generate new key in n8n UI: Settings → API → Create API Key
 
# 2. Update /opt/bms4-services/.env on bms-4:
ssh ubuntu@54.36.123.110 "sed -i 's/BMS4_N8N_API_KEY=.*/BMS4_N8N_API_KEY=<new-key>/' /opt/bms4-services/.env"
 
# 3. Restart n8n to pick up new key:
ssh ubuntu@54.36.123.110 "docker compose -f /opt/bms4-services/docker-compose.yml restart n8n"
 
# 4. Update GH Secret (BMS4_N8N_API_KEY) and .env.local
 
# 5. Log rotation in docs/secrets-rotation-log.md

GPS Sync Workflow — atrax, kravag-scheduled-fleet-updates (CCx9UMdphmGficDX)

Production host: bms-4 (n8n.bms-4.infra.zintegrowana.online) — migrated from vps-h1 on 2026-06-15.
DB: PostgreSQL on bms-4 (not SQLite — see docs/n8n-postgresql-operations.md).
Status: Active, production, criticality = high.

Triggers

NameTypeSchedule / PathNotes
3 minutes1ScheduleEvery 3 minPrimary trigger — main fleet sync
Webhook atrax syncWebhook/webhook/atrax-sync (prod) / /webhook-test/atrax-sync (test)On-demand sync
5 morningSchedule05:00 UTC dailyDisabled 2026-06-15 — was causing double executions
When clicking 'Execute workflow'ManualDev/test only

What it writes

  • p24_gps_current_state — vehicle GPS positions (via upsert_gps_current_state node)
  • p24_l_cars_atrax — vehicle metadata (via update-atrax-pojazdy-w car-atrax node)
  • p24_l_cars — mileage via rpc/update_atrax_rt_batch; driver info via rpc/update_atrax_drivers_batch (both single batch calls per cycle — see issue #629)

Freshness is monitored by .github/workflows/atrax-data-freshness.yml (runs every 10 min).

Batch update pattern (IMPORTANT — do not revert)

All writes to p24_l_cars MUST use batch RPCs, never individual row upserts via REST API. Individual POST /rest/v1/p24_l_cars?on_conflict=... calls (one per vehicle) caused AccessExclusiveLock contention blocking all reads (issue #629, 2026-06-17).

NodeRPC calledFields written
update atraxId na rej floty by plate1rpc/update_atrax_rt_batchcurrent_mileage, km_upd_date
batch_drivers_rpcrpc/update_atrax_drivers_batchatrax_driver1, atrax_driver1_name, atrax_driver2, atrax_driver2_name

The aggregate_driver_updates node (Aggregate type) collapses all per-vehicle items into a single item before batch_drivers_rpc runs, ensuring exactly one HTTP call per sync cycle regardless of fleet size.

Known failure modes (discovered 2026-05-19, issue #201)

FailureSymptomFix
Workflow deactivated in n8nactive=0 in SQLite, no executionsSee “Re-activating a workflow” below
activeVersionId NULL in SQLiteWorkflow shows active=1 but never appears in startup activation logSee “Re-activating a workflow” below
Missing gps_atrax_installed column on p24_l_carssupabase upsert query node fails, GPS branch never runsALTER TABLE p24_l_cars ADD COLUMN IF NOT EXISTS gps_atrax_installed boolean DEFAULT false
Ambiguous update_atrax_rt_batch function overloadPGRST203 error on update atraxId na rej floty by plate1 nodeDrop old single-arg overload: DROP FUNCTION public.update_atrax_rt_batch(vehicles jsonb)
Missing status_atrax column on p24_l_carsupsert_gps_current_state fails — trigger sync_gps_state_to_cars references itALTER TABLE p24_l_cars ADD COLUMN IF NOT EXISTS status_atrax text

Re-activating a workflow after deactivation

If n8n was restarted and the startup log does NOT show Activated workflow "atrax, kravag-scheduled-fleet-updates":

# SSH to bms-4 as ubuntu (n8n now on bms-4, using PostgreSQL — not SQLite)
ssh ubuntu@54.36.123.110
 
# Check active state via n8n API
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/CCx9UMdphmGficDX \
  | jq '{id,name,active}'
 
# Re-activate via API
curl -X PATCH \
  -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"active": true}' \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/CCx9UMdphmGficDX
 
# Or restart the n8n container on bms-4
docker compose -f /opt/bms4-services/docker-compose.yml restart n8n
 
# Verify: check executions appear
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  "https://n8n.bms-4.infra.zintegrowana.online/api/v1/executions?workflowId=CCx9UMdphmGficDX&limit=3" \
  | jq '[.data[] | {id,status,startedAt}]'

n8n 2.x Upgrade (2026-05-24, issue #220)

What changed

n8n was upgraded from 1.123.44 to 2.22.0 to fix two critical CVEs:

CVECVSSDescriptionFixed in
CVE-2025-686689.9Python Code Node RCE via Pyodiden8n 2.0.0
CVE-2026-2511510.0Python sandbox escapen8n 2.4.8

Both CVEs required the Python Code Node (Pyodide-based). None of our workflows use Python — all Code nodes use JavaScript — so the real-world exposure was zero, but the CVSS scores demanded the upgrade.

Workflow compatibility audit

All workflows were audited before upgrade. All Code nodes use JavaScript (not Python):

WorkflowCode nodesLanguage2.x compatible
wa-router2JavaScriptYes
wa-ai-to-inbox3JavaScriptYes
wa-group-sync1JavaScriptYes
wa-processing-watchdog1JavaScriptYes
fleet-update-v2-batch4JavaScriptYes
alertmanager-to-incidents0Yes
sentry-to-github2JavaScriptYes
github-auto-trigger1JavaScriptYes
hu-sp-report-email1JavaScriptYes

n8n 2.x breaking changes (1.x → 2.x)

Key breaking changes in n8n 2.0.0 relevant to our stack:

  • Python Code Node removed — Pyodide sandbox dropped entirely. Not used in our workflows.
  • N8N_BASIC_AUTH_ACTIVE deprecated — use N8N_COMMUNITY_PACKAGES_ALLOW_TOOL_USAGE for community nodes. Basic auth still works for API key auth (our setup uses X-N8N-API-KEY).
  • SQLite schema migration — n8n 2.x auto-migrates the SQLite DB on first start. Expect ~10–30s extra startup time. Migration is non-destructive.
  • Execution data schemaexecutionData column type changed. Old executions remain readable but storage format differs.
  • Webhook node v2 — our workflows already use typeVersion: 2 (confirmed in audit above).
  • httpRequest node v4.2 — our workflows use this version (confirmed in audit above). No changes needed.
  • Code node v2 — our Code nodes use typeVersion: 2. Compatible with 2.x.

Deployment procedure

n8n 2.x upgrade is performed via a standard image tag bump + rolling restart:

# On bms-4 (PostgreSQL backend — no SQLite migration needed)
ssh ubuntu@54.36.123.110
cd /opt/bms4-services
docker compose pull n8n
docker compose up -d n8n
 
# Monitor startup logs for workflow activation
docker compose logs -f n8n | grep -E "Activated|error|ERROR"
 
# Verify healthz
curl -s http://localhost:5678/healthz
# Expected: {"status":"ok"}
 
# Verify all workflows are active
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  http://localhost:5678/api/v1/workflows | jq '[.data[] | {name,active}]'

Post-upgrade checklist

  • n8n container starts and /healthz returns {"status":"ok"}
  • All workflows show active: true in API response
  • wa-router webhook receives test event from WAHA
  • fleet-update-v2-batch (ATRAX sync) runs on schedule — check fleet_positions freshness
  • Trivy scan passes clean (CVE-2025-68668 and CVE-2026-25115 no longer reported)

Rollback procedure

If issues arise, roll back to previous version:

# On bms-4
ssh ubuntu@54.36.123.110
# Edit bms-4/docker-compose.yml image tag back to previous version
cd /opt/bms4-services
docker compose up -d n8n
 
# If rollback causes DB errors, restore from last Wasabi backup:
# aws s3 cp s3://p24-infra/n8n/workflows-YYYY-MM-DD.json /tmp/ \
#   --endpoint-url https://s3.eu-central-2.wasabisys.com

Troubleshooting

SymptomCauseFix
GPS positions stale (>10min)ATRAX workflow deactivated or DB schema mismatchSee “GPS Sync Workflow — Known failure modes” above; check n8n startup log
GPS positions stale (>10min)ATRAX API unreachableCheck get_token node output in latest execution; test Atrax API from bms-4
WhatsApp messages not routingwa-router webhook timeoutCheck WAHA logs on vps-h1; verify webhook URL points to n8n.bms-4
n8n UI unreachableTraefik or container downssh ubuntu@54.36.123.110 "docker compose -f /opt/bms4-services/docker-compose.yml ps"
N8N_ENCRYPTION_KEY changedCredential vault corruptedMust re-enter all credentials manually
n8n crashes on startupN8N_DB_PASSWORD not setVerify N8N_DB_PASSWORD in /opt/bms4-services/.env on bms-4