n8n — Operations Workbook

n8n workflow automation running on OVH bms-4 (n8n.bms-4.infra.zintegrowana.online). All workflows migrated from vps-h1 on 2026-06-15; vps-h1 n8n decommissioned same day.

Migration (2026-06-15): All workflows moved from vps-h1 → bms-4 (PostgreSQL backend, Traefik TLS). vps-h1 n8n containers stopped and removed — vps-h1 now runs only WAHA + Traefik + monitoring exporters.

Architecture

OVH bms-4 (54.36.123.110)
├── Container: n8n                    image: docker.n8n.io/n8nio/n8n:2.26.3
│   ├── port 5678                    web UI + webhook receiver (internal only)
│   ├── EXECUTIONS_MODE=queue        enqueues jobs to Redis; does NOT execute them
│   ├── DB: n8n-postgres container
│   └── Redis: redis container (job broker)
│
├── Container: n8n-worker-1           image: docker.n8n.io/n8nio/n8n:2.26.3
│   ├── port 5679                    Prometheus scrape endpoint
│   ├── cpus: 1.5 · N8N_RUNNERS_MAX_CONCURRENCY=3
│   └── dequeues jobs from Redis, executes, writes results to PostgreSQL
│
├── Container: n8n-worker-2           image: docker.n8n.io/n8nio/n8n:2.26.3
│   ├── port 5680                    Prometheus scrape endpoint
│   ├── cpus: 1.5 · N8N_RUNNERS_MAX_CONCURRENCY=3
│   └── same as n8n-worker-1
│
├── Container: redis                  image: redis:7-alpine
│   ├── AOF persistence (appendonly yes)
│   ├── maxmemory 256mb (noeviction policy)
│   └── password-protected (REDIS_PASSWORD)
│
├── Container: redis-exporter         image: oliver006/redis_exporter:v1.67.0
│   └── port 9121                    Prometheus scrape (vps-i1 → 54.36.123.110:9121)
│
└── Container: traefik (bms-4)        TLS termination → n8n:5678

Public URL: https://n8n.bms-4.infra.zintegrowana.online
Compose file on server: /opt/bms4-services/docker-compose.yml
Compose file in repo: bms-4/docker-compose.yml (canonical — keep in sync)

Active Workflows

Workflow	Trigger	Purpose
ATRAX GPS Sync	Cron every 15min	Fetches vehicle positions from ATRAX API → Supabase `fleet_positions`
wa-router	Webhook (WAHA)	Routes WhatsApp messages to AI / Supabase
wa-ai-to-inbox	Webhook	WhatsApp AI reply generation
wa-processing-watchdog	Cron every 5min	Alerts if WhatsApp processing stalls
GPS Sync Watchdog	Cron every 30min	Alerts if last GPS sync >30min ago
Daily Fleet Report	Cron 07:00 UTC	Generates and emails daily fleet status report

Config Management

File	Location	In repo?	Contains secrets?
`docker-compose.yml`	`/opt/bms4-services/docker-compose.yml`	✅ `bms-4/docker-compose.yml`	No
PostgreSQL data	Docker volume `n8n_postgres_data` on bms-4	❌	Yes (credential vault)
n8n workflow exports	`infra-src/n8n-workflows/*.json`	✅	No (credentials stripped)

Secrets injected at runtime

n8n credentials are stored encrypted in PostgreSQL (AES-256 via N8N_ENCRYPTION_KEY).
Key env vars in /opt/bms4-services/.env on bms-4:

Variable	Purpose
`N8N_ENCRYPTION_KEY`	Master encryption key for credential vault — must be set in shell env on bms-4, not just in docker-compose.yml, so workers can decrypt credentials at startup
`BMS4_N8N_API_KEY`	n8n REST API key (external access)
`N8N_DB_PASSWORD`	PostgreSQL password for n8n database
`REDIS_PASSWORD`	Redis auth password (shared by main + workers + redis-exporter)
`QUEUE_BULL_REDIS_HOST`	Redis hostname — `redis` (Docker network)
`QUEUE_BULL_REDIS_PORT`	Redis port — `6379`
`QUEUE_BULL_REDIS_PASSWORD`	Same value as `REDIS_PASSWORD`
`QUEUE_BULL_REDIS_DB`	Redis DB index — `0`

N8N_ENCRYPTION_KEY advisory: Workers read this key directly from the shell environment when they start. If it is only declared inside docker-compose.yml environment: and not exported in the host shell, workers may fail to decrypt credentials and executions will error. Always ensure it is present in /opt/bms4-services/.env and exported in the session before running docker compose up.

Queue Mode

Added 2026-06-15. n8n runs in queue mode (EXECUTIONS_MODE=queue) with Redis as the job broker.

Why queue mode

Single-container n8n was blocking on long-running workflows (Atrax GPS sync, WhatsApp AI, daily report). Queue mode offloads execution to dedicated workers, keeping the main container responsive to webhooks.

Capacity

Component	Count	CPU limit	Max concurrent jobs
n8n main	1	0.5	0 (webhook + scheduling only)
n8n-worker-1	1	1.5	3 (`N8N_RUNNERS_MAX_CONCURRENCY=3`)
n8n-worker-2	1	1.5	3
Total	—	—	6 concurrent executions

Redis configuration

Setting	Value	Reason
Image	`redis:7-alpine`	Stable LTS
Persistence	AOF (`appendonly yes`)	Job queue survives container restart
Max memory	`256mb` (noeviction)	Prevents OOM; noeviction keeps queue intact
Auth	`requirepass ${REDIS_PASSWORD}`	Required — all clients pass password

Scaling workers

To add a third worker, clone n8n-worker-2 block in bms-4/docker-compose.yml, assign a unique port (e.g. 5681:5678), and run docker compose up -d n8n-worker-3.

Checking worker health

ssh ubuntu@54.36.123.110
cd /opt/bms4-services
 
# All containers running?
docker compose ps
 
# Worker logs
docker compose logs --tail=50 n8n-worker-1
docker compose logs --tail=50 n8n-worker-2
 
# Redis queue depth
docker compose exec redis redis-cli -a "${REDIS_PASSWORD}" llen bull:jobs:wait

Troubleshooting queue mode

Symptom	Cause	Fix
Workers exit immediately	`N8N_ENCRYPTION_KEY` not in shell env	Export key in host shell; `docker compose up -d` again
Executions stuck in “running”	Worker crashed mid-job	Restart workers; Redis retains job state
Redis OOM	Queue flooded	Check for runaway workflow; `docker compose restart redis` (jobs lost)
Worker can’t connect to Redis	`REDIS_PASSWORD` mismatch	Verify `.env` has matching `REDIS_PASSWORD`; restart all

Deployment

Fresh install

# On bms-4 — assumes docker-compose.yml and .env are already deployed
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n n8n-postgres

Update n8n version

Update image tag in bms-4/docker-compose.yml
Commit and push

On bms-4:

ssh ubuntu@54.36.123.110
cd /opt/bms4-services
docker compose pull n8n
docker compose up -d n8n

Verify: curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz

Import a workflow

# Via REST API (no UI needed)
curl -X POST https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows \
  -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @infra-src/n8n-workflows/wa-router.json

Backup

What needs backing up

Data	Method	Schedule	Destination
Workflow definitions	`n8n-backup.yml` GH Action → Wasabi	Nightly 03:00 UTC	`s3://p24-infra/n8n/workflows-YYYY-MM-DD.json`
n8n credentials (encrypted)	`n8n-backup.yml` → Wasabi	Nightly	`s3://p24-infra/n8n/credentials-YYYY-MM-DD.json`
PostgreSQL DB (`n8n_postgres_data`)	Not directly backed up	—	Gap — execution history only; workflows/creds covered by API export

Note: Workflows and credentials are exported via REST API nightly. If the PostgreSQL volume is lost, workflows can be re-imported but execution history will be gone. Credentials must be re-entered manually (values are never exported).

Manual backup

# Export workflows via API
curl -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows > /tmp/n8n-workflows.json
 
# Export credentials (encrypted, no values)
curl -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/credentials > /tmp/n8n-credentials.json

Restore

Target RTO: 45 minutes (includes workflow re-activation)

Scenario 1: Container crash

ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n
# n8n_postgres_data volume preserved — no data loss

Scenario 2: Fresh install (n8n_postgres_data volume lost)

# 1. Download latest workflow backup from Wasabi
aws s3 cp s3://p24-infra/n8n/workflows-YYYY-MM-DD.json /tmp/ \
  --endpoint-url https://s3.eu-central-2.wasabisys.com
 
# 2. Start fresh n8n (PostgreSQL will be initialized on first start)
ssh ubuntu@54.36.123.110
docker compose -f /opt/bms4-services/docker-compose.yml up -d n8n-postgres n8n
 
# 3. Wait for n8n to be ready
curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
 
# 4. Import workflows via API
curl -X POST https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/import \
  -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @/tmp/n8n-workflows.json
 
# 5. Re-enter credentials manually in the n8n UI (cannot be restored from export)
 
# 6. Activate workflows: set active=true for each
 
# 7. Verify ATRAX sync: check Supabase fleet_positions for new entries

Note: Credentials (API keys, passwords stored in n8n) must be re-entered manually after a fresh restore. Keep the credential list in docs/elements.md § Credentials Index.

Healthcheck

Docker healthcheck: wget http://localhost:5678/healthz — defined in bms-4/docker-compose.yml

External probe: blackbox-exporter via https://n8n.bms-4.infra.zintegrowana.online/healthz — EndpointDown alert.

Manual check:

curl -s https://n8n.bms-4.infra.zintegrowana.online/healthz
# Expected: {"status":"ok"}

Password Rotation

BMS4_N8N_API_KEY

# 1. Generate new key in n8n UI: Settings → API → Create API Key
 
# 2. Update /opt/bms4-services/.env on bms-4:
ssh ubuntu@54.36.123.110 "sed -i 's/BMS4_N8N_API_KEY=.*/BMS4_N8N_API_KEY=<new-key>/' /opt/bms4-services/.env"
 
# 3. Restart n8n to pick up new key:
ssh ubuntu@54.36.123.110 "docker compose -f /opt/bms4-services/docker-compose.yml restart n8n"
 
# 4. Update GH Secret (BMS4_N8N_API_KEY) and .env.local
 
# 5. Log rotation in docs/secrets-rotation-log.md

GPS Sync Workflow — atrax, kravag-scheduled-fleet-updates (CCx9UMdphmGficDX)

Production host: bms-4 (n8n.bms-4.infra.zintegrowana.online) — migrated from vps-h1 on 2026-06-15.
DB: PostgreSQL on bms-4 (not SQLite — see docs/n8n-postgresql-operations.md).
Status: Active, production, criticality = high.

Triggers

Name	Type	Schedule / Path	Notes
`3 minutes1`	Schedule	Every 3 min	Primary trigger — main fleet sync
`Webhook atrax sync`	Webhook	`/webhook/atrax-sync` (prod) / `/webhook-test/atrax-sync` (test)	On-demand sync
`5 morning`	Schedule	05:00 UTC daily	Disabled 2026-06-15 — was causing double executions
`When clicking 'Execute workflow'`	Manual	—	Dev/test only

What it writes

p24_gps_current_state — vehicle GPS positions (via upsert_gps_current_state node)
p24_l_cars_atrax — vehicle metadata (via update-atrax-pojazdy-w car-atrax node)
p24_l_cars — mileage via rpc/update_atrax_rt_batch; driver info via rpc/update_atrax_drivers_batch (both single batch calls per cycle — see issue #629)

Freshness is monitored by .github/workflows/atrax-data-freshness.yml (runs every 10 min).

Batch update pattern (IMPORTANT — do not revert)

All writes to p24_l_cars MUST use batch RPCs, never individual row upserts via REST API. Individual POST /rest/v1/p24_l_cars?on_conflict=... calls (one per vehicle) caused AccessExclusiveLock contention blocking all reads (issue #629, 2026-06-17).

Node	RPC called	Fields written
`update atraxId na rej floty by plate1`	`rpc/update_atrax_rt_batch`	`current_mileage`, `km_upd_date`
`batch_drivers_rpc`	`rpc/update_atrax_drivers_batch`	`atrax_driver1`, `atrax_driver1_name`, `atrax_driver2`, `atrax_driver2_name`

The aggregate_driver_updates node (Aggregate type) collapses all per-vehicle items into a single item before batch_drivers_rpc runs, ensuring exactly one HTTP call per sync cycle regardless of fleet size.

Known failure modes (discovered 2026-05-19, issue #201)

Failure	Symptom	Fix
Workflow deactivated in n8n	`active=0` in SQLite, no executions	See “Re-activating a workflow” below
`activeVersionId` NULL in SQLite	Workflow shows `active=1` but never appears in startup activation log	See “Re-activating a workflow” below
Missing `gps_atrax_installed` column on `p24_l_cars`	`supabase upsert query` node fails, GPS branch never runs	`ALTER TABLE p24_l_cars ADD COLUMN IF NOT EXISTS gps_atrax_installed boolean DEFAULT false`
Ambiguous `update_atrax_rt_batch` function overload	PGRST203 error on `update atraxId na rej floty by plate1` node	Drop old single-arg overload: `DROP FUNCTION public.update_atrax_rt_batch(vehicles jsonb)`
Missing `status_atrax` column on `p24_l_cars`	`upsert_gps_current_state` fails — trigger `sync_gps_state_to_cars` references it	`ALTER TABLE p24_l_cars ADD COLUMN IF NOT EXISTS status_atrax text`

Re-activating a workflow after deactivation

If n8n was restarted and the startup log does NOT show Activated workflow "atrax, kravag-scheduled-fleet-updates":

# SSH to bms-4 as ubuntu (n8n now on bms-4, using PostgreSQL — not SQLite)
ssh ubuntu@54.36.123.110
 
# Check active state via n8n API
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/CCx9UMdphmGficDX \
  | jq '{id,name,active}'
 
# Re-activate via API
curl -X PATCH \
  -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"active": true}' \
  https://n8n.bms-4.infra.zintegrowana.online/api/v1/workflows/CCx9UMdphmGficDX
 
# Or restart the n8n container on bms-4
docker compose -f /opt/bms4-services/docker-compose.yml restart n8n
 
# Verify: check executions appear
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  "https://n8n.bms-4.infra.zintegrowana.online/api/v1/executions?workflowId=CCx9UMdphmGficDX&limit=3" \
  | jq '[.data[] | {id,status,startedAt}]'

n8n 2.x Upgrade (2026-05-24, issue #220)

What changed

n8n was upgraded from 1.123.44 to 2.22.0 to fix two critical CVEs:

CVE	CVSS	Description	Fixed in
CVE-2025-68668	9.9	Python Code Node RCE via Pyodide	n8n 2.0.0
CVE-2026-25115	10.0	Python sandbox escape	n8n 2.4.8

Both CVEs required the Python Code Node (Pyodide-based). None of our workflows use Python — all Code nodes use JavaScript — so the real-world exposure was zero, but the CVSS scores demanded the upgrade.

Workflow compatibility audit

All workflows were audited before upgrade. All Code nodes use JavaScript (not Python):

Workflow	Code nodes	Language	2.x compatible
wa-router	2	JavaScript	Yes
wa-ai-to-inbox	3	JavaScript	Yes
wa-group-sync	1	JavaScript	Yes
wa-processing-watchdog	1	JavaScript	Yes
fleet-update-v2-batch	4	JavaScript	Yes
alertmanager-to-incidents	0	—	Yes
sentry-to-github	2	JavaScript	Yes
github-auto-trigger	1	JavaScript	Yes
hu-sp-report-email	1	JavaScript	Yes

n8n 2.x breaking changes (1.x → 2.x)

Key breaking changes in n8n 2.0.0 relevant to our stack:

Python Code Node removed — Pyodide sandbox dropped entirely. Not used in our workflows.
N8N_BASIC_AUTH_ACTIVE deprecated — use N8N_COMMUNITY_PACKAGES_ALLOW_TOOL_USAGE for community nodes. Basic auth still works for API key auth (our setup uses X-N8N-API-KEY).
SQLite schema migration — n8n 2.x auto-migrates the SQLite DB on first start. Expect ~10–30s extra startup time. Migration is non-destructive.
Execution data schema — executionData column type changed. Old executions remain readable but storage format differs.
Webhook node v2 — our workflows already use typeVersion: 2 (confirmed in audit above).
httpRequest node v4.2 — our workflows use this version (confirmed in audit above). No changes needed.
Code node v2 — our Code nodes use typeVersion: 2. Compatible with 2.x.

Deployment procedure

n8n 2.x upgrade is performed via a standard image tag bump + rolling restart:

# On bms-4 (PostgreSQL backend — no SQLite migration needed)
ssh ubuntu@54.36.123.110
cd /opt/bms4-services
docker compose pull n8n
docker compose up -d n8n
 
# Monitor startup logs for workflow activation
docker compose logs -f n8n | grep -E "Activated|error|ERROR"
 
# Verify healthz
curl -s http://localhost:5678/healthz
# Expected: {"status":"ok"}
 
# Verify all workflows are active
curl -s -H "X-N8N-API-KEY: ${BMS4_N8N_API_KEY}" \
  http://localhost:5678/api/v1/workflows | jq '[.data[] | {name,active}]'

Post-upgrade checklist

n8n container starts and /healthz returns {"status":"ok"}
All workflows show active: true in API response
wa-router webhook receives test event from WAHA
fleet-update-v2-batch (ATRAX sync) runs on schedule — check fleet_positions freshness
Trivy scan passes clean (CVE-2025-68668 and CVE-2026-25115 no longer reported)

Rollback procedure

If issues arise, roll back to previous version:

# On bms-4
ssh ubuntu@54.36.123.110
# Edit bms-4/docker-compose.yml image tag back to previous version
cd /opt/bms4-services
docker compose up -d n8n
 
# If rollback causes DB errors, restore from last Wasabi backup:
# aws s3 cp s3://p24-infra/n8n/workflows-YYYY-MM-DD.json /tmp/ \
#   --endpoint-url https://s3.eu-central-2.wasabisys.com

Troubleshooting

Symptom	Cause	Fix
GPS positions stale (>10min)	ATRAX workflow deactivated or DB schema mismatch	See “GPS Sync Workflow — Known failure modes” above; check n8n startup log
GPS positions stale (>10min)	ATRAX API unreachable	Check `get_token` node output in latest execution; test Atrax API from bms-4
WhatsApp messages not routing	wa-router webhook timeout	Check WAHA logs on vps-h1; verify webhook URL points to `n8n.bms-4`
n8n UI unreachable	Traefik or container down	`ssh ubuntu@54.36.123.110 "docker compose -f /opt/bms4-services/docker-compose.yml ps"`
`N8N_ENCRYPTION_KEY` changed	Credential vault corrupted	Must re-enter all credentials manually
n8n crashes on startup	`N8N_DB_PASSWORD` not set	Verify `N8N_DB_PASSWORD` in `/opt/bms4-services/.env` on bms-4

p24-infra Docs

Explorer

n8n-operations