Spec 02 — Centralized logs (Loki + Promtail)
Purpose
Today, debugging means SSHing to a VPS and running docker logs <container> --tail 200. There is no cross-container search, no retention beyond the per-container json-file rotation (max-size: 50m, max-file: 3 = ~150 MB tail), and no way to correlate a Prometheus alert with the log lines that caused it.
Loki is “Prometheus for logs” — labels, LogQL, integrates natively with the existing Grafana. Adding it is one extra container per VPS plus one new datasource.
Rulebook (operating rules)
- Logs are not for secrets. Configure each service to never log secrets. Promtail has no filtering for secret values; the right fix is at the source.
- Retention: 14 days hot, 30 days cold. Hot = local volume; cold = Wasabi via Loki’s S3 backend.
- Labels are immutable schema. Don’t add high-cardinality labels (user IDs, request IDs as labels). Cardinality explosions kill Loki. Use label values for things like
container_name,service,severity. - One Loki, multiple Promtails. Single Loki on
vps-i1(monitoring host); Promtail agent on every VPS that has containers.
Architecture
vps-i1 containers ─► Promtail (vps-i1) ──┐
vps-h1 containers ─► Promtail (vps-h1) ──┤
▼
┌──────┐
│ Loki │ ← vps-i1
└──┬───┘
│
backend storage
│
┌─────────────┴──────────────┐
local FS (14d) Wasabi (30d)
│
▼
Grafana datasource
"Explore" + correlation
with Prometheus metrics
Implementation plan
- Add
lokiservice tomonitoring/docker-compose.yml(single binary mode, filesystem + S3 backend). - Add
promtailtomonitoring/docker-compose.yml(scrapes local Docker socket). - On
vps-h1, addpromtailtohostinger/docker-compose.yml(ships toloki.vps-i1.infra.zintegrowana.online). - Caddy: add
loki.vps-i1.infra.zintegrowana.onlineroute, basic_auth protected (Promtail bearer token). - Grafana provisioning: add Loki datasource (
monitoring/grafana/provisioning/datasources/loki.yml). - Provision dashboard: “Container logs by service” with severity filter + freetext search.
- Add alert:
LokiIngestionStopped— no logs received from a known service in 10 min.
Acceptance criteria
- Grafana → Explore → Loki datasource returns logs for both vps-i1 and vps-h1 containers
-
{container="root-n8n-1"} |= "error"returns matches when n8n logs an error - Stopping Promtail on vps-h1 triggers
LokiIngestionStoppedalert within 15 min - Disk usage of Loki volume <2 GB after 14 days (verify retention rolls correctly)
-
docs/runbook.mdincludes “How to grep logs” recipe pointing to Grafana
Cost impact
Wasabi cold storage for 30-day overflow: ~1–2 GB/month → 0.01 €/month. Functionally free.
Back-out plan
Remove loki, promtail services from compose files; remove Loki datasource from Grafana; delete loki_data volume. No service downtime, no data loss elsewhere.
Risks / open questions
- Risk: Promtail mis-config could ship secrets to Loki. Mitigation: review service log output in PR; add
pipeline_stagesto drop lines matching secret patterns as defense-in-depth. - Q: Vector vs Promtail? A: Promtail — same vendor as Loki, simpler, sufficient for our log volume (<1 GB/day).
Bootstrap
Deployment is manual after the PR with artifacts is merged. Steps must run in order; each VPS only when the previous one is healthy.
Step 1 — Generate the shared Promtail password
On any machine (paste output into both .env files in steps 2–3):
LOKI_PROMTAIL_PASSWORD=$(openssl rand -hex 24)
echo "$LOKI_PROMTAIL_PASSWORD"Step 2 — Bcrypt-hash the password for Caddy basic_auth
docker run --rm caddy:2.8-alpine caddy hash-password --plaintext "$LOKI_PROMTAIL_PASSWORD"Open monitoring/Caddyfile and replace the placeholder {bcrypt-hash-of-LOKI_PROMTAIL_PASSWORD} (one line, inside the loki.vps-i1.infra.zintegrowana.online block) with the bcrypt output. Commit that one-line change to main (or as a follow-up PR).
Step 3 — Deploy on vps-i1 (IONOS) — Loki + local Promtail
ssh root@217.154.82.162
cd /opt/p24-infra
git pull
# Add password to .env (first time only)
grep -q LOKI_PROMTAIL_PASSWORD monitoring/.env || echo "LOKI_PROMTAIL_PASSWORD=$LOKI_PROMTAIL_PASSWORD" >> monitoring/.env
cd monitoring
docker compose up -d loki promtail-local
# Reload Caddy to pick up the new vhost
docker compose exec caddy caddy reload --config /etc/caddy/Caddyfile
docker compose ps loki promtail-localStep 4 — Deploy on vps-h1 (Hostinger) — remote Promtail
ssh root@72.60.32.61
cd /opt/p24-infra
git pull
# Add password to /root/.env (used by the root-level compose)
grep -q LOKI_PROMTAIL_PASSWORD /root/.env || echo "LOKI_PROMTAIL_PASSWORD=$LOKI_PROMTAIL_PASSWORD" >> /root/.env
# The hostinger compose mounts ./promtail relative to /root — copy the config in
mkdir -p /root/promtail
cp /opt/p24-infra/hostinger/promtail/promtail-remote.yml /root/promtail/promtail-remote.yml
cd /root
docker compose up -d promtail
docker compose logs --tail 30 promtailStep 5 — Verify ingestion via Caddy ingress
curl -G -s "https://loki.vps-i1.infra.zintegrowana.online/loki/api/v1/labels" \
-u "promtail:$LOKI_PROMTAIL_PASSWORD"Expect a JSON list including container_name, host, service, stream.
Step 6 — Verify in Grafana
- Open
https://grafana.vps-i1.infra.zintegrowana.online. - Explore → datasource Loki → run
{host="vps-i1"}→ logs appear. - Re-run with
{host="vps-h1"}→ logs from Hostinger containers appear. - Open the “Container logs” dashboard and filter by host/container/search.
If either host returns nothing, check Promtail logs on the silent VPS first — see docs/runbook.md → Alert: LokiIngestionStopped.