Audit Engine — Operations Workbook

The audit engine is a FastAPI service running on Hostinger vps-h1 (:8200). It executes scheduled audit actions stored in audit.actions, records results in audit.runs, and exposes Prometheus metrics scraped by the IONOS monitoring stack.

Repo path: audit-engine/
Compose file: hostinger/docker-compose.ymlaudit-engine service
Prometheus target: 72.60.32.61:8200 (job: audit_engine)


Architecture

Supabase (audit schema)
  audit.projects   — project registry (one row per repo/product)
  audit.actions    — scheduled actions (cron, action_type, status)
  audit.runs       — execution history (pass/fail/error + output)
  audit.workbooks  — AI-designed workbook specs (for ai_workbook actions)
 
Hostinger vps-h1
  audit-engine container
    scheduler.py   — APScheduler, syncs jobs from audit.actions every 5 min
    actions/       — one .py file per action_type
    db.py          — Supabase client (service_role key)
    metrics.py     — Prometheus counters/gauges

Action Types

action_typeFileWhat it does
ping_checkactions/ping_check.pyHTTP GET a URL, pass=200, fail=other
ai_workbookactions/ai_workbook.pyExecutes an AI-designed connector workbook
infra_docs_checkactions/infra_docs_check.pyFlags active p24-infra elements missing workbooks

Managing Actions

Actions are rows in audit.actions. The scheduler re-syncs every 5 minutes — no restart needed.

View all actions

SELECT id, name, schedule, status, description FROM audit.actions ORDER BY name;

Add a new action

INSERT INTO audit.actions (project_id, name, schedule, description, status)
SELECT id, 'my-check', '0 10 * * *', 'What it does', 'active'
FROM audit.projects WHERE name = 'p24-infra';

Cron syntax: standard 5-field (minute hour dom month dow). Examples:

  • 0 9 * * * — daily 09:00 UTC
  • */15 * * * * — every 15 minutes
  • 0 9 * * 1 — every Monday 09:00

Pause / resume an action

UPDATE audit.actions SET status = 'paused'  WHERE name = 'my-check';
UPDATE audit.actions SET status = 'active'  WHERE name = 'my-check';

Add a new action_type

  1. Create audit-engine/actions/<type>.py with a run(action: dict) -> None function
  2. Add the handler to ACTION_HANDLERS in audit-engine/scheduler.py
  3. SFTP both files to /opt/p24-infra/audit-engine/ on vps-h1 and rebuild:
    cd /root && docker compose up -d --no-deps --build audit-engine

infra_docs_check — Documentation Standard Enforcement

Runs daily at 09:00 UTC. Queries dev_r_services for active p24-infra elements where compliance_workbook != 'yes' and reports gaps.

Pass: all active elements have a documented workbook
Fail: lists the undocumented elements by name, type, and host

What to do on a fail

For each element in the fail output:

  1. Write or update the ops doc (e.g. docs/<service>-operations.md or add a section to docs/monitoring-stack-operations.md)
  2. Update dev_r_services:
    UPDATE dev_r_services
    SET compliance_workbook = 'yes',
        workbook_url = 'docs/<service>-operations.md',
        compliance_notes = 'brief notes',
        updated_at = now()
    WHERE service_name = '<name>' AND project_id = 'p24-infra';
  3. The next daily run will pass for that element

How the standard is enforced (three layers)

LayerMechanismWhere
Agent rule## Documentation Standard — MANDATORY in CLAUDE.mdApplies to all Claude agents
PR gateChecklist in .github/pull_request_template.mdApplies to all PRs
Automated auditThis action, daily 09:00Catches anything that slipped through

Deployment

# Rebuild after code change (from local, via SFTP + SSH)
# or on vps-h1:
cd /root
docker compose up -d --no-deps --build audit-engine
docker logs --tail=30 root-audit-engine-1

Healthcheck

The container has a /ping healthcheck (30s interval). Prometheus scrapes /metrics every 60s.

curl https://72.60.32.61:8200/ping      # → {"status":"ok"}
curl https://72.60.32.61:8200/metrics   # → Prometheus text

Troubleshooting

SymptomCauseFix
Action never runsschedule column empty or action status != activeCheck audit.actions; verify cron is valid 5-field
No handler for action_type=X in logsHandler not registered in scheduler.pyAdd to ACTION_HANDLERS dict and redeploy
infra_docs_check always failsElements genuinely undocumentedUpdate dev_r_services and add ops docs
Startup crash column actions.enabled does not existOld code reading enabled columnFixed 2026-05-14: filter is now status = active