Audit Engine — Operations Workbook
The audit engine is a FastAPI service running on Hostinger vps-h1 (:8200). It executes scheduled audit actions stored in audit.actions, records results in audit.runs, and exposes Prometheus metrics scraped by the IONOS monitoring stack.
Repo path: audit-engine/
Compose file: hostinger/docker-compose.yml → audit-engine service
Prometheus target: 72.60.32.61:8200 (job: audit_engine)
Architecture
Supabase (audit schema)
audit.projects — project registry (one row per repo/product)
audit.actions — scheduled actions (cron, action_type, status)
audit.runs — execution history (pass/fail/error + output)
audit.workbooks — AI-designed workbook specs (for ai_workbook actions)
Hostinger vps-h1
audit-engine container
scheduler.py — APScheduler, syncs jobs from audit.actions every 5 min
actions/ — one .py file per action_type
db.py — Supabase client (service_role key)
metrics.py — Prometheus counters/gaugesAction Types
| action_type | File | What it does |
|---|---|---|
ping_check | actions/ping_check.py | HTTP GET a URL, pass=200, fail=other |
ai_workbook | actions/ai_workbook.py | Executes an AI-designed connector workbook |
infra_docs_check | actions/infra_docs_check.py | Flags active p24-infra elements missing workbooks |
Managing Actions
Actions are rows in audit.actions. The scheduler re-syncs every 5 minutes — no restart needed.
View all actions
SELECT id, name, schedule, status, description FROM audit.actions ORDER BY name;Add a new action
INSERT INTO audit.actions (project_id, name, schedule, description, status)
SELECT id, 'my-check', '0 10 * * *', 'What it does', 'active'
FROM audit.projects WHERE name = 'p24-infra';Cron syntax: standard 5-field (minute hour dom month dow). Examples:
0 9 * * *— daily 09:00 UTC*/15 * * * *— every 15 minutes0 9 * * 1— every Monday 09:00
Pause / resume an action
UPDATE audit.actions SET status = 'paused' WHERE name = 'my-check';
UPDATE audit.actions SET status = 'active' WHERE name = 'my-check';Add a new action_type
- Create
audit-engine/actions/<type>.pywith arun(action: dict) -> Nonefunction - Add the handler to
ACTION_HANDLERSinaudit-engine/scheduler.py - SFTP both files to
/opt/p24-infra/audit-engine/on vps-h1 and rebuild:cd /root && docker compose up -d --no-deps --build audit-engine
infra_docs_check — Documentation Standard Enforcement
Runs daily at 09:00 UTC. Queries dev_r_services for active p24-infra elements where compliance_workbook != 'yes' and reports gaps.
Pass: all active elements have a documented workbook
Fail: lists the undocumented elements by name, type, and host
What to do on a fail
For each element in the fail output:
- Write or update the ops doc (e.g.
docs/<service>-operations.mdor add a section todocs/monitoring-stack-operations.md) - Update
dev_r_services:UPDATE dev_r_services SET compliance_workbook = 'yes', workbook_url = 'docs/<service>-operations.md', compliance_notes = 'brief notes', updated_at = now() WHERE service_name = '<name>' AND project_id = 'p24-infra'; - The next daily run will pass for that element
How the standard is enforced (three layers)
| Layer | Mechanism | Where |
|---|---|---|
| Agent rule | ## Documentation Standard — MANDATORY in CLAUDE.md | Applies to all Claude agents |
| PR gate | Checklist in .github/pull_request_template.md | Applies to all PRs |
| Automated audit | This action, daily 09:00 | Catches anything that slipped through |
Deployment
# Rebuild after code change (from local, via SFTP + SSH)
# or on vps-h1:
cd /root
docker compose up -d --no-deps --build audit-engine
docker logs --tail=30 root-audit-engine-1Healthcheck
The container has a /ping healthcheck (30s interval). Prometheus scrapes /metrics every 60s.
curl https://72.60.32.61:8200/ping # → {"status":"ok"}
curl https://72.60.32.61:8200/metrics # → Prometheus textTroubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Action never runs | schedule column empty or action status != active | Check audit.actions; verify cron is valid 5-field |
No handler for action_type=X in logs | Handler not registered in scheduler.py | Add to ACTION_HANDLERS dict and redeploy |
infra_docs_check always fails | Elements genuinely undocumented | Update dev_r_services and add ops docs |
Startup crash column actions.enabled does not exist | Old code reading enabled column | Fixed 2026-05-14: filter is now status = active |