Spec 11 — Cost tracking dashboard

Purpose

We pay: IONOS (~6€), Hostinger (~5€), OVH bms-1–4 (~250€), Cloudflare (0€), Wasabi (≤1€), Vercel Hobby (0€), Supabase Pro (25€), Anthropic Claude Max 20× (~185€). No dashboard, no alerts, no historical trend. A surprise bill (Vercel function spike, Supabase row count, Wasabi explosion) takes 30+ days to discover via email.

A tiny Python exporter pulls billing/usage APIs nightly → Prometheus → Grafana panel + alert on anomalies.


Rulebook

  1. Daily snapshot, not real-time. Most billing APIs rate-limit and don’t update by the minute. Once a day is fine.
  2. Alert thresholds are absolute, not relative. “Vercel monthly invocations > 80% of free tier” beats “spiked 200% over yesterday” (yesterday could have been zero).
  3. No credentials with billing-write scope. Read-only API keys only.

Architecture

cost-exporter (Python)
  ├── Vercel API     → invocations, bandwidth, function GB-h
  ├── Supabase API   → DB size, row counts, edge function invocations
  ├── Wasabi API     → bucket bytes, request counts
  └── Anthropic API  → token spend (if exposed; otherwise manual monthly entry)
        │
        ▼
   Prometheus scrape (port 9210)
        │
        ▼
   Grafana dashboard "Costs"
   Alert: BudgetWarning (>80% of monthly cap on free-tier services)

Implementation plan

  1. Scaffold monitoring/exporters/cost-exporter/ (Python + FastAPI + prometheus_client), modelled on queue-exporter.
  2. Implement one collector per provider (Vercel, Supabase, Wasabi). Defer Anthropic until they expose usage API.
  3. Add to monitoring/docker-compose.yml.
  4. Add Prometheus scrape config.
  5. Create dashboard monitoring/grafana/provisioning/dashboards/costs.json.
  6. Define alerts in monitoring/prometheus/rules/costs.yml.

Acceptance criteria

  • Grafana “Costs” dashboard renders with current-month numbers for Vercel, Supabase, Wasabi
  • Setting Vercel free-tier threshold to 1 and exceeding it triggers BudgetWarning alert
  • Exporter handles API errors gracefully (no crashes, exports cost_collector_errors_total instead)

Cost impact

0 €.

Back-out plan

Remove exporter from compose, drop dashboard JSON, remove alert rules.

Risks / open questions

  • Q: Can we hit per-resource granularity (per-project Vercel costs)? A: Yes — Vercel’s API gives per-project breakdowns. Add later.

Bootstrap

Once this PR is merged, deploy on vps-i1 with the following steps. The artifact PR ships configs only — no live API tokens are committed.

  1. Create scoped, read-only API tokens:
    • Vercel: https://vercel.com/account/tokens → scope: read. Name it cost-exporter-vps-i1.
    • Supabase: Dashboard → Account → Access Tokens → new readonly token. Name it cost-exporter-vps-i1.
    • Wasabi: IAM → create user cost-exporter-readonly with a policy granting only s3:ListBucket, s3:GetBucketLocation, and s3:HeadObject on resources arn:aws:s3:::ecotrans-monitoring + arn:aws:s3:::ecotrans-backups (and their /* child paths).
  2. Add to monitoring/.env on vps-i1 (do not commit):
    VERCEL_API_TOKEN=...
    SUPABASE_ACCESS_TOKEN=...
    SUPABASE_PROJECT_REF=mwkqmgadqnkkihjdeqsi
    WASABI_ACCESS_KEY=...
    WASABI_SECRET_KEY=...
  3. Pull the new compose config and bring the service up:
    cd /opt/p24-infra && git pull
    cd monitoring && docker compose up -d cost-exporter
  4. Reload Prometheus so the new scrape job is picked up:
    curl -X POST http://localhost:9090/-/reload
  5. Verify Prometheus targets page shows cost-exporter as UP: https://prometheus.vps-i1.infra.zintegrowana.online/targets
  6. The first collection runs at container startup; you can also force one manually:
    curl -X POST http://localhost:9210/refresh
    After that, the Costs dashboard in Grafana populates (default cadence is daily — bump COST_REFRESH_INTERVAL_S if you want it faster during initial tuning).
  7. Tune the alert thresholds in monitoring/prometheus/rules/costs.yml to your actual monthly caps if you’re not on free tiers, then curl -X POST http://localhost:9090/-/reload again.

Rollback

cd /opt/p24-infra/monitoring && docker compose stop cost-exporter && docker compose rm -f cost-exporter
# Remove or comment the cost-exporter scrape job + costs.yml alert rule, then reload.