Spec 11 — Cost tracking dashboard

Purpose

We pay: IONOS (~6€), Hostinger (~5€), OVH bms-1–4 (~250€), Cloudflare (0€), Wasabi (≤1€), Vercel Hobby (0€), Supabase Pro (25€), Anthropic Claude Max 20× (~185€). No dashboard, no alerts, no historical trend. A surprise bill (Vercel function spike, Supabase row count, Wasabi explosion) takes 30+ days to discover via email.

A tiny Python exporter pulls billing/usage APIs nightly → Prometheus → Grafana panel + alert on anomalies.

Rulebook

Daily snapshot, not real-time. Most billing APIs rate-limit and don’t update by the minute. Once a day is fine.
Alert thresholds are absolute, not relative. “Vercel monthly invocations > 80% of free tier” beats “spiked 200% over yesterday” (yesterday could have been zero).
No credentials with billing-write scope. Read-only API keys only.

Architecture

cost-exporter (Python)
  ├── Vercel API     → invocations, bandwidth, function GB-h
  ├── Supabase API   → DB size, row counts, edge function invocations
  ├── Wasabi API     → bucket bytes, request counts
  └── Anthropic API  → token spend (if exposed; otherwise manual monthly entry)
        │
        ▼
   Prometheus scrape (port 9210)
        │
        ▼
   Grafana dashboard "Costs"
   Alert: BudgetWarning (>80% of monthly cap on free-tier services)

Implementation plan

Scaffold monitoring/exporters/cost-exporter/ (Python + FastAPI + prometheus_client), modelled on queue-exporter.
Implement one collector per provider (Vercel, Supabase, Wasabi). Defer Anthropic until they expose usage API.
Add to monitoring/docker-compose.yml.
Add Prometheus scrape config.
Create dashboard monitoring/grafana/provisioning/dashboards/costs.json.
Define alerts in monitoring/prometheus/rules/costs.yml.

Acceptance criteria

Grafana “Costs” dashboard renders with current-month numbers for Vercel, Supabase, Wasabi
Setting Vercel free-tier threshold to 1 and exceeding it triggers BudgetWarning alert
Exporter handles API errors gracefully (no crashes, exports cost_collector_errors_total instead)

Cost impact

0 €.

Back-out plan

Remove exporter from compose, drop dashboard JSON, remove alert rules.

Risks / open questions

Q: Can we hit per-resource granularity (per-project Vercel costs)? A: Yes — Vercel’s API gives per-project breakdowns. Add later.

Bootstrap

Once this PR is merged, deploy on vps-i1 with the following steps. The artifact PR ships configs only — no live API tokens are committed.

Create scoped, read-only API tokens:
- Vercel: https://vercel.com/account/tokens → scope: read. Name it cost-exporter-vps-i1.
- Supabase: Dashboard → Account → Access Tokens → new readonly token. Name it cost-exporter-vps-i1.
- Wasabi: IAM → create user cost-exporter-readonly with a policy granting only s3:ListBucket, s3:GetBucketLocation, and s3:HeadObject on resources arn:aws:s3:::ecotrans-monitoring + arn:aws:s3:::ecotrans-backups (and their /* child paths).

Add to monitoring/.env on vps-i1 (do not commit):

VERCEL_API_TOKEN=...
SUPABASE_ACCESS_TOKEN=...
SUPABASE_PROJECT_REF=mwkqmgadqnkkihjdeqsi
WASABI_ACCESS_KEY=...
WASABI_SECRET_KEY=...

Pull the new compose config and bring the service up:

cd /opt/p24-infra && git pull
cd monitoring && docker compose up -d cost-exporter

Reload Prometheus so the new scrape job is picked up:
```
curl -X POST http://localhost:9090/-/reload
```
Verify Prometheus targets page shows cost-exporter as UP: https://prometheus.vps-i1.infra.zintegrowana.online/targets
The first collection runs at container startup; you can also force one manually:
```
curl -X POST http://localhost:9210/refresh
```
After that, the Costs dashboard in Grafana populates (default cadence is daily — bump COST_REFRESH_INTERVAL_S if you want it faster during initial tuning).
Tune the alert thresholds in monitoring/prometheus/rules/costs.yml to your actual monthly caps if you’re not on free tiers, then curl -X POST http://localhost:9090/-/reload again.

Rollback

cd /opt/p24-infra/monitoring && docker compose stop cost-exporter && docker compose rm -f cost-exporter
# Remove or comment the cost-exporter scrape job + costs.yml alert rule, then reload.

p24-infra Docs

Explorer

Cost tracking dashboard