Report Scheduler — Operations Workbook

Covers: report_scheduler.py cron script, Prometheus metrics pipeline, and the “Report Scheduler” Grafana dashboard.

Workbook last reviewed: 2026-06-13

Overview

The report scheduler generates automated vehicle inspection reports (HU/SP/UVV daily, Tacho/Agregat weekly), uploads them to Wasabi S3, and emails them to recipients. After each run it writes Prometheus textfile metrics, which power the Grafana dashboard and Alertmanager rules.

Property	Value
Script	`infra-src/report-scheduler/report_scheduler.py`
Cron file (repo)	`monitoring/cron/report-scheduler.cron`
Cron installed at	`/etc/cron.d/report-scheduler` on vps-i1
Config files (on server)	`/opt/p24-infra/reports/configs/{daily,weekly}/*.json`
Grafana dashboard	`monitoring/grafana/provisioning/dashboards/report-scheduler.json`
Dashboard UID	`report-scheduler-v1`
Alert rules	`monitoring/prometheus/rules/reports.yml`
Log file	`/var/log/report-scheduler.log` on vps-i1

Architecture

vps-i1 cron (/etc/cron.d/report-scheduler)
    │
    ├── 02:00 UTC daily   → report_scheduler.py --config przeglady-hu-sp-uvv.json
    └── 05:00 UTC Sunday  → report_scheduler.py --config przeglady-tacho-agregat.json
              │
              ├── Supabase → fetch vehicle inspection records
              ├── pdf-service (:8100) → render Markdown → PDF
              ├── Wasabi S3 (ecotrans-monitoring) → upload PDF
              ├── Mailgun EU SMTP → email PDF to recipients
              └── /var/lib/node_exporter/textfile_collector/
                      └── report_<name>.prom  ← atomic write (tmp → rename)
                              │
                              node-exporter (:9100) textfile collector
                              │
                              Prometheus scrapes vps-i1:9100 (job: node, every 15s)
                              │
                              Thanos Query (http://thanos-query:10904)
                              │
                              Grafana → dashboard report-scheduler-v1
                                      └── Alertmanager → email on failure

Reports in scope

Report name	Inspection types	Schedule	Config file
`przeglady-hu-sp-uvv`	HU, SP, UVV	Daily 02:00 UTC	`daily/przeglady-hu-sp-uvv.json`
`przeglady-tacho-agregat`	Tacho, Agregat	Sunday 05:00 UTC	`weekly/przeglady-tacho-agregat.json`

Prometheus Metrics

Metrics are written by write_metrics() (lines 672–706 of report_scheduler.py) regardless of run success or failure (called in finally block). The file is written atomically: .prom.tmp is written then renamed.

Metric	Type	Labels	Meaning
`report_last_run_timestamp_seconds`	gauge	`report_name`	Unix epoch of last run (success or fail)
`report_last_run_status`	gauge	`report_name`	`1` = success, `0` = failure
`report_generation_duration_seconds`	gauge	`report_name`	Wall-clock seconds to generate the report

Textfile location: /var/lib/node_exporter/textfile_collector/
File names: report_przeglady_hu_sp_uvv.prom, report_przeglady_tacho_agregat.prom

Scrape path: node-exporter job node in prometheus/prometheus.yml → 217.154.82.162:9100

Grafana Dashboard

Dashboard file: report-scheduler.json

Dashboard URL: https://grafana.vps-i1.infra.zintegrowana.online/d/report-scheduler-v1/report-scheduler

Panels

Panel	Type	Query	What to look for
Status ostatniego uruchomienia	Stat	`report_last_run_status`	Green (OK) = last run succeeded; Red (FAIL) = failure — check logs immediately
Czas od ostatniego uruchomienia	Stat	`time() - report_last_run_timestamp_seconds`	Green < 26h; Yellow ≥ 24h; Red ≥ 26h — if red and status OK, script ran but Prometheus lost the metric
Czas generowania raportu	Timeseries	`report_generation_duration_seconds`	Baseline ~1.5 s; spikes > 5 s suggest pdf-service slowness or Supabase latency
Historia uruchomień	Table	3 metrics merged	Snapshot of last known state per report — useful after an alert to confirm timestamps

Dashboard variable

DS_PROMETHEUS — auto-selects the default Prometheus datasource (Thanos Query at http://thanos-query:10904). No manual selection needed; it resolves on dashboard load.

Time range & refresh

Default view: Last 24 hours, refresh every 5 minutes. Extend to 7 days to inspect the timeseries panel trend.

Alert Rules

File: reports.yml

Alert	Fires when	Grace	Severity
`ReportNotGenerated`	`przeglady-hu-sp-uvv` not run in 26 h (or metric absent)	`for: 1h`	warning
`WeeklyReportNotGenerated`	`przeglady-tacho-agregat` not run in 8 days (or metric absent)	`for: 1h`	warning
`ReportRunFailed`	Any report `status == 0`	`for: 5m`	warning

All three alerts point to /var/log/report-scheduler.log on vps-i1 for investigation.

Config Management

File	Location on server	In repo?	Contains secrets?
`report_scheduler.py`	`/opt/p24-infra/infra-src/report-scheduler/report_scheduler.py`	✅	No
`requirements.txt`	`/opt/p24-infra/infra-src/report-scheduler/requirements.txt`	✅	No
`report-scheduler.cron`	`/etc/cron.d/report-scheduler` (installed from repo)	✅ `monitoring/cron/`	No
`przeglady-hu-sp-uvv.json`	`/opt/p24-infra/reports/configs/daily/`	❌ server-only	No (but contains email addresses)
`przeglady-tacho-agregat.json`	`/opt/p24-infra/reports/configs/weekly/`	❌ server-only	No
`.env`	`/opt/p24-infra/monitoring/.env`	❌	Yes — all secrets

Required environment variables (key names only)

Key	Purpose
`SUPABASE_URL`	Supabase project URL
`SUPABASE_SERVICE_KEY`	Service-role key for inspection data queries
`WASABI_ACCESS_KEY` / `WASABI_SECRET_KEY`	Upload PDFs to `ecotrans-monitoring` bucket
`PDF_SERVICE_URL` / `REPORT_PDF_API_KEY`	pdf-service at `:8100`
`EMAIL_SENDER_URL` / `EMAIL_SENDER_API_KEY`	Mailgun EU relay
`METRICS_DIR`	Override textfile path (default: `/var/lib/node_exporter/textfile_collector`)

Secret injection: sourced directly by cron via set -a && source /opt/p24-infra/monitoring/.env && set +a.

Deployment

Install / reinstall cron

ssh root@217.154.82.162
cp /opt/p24-infra/monitoring/cron/report-scheduler.cron /etc/cron.d/report-scheduler
chmod 644 /etc/cron.d/report-scheduler
# Verify cron picked it up:
crontab -l -u root   # cron.d entries are picked up automatically by crond

Update script only

ssh root@217.154.82.162 "cd /opt/p24-infra && git pull"
# No restart needed — cron launches a fresh process each run

Add or change a report config

Create/edit the JSON config on vps-i1 under /opt/p24-infra/reports/configs/{daily|weekly}/
Add a cron entry to monitoring/cron/report-scheduler.cron (in repo), then reinstall on server
Update the list in Reports in scope section above
If this is a new report_name, the Prometheus panels auto-display the new label — no dashboard JSON change required

Run manually

ssh root@217.154.82.162
set -a && source /opt/p24-infra/monitoring/.env && set +a
python3 /opt/p24-infra/infra-src/report-scheduler/report_scheduler.py \
  --config /opt/p24-infra/reports/configs/daily/przeglady-hu-sp-uvv.json
# Add --date YYYY-MM-DD to backfill a specific date

Diagnostics

Dashboard shows FAIL

ssh root@217.154.82.162
tail -100 /var/log/report-scheduler.log
# Look for: ERROR, exception tracebacks, "status=0"

Dashboard shows metric absent / no data

# Check textfile exists and is recent:
ls -la /var/lib/node_exporter/textfile_collector/report_*.prom
cat /var/lib/node_exporter/textfile_collector/report_przeglady_hu_sp_uvv.prom
 
# Check node-exporter is serving it:
curl -s http://localhost:9100/metrics | grep report_last_run
 
# Check Prometheus scraped it:
# Open https://prometheus.vps-i1.infra.zintegrowana.online
# Query: report_last_run_status

Time since last run is red (> 26h) but status is green

The metric was last written more than 26 h ago. Either:

The cron job didn’t run (check grep CRON /var/log/cron)
The textfile was deleted or permissions changed
node-exporter stopped reading the textfile directory

grep CRON /var/log/cron | grep report-scheduler | tail -20
systemctl status node_exporter
ls -la /var/lib/node_exporter/textfile_collector/

Report generation time spike

# Check pdf-service health:
curl -s http://localhost:8100/health
 
# Check Supabase latency:
tail -50 /var/log/report-scheduler.log | grep "Supabase\|duration"

Upgrade

The script has no Docker container — it runs directly under Python 3 on vps-i1.

Update Python dependencies

ssh root@217.154.82.162
cd /opt/p24-infra && git pull
pip3 install -r infra-src/report-scheduler/requirements.txt --upgrade
# Test:
python3 infra-src/report-scheduler/report_scheduler.py --help

Modify a panel query or add a panel

Edit monitoring/grafana/provisioning/dashboards/report-scheduler.json locally
Commit and push to dev → merge to main
On vps-i1: git pull — Grafana reloads provisioned dashboards automatically (no restart)

Add a new alert rule

Edit monitoring/prometheus/rules/reports.yml
Commit and push
On vps-i1: git pull && curl -X POST http://localhost:9090/-/reload

Backup

No stateful data belongs to this component — all output is in Wasabi S3 (bucket ecotrans-monitoring, path reports/) and email inboxes. The .prom textfile is ephemeral (regenerated each run).

Config JSON files on vps-i1 are not in the repo; back them up manually if changed:

scp -r root@217.154.82.162:/opt/p24-infra/reports/configs ./reports-configs-backup-$(date +%Y%m%d)

Monitoring & Alerts

What to watch	Where	Threshold
Last run status panel	Grafana dashboard	Red = immediate action
Time since last run panel	Grafana dashboard	Yellow > 24h, Red > 26h
`ReportRunFailed` alert	Alertmanager → email	Act within 1h
`ReportNotGenerated` alert	Alertmanager → email	Act within 1h
Cron log	`/var/log/report-scheduler.log` on vps-i1	Tail after any alert

Known Limitations

Requirement	Status	Reason	Compensating control
Report config files in repo	No	Contains recipient email lists managed outside git	Manual backup via `scp` documented above
Retry on transient failure	No	Script exits on first error; cron does not retry	Alertmanager fires after 5 min; on-call re-runs manually

p24-infra Docs

Explorer

report-scheduler-operations