Report Scheduler — Operations Workbook
Covers: report_scheduler.py cron script, Prometheus metrics pipeline, and the “Report Scheduler” Grafana dashboard.
Workbook last reviewed: 2026-06-13
Overview
The report scheduler generates automated vehicle inspection reports (HU/SP/UVV daily, Tacho/Agregat weekly), uploads them to Wasabi S3, and emails them to recipients. After each run it writes Prometheus textfile metrics, which power the Grafana dashboard and Alertmanager rules.
| Property | Value |
|---|---|
| Script | infra-src/report-scheduler/report_scheduler.py |
| Cron file (repo) | monitoring/cron/report-scheduler.cron |
| Cron installed at | /etc/cron.d/report-scheduler on vps-i1 |
| Config files (on server) | /opt/p24-infra/reports/configs/{daily,weekly}/*.json |
| Grafana dashboard | monitoring/grafana/provisioning/dashboards/report-scheduler.json |
| Dashboard UID | report-scheduler-v1 |
| Alert rules | monitoring/prometheus/rules/reports.yml |
| Log file | /var/log/report-scheduler.log on vps-i1 |
Architecture
vps-i1 cron (/etc/cron.d/report-scheduler)
│
├── 02:00 UTC daily → report_scheduler.py --config przeglady-hu-sp-uvv.json
└── 05:00 UTC Sunday → report_scheduler.py --config przeglady-tacho-agregat.json
│
├── Supabase → fetch vehicle inspection records
├── pdf-service (:8100) → render Markdown → PDF
├── Wasabi S3 (ecotrans-monitoring) → upload PDF
├── Mailgun EU SMTP → email PDF to recipients
└── /var/lib/node_exporter/textfile_collector/
└── report_<name>.prom ← atomic write (tmp → rename)
│
node-exporter (:9100) textfile collector
│
Prometheus scrapes vps-i1:9100 (job: node, every 15s)
│
Thanos Query (http://thanos-query:10904)
│
Grafana → dashboard report-scheduler-v1
└── Alertmanager → email on failureReports in scope
| Report name | Inspection types | Schedule | Config file |
|---|---|---|---|
przeglady-hu-sp-uvv | HU, SP, UVV | Daily 02:00 UTC | daily/przeglady-hu-sp-uvv.json |
przeglady-tacho-agregat | Tacho, Agregat | Sunday 05:00 UTC | weekly/przeglady-tacho-agregat.json |
Prometheus Metrics
Metrics are written by write_metrics() (lines 672–706 of report_scheduler.py) regardless of run success or failure (called in finally block). The file is written atomically: .prom.tmp is written then renamed.
| Metric | Type | Labels | Meaning |
|---|---|---|---|
report_last_run_timestamp_seconds | gauge | report_name | Unix epoch of last run (success or fail) |
report_last_run_status | gauge | report_name | 1 = success, 0 = failure |
report_generation_duration_seconds | gauge | report_name | Wall-clock seconds to generate the report |
Textfile location: /var/lib/node_exporter/textfile_collector/
File names: report_przeglady_hu_sp_uvv.prom, report_przeglady_tacho_agregat.prom
Scrape path: node-exporter job node in prometheus/prometheus.yml → 217.154.82.162:9100
Grafana Dashboard
Dashboard file: report-scheduler.json
Dashboard URL: https://grafana.vps-i1.infra.zintegrowana.online/d/report-scheduler-v1/report-scheduler
Panels
| Panel | Type | Query | What to look for |
|---|---|---|---|
| Status ostatniego uruchomienia | Stat | report_last_run_status | Green (OK) = last run succeeded; Red (FAIL) = failure — check logs immediately |
| Czas od ostatniego uruchomienia | Stat | time() - report_last_run_timestamp_seconds | Green < 26h; Yellow ≥ 24h; Red ≥ 26h — if red and status OK, script ran but Prometheus lost the metric |
| Czas generowania raportu | Timeseries | report_generation_duration_seconds | Baseline ~1.5 s; spikes > 5 s suggest pdf-service slowness or Supabase latency |
| Historia uruchomień | Table | 3 metrics merged | Snapshot of last known state per report — useful after an alert to confirm timestamps |
Dashboard variable
DS_PROMETHEUS — auto-selects the default Prometheus datasource (Thanos Query at http://thanos-query:10904). No manual selection needed; it resolves on dashboard load.
Time range & refresh
Default view: Last 24 hours, refresh every 5 minutes. Extend to 7 days to inspect the timeseries panel trend.
Alert Rules
File: reports.yml
| Alert | Fires when | Grace | Severity |
|---|---|---|---|
ReportNotGenerated | przeglady-hu-sp-uvv not run in 26 h (or metric absent) | for: 1h | warning |
WeeklyReportNotGenerated | przeglady-tacho-agregat not run in 8 days (or metric absent) | for: 1h | warning |
ReportRunFailed | Any report status == 0 | for: 5m | warning |
All three alerts point to /var/log/report-scheduler.log on vps-i1 for investigation.
Config Management
| File | Location on server | In repo? | Contains secrets? |
|---|---|---|---|
report_scheduler.py | /opt/p24-infra/infra-src/report-scheduler/report_scheduler.py | ✅ | No |
requirements.txt | /opt/p24-infra/infra-src/report-scheduler/requirements.txt | ✅ | No |
report-scheduler.cron | /etc/cron.d/report-scheduler (installed from repo) | ✅ monitoring/cron/ | No |
przeglady-hu-sp-uvv.json | /opt/p24-infra/reports/configs/daily/ | ❌ server-only | No (but contains email addresses) |
przeglady-tacho-agregat.json | /opt/p24-infra/reports/configs/weekly/ | ❌ server-only | No |
.env | /opt/p24-infra/monitoring/.env | ❌ | Yes — all secrets |
Required environment variables (key names only)
| Key | Purpose |
|---|---|
SUPABASE_URL | Supabase project URL |
SUPABASE_SERVICE_KEY | Service-role key for inspection data queries |
WASABI_ACCESS_KEY / WASABI_SECRET_KEY | Upload PDFs to ecotrans-monitoring bucket |
PDF_SERVICE_URL / REPORT_PDF_API_KEY | pdf-service at :8100 |
EMAIL_SENDER_URL / EMAIL_SENDER_API_KEY | Mailgun EU relay |
METRICS_DIR | Override textfile path (default: /var/lib/node_exporter/textfile_collector) |
Secret injection: sourced directly by cron via set -a && source /opt/p24-infra/monitoring/.env && set +a.
Deployment
Install / reinstall cron
ssh root@217.154.82.162
cp /opt/p24-infra/monitoring/cron/report-scheduler.cron /etc/cron.d/report-scheduler
chmod 644 /etc/cron.d/report-scheduler
# Verify cron picked it up:
crontab -l -u root # cron.d entries are picked up automatically by crondUpdate script only
ssh root@217.154.82.162 "cd /opt/p24-infra && git pull"
# No restart needed — cron launches a fresh process each runAdd or change a report config
- Create/edit the JSON config on vps-i1 under
/opt/p24-infra/reports/configs/{daily|weekly}/ - Add a cron entry to
monitoring/cron/report-scheduler.cron(in repo), then reinstall on server - Update the list in Reports in scope section above
- If this is a new
report_name, the Prometheus panels auto-display the new label — no dashboard JSON change required
Run manually
ssh root@217.154.82.162
set -a && source /opt/p24-infra/monitoring/.env && set +a
python3 /opt/p24-infra/infra-src/report-scheduler/report_scheduler.py \
--config /opt/p24-infra/reports/configs/daily/przeglady-hu-sp-uvv.json
# Add --date YYYY-MM-DD to backfill a specific dateDiagnostics
Dashboard shows FAIL
ssh root@217.154.82.162
tail -100 /var/log/report-scheduler.log
# Look for: ERROR, exception tracebacks, "status=0"Dashboard shows metric absent / no data
# Check textfile exists and is recent:
ls -la /var/lib/node_exporter/textfile_collector/report_*.prom
cat /var/lib/node_exporter/textfile_collector/report_przeglady_hu_sp_uvv.prom
# Check node-exporter is serving it:
curl -s http://localhost:9100/metrics | grep report_last_run
# Check Prometheus scraped it:
# Open https://prometheus.vps-i1.infra.zintegrowana.online
# Query: report_last_run_statusTime since last run is red (> 26h) but status is green
The metric was last written more than 26 h ago. Either:
- The cron job didn’t run (check
grep CRON /var/log/cron) - The textfile was deleted or permissions changed
- node-exporter stopped reading the textfile directory
grep CRON /var/log/cron | grep report-scheduler | tail -20
systemctl status node_exporter
ls -la /var/lib/node_exporter/textfile_collector/Report generation time spike
# Check pdf-service health:
curl -s http://localhost:8100/health
# Check Supabase latency:
tail -50 /var/log/report-scheduler.log | grep "Supabase\|duration"Upgrade
The script has no Docker container — it runs directly under Python 3 on vps-i1.
Update Python dependencies
ssh root@217.154.82.162
cd /opt/p24-infra && git pull
pip3 install -r infra-src/report-scheduler/requirements.txt --upgrade
# Test:
python3 infra-src/report-scheduler/report_scheduler.py --helpModify a panel query or add a panel
- Edit
monitoring/grafana/provisioning/dashboards/report-scheduler.jsonlocally - Commit and push to
dev→ merge tomain - On vps-i1:
git pull— Grafana reloads provisioned dashboards automatically (no restart)
Add a new alert rule
- Edit
monitoring/prometheus/rules/reports.yml - Commit and push
- On vps-i1:
git pull && curl -X POST http://localhost:9090/-/reload
Backup
No stateful data belongs to this component — all output is in Wasabi S3 (bucket ecotrans-monitoring, path reports/) and email inboxes. The .prom textfile is ephemeral (regenerated each run).
Config JSON files on vps-i1 are not in the repo; back them up manually if changed:
scp -r root@217.154.82.162:/opt/p24-infra/reports/configs ./reports-configs-backup-$(date +%Y%m%d)Monitoring & Alerts
| What to watch | Where | Threshold |
|---|---|---|
| Last run status panel | Grafana dashboard | Red = immediate action |
| Time since last run panel | Grafana dashboard | Yellow > 24h, Red > 26h |
ReportRunFailed alert | Alertmanager → email | Act within 1h |
ReportNotGenerated alert | Alertmanager → email | Act within 1h |
| Cron log | /var/log/report-scheduler.log on vps-i1 | Tail after any alert |
Known Limitations
| Requirement | Status | Reason | Compensating control |
|---|---|---|---|
| Report config files in repo | No | Contains recipient email lists managed outside git | Manual backup via scp documented above |
| Retry on transient failure | No | Script exits on first error; cron does not retry | Alertmanager fires after 5 min; on-call re-runs manually |