Report Scheduler — Operations Workbook

Covers: report_scheduler.py cron script, Prometheus metrics pipeline, and the “Report Scheduler” Grafana dashboard.

Workbook last reviewed: 2026-06-13


Overview

The report scheduler generates automated vehicle inspection reports (HU/SP/UVV daily, Tacho/Agregat weekly), uploads them to Wasabi S3, and emails them to recipients. After each run it writes Prometheus textfile metrics, which power the Grafana dashboard and Alertmanager rules.

PropertyValue
Scriptinfra-src/report-scheduler/report_scheduler.py
Cron file (repo)monitoring/cron/report-scheduler.cron
Cron installed at/etc/cron.d/report-scheduler on vps-i1
Config files (on server)/opt/p24-infra/reports/configs/{daily,weekly}/*.json
Grafana dashboardmonitoring/grafana/provisioning/dashboards/report-scheduler.json
Dashboard UIDreport-scheduler-v1
Alert rulesmonitoring/prometheus/rules/reports.yml
Log file/var/log/report-scheduler.log on vps-i1

Architecture

vps-i1 cron (/etc/cron.d/report-scheduler)

    ├── 02:00 UTC daily   → report_scheduler.py --config przeglady-hu-sp-uvv.json
    └── 05:00 UTC Sunday  → report_scheduler.py --config przeglady-tacho-agregat.json

              ├── Supabase → fetch vehicle inspection records
              ├── pdf-service (:8100) → render Markdown → PDF
              ├── Wasabi S3 (ecotrans-monitoring) → upload PDF
              ├── Mailgun EU SMTP → email PDF to recipients
              └── /var/lib/node_exporter/textfile_collector/
                      └── report_<name>.prom  ← atomic write (tmp → rename)

                              node-exporter (:9100) textfile collector

                              Prometheus scrapes vps-i1:9100 (job: node, every 15s)

                              Thanos Query (http://thanos-query:10904)

                              Grafana → dashboard report-scheduler-v1
                                      └── Alertmanager → email on failure

Reports in scope

Report nameInspection typesScheduleConfig file
przeglady-hu-sp-uvvHU, SP, UVVDaily 02:00 UTCdaily/przeglady-hu-sp-uvv.json
przeglady-tacho-agregatTacho, AgregatSunday 05:00 UTCweekly/przeglady-tacho-agregat.json

Prometheus Metrics

Metrics are written by write_metrics() (lines 672–706 of report_scheduler.py) regardless of run success or failure (called in finally block). The file is written atomically: .prom.tmp is written then renamed.

MetricTypeLabelsMeaning
report_last_run_timestamp_secondsgaugereport_nameUnix epoch of last run (success or fail)
report_last_run_statusgaugereport_name1 = success, 0 = failure
report_generation_duration_secondsgaugereport_nameWall-clock seconds to generate the report

Textfile location: /var/lib/node_exporter/textfile_collector/
File names: report_przeglady_hu_sp_uvv.prom, report_przeglady_tacho_agregat.prom

Scrape path: node-exporter job node in prometheus/prometheus.yml217.154.82.162:9100


Grafana Dashboard

Dashboard file: report-scheduler.json

Dashboard URL: https://grafana.vps-i1.infra.zintegrowana.online/d/report-scheduler-v1/report-scheduler

Panels

PanelTypeQueryWhat to look for
Status ostatniego uruchomieniaStatreport_last_run_statusGreen (OK) = last run succeeded; Red (FAIL) = failure — check logs immediately
Czas od ostatniego uruchomieniaStattime() - report_last_run_timestamp_secondsGreen < 26h; Yellow ≥ 24h; Red ≥ 26h — if red and status OK, script ran but Prometheus lost the metric
Czas generowania raportuTimeseriesreport_generation_duration_secondsBaseline ~1.5 s; spikes > 5 s suggest pdf-service slowness or Supabase latency
Historia uruchomieńTable3 metrics mergedSnapshot of last known state per report — useful after an alert to confirm timestamps

Dashboard variable

DS_PROMETHEUS — auto-selects the default Prometheus datasource (Thanos Query at http://thanos-query:10904). No manual selection needed; it resolves on dashboard load.

Time range & refresh

Default view: Last 24 hours, refresh every 5 minutes. Extend to 7 days to inspect the timeseries panel trend.


Alert Rules

File: reports.yml

AlertFires whenGraceSeverity
ReportNotGeneratedprzeglady-hu-sp-uvv not run in 26 h (or metric absent)for: 1hwarning
WeeklyReportNotGeneratedprzeglady-tacho-agregat not run in 8 days (or metric absent)for: 1hwarning
ReportRunFailedAny report status == 0for: 5mwarning

All three alerts point to /var/log/report-scheduler.log on vps-i1 for investigation.


Config Management

FileLocation on serverIn repo?Contains secrets?
report_scheduler.py/opt/p24-infra/infra-src/report-scheduler/report_scheduler.pyNo
requirements.txt/opt/p24-infra/infra-src/report-scheduler/requirements.txtNo
report-scheduler.cron/etc/cron.d/report-scheduler (installed from repo)monitoring/cron/No
przeglady-hu-sp-uvv.json/opt/p24-infra/reports/configs/daily/❌ server-onlyNo (but contains email addresses)
przeglady-tacho-agregat.json/opt/p24-infra/reports/configs/weekly/❌ server-onlyNo
.env/opt/p24-infra/monitoring/.envYes — all secrets

Required environment variables (key names only)

KeyPurpose
SUPABASE_URLSupabase project URL
SUPABASE_SERVICE_KEYService-role key for inspection data queries
WASABI_ACCESS_KEY / WASABI_SECRET_KEYUpload PDFs to ecotrans-monitoring bucket
PDF_SERVICE_URL / REPORT_PDF_API_KEYpdf-service at :8100
EMAIL_SENDER_URL / EMAIL_SENDER_API_KEYMailgun EU relay
METRICS_DIROverride textfile path (default: /var/lib/node_exporter/textfile_collector)

Secret injection: sourced directly by cron via set -a && source /opt/p24-infra/monitoring/.env && set +a.


Deployment

Install / reinstall cron

ssh root@217.154.82.162
cp /opt/p24-infra/monitoring/cron/report-scheduler.cron /etc/cron.d/report-scheduler
chmod 644 /etc/cron.d/report-scheduler
# Verify cron picked it up:
crontab -l -u root   # cron.d entries are picked up automatically by crond

Update script only

ssh root@217.154.82.162 "cd /opt/p24-infra && git pull"
# No restart needed — cron launches a fresh process each run

Add or change a report config

  1. Create/edit the JSON config on vps-i1 under /opt/p24-infra/reports/configs/{daily|weekly}/
  2. Add a cron entry to monitoring/cron/report-scheduler.cron (in repo), then reinstall on server
  3. Update the list in Reports in scope section above
  4. If this is a new report_name, the Prometheus panels auto-display the new label — no dashboard JSON change required

Run manually

ssh root@217.154.82.162
set -a && source /opt/p24-infra/monitoring/.env && set +a
python3 /opt/p24-infra/infra-src/report-scheduler/report_scheduler.py \
  --config /opt/p24-infra/reports/configs/daily/przeglady-hu-sp-uvv.json
# Add --date YYYY-MM-DD to backfill a specific date

Diagnostics

Dashboard shows FAIL

ssh root@217.154.82.162
tail -100 /var/log/report-scheduler.log
# Look for: ERROR, exception tracebacks, "status=0"

Dashboard shows metric absent / no data

# Check textfile exists and is recent:
ls -la /var/lib/node_exporter/textfile_collector/report_*.prom
cat /var/lib/node_exporter/textfile_collector/report_przeglady_hu_sp_uvv.prom
 
# Check node-exporter is serving it:
curl -s http://localhost:9100/metrics | grep report_last_run
 
# Check Prometheus scraped it:
# Open https://prometheus.vps-i1.infra.zintegrowana.online
# Query: report_last_run_status

Time since last run is red (> 26h) but status is green

The metric was last written more than 26 h ago. Either:

  • The cron job didn’t run (check grep CRON /var/log/cron)
  • The textfile was deleted or permissions changed
  • node-exporter stopped reading the textfile directory
grep CRON /var/log/cron | grep report-scheduler | tail -20
systemctl status node_exporter
ls -la /var/lib/node_exporter/textfile_collector/

Report generation time spike

# Check pdf-service health:
curl -s http://localhost:8100/health
 
# Check Supabase latency:
tail -50 /var/log/report-scheduler.log | grep "Supabase\|duration"

Upgrade

The script has no Docker container — it runs directly under Python 3 on vps-i1.

Update Python dependencies

ssh root@217.154.82.162
cd /opt/p24-infra && git pull
pip3 install -r infra-src/report-scheduler/requirements.txt --upgrade
# Test:
python3 infra-src/report-scheduler/report_scheduler.py --help

Modify a panel query or add a panel

  1. Edit monitoring/grafana/provisioning/dashboards/report-scheduler.json locally
  2. Commit and push to dev → merge to main
  3. On vps-i1: git pull — Grafana reloads provisioned dashboards automatically (no restart)

Add a new alert rule

  1. Edit monitoring/prometheus/rules/reports.yml
  2. Commit and push
  3. On vps-i1: git pull && curl -X POST http://localhost:9090/-/reload

Backup

No stateful data belongs to this component — all output is in Wasabi S3 (bucket ecotrans-monitoring, path reports/) and email inboxes. The .prom textfile is ephemeral (regenerated each run).

Config JSON files on vps-i1 are not in the repo; back them up manually if changed:

scp -r root@217.154.82.162:/opt/p24-infra/reports/configs ./reports-configs-backup-$(date +%Y%m%d)

Monitoring & Alerts

What to watchWhereThreshold
Last run status panelGrafana dashboardRed = immediate action
Time since last run panelGrafana dashboardYellow > 24h, Red > 26h
ReportRunFailed alertAlertmanager → emailAct within 1h
ReportNotGenerated alertAlertmanager → emailAct within 1h
Cron log/var/log/report-scheduler.log on vps-i1Tail after any alert

Known Limitations

RequirementStatusReasonCompensating control
Report config files in repoNoContains recipient email lists managed outside gitManual backup via scp documented above
Retry on transient failureNoScript exits on first error; cron does not retryAlertmanager fires after 5 min; on-call re-runs manually