p24-infra Project Standards

Every infrastructure element added to p24-infra MUST follow this standard. This document is authoritative. When standards conflict with convenience, standards win.


1. Core Principle

Every element of our infrastructure must be:

  • Documented — anyone can understand what it is and why it exists
  • Observable — we know when it fails before users do
  • Recoverable — we can restore it from zero within 30 minutes
  • Auditable — credentials are tracked, never exposed, rotated on schedule

2. The Six Requirements

Defined in detail in infrastructure-standard.md. Every element in elements.md must satisfy all six or have a documented exception.

#RequirementMinimum bar
1Workbookdocs/{service}-operations.md using workbook-template.md
2Monitoring{Service}Down alert + blackbox probe (if public URL exists)
3BackupAutomated, off-server (Wasabi p24-infra bucket), daily minimum
4RestoreWritten procedure, tested at least once, documented result
5HealthcheckDocker healthcheck or external probe — no silent failures
6Password RotationAll credentials listed (names only), rotation frequency defined

3. Element Spec Requirement

Every infrastructure element MUST have an entry in docs/elements.md. Complex services additionally get an individual spec file using element-spec-template.md.

The element entry must be updated immediately when:

  • A service is added or removed
  • A service moves to a different host
  • A service’s compliance status changes
  • A known limitation or exception is discovered

4. Credential Protection Rule

CRITICAL — non-negotiable: Passwords, API keys, tokens, and secrets MUST NEVER appear in:

  • Documentation files
  • Git commits
  • Chat messages
  • Issue descriptions
  • PR descriptions

Use key names and placeholder syntax only:

  • WASABI_ACCESS_KEY=<from .env.local>
  • database.password = [see /root/traccar/.env on vps-i1]
  • WASABI_ACCESS_KEY=RIEWOBR6XNY4G93WQ99V

When working with Claude Code or any AI assistant: read env files internally for programmatic use, never display or quote credential values in responses.


5. Exception Documentation Rule

If an element cannot meet a standard requirement, this must be documented in two places:

  1. services/compliance-matrix.yml — set the field to "partial" or "no" with an explanation in notes
  2. The element’s spec file — add an Exceptions section explaining:
    • Which requirement cannot be met
    • Why (technical limitation, cost, external dependency)
    • What compensating control exists (if any)
    • Target date for resolution (if planned)

6. Infrastructure Review Process

When to review

A review is triggered by any of the following:

  • Adding a new service to production
  • Migrating a service to a different host
  • A security incident affecting any element
  • Quarterly scheduled review (every 3 months)

Review checklist

For each service under review:

  • Element spec is up to date in docs/elements.md
  • All six standard requirements checked — gaps documented as exceptions
  • Compliance matrix updated in services/compliance-matrix.yml
  • Standard extension question: Does this service expose a gap in the standard itself? If yes, open a GitHub issue proposing a standard update before closing the review.
  • Credential list verified — all credentials named in spec, values NOT in spec
  • Monitoring alert fired and resolved at least once (or acknowledged as untested)
  • Restore procedure last tested date recorded

Standard extension process

If during a review you find a pattern that the standard doesn’t cover (e.g., rate-limiting, multi-region failover, log retention), open a GitHub issue:

  • Label: enhancement, standards
  • Title: [Standard] Add requirement: {name}
  • Body: what the requirement is, why it matters, what “done” looks like

Approved standard extensions are added to infrastructure-standard.md and the compliance matrix gains a new column for all services.


7. Compliance Matrix Maintenance

services/compliance-matrix.yml is the source of truth for compliance status.

  • Update it in the same PR as the infrastructure change — not afterwards
  • The p24-infra Health dashboard reads it live from GitHub — changes appear within 5 minutes
  • "yes" = fully implemented and tested
  • "partial" = partially implemented — notes field is mandatory
  • "no" = not implemented — acceptable only if documented as an exception

8. Naming Conventions

ArtifactPatternExample
Workbookdocs/{service-name}-operations.mddocs/traccar-operations.md
Element spec (complex)docs/specs/{service-name}.mddocs/specs/grafana.md
Compose fileservices/{service-name}/docker-compose.ymlservices/traccar/docker-compose.yml
Backup scriptservices/{service-name}/scripts/backup.pyservices/traccar/scripts/backup.py
Wasabi backup prefix{service-name}/ in bucket p24-infratraccar-server/
Prometheus alert{ServiceName}Down, {ServiceName}HighRestartsTraccarDown

9. Document Ownership

DocumentOwnerUpdate trigger
docs/standards/project-standards.mdThis file — update via PR + reviewStandard change
docs/infrastructure-standard.mdSix requirements detailRequirement change
docs/elements.mdFull element registryAny infra change
services/compliance-matrix.ymlCompliance statusAny compliance change
docs/{service}-operations.mdPer-service workbookService change