p24-infra Project Standards
Every infrastructure element added to p24-infra MUST follow this standard. This document is authoritative. When standards conflict with convenience, standards win.
1. Core Principle
Every element of our infrastructure must be:
- Documented — anyone can understand what it is and why it exists
- Observable — we know when it fails before users do
- Recoverable — we can restore it from zero within 30 minutes
- Auditable — credentials are tracked, never exposed, rotated on schedule
2. The Six Requirements
Defined in detail in infrastructure-standard.md. Every element in elements.md must satisfy all six or have a documented exception.
| # | Requirement | Minimum bar |
|---|---|---|
| 1 | Workbook | docs/{service}-operations.md using workbook-template.md |
| 2 | Monitoring | {Service}Down alert + blackbox probe (if public URL exists) |
| 3 | Backup | Automated, off-server (Wasabi p24-infra bucket), daily minimum |
| 4 | Restore | Written procedure, tested at least once, documented result |
| 5 | Healthcheck | Docker healthcheck or external probe — no silent failures |
| 6 | Password Rotation | All credentials listed (names only), rotation frequency defined |
3. Element Spec Requirement
Every infrastructure element MUST have an entry in docs/elements.md.
Complex services additionally get an individual spec file using
element-spec-template.md.
The element entry must be updated immediately when:
- A service is added or removed
- A service moves to a different host
- A service’s compliance status changes
- A known limitation or exception is discovered
4. Credential Protection Rule
CRITICAL — non-negotiable: Passwords, API keys, tokens, and secrets MUST NEVER appear in:
- Documentation files
- Git commits
- Chat messages
- Issue descriptions
- PR descriptions
Use key names and placeholder syntax only:
- ✅
WASABI_ACCESS_KEY=<from .env.local>- ✅
database.password = [see /root/traccar/.env on vps-i1]- ❌
WASABI_ACCESS_KEY=RIEWOBR6XNY4G93WQ99V
When working with Claude Code or any AI assistant: read env files internally for programmatic use, never display or quote credential values in responses.
5. Exception Documentation Rule
If an element cannot meet a standard requirement, this must be documented in two places:
services/compliance-matrix.yml— set the field to"partial"or"no"with an explanation innotes- The element’s spec file — add an Exceptions section explaining:
- Which requirement cannot be met
- Why (technical limitation, cost, external dependency)
- What compensating control exists (if any)
- Target date for resolution (if planned)
6. Infrastructure Review Process
When to review
A review is triggered by any of the following:
- Adding a new service to production
- Migrating a service to a different host
- A security incident affecting any element
- Quarterly scheduled review (every 3 months)
Review checklist
For each service under review:
- Element spec is up to date in
docs/elements.md - All six standard requirements checked — gaps documented as exceptions
- Compliance matrix updated in
services/compliance-matrix.yml - Standard extension question: Does this service expose a gap in the standard itself? If yes, open a GitHub issue proposing a standard update before closing the review.
- Credential list verified — all credentials named in spec, values NOT in spec
- Monitoring alert fired and resolved at least once (or acknowledged as untested)
- Restore procedure last tested date recorded
Standard extension process
If during a review you find a pattern that the standard doesn’t cover (e.g., rate-limiting, multi-region failover, log retention), open a GitHub issue:
- Label:
enhancement,standards - Title:
[Standard] Add requirement: {name} - Body: what the requirement is, why it matters, what “done” looks like
Approved standard extensions are added to infrastructure-standard.md and the compliance matrix gains a new column for all services.
7. Compliance Matrix Maintenance
services/compliance-matrix.yml is the source of truth for compliance status.
- Update it in the same PR as the infrastructure change — not afterwards
- The p24-infra Health dashboard reads it live from GitHub — changes appear within 5 minutes
"yes"= fully implemented and tested"partial"= partially implemented — notes field is mandatory"no"= not implemented — acceptable only if documented as an exception
8. Naming Conventions
| Artifact | Pattern | Example |
|---|---|---|
| Workbook | docs/{service-name}-operations.md | docs/traccar-operations.md |
| Element spec (complex) | docs/specs/{service-name}.md | docs/specs/grafana.md |
| Compose file | services/{service-name}/docker-compose.yml | services/traccar/docker-compose.yml |
| Backup script | services/{service-name}/scripts/backup.py | services/traccar/scripts/backup.py |
| Wasabi backup prefix | {service-name}/ in bucket p24-infra | traccar-server/ |
| Prometheus alert | {ServiceName}Down, {ServiceName}HighRestarts | TraccarDown |
9. Document Ownership
| Document | Owner | Update trigger |
|---|---|---|
docs/standards/project-standards.md | This file — update via PR + review | Standard change |
docs/infrastructure-standard.md | Six requirements detail | Requirement change |
docs/elements.md | Full element registry | Any infra change |
services/compliance-matrix.yml | Compliance status | Any compliance change |
docs/{service}-operations.md | Per-service workbook | Service change |