Spec 03 — Secrets management

Purpose

Today the same secret is typically stored in 4 places: .env.local (dev machine), monitoring/.env (VPS), GitHub Secrets, Vercel env. There is no source of truth, no rotation audit log, and docs/infrastructure-overview.md explicitly flags 6 credentials that need rotating after a 2026-05-06 LLM session. Manual rotation across 4 places is error-prone — secrets get rotated in one place and forgotten in another, which has caused at least two prod incidents this year.

This spec eliminates manual fan-out and eliminates manual rotation entirely for secrets that support short-lived/auto-rotating credentials.


Answer: how to organise key management to minimise manual rotation

The strategy is a three-tier hierarchy, ordered by preference:

Tier 1: OIDC / workload identity      ← ZERO manual rotation
Tier 2: Auto-rotating tokens          ← manual ONLY when subscription/account changes
Tier 3: sops-encrypted static secrets ← manual rotation, but ONE place, audited via git

Rule of thumb: Move every secret as high up this hierarchy as the upstream system supports.

Tier 1 — OIDC (no secret stored anywhere)

GitHub Actions → cloud provider auth via short-lived OIDC tokens. We already use this implicitly for Vercel (GitHub → Vercel integration). Extend to:

Use caseReplacesMechanism
GH Actions → Vercel deploysVERCEL_TOKENVercel GitHub integration (already in place) — verify
GH Actions → Cloudflare DNS editsCLOUDFLARE_TOKEN_ZINTEGROWANA (future)Cloudflare’s GitHub OIDC provider
GH Actions → Wasabi (backups, Thanos)WASABI_*Wasabi supports AWS-compatible STS with OIDC — investigate
GH Actions → SupabaseSUPABASE_ACCESS_TOKENSupabase doesn’t support OIDC yet — keep in Tier 3

Manual rotation needed: Never. Tokens are issued per-run, valid for minutes.

Tier 2 — Auto-rotating credentials

SecretAuto-rotation mechanism
Claude Code OAuth (accessToken)Already auto-refreshes silently every 8–12 h. Refresh token persists for months. No action needed.
GitHub fine-grained PAT for claude-runnerUse GitHub App installation token instead — auto-rotated hourly via actions/create-github-app-token. Replaces P24_INFRA_GH_TOKEN.
n8n internal credentials (Supabase, Trello, Gmail)n8n stores these encrypted with N8N_ENCRYPTION_KEY. Workflows back up via spec 14 (encrypted JSON in git). The encryption key itself is Tier 3.
SSH keys (claude-admin, root)Manual but rare. Rotation = regenerate keypair + update GH Secret + push to VPS via Ansible (spec 04).

Manual rotation needed: Only when an account is compromised or quarterly hygiene rotation. Drops from “6 places” to “1 place + redeploy”.

Tier 3 — sops-encrypted static secrets

For the remaining secrets that have no auto-rotation upstream, we use Mozilla sops + age:

  • Secrets live in secrets/*.sops.yaml files inside the git repo, encrypted.
  • Each VPS holds an age private key at /root/.age/secrets.key (mode 600). VPS can decrypt; no one else can.
  • Developer machines hold a personal age key in 1Password. Multiple recipients (VPS keys + developer keys) per file.
  • Rotation = update the plaintext, re-encrypt, commit, redeploy. One place. One PR. Audited.
secrets/
├── shared.sops.yaml           # Discord webhooks, SMTP, Wasabi
├── vps-i1.sops.yaml           # IONOS-specific (Grafana admin pw, Supabase grafana pw)
├── vps-h1.sops.yaml           # Hostinger-specific (n8n encryption key, WAHA api key)
├── github-actions.sops.yaml   # secrets sync'd into GitHub via gh CLI
└── .sops.yaml                 # recipient rules per file

A pre-commit hook rejects any file matching *.env* or secrets/*.yaml without sops encryption. You physically cannot accidentally commit a plaintext secret.

What this looks like in practice

Adding a new secret (e.g. PDF service API key):

sops secrets/shared.sops.yaml          # opens $EDITOR with decrypted content
# add: pdf_service_api_key: pdfk_abc123
# save, exit — sops re-encrypts before write
git add secrets/shared.sops.yaml
git commit -m "feat(secrets): add pdf_service_api_key"
# next deploy picks it up automatically

Rotating a secret (e.g. Grafana admin password):

sops secrets/vps-i1.sops.yaml          # change the value
git commit -m "fix(secrets): rotate grafana_admin_password"
# CI workflow `secrets-sync.yml` automatically:
#   1. ssh to vps-i1, regenerates monitoring/.env from decrypted yaml
#   2. docker compose up -d (picks up new env)
#   3. comments on the commit with rotation timestamp

Onboarding a new dev to decrypt secrets:

  1. Dev generates age-keygen → public key.
  2. Add public key to .sops.yaml recipients list.
  3. sops updatekeys secrets/*.sops.yaml re-encrypts all files for the new recipient.
  4. Dev pulls and can now decrypt locally.

What we don’t try to automate

  • Anthropic / Claude Max subscription renewal — billing event, human-only.
  • Supabase service role key rotation — Supabase doesn’t expose this via API. Quarterly manual rotation, tracked in docs/secrets-rotation-log.md.
  • Wasabi account credentials — root keys never used; we use scoped IAM users created once.

These 3 are quarterly + on-incident. Everything else is automated or eliminated.


Rulebook (operating rules)

  1. Plaintext secrets never enter git. Pre-commit hook + GitHub push protection (free for private repos) enforce this. Violation = revert + rotate.
  2. One secret, one source. sops files are the source of truth. GH Secrets and VPS .env files are derived, regenerated by CI. Never edit the derived form directly.
  3. .env.local is for personal dev overrides only. Anything sharable belongs in secrets/*.sops.yaml.
  4. Rotation log. Every rotation appends a row to docs/secrets-rotation-log.md (date, secret name, reason, rotator). Audit trail for compliance + incident response.
  5. Treat LLM exposure as breach. Any secret echoed in a Claude Code session (even truncated) gets rotated within 24 h, no exceptions.
  6. No secret in CI logs. set +x before commands handling secrets. Echo only sha256sum | head -c 8 for diagnostics.
  7. Age private keys are physical artefacts. Stored in 1Password (primary), VPS (/root/.age/), and on the developer’s encrypted drive. If two of three are lost, regenerate and re-encrypt all sops files.

Architecture

                ┌──────────────────────────────────────┐
                │  secrets/*.sops.yaml  (in git repo)  │
                │  encrypted with age, multi-recipient │
                └──────────────────────────────────────┘
                              │  (anyone can pull)
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
  ┌──────────┐         ┌──────────────┐      ┌──────────────┐
  │ vps-i1   │         │ vps-h1       │      │ GitHub       │
  │ /root/   │         │ /root/       │      │ Actions      │
  │ .age/key │         │ .age/key     │      │ + AGE_KEY    │
  └────┬─────┘         └──────┬───────┘      │   secret     │
       │                      │              └──────┬───────┘
       ▼                      ▼                     ▼
  monitoring/.env       hostinger/.env       gh secret set ...
       │                      │                     │
       ▼                      ▼                     ▼
  docker compose         docker compose       Vercel/n8n/etc.
       │                      │
   (auto sync via cron+git pull or PreToolUse hook)

Tier 1 (OIDC) flows bypass this entirely:

GitHub Actions  ──(short-lived JWT)──►  Vercel / Cloudflare / Wasabi-STS
                  (no secret stored anywhere — Tier 1)

Implementation plan

Phase 1 — inventory + rotation (0.5 d)

  1. Audit every value currently in .env.local, monitoring/.env, GH Secrets, Vercel env. Produce a sheet: secret name → current locations → which tier it belongs in.
  2. Rotate the 6 already-flagged secrets from infrastructure-overview.md §9. Record in new docs/secrets-rotation-log.md.

Phase 2 — sops bootstrap (0.5 d)

  1. Generate age keypairs:
    • age-keygen -o ~/.age/personal.key on dev machine → 1Password
    • age-keygen -o /root/.age/vps-i1.key on IONOS
    • age-keygen -o /root/.age/vps-h1.key on Hostinger
  2. Create .sops.yaml at repo root with recipient rules per file.
  3. Create initial secrets/shared.sops.yaml, secrets/vps-i1.sops.yaml, secrets/vps-h1.sops.yaml populated from Phase 1 inventory.
  4. Add pre-commit hook (scripts/git-hooks/pre-commit) that runs sops --verify and rejects unencrypted matches.
  5. Add .github/workflows/secrets-sync.yml: on push to main touching secrets/, SSH to relevant VPS, regenerate .env, docker compose up -d.

Phase 3 — Tier 1 / Tier 2 migrations (0.5 d)

  1. Replace P24_INFRA_GH_TOKEN PAT with a GitHub App (p24-infra-bot). Install on radieu/p24-infra and radieu/et-operational-platform. Update workflows to use actions/create-github-app-token@v2.
  2. Verify Vercel uses GitHub OIDC (no VERCEL_TOKEN in workflows). If still in use, migrate.
  3. Investigate Wasabi STS for OIDC (timebox 1 h — if not straightforward, defer and keep in Tier 3).

Phase 4 — docs + drill (0.5 d)

  1. Write docs/secrets-management.md — operator guide (how to add, rotate, audit).
  2. Write docs/secrets-rotation-log.md — template + Phase 1 entries.
  3. Update docs/runbook.md with new alerts:
    • SecretsSyncFailed — secrets-sync workflow failed
    • AgeKeyMissing — VPS reports missing decryption key
  4. Run a dry-run rotation drill: rotate one low-stakes secret (e.g. SMTP password) end-to-end. Time it (target: <5 min from sops edit to deployed).

Acceptance criteria

  • All 6 flagged secrets have new values in production (check rotation log)
  • secrets/*.sops.yaml files exist; running sops -d with the dev key produces valid yaml
  • Pre-commit hook rejects git commit -m "x" -- secrets/test.sops.yaml if file is unencrypted
  • Rotating SMTP password via sops edit + commit deploys to both VPSes within 5 minutes (CI run timestamp - commit timestamp < 5 min)
  • monitoring/.env and hostinger/.env are listed in .gitignore
  • No file matching *token*, *key*, *password* in git history except secrets/*.sops.yaml (audited via git log -p)
  • GH App p24-infra-bot installed; workflows use installation tokens (verify in workflow run logs: token starts with ghs_ not ghp_)
  • docs/secrets-management.md published; docs/secrets-rotation-log.md shows ≥7 entries (6 rotations + drill)

Cost impact

ItemCost
sops, agefree, OSS
GitHub Appfree
GitHub push protection (private repos)free
Additional GH Actions minutes for sync~2 min/push, well under free tier

Total: 0 €/month.


Back-out plan

  1. Revert the secrets-sync workflow PR.
  2. Manually populate monitoring/.env and hostinger/.env on each VPS from the decrypted yaml (one last sops -d).
  3. Delete secrets/ directory and .sops.yaml.
  4. Re-add direct GH Secret usage in workflows that switched to the GH App.

Data loss risk: None — sops files are decryptable as long as one age key survives.

Operational risk during back-out: Services keep their current .env files — they don’t re-read on revert, so no immediate downtime. The next manual .env edit reverts to old workflow.


Risks / open questions

  • Q: sops vs Doppler/Infisical? A: sops because it’s git-native (audit trail = git log, no extra dashboard, no per-seat fees, no vendor lock-in). Doppler is nicer UX but starts at $7/user/month and we don’t have a multi-user problem.
  • Q: What about Vercel preview deployments? A: Vercel project-level env vars stay as-is; we don’t sync sops → Vercel in this spec. Add later if needed.
  • Risk: A misconfigured .sops.yaml recipient rule could leave files only decryptable by the VPS, locking out the developer. Mitigation: every PR touching .sops.yaml requires sops -d to succeed locally as a manual check.
  • Risk: Migration period — secrets exist in both old GH Secrets and new sops files. Schedule: one week of dual-source, then delete GH Secrets. Track in rotation log.
  • Q: Should we use Sealed Secrets or similar K8s-native tools? A: No — we don’t run Kubernetes and won’t for this scale.

Bootstrap

The framework lands in code via PR #58. The framework is inert until the human runs the following bootstrap once. After bootstrap, all rotation is a routine sops edit + commit.

Step 1 - Generate developer age key (local)

# Windows
mkdir C:\Users\konar\.age -ErrorAction SilentlyContinue
age-keygen -o C:\Users\konar\.age\personal.key
# Save the AGE-SECRET-KEY-1... line to 1Password under "p24-infra age - personal"
# Note the "Public key: age1..." line - needed in step 5.

Step 2 - Generate vps-i1 (IONOS) age key

ssh root@217.154.82.162 "mkdir -p /root/.age && age-keygen -o /root/.age/secrets.key && chmod 600 /root/.age/secrets.key && grep 'Public key' /root/.age/secrets.key"
# Save the AGE-SECRET-KEY-1... contents to 1Password under "p24-infra age - vps-i1"
# (read with: ssh root@217.154.82.162 'cat /root/.age/secrets.key')

Step 3 - Generate vps-h1 (Hostinger) age key

ssh root@72.60.32.61 "mkdir -p /root/.age && age-keygen -o /root/.age/secrets.key && chmod 600 /root/.age/secrets.key && grep 'Public key' /root/.age/secrets.key"
# Save private key contents to 1Password under "p24-infra age - vps-h1"

Step 4 - Generate GHA-runner age key (local, will go to GH Secret)

age-keygen -o C:\Users\konar\.age\gha.key
# Save AGE-SECRET-KEY-1... to 1Password as "p24-infra age - gha-runner"
# Note the public key.

Step 5 - Wire real public keys into .sops.yaml

Replace the four age1placeholder... strings in .sops.yaml with the real public keys from steps 1-4 (dev_radieu, vps_i1, vps_h1, gha_runner).

Step 6 - First-time encryption of template files

Each of the four secrets/*.sops.yaml files starts as a plaintext template (header # TEMPLATE - placeholder...). After step 5 they encrypt cleanly:

export SOPS_AGE_KEY_FILE=C:\Users\konar\.age\personal.key   # PowerShell: $env:SOPS_AGE_KEY_FILE = ...
for f in secrets/shared.sops.yaml secrets/vps-i1.sops.yaml secrets/vps-h1.sops.yaml secrets/github-actions.sops.yaml; do
  # In-place encrypt with current recipients from .sops.yaml
  sops -e -i "$f"
  # Then open and replace placeholders one by one
  sops edit "$f"
done

After this step the # TEMPLATE header is gone and a sops: block is present at the bottom - pre-commit accepts the file as encrypted.

Step 7 - Create the GitHub App p24-infra-bot

  1. Open https://github.com/settings/apps/new
  2. Name: p24-infra-bot
  3. Homepage URL: https://github.com/radieu/p24-infra
  4. Webhook: disable
  5. Permissions:
    • Repository -> Issues: Read & write
    • Repository -> Pull requests: Read & write
    • Repository -> Contents: Read & write
    • Repository -> Actions: Read
    • Repository -> Metadata: Read
  6. Where can this be installed: “Only on this account”
  7. Create - note the App ID (numeric, e.g. 123456)
  8. Generate a private key - download the .pem file. Store the full contents (including BEGIN/END lines) in 1Password as “p24-infra-bot - private key”.

Step 8 - Install GitHub App on both repos

From the App page -> Install App -> choose:

  • radieu/p24-infra
  • radieu/et-operational-platform

Verify under https://github.com/settings/installations.

Step 9 - Add GitHub Secrets

gh secret set P24_BOT_APP_ID --repo radieu/p24-infra --body "<APP_ID>"
gh secret set P24_BOT_PRIVATE_KEY --repo radieu/p24-infra --body "$(cat path/to/p24-infra-bot.pem)"
gh secret set AGE_KEY_GHA --repo radieu/p24-infra --body "$(cat C:\Users\konar\.age\gha.key)"

Step 10 - Migrate workflow consumers from PAT to App token

The PR already added # TODO: replace with steps.app-token.outputs.token comments at every secrets.GH_PAT / secrets.GH_TOKEN consumer. For each, insert before the consuming step:

      - uses: actions/create-github-app-token@v2
        id: app-token
        with:
          app-id: ${{ secrets.P24_BOT_APP_ID }}
          private-key: ${{ secrets.P24_BOT_PRIVATE_KEY }}
          owner: radieu
          repositories: p24-infra,et-operational-platform

…then change ${{ secrets.GH_PAT }} / ${{ secrets.GH_TOKEN }} (where those refer to the PAT, not the auto-provided secrets.GITHUB_TOKEN) to ${{ steps.app-token.outputs.token }} and remove the TODO comment. Files touched by this PR with TODOs:

  • .github/workflows/health-check.yml (3 occurrences)
  • .github/workflows/provision-new-vps.yml (1 occurrence)
  • .github/workflows/update-claude-env.yml (1 occurrence)
  • .github/workflows/secrets-sync.yml (1 occurrence - sync-github-secrets job)

Validate by checking a workflow-run log - App tokens start with ghs_, PATs with ghp_.

Step 11 - Rotate the 6 flagged secrets

For each entry in docs/secrets-rotation-log.md marked “PENDING - rotate during bootstrap”:

  1. Revoke at source (provider dashboard).
  2. Generate new value.
  3. sops edit secrets/github-actions.sops.yaml (or shared for Supabase keys) -> paste new value.
  4. Commit + push.
  5. Append a new row to docs/secrets-rotation-log.md with today’s date and Confirmed in sync: yes once the workflow runs green.

Step 12 - Trigger secrets-sync manually

gh workflow run secrets-sync.yml --repo radieu/p24-infra
gh run watch

Expected: sync-vps-i1 green, sync-vps-h1 green (or still TODO depending on Hostinger user prep - see open question below), sync-github-secrets green.

Step 13 - After 24h of stable operation, delete deprecated GH Secrets

For each secret now owned by sops (SMTP_*, WASABI_*, GRAFANA_ADMIN_PASSWORD, SUPABASE_*, PDF_SERVICE_API_KEY, etc.) - open https://github.com/radieu/p24-infra/settings/secrets/actions, delete the secret, and add a row to docs/secrets-rotation-log.md noting the deletion.

Do not delete: VPS_SSH_PRIVATE_KEY, VPS_ROOT_SSH_KEY, AGE_KEY_GHA, P24_BOT_APP_ID, P24_BOT_PRIVATE_KEY - these are the roots of trust, they cannot themselves live in sops.

Open questions for the human at bootstrap time

  1. vps-h1 sync user - the secrets-sync.yml sync-vps-h1 job is a TODO. Decide: (a) create a claude-admin user on vps-h1 mirroring vps-i1’s setup and use VPS_SSH_PRIVATE_KEY, or (b) use root with VPS_ROOT_SSH_KEY (simpler, less secure). Pick (a) for parity. Track in a follow-up issue.
  2. secrets-sync gh-secrets bootstrap chicken-and-egg - sync-github-secrets currently uses secrets.P24_INFRA_GH_TOKEN to call gh secret set. After step 10 it should use the App installation token. Until step 10 lands you may need a one-time manual push of the github-actions sops decryption to GH Secrets via gh secret set from your laptop.
  3. Wasabi OIDC - spec 03 mentions investigating Wasabi STS as a Tier-1 candidate. Time-box 1h; if not straightforward, leave Wasabi keys in secrets/shared.sops.yaml as Tier 3.