Spec 03 — Secrets management
Purpose
Today the same secret is typically stored in 4 places: .env.local (dev machine), monitoring/.env (VPS), GitHub Secrets, Vercel env. There is no source of truth, no rotation audit log, and docs/infrastructure-overview.md explicitly flags 6 credentials that need rotating after a 2026-05-06 LLM session. Manual rotation across 4 places is error-prone — secrets get rotated in one place and forgotten in another, which has caused at least two prod incidents this year.
This spec eliminates manual fan-out and eliminates manual rotation entirely for secrets that support short-lived/auto-rotating credentials.
Answer: how to organise key management to minimise manual rotation
The strategy is a three-tier hierarchy, ordered by preference:
Tier 1: OIDC / workload identity ← ZERO manual rotation
Tier 2: Auto-rotating tokens ← manual ONLY when subscription/account changes
Tier 3: sops-encrypted static secrets ← manual rotation, but ONE place, audited via git
Rule of thumb: Move every secret as high up this hierarchy as the upstream system supports.
Tier 1 — OIDC (no secret stored anywhere)
GitHub Actions → cloud provider auth via short-lived OIDC tokens. We already use this implicitly for Vercel (GitHub → Vercel integration). Extend to:
| Use case | Replaces | Mechanism |
|---|---|---|
| GH Actions → Vercel deploys | VERCEL_TOKEN | Vercel GitHub integration (already in place) — verify |
| GH Actions → Cloudflare DNS edits | CLOUDFLARE_TOKEN_ZINTEGROWANA (future) | Cloudflare’s GitHub OIDC provider |
| GH Actions → Wasabi (backups, Thanos) | WASABI_* | Wasabi supports AWS-compatible STS with OIDC — investigate |
| GH Actions → Supabase | SUPABASE_ACCESS_TOKEN | Supabase doesn’t support OIDC yet — keep in Tier 3 |
Manual rotation needed: Never. Tokens are issued per-run, valid for minutes.
Tier 2 — Auto-rotating credentials
| Secret | Auto-rotation mechanism |
|---|---|
Claude Code OAuth (accessToken) | Already auto-refreshes silently every 8–12 h. Refresh token persists for months. No action needed. |
GitHub fine-grained PAT for claude-runner | Use GitHub App installation token instead — auto-rotated hourly via actions/create-github-app-token. Replaces P24_INFRA_GH_TOKEN. |
| n8n internal credentials (Supabase, Trello, Gmail) | n8n stores these encrypted with N8N_ENCRYPTION_KEY. Workflows back up via spec 14 (encrypted JSON in git). The encryption key itself is Tier 3. |
| SSH keys (claude-admin, root) | Manual but rare. Rotation = regenerate keypair + update GH Secret + push to VPS via Ansible (spec 04). |
Manual rotation needed: Only when an account is compromised or quarterly hygiene rotation. Drops from “6 places” to “1 place + redeploy”.
Tier 3 — sops-encrypted static secrets
For the remaining secrets that have no auto-rotation upstream, we use Mozilla sops + age:
- Secrets live in
secrets/*.sops.yamlfiles inside the git repo, encrypted. - Each VPS holds an age private key at
/root/.age/secrets.key(mode 600). VPS can decrypt; no one else can. - Developer machines hold a personal age key in 1Password. Multiple recipients (VPS keys + developer keys) per file.
- Rotation = update the plaintext, re-encrypt, commit, redeploy. One place. One PR. Audited.
secrets/
├── shared.sops.yaml # Discord webhooks, SMTP, Wasabi
├── vps-i1.sops.yaml # IONOS-specific (Grafana admin pw, Supabase grafana pw)
├── vps-h1.sops.yaml # Hostinger-specific (n8n encryption key, WAHA api key)
├── github-actions.sops.yaml # secrets sync'd into GitHub via gh CLI
└── .sops.yaml # recipient rules per file
A pre-commit hook rejects any file matching *.env* or secrets/*.yaml without sops encryption. You physically cannot accidentally commit a plaintext secret.
What this looks like in practice
Adding a new secret (e.g. PDF service API key):
sops secrets/shared.sops.yaml # opens $EDITOR with decrypted content
# add: pdf_service_api_key: pdfk_abc123
# save, exit — sops re-encrypts before write
git add secrets/shared.sops.yaml
git commit -m "feat(secrets): add pdf_service_api_key"
# next deploy picks it up automaticallyRotating a secret (e.g. Grafana admin password):
sops secrets/vps-i1.sops.yaml # change the value
git commit -m "fix(secrets): rotate grafana_admin_password"
# CI workflow `secrets-sync.yml` automatically:
# 1. ssh to vps-i1, regenerates monitoring/.env from decrypted yaml
# 2. docker compose up -d (picks up new env)
# 3. comments on the commit with rotation timestampOnboarding a new dev to decrypt secrets:
- Dev generates
age-keygen→ public key. - Add public key to
.sops.yamlrecipients list. sops updatekeys secrets/*.sops.yamlre-encrypts all files for the new recipient.- Dev pulls and can now decrypt locally.
What we don’t try to automate
- Anthropic / Claude Max subscription renewal — billing event, human-only.
- Supabase service role key rotation — Supabase doesn’t expose this via API. Quarterly manual rotation, tracked in
docs/secrets-rotation-log.md. - Wasabi account credentials — root keys never used; we use scoped IAM users created once.
These 3 are quarterly + on-incident. Everything else is automated or eliminated.
Rulebook (operating rules)
- Plaintext secrets never enter git. Pre-commit hook + GitHub push protection (free for private repos) enforce this. Violation = revert + rotate.
- One secret, one source. sops files are the source of truth. GH Secrets and VPS
.envfiles are derived, regenerated by CI. Never edit the derived form directly. .env.localis for personal dev overrides only. Anything sharable belongs insecrets/*.sops.yaml.- Rotation log. Every rotation appends a row to
docs/secrets-rotation-log.md(date, secret name, reason, rotator). Audit trail for compliance + incident response. - Treat LLM exposure as breach. Any secret echoed in a Claude Code session (even truncated) gets rotated within 24 h, no exceptions.
- No secret in CI logs.
set +xbefore commands handling secrets. Echo onlysha256sum | head -c 8for diagnostics. - Age private keys are physical artefacts. Stored in 1Password (primary), VPS (
/root/.age/), and on the developer’s encrypted drive. If two of three are lost, regenerate and re-encrypt all sops files.
Architecture
┌──────────────────────────────────────┐
│ secrets/*.sops.yaml (in git repo) │
│ encrypted with age, multi-recipient │
└──────────────────────────────────────┘
│ (anyone can pull)
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ vps-i1 │ │ vps-h1 │ │ GitHub │
│ /root/ │ │ /root/ │ │ Actions │
│ .age/key │ │ .age/key │ │ + AGE_KEY │
└────┬─────┘ └──────┬───────┘ │ secret │
│ │ └──────┬───────┘
▼ ▼ ▼
monitoring/.env hostinger/.env gh secret set ...
│ │ │
▼ ▼ ▼
docker compose docker compose Vercel/n8n/etc.
│ │
(auto sync via cron+git pull or PreToolUse hook)
Tier 1 (OIDC) flows bypass this entirely:
GitHub Actions ──(short-lived JWT)──► Vercel / Cloudflare / Wasabi-STS
(no secret stored anywhere — Tier 1)
Implementation plan
Phase 1 — inventory + rotation (0.5 d)
- Audit every value currently in
.env.local,monitoring/.env, GH Secrets, Vercel env. Produce a sheet: secret name → current locations → which tier it belongs in. - Rotate the 6 already-flagged secrets from
infrastructure-overview.md§9. Record in newdocs/secrets-rotation-log.md.
Phase 2 — sops bootstrap (0.5 d)
- Generate age keypairs:
age-keygen -o ~/.age/personal.keyon dev machine → 1Passwordage-keygen -o /root/.age/vps-i1.keyon IONOSage-keygen -o /root/.age/vps-h1.keyon Hostinger
- Create
.sops.yamlat repo root with recipient rules per file. - Create initial
secrets/shared.sops.yaml,secrets/vps-i1.sops.yaml,secrets/vps-h1.sops.yamlpopulated from Phase 1 inventory. - Add pre-commit hook (
scripts/git-hooks/pre-commit) that runssops --verifyand rejects unencrypted matches. - Add
.github/workflows/secrets-sync.yml: on push tomaintouchingsecrets/, SSH to relevant VPS, regenerate.env,docker compose up -d.
Phase 3 — Tier 1 / Tier 2 migrations (0.5 d)
- Replace
P24_INFRA_GH_TOKENPAT with a GitHub App (p24-infra-bot). Install onradieu/p24-infraandradieu/et-operational-platform. Update workflows to useactions/create-github-app-token@v2. - Verify Vercel uses GitHub OIDC (no
VERCEL_TOKENin workflows). If still in use, migrate. - Investigate Wasabi STS for OIDC (timebox 1 h — if not straightforward, defer and keep in Tier 3).
Phase 4 — docs + drill (0.5 d)
- Write
docs/secrets-management.md— operator guide (how to add, rotate, audit). - Write
docs/secrets-rotation-log.md— template + Phase 1 entries. - Update
docs/runbook.mdwith new alerts:SecretsSyncFailed— secrets-sync workflow failedAgeKeyMissing— VPS reports missing decryption key
- Run a dry-run rotation drill: rotate one low-stakes secret (e.g. SMTP password) end-to-end. Time it (target: <5 min from
sops editto deployed).
Acceptance criteria
- All 6 flagged secrets have new values in production (check rotation log)
-
secrets/*.sops.yamlfiles exist; runningsops -dwith the dev key produces valid yaml - Pre-commit hook rejects
git commit -m "x" -- secrets/test.sops.yamlif file is unencrypted - Rotating SMTP password via
sops edit+ commit deploys to both VPSes within 5 minutes (CI run timestamp - commit timestamp < 5 min) -
monitoring/.envandhostinger/.envare listed in.gitignore - No file matching
*token*,*key*,*password*in git history exceptsecrets/*.sops.yaml(audited viagit log -p) - GH App
p24-infra-botinstalled; workflows use installation tokens (verify in workflow run logs: token starts withghs_notghp_) -
docs/secrets-management.mdpublished;docs/secrets-rotation-log.mdshows ≥7 entries (6 rotations + drill)
Cost impact
| Item | Cost |
|---|---|
| sops, age | free, OSS |
| GitHub App | free |
| GitHub push protection (private repos) | free |
| Additional GH Actions minutes for sync | ~2 min/push, well under free tier |
Total: 0 €/month.
Back-out plan
- Revert the secrets-sync workflow PR.
- Manually populate
monitoring/.envandhostinger/.envon each VPS from the decrypted yaml (one lastsops -d). - Delete
secrets/directory and.sops.yaml. - Re-add direct GH Secret usage in workflows that switched to the GH App.
Data loss risk: None — sops files are decryptable as long as one age key survives.
Operational risk during back-out: Services keep their current .env files — they don’t re-read on revert, so no immediate downtime. The next manual .env edit reverts to old workflow.
Risks / open questions
- Q: sops vs Doppler/Infisical? A: sops because it’s git-native (audit trail = git log, no extra dashboard, no per-seat fees, no vendor lock-in). Doppler is nicer UX but starts at $7/user/month and we don’t have a multi-user problem.
- Q: What about Vercel preview deployments? A: Vercel project-level env vars stay as-is; we don’t sync sops → Vercel in this spec. Add later if needed.
- Risk: A misconfigured
.sops.yamlrecipient rule could leave files only decryptable by the VPS, locking out the developer. Mitigation: every PR touching.sops.yamlrequiressops -dto succeed locally as a manual check. - Risk: Migration period — secrets exist in both old GH Secrets and new sops files. Schedule: one week of dual-source, then delete GH Secrets. Track in rotation log.
- Q: Should we use Sealed Secrets or similar K8s-native tools? A: No — we don’t run Kubernetes and won’t for this scale.
Bootstrap
The framework lands in code via PR #58. The framework is inert until the human runs the following bootstrap once. After bootstrap, all rotation is a routine sops edit + commit.
Step 1 - Generate developer age key (local)
# Windows
mkdir C:\Users\konar\.age -ErrorAction SilentlyContinue
age-keygen -o C:\Users\konar\.age\personal.key
# Save the AGE-SECRET-KEY-1... line to 1Password under "p24-infra age - personal"
# Note the "Public key: age1..." line - needed in step 5.Step 2 - Generate vps-i1 (IONOS) age key
ssh root@217.154.82.162 "mkdir -p /root/.age && age-keygen -o /root/.age/secrets.key && chmod 600 /root/.age/secrets.key && grep 'Public key' /root/.age/secrets.key"
# Save the AGE-SECRET-KEY-1... contents to 1Password under "p24-infra age - vps-i1"
# (read with: ssh root@217.154.82.162 'cat /root/.age/secrets.key')Step 3 - Generate vps-h1 (Hostinger) age key
ssh root@72.60.32.61 "mkdir -p /root/.age && age-keygen -o /root/.age/secrets.key && chmod 600 /root/.age/secrets.key && grep 'Public key' /root/.age/secrets.key"
# Save private key contents to 1Password under "p24-infra age - vps-h1"Step 4 - Generate GHA-runner age key (local, will go to GH Secret)
age-keygen -o C:\Users\konar\.age\gha.key
# Save AGE-SECRET-KEY-1... to 1Password as "p24-infra age - gha-runner"
# Note the public key.Step 5 - Wire real public keys into .sops.yaml
Replace the four age1placeholder... strings in .sops.yaml with the real public keys from steps 1-4 (dev_radieu, vps_i1, vps_h1, gha_runner).
Step 6 - First-time encryption of template files
Each of the four secrets/*.sops.yaml files starts as a plaintext template (header # TEMPLATE - placeholder...). After step 5 they encrypt cleanly:
export SOPS_AGE_KEY_FILE=C:\Users\konar\.age\personal.key # PowerShell: $env:SOPS_AGE_KEY_FILE = ...
for f in secrets/shared.sops.yaml secrets/vps-i1.sops.yaml secrets/vps-h1.sops.yaml secrets/github-actions.sops.yaml; do
# In-place encrypt with current recipients from .sops.yaml
sops -e -i "$f"
# Then open and replace placeholders one by one
sops edit "$f"
doneAfter this step the # TEMPLATE header is gone and a sops: block is present at the bottom - pre-commit accepts the file as encrypted.
Step 7 - Create the GitHub App p24-infra-bot
- Open https://github.com/settings/apps/new
- Name:
p24-infra-bot - Homepage URL:
https://github.com/radieu/p24-infra - Webhook: disable
- Permissions:
- Repository -> Issues: Read & write
- Repository -> Pull requests: Read & write
- Repository -> Contents: Read & write
- Repository -> Actions: Read
- Repository -> Metadata: Read
- Where can this be installed: “Only on this account”
- Create - note the App ID (numeric, e.g.
123456) - Generate a private key - download the
.pemfile. Store the full contents (including BEGIN/END lines) in 1Password as “p24-infra-bot - private key”.
Step 8 - Install GitHub App on both repos
From the App page -> Install App -> choose:
radieu/p24-infraradieu/et-operational-platform
Verify under https://github.com/settings/installations.
Step 9 - Add GitHub Secrets
gh secret set P24_BOT_APP_ID --repo radieu/p24-infra --body "<APP_ID>"
gh secret set P24_BOT_PRIVATE_KEY --repo radieu/p24-infra --body "$(cat path/to/p24-infra-bot.pem)"
gh secret set AGE_KEY_GHA --repo radieu/p24-infra --body "$(cat C:\Users\konar\.age\gha.key)"Step 10 - Migrate workflow consumers from PAT to App token
The PR already added # TODO: replace with steps.app-token.outputs.token comments at every secrets.GH_PAT / secrets.GH_TOKEN consumer. For each, insert before the consuming step:
- uses: actions/create-github-app-token@v2
id: app-token
with:
app-id: ${{ secrets.P24_BOT_APP_ID }}
private-key: ${{ secrets.P24_BOT_PRIVATE_KEY }}
owner: radieu
repositories: p24-infra,et-operational-platform…then change ${{ secrets.GH_PAT }} / ${{ secrets.GH_TOKEN }} (where those refer to the PAT, not the auto-provided secrets.GITHUB_TOKEN) to ${{ steps.app-token.outputs.token }} and remove the TODO comment. Files touched by this PR with TODOs:
.github/workflows/health-check.yml(3 occurrences).github/workflows/provision-new-vps.yml(1 occurrence).github/workflows/update-claude-env.yml(1 occurrence).github/workflows/secrets-sync.yml(1 occurrence -sync-github-secretsjob)
Validate by checking a workflow-run log - App tokens start with ghs_, PATs with ghp_.
Step 11 - Rotate the 6 flagged secrets
For each entry in docs/secrets-rotation-log.md marked “PENDING - rotate during bootstrap”:
- Revoke at source (provider dashboard).
- Generate new value.
sops edit secrets/github-actions.sops.yaml(orsharedfor Supabase keys) -> paste new value.- Commit + push.
- Append a new row to
docs/secrets-rotation-log.mdwith today’s date andConfirmed in sync: yesonce the workflow runs green.
Step 12 - Trigger secrets-sync manually
gh workflow run secrets-sync.yml --repo radieu/p24-infra
gh run watchExpected: sync-vps-i1 green, sync-vps-h1 green (or still TODO depending on Hostinger user prep - see open question below), sync-github-secrets green.
Step 13 - After 24h of stable operation, delete deprecated GH Secrets
For each secret now owned by sops (SMTP_*, WASABI_*, GRAFANA_ADMIN_PASSWORD, SUPABASE_*, PDF_SERVICE_API_KEY, etc.) - open https://github.com/radieu/p24-infra/settings/secrets/actions, delete the secret, and add a row to docs/secrets-rotation-log.md noting the deletion.
Do not delete: VPS_SSH_PRIVATE_KEY, VPS_ROOT_SSH_KEY, AGE_KEY_GHA, P24_BOT_APP_ID, P24_BOT_PRIVATE_KEY - these are the roots of trust, they cannot themselves live in sops.
Open questions for the human at bootstrap time
- vps-h1 sync user - the
secrets-sync.ymlsync-vps-h1job is a TODO. Decide: (a) create aclaude-adminuser on vps-h1 mirroring vps-i1’s setup and useVPS_SSH_PRIVATE_KEY, or (b) use root withVPS_ROOT_SSH_KEY(simpler, less secure). Pick (a) for parity. Track in a follow-up issue. - secrets-sync gh-secrets bootstrap chicken-and-egg -
sync-github-secretscurrently usessecrets.P24_INFRA_GH_TOKENto callgh secret set. After step 10 it should use the App installation token. Until step 10 lands you may need a one-time manual push of the github-actions sops decryption to GH Secrets viagh secret setfrom your laptop. - Wasabi OIDC - spec 03 mentions investigating Wasabi STS as a Tier-1 candidate. Time-box 1h; if not straightforward, leave Wasabi keys in
secrets/shared.sops.yamlas Tier 3.