Spec 09 — SSH hardening

Purpose

  • Root login is enabled on both VPSes with password fallback allowed (we use keys, but the door is left ajar).
  • UFW is inactive on Hostinger.
  • No fail2ban anywhere.
  • SSH ports are open to the internet 24/7 on both VPSes.

These are not screaming dangers — we use ed25519 keys, so brute-force is impractical — but each one is a free hardening win.

Depends on spec 04 (Ansible) so the changes are declared in code, not hand-applied (otherwise this spec creates drift the moment it lands).


Rulebook

  1. No password SSH auth. Anywhere. Keys only.
  2. No direct root SSH after this lands. Root access goes through claude-admin + sudo on IONOS; need equivalent claude-admin user on Hostinger.
  3. fail2ban with conservative thresholds. 5 attempts → 1 h ban. Whitelist developer IP via Cloudflare-known-IP list (regenerated weekly).
  4. CF Access for SSH (stretch). Cloudflare Tunnel + Access policy means SSH port can be firewalled off entirely. Phase 2 — not blocking this spec.

Implementation plan

Phase 1 — quick wins (0.5 d)

  1. Ansible role common:
    • PasswordAuthentication no in /etc/ssh/sshd_config
    • PermitRootLogin prohibit-password
    • UFW enabled with allows for 22, 80, 443; deny others
    • fail2ban with sshd jail
  2. Create claude-admin user on Hostinger (currently only on IONOS).
  3. PermitRootLogin no on Hostinger only after verifying claude-admin works.

Phase 2 — CF Access SSH (0.5 d, optional)

  1. Cloudflare Tunnel container on each VPS with ssh route.
  2. Cloudflare Access policy: only radieu@gmail.com can hit ssh.vps-h1.infra.zintegrowana.online.
  3. UFW: drop port 22 from public ingress.
  4. Document cloudflared access ssh --hostname ssh.vps-h1.infra... in runbook.

Acceptance criteria

  • ssh -o PasswordAuthentication=yes root@vps rejected with “Permission denied (publickey)” before key is tried
  • ufw status shows enabled on both VPSes
  • fail2ban-client status sshd lists at least the sshd jail
  • After 5 failed SSH attempts from a non-whitelisted IP, that IP is banned
  • Hostinger claude-admin can sudo docker ps
  • (Phase 2) nmap -p 22 217.154.82.162 from outside reports closed

Cost impact

0 €. Cloudflare Tunnel + Access are free for <50 users.

Back-out plan

Revert Ansible role; ansible-playbook --tags ssh reverts. UFW disable via ufw disable. CF Access policy can be deleted from dashboard.

Risk during back-out: If CF Access SSH is the only way in and we revert it wrong, we’re locked out. Mitigation: keep IONOS Cloud Console password handy; never deploy Phase 2 to both VPSes simultaneously.

Risks / open questions

  • Risk: Locking ourselves out during Phase 1. Mitigation: keep an active SSH session open during the change; verify new session works before closing the old one.
  • Q: Should we also rotate the existing SSH keys? A: Yes, but separate spec — out of scope for hardening config.

Bootstrap (post-merge deployment) — READ FULLY BEFORE STARTING

LOCKOUT RISK. If sshd config is bad and the daemon restarts, you cannot SSH back in. Mitigations are mandatory:

  1. Keep ONE active SSH session open in a separate terminal for the entire duration. Do NOT close it until you have verified a NEW session works.
  2. Always use --check --diff first. Read the diff line-by-line.
  3. Apply to vps-i1 (which has claude-admin) BEFORE vps-h1 (which doesn’t yet).

The Ansible role uses validate: 'sshd -t -f %s' on every lineinfile task — so /etc/ssh/sshd_config is never written with a config that sshd -t would reject. The restart sshd handler is still the lockout-risk step: a valid config can still log out a session that was depending on a setting we just disabled (e.g. password auth).

Step 1: vps-i1 dry-run

cd ansible
ansible-playbook playbooks/vps-i1.yml --tags ssh-hardening,firewall,fail2ban --check --diff

Read the diff. Pay special attention to:

  • The sshd_config proposed lines — does sshd -t validate them (Ansible will fail the task pre-write if not)?
  • The firewalld/UFW rules — does ssh (22/tcp) stay open?

Step 2: vps-i1 apply (with safety line)

  • Open a second SSH session as claude-admin@217.154.82.162 and leave it idle in another terminal. Confirm sudo whoami works.
  • In a third terminal, run:
ansible-playbook playbooks/vps-i1.yml --tags ssh-hardening,firewall,fail2ban --diff
  • Wait for completion. Without closing the idle safety session, open a new terminal and try a fresh SSH connection. If it works, you’re safe; close the safety session.
  • If the new connection fails: use the safety session to revert with cp /etc/ssh/sshd_config.<timestamp>.bak /etc/ssh/sshd_config && systemctl restart sshd (the role’s backup: true keeps a timestamped backup of every change).

Step 3: vps-i1 → flip root login off

Only after step 2 succeeds and claude-admin is confirmed working. Set ssh_permit_root_login: "no" in ansible/host_vars/vps-i1.yml. Re-run --check then --diff then verify a fresh claude-admin SSH still works. Root SSH will now refuse.

Step 4: vps-h1 — create claude-admin first

The claude-admin user does not exist yet on vps-h1. The claude-admin-user role provisions it. First flip claude_admin_user_enabled: true in ansible/host_vars/vps-h1.yml, then:

ansible-playbook playbooks/vps-h1.yml --tags claude-admin-user --check --diff
ansible-playbook playbooks/vps-h1.yml --tags claude-admin-user --diff

Verify: ssh claude-admin@72.60.32.61 works with the same key currently used for root.

Step 5: vps-h1 SSH hardening

Same flow as steps 1–2:

ansible-playbook playbooks/vps-h1.yml --tags ssh-hardening,firewall,fail2ban --check --diff
# safety session open in another terminal...
ansible-playbook playbooks/vps-h1.yml --tags ssh-hardening,firewall,fail2ban --diff

Step 6: vps-h1 → flip root login off

Same as step 3, but in ansible/host_vars/vps-h1.yml.

Step 7: verify fail2ban

On each VPS: sudo fail2ban-client status sshd. Should list at least the sshd jail.

On AlmaLinux, fail2ban requires EPEL. The role degrades gracefully if it’s not installable (the install task is failed_when: false; dependent config tasks skip). If fail2ban is missing on vps-i1, enable EPEL then re-run:

ssh root@217.154.82.162 'dnf install -y epel-release'
ansible-playbook playbooks/vps-i1.yml --tags fail2ban --diff

Step 8: verify UFW / firewalld

  • vps-h1: sudo ufw status — enabled, allow 22/80/443
  • vps-i1: sudo firewall-cmd --list-all — public zone with ssh/http/https

Step 9: deliberate brute-force test (optional)

From a different IP, try 6 failed ssh -o PreferredAuthentications=password root@<vps> attempts. The 6th should be refused immediately. Then sudo fail2ban-client status sshd should show the IP in Banned IP list. Remove the ban manually for cleanup: sudo fail2ban-client unban <ip>.

Phase 2 (CF Access SSH tunnel) — separate spec follow-up

Out of scope for #64. To track: open a new issue “spec 09 Phase 2 — CF Access SSH tunnel for both VPSes” once Phase 1 is verified stable in production.