MongoDB rs0 — Operations Guide
Replica set:
rs0Members: 3 (bms-2 PRIMARY · bms-3 SECONDARY · bms-4 ARBITER) Version: MongoDB 7.0 Data: Pinbox24 SaaS production + staging databases Last verified: 2026-06-17
CRITICAL — Backup Gap
No automated backup exists for MongoDB rs0. The last known manual dump was taken on bms-1 approximately 4+ months ago (
w3-2026-02-05,w4-2026-02-23/24in/rooton bms-1). This is the highest-priority infrastructure gap in the entire platform.Until automated backups are implemented, any data-loss event (disk failure, accidental drop, ransomware) will result in irreversible production data loss.
Action required: see 01-backups.md for the planned solution.
Architecture
┌─────────────────────────────────────────┐
│ rs0 replica set │
│ │
┌───────────┤ bms-2 — ns3087638 │
│ PRIMARY │ 145.239.133.104:27017 │
│ priority=1, votes=1 │
│ Ubuntu 24.04 · 32 GB RAM │
└───────────┤ │
│ │
┌───────────┤ bms-3 — ns3129867 │
│ SECONDARY │ 51.68.155.224:27017 │
│ priority=1, votes=1 │
│ Ubuntu 22.04 · 32 GB RAM │
└───────────┤ ⚠️ ~21.7 GB RAM used by MongoDB │
│ │
┌───────────┤ bms-4 — ns3101999 │
│ ARBITER │ 54.36.123.110:27017 │
│ priority=0, votes=1 │
│ Ubuntu 22.04 · no data stored │
└───────────┴─────────────────────────────────────────┘
Election quorum = 2 votes (out of 3)
bms-4 arbiter added 2026-06-10 (replaced dead arbiter at 51.83.132.99)
Key properties:
- bms-4 stores no data — it participates in elections only
- bms-2 and bms-3 both carry full data copies
- If bms-2 fails, bms-3 (+ bms-4 arbiter vote) has quorum to elect a new primary automatically
- If bms-4 fails, bms-2 + bms-3 still have 2 votes and remain operational
SSH Access
| Server | Role | SSH Command |
|---|---|---|
| bms-2 | PRIMARY | ssh ubuntu@145.239.133.104 |
| bms-3 | SECONDARY | ssh ubuntu@51.68.155.224 |
| bms-4 | ARBITER | ssh root@54.36.123.110 |
SSH key: C:\Users\konar\.ssh\id_ed25519 (local) — installed on all three servers.
Claude agent on bms-2 and bms-3: ssh claude-admin@<host> (uses VPS_SSH_PRIVATE_KEY).
Day-to-Day Operations
Connect to MongoDB shell
# On any data-bearing member (bms-2 or bms-3):
mongosh --port 27017
# With authentication:
mongosh --port 27017 -u <admin-user> -p --authenticationDatabase adminCredentials: Infisical CE → bms-servers project. Never commit passwords to git.
Check replica set status
# Quick status
rs.status()
# Compact member summary
db.adminCommand({ replSetGetStatus: 1 }).members.forEach(m =>
print(m.name, m.stateStr, "optime:", m.optimeDate))
# Check replication lag (on PRIMARY — shows how far behind SECONDARY is)
rs.printSecondaryReplicationInfo()Verify which node is PRIMARY
# From any member:
rs.isMaster()
# or
db.adminCommand({ hello: 1 })Check replica lag manually
# On PRIMARY — shows lastHeartbeatMessage and optime for each member
rs.status().members.forEach(m => {
print(m.name, m.stateStr, "lag:", m.optimeDate, "lastHeartbeat:", m.lastHeartbeatMessage)
})Force step-down of PRIMARY (planned maintenance)
# Connect to current PRIMARY (bms-2 normally), then:
rs.stepDown(60) # 60s: time to wait before stepping down (allows secondaries to catch up)After stepDown, bms-3 will be elected PRIMARY. Verify with rs.status().
Replica Set Member Management
Add a new full voting member
# On current PRIMARY:
rs.add({ host: "<ip>:27017", priority: 1, votes: 1 })Add an arbiter
# On current PRIMARY:
rs.addArb("<ip>:27017")Remove a member
# On current PRIMARY:
rs.remove("<ip>:27017")Re-add bms-4 arbiter after outage
# After bms-4 is restored and mongod is running:
# On PRIMARY (bms-2):
rs.addArb("54.36.123.110:27017")Failover Scenarios
Scenario 1: bms-2 (PRIMARY) fails
- bms-3 detects heartbeat loss after ~10s
- bms-3 requests election — bms-4 arbiter provides the second vote (quorum = 2)
- bms-3 becomes PRIMARY automatically
- Application connections fail over to bms-3 (connection string must list multiple hosts)
- Verify:
ssh ubuntu@51.68.155.224thenmongosh --eval "rs.status()" - When bms-2 comes back: it rejoins as SECONDARY automatically, syncs data
Scenario 2: bms-3 (SECONDARY) fails
- rs0 continues operating — bms-2 (PRIMARY) + bms-4 (arbiter) = 2 votes, quorum maintained
- No automatic promotion needed
- Monitor: replication will be behind once bms-3 recovers
- Verify:
rs.status()on bms-2 — bms-3 state showsUNREACHABLEorDOWN
Scenario 3: bms-4 (ARBITER) fails
- rs0 still has 2 votes: bms-2 (1) + bms-3 (1) = quorum maintained
- No immediate impact on reads or writes
- Risk: if bms-2 also fails while bms-4 is down, bms-3 cannot reach quorum (1 vote) and rs0 goes read-only
- Restore bms-4 promptly, then re-add arbiter:
rs.addArb("54.36.123.110:27017")
Scenario 4: Split network / no quorum
If fewer than 2 votes are reachable, rs0 goes read-only (all members become SECONDARY).
Recovery requires restoring network or reconfiguring rs with rs.reconfig(..., {force: true}).
Maintenance — Rolling Restart
Rolling restart allows zero-downtime MongoDB upgrades or config changes.
1. Restart SECONDARY (bms-3) first:
ssh ubuntu@51.68.155.224
sudo systemctl restart mongod
# Wait for state to return to SECONDARY: rs.status()
2. Step down PRIMARY (bms-2):
# On bms-2 primary:
mongosh --eval "rs.stepDown(60)"
# bms-3 becomes PRIMARY
3. Restart old primary (now SECONDARY, bms-2):
sudo systemctl restart mongod
# Verify it rejoins as SECONDARY
4. (Optional) Transfer primary back to bms-2:
# On bms-3 (current primary):
rs.stepDown(60)
KeyFile Rotation
The replica set uses a shared keyFile for inter-member authentication.
Keyfile is stored at /etc/mongodb-keyfile on each member.
Never commit the keyFile content to git. Store the value in Infisical
bms-serversproject.
Rolling rotation procedure:
- Generate new keyFile content (random bytes, base64-encoded)
- Store in Infisical
bms-servers→MONGO_REPLICA_SET_KEY - Place on bms-3 (SECONDARY), restart mongod, verify it rejoins
- Place on bms-4 (ARBITER), restart mongod, verify it rejoins
- Place on bms-2 (PRIMARY), step down, restart mongod, verify rejoins
- Update rotation log in
docs/secrets-rotation-log.md
Monitoring
| Exporter | Host | Port | Scrape target |
|---|---|---|---|
| node_exporter | bms-2 | 9100 | Host CPU/RAM/disk metrics |
| node_exporter | bms-3 | 9100 | Host CPU/RAM/disk metrics |
| mongodb_exporter | bms-2 | 9216 | MongoDB internals (opcounters, connections, replication lag) |
| mongodb_exporter | bms-3 | 9217 | MongoDB internals |
Both exporters scrape into Prometheus on vps-i1. Grafana dashboard: MongoDB rs0 Overview.
Alerts configured:
MongoDBReplicationLag > 30s→ warningMongoDBMemberDown→ criticalMongoDBNoPrimary→ critical (P1)
Password and Secret Management
| Secret | Location |
|---|---|
| MongoDB admin password | Infisical bms-servers project → MONGO_ADMIN_PASSWORD |
| MongoDB keyFile | Infisical bms-servers project → MONGO_REPLICA_SET_KEY |
| bms-2 root/ubuntu password | Infisical bms-servers project |
| bms-3 root/ubuntu password | Infisical bms-servers project |
Never hardcode or display these values. Fetch from Infisical at runtime.