Restore from Backup — Standard Operation¶
Runbook for restoring a single namespace or the full bench from backup. Pairs with ADR 0017.
Goal¶
Restore bench state to a previous known-good point. Two scenarios:
- Partial: single namespace (e.g. web-agents, validator)
- Full: entire bench (DR-grade — use the DR drill runbook)
Prerequisites¶
- Admin role
- Recent backup available (
velero backup getshows recentCompleted) - Maintenance window appropriate to restore scope
Partial restore (single namespace)¶
Step 1 — List available backups for the namespace¶
velero backup get --output json | \
jq '.items[] | select(.spec.includedNamespaces | contains(["TARGET_NS"])) | {name: .metadata.name, started: .status.startTimestamp}' | \
head -10
Step 2 — Pick a backup + preview¶
velero backup describe BACKUP_NAME --details | head -50
Step 3 — Stop active workloads in the target namespace¶
kubectl scale --all deployment -n TARGET_NS --replicas=0
Step 4 — Restore¶
velero restore create restore-$(date +%Y%m%d-%H%M) \
--from-backup BACKUP_NAME \
--include-namespaces TARGET_NS \
--wait
Step 5 — Verify + scale back up¶
kubectl get pods -n TARGET_NS
kubectl scale --all deployment -n TARGET_NS --replicas=1
Full bench restore¶
See DR drill runbook — uses the same procedure but in a controlled maintenance window with operator notification + DR drill audit trail.
PostgreSQL point-in-time recovery (PITR)¶
For database-specific restore (e.g. operator dropped a table):
kubectl exec -it postgres-0 -n platform -- bash -c "
pg_ctl stop -m fast
rm -rf /var/lib/postgresql/data/*
wal-g backup-fetch /var/lib/postgresql/data \$(wal-g backup-list | tail -1 | awk '{print \$1}')
cat > /var/lib/postgresql/data/recovery.signal <<EOF
recovery_target_time = '2026-05-11 12:00:00'
recovery_target_action = 'pause'
EOF
pg_ctl start
"
After verifying the data state, promote:
kubectl exec -it postgres-0 -n platform -- pg_ctl promote
Rollback¶
If the restore turns out incorrect: 1. Take a snapshot of the post-restore state (for forensics) 2. Restore from a different backup (an earlier point)
Never kubectl apply over a partially-restored namespace —
namespaces should be wiped between restore attempts to avoid
state mixing.
Success criteria¶
- Target namespace pods running
- Smoke tests pass
- Audit log captures restore event
- No alert flap