Chaos — OBP Session Loss Mid-Operation¶
Failure injection runbook: simulate operator notebook WiFi drop mid-CLONER refresh + validate graceful error handling.
Goal¶
Confirm that an OBP session loss during a CLONER fn fetch (Internet egress) surfaces a clean error to the operator without corrupting bench state.
Prerequisites¶
- Test bench (not production)
- Operator notebook with OBP installed + authorized
- An active CLONER fn running (e.g. catalog refresh fn #4)
- Operator role with chaos permissions
Procedure¶
- Authorize an OBP session via dashboard
/admin/obp/authorize - Start a CLONER catalog refresh:
curl -X POST https://dashboard.test-bench/api/cloner/refresh-catalog -
Mid-fetch (within 10s), kill the OBP daemon on the notebook:
# macOS killall obp-daemon # Linux sudo systemctl stop obp -
Watch for bench reaction (target: < 5s detection):
watch -n 1 'curl -fsS https://dashboard.test-bench/api/cloner/status' -
Expected behavior:
- CLONER egress request returns clean error (no hang)
- Dashboard shows "OBP session lost" with retry option
- No partial catalog write to local cache
- Audit log captures the truncated session
- Alert fires (
OBPSessionLoss)
Rollback¶
-
Restart OBP daemon:
# macOS open /Applications/OBP\ Daemon.app # Linux sudo systemctl start obp -
Re-authorize via dashboard
- Retry the CLONER operation (idempotent)
Success criteria¶
- OBP loss detected within 5s
- Clean error returned to operator
- No partial state written
- Alert fired + audit logged
- Retry after restart works idempotently