Chaos — BGP Withdraw Mid-Test¶
Failure injection runbook: simulate a BGP session reset during a live test + validate the bench's behavior.
Goal¶
Confirm that an unexpected BGP withdraw during a running BGP saturation test doesn't crash the bench or leak partial routes.
Prerequisites¶
- Test bench (not production)
- BGP saturation test running (any route_count_mode)
- Operator role with chaos permissions (
team-leadoradmin)
Procedure¶
-
Note current test state:
curl -fsS https://dashboard.test-bench/api/bgp-saturation/status -
Inject withdraw on peer 1 only:
kubectl exec -it vyos-bgp-peer-1 -n web-agents -- \ vtysh -c "configure terminal" -c "router bgp 64501" \ -c "neighbor 200.130.0.13 shutdown" -
Watch for the bench's reaction (target: < 10s detection):
watch -n 1 'curl -fsS https://dashboard.test-bench/api/bgp-saturation/status' -
Expected behavior:
- DUT FIB drops peer 1's prefixes within 30s (default hold timer)
- Bench dashboard shows peer 1 as
down - Alertmanager fires
BGPSessionFlap - PR's running session marked degraded; report annex shows event
Rollback¶
kubectl exec -it vyos-bgp-peer-1 -n web-agents -- \
vtysh -c "configure terminal" -c "router bgp 64501" \
-c "no neighbor 200.130.0.13 shutdown"
Peer should re-establish within 60s. Verify FIB count returns to baseline.
Success criteria¶
- Withdraw detected within 30s
- No bench crash / no leaked routes
- Alert fired + audit logged
- Recovery within 60s of un-shutdown