Skip to content

Chaos — BGP Withdraw Mid-Test

Failure injection runbook: simulate a BGP session reset during a live test + validate the bench's behavior.

Goal

Confirm that an unexpected BGP withdraw during a running BGP saturation test doesn't crash the bench or leak partial routes.

Prerequisites

  • Test bench (not production)
  • BGP saturation test running (any route_count_mode)
  • Operator role with chaos permissions (team-lead or admin)

Procedure

  1. Note current test state:

    curl -fsS https://dashboard.test-bench/api/bgp-saturation/status
    

  2. Inject withdraw on peer 1 only:

    kubectl exec -it vyos-bgp-peer-1 -n web-agents -- \
      vtysh -c "configure terminal" -c "router bgp 64501" \
      -c "neighbor 200.130.0.13 shutdown"
    

  3. Watch for the bench's reaction (target: < 10s detection):

    watch -n 1 'curl -fsS https://dashboard.test-bench/api/bgp-saturation/status'
    

  4. Expected behavior:

  5. DUT FIB drops peer 1's prefixes within 30s (default hold timer)
  6. Bench dashboard shows peer 1 as down
  7. Alertmanager fires BGPSessionFlap
  8. PR's running session marked degraded; report annex shows event

Rollback

kubectl exec -it vyos-bgp-peer-1 -n web-agents -- \
  vtysh -c "configure terminal" -c "router bgp 64501" \
  -c "no neighbor 200.130.0.13 shutdown"

Peer should re-establish within 60s. Verify FIB count returns to baseline.

Success criteria

  • Withdraw detected within 30s
  • No bench crash / no leaked routes
  • Alert fired + audit logged
  • Recovery within 60s of un-shutdown