L2 BPDU isolation — operator methodology¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — L2 BPDU isolation per ADR 0009 is layer 1 of the 5-layer defense stack. Subsequent layers added post-2026-05: - L2: VLAN segregation per 3-plane classification (ARCHITECTURE.md) - L3: OOBI VXLAN immutable fabric (ADR 0019) - L4: RELAY.Art trust-zone bridge (ADR 0020) - L5: DOM production blocking — DDPB 7-layer chain (ADR 0014)
TL;DR¶
The TLSStress.Art test bench bridges multiple L2 segments. Without explicit configuration, our Linux bridges and the upstream Nexus 9000 trunk would propagate (or generate) Spanning-Tree Protocol (STP) BPDU frames. Two failure modes:
- A BPDU leaking onto a customer-facing uplink can corrupt the customer's STP topology, triggering a ~30s network outage at the customer site.
- A BPDU loop inside the lab itself saturates bridge CPU and kills the stack.
This document explains how the lab prevents both failures (3-layer defense) and how the run report attests the isolation (Annex G layer 5).
For the full design rationale, see ADR 0009.
The threat is bidirectional¶
| Direction | Source | Why we care |
|---|---|---|
| Inbound (customer → lab) | Customer's own switches | A BPDU entering our bridges and propagating to other ports = topology corruption |
| Outbound (lab → customer) | Nexus 9000 generates BPDU every 2s; NGFW transparent mode forwards them; Linux bridges with STP on generate them | A BPDU leaving the lab = customer outage caused by us |
The defense addresses both directions at every boundary.
3-layer defense¶
Layer 1 — Linux bridges¶
Every Linux bridge in the test bench (Multus, macvlan parents, Docker bridges in dev) MUST have:
stp_state = 0on the bridge itself (we do not participate in STP)bpdu_guard onon every port (drop incoming BPDUs)root_block onon every port (refuse to accept root-bridge candidacy)
Applied automatically by the privileged DaemonSet
k8s/dut/47-bpdu-guard-daemonset.yaml at node-bring-up time. The
DaemonSet is idempotent and re-runs on bridge creation events.
Verify manually:
ip link show type bridge | grep stp_state
bridge -s link show
Layer 2 — Nexus 9000 trunk¶
Two port classes:
| Class | Configuration | Rationale |
|---|---|---|
| Lab-internal (Nexus → Linux bridges) | port type edge + bpduguard enable |
Detect inconsistencies if our internal config drifts (canary) |
| Customer-facing uplink (Nexus → customer switch) | bpdufilter enable + bpduguard enable |
Bidirectional silent BPDU drop, AND err-disable if anything arrives |
The bpdufilter enable command (Cisco NX-OS) is the critical one for
the uplink: it stops Nexus from sending Hello on this port AND ignores
any BPDU received on it. Different from bpduguard, which only catches
incoming and err-disables.
Applied via scripts/nexus/01-apply-tuning.nxos. Verify:
N9K# show running-config interface ethernet 1/49
N9K# show spanning-tree summary
Layer 3 — VyOS pod¶
set interfaces bridge br0 stp false
set interfaces bridge br0 member interface eth1 bpdu-guard
set interfaces bridge br0 member interface eth1 root-guard
Applied via k8s/dut/45-vyos-isp-router.yaml ConfigMap. Verify inside
the VyOS pod:
kubectl exec -n web-agents vyos-isp-router-0 -- vyos-show-config | grep -A5 bridge
Annex G — 5th layer attestation¶
The run report's Annex G ("air-gap attestation") attests 5 layers of isolation now (was 4 before v4.3.1):
- Physical (interfaces listed, no unexpected cabling)
- VyOS BGP (default route is blackhole, 0 external peers)
- NetworkPolicy (every persona ns has default-deny + country-allow)
- DNS NXDOMAIN (public resolvers return NXDOMAIN for persona FQDNs)
- L2 BPDU isolation (this layer)
Layer 5 is verified by scripts/airgap-l2-verify.sh, which:
- Audits every Linux bridge's
stp_stateand per-portbpdu_guard - Audits every Nexus port's
port type+bpduguard+bpdufilter - Runs a 60-second
tcpdumpcapture onanyinterface filtering for BPDU frames (ether proto 0x0026 or stp) - Emits a JSON document consumed by
dashboard/src/lib/preflight/airgap-checks.ts
Zero BPDU packets observed in 60 seconds = layer 5 PASS. Any non-zero = FAIL.
Operator workflow¶
After every cabling change¶
scripts/airgap-l2-verify.sh
# Expected output:
# Layer 5 — L2 BPDU isolation: PASS
# - Bridges audited: 7 (all stp_state=0, all ports bpdu_guard on)
# - Nexus ports audited: 49 (47 internal, 2 uplink, all configured)
# - 60s BPDU capture: 0 packets observed
If FAIL, the script prints exactly which check failed (which bridge port has STP on, which Nexus port lacks bpdufilter, or which BPDUs were captured and where). Fix and re-run.
Before every test run¶
The dashboard's preflight check runs the same script automatically. In
production environments (AIRGAP_GATING=production, default), a layer-5
FAIL blocks the test run from starting. In dev environments
(AIRGAP_GATING=observational), it logs a warning and proceeds.
Operator escalation¶
If you see frequent layer-5 failures with no obvious cause, the most common explanations are:
- A new lab cable was added without classifying the Nexus port
(still
port type edgeon what is now an uplink — needs to be reclassified as uplink withbpdufilter). - The customer has changed their switch and the new switch generates
more BPDUs than before. Filter still works, but if
bpduguardis active on the uplink you'll see err-disable events. Check Nexusshow interface ethernet 1/49 status err-disabled. - A new Linux bridge was created (e.g. a new persona namespace
with macvlan) and the DaemonSet hasn't yet applied BPDU guard to
it. Force a re-run:
kubectl rollout restart daemonset/bpdu-guard -n web-agents.
When to bypass (emergency only)¶
If you absolutely must run a test before BPDU isolation is verified
(disaster scenario, customer demo in 5 minutes), set
AIRGAP_GATING=observational in the operator UI before launching the
plan. The test will run and the Annex G report will mark layer 5 as
SKIPPED with a red flag. Do not use this in front of customers.
References¶
- ADR 0009 — L2 BPDU isolation (
docs/ADR/0009-l2-bpdu-isolation.md) - Annex G template —
dashboard/templates/annex-g-airgap-attestation.md.tmpl - Verifier script —
scripts/airgap-l2-verify.sh - DaemonSet —
k8s/dut/47-bpdu-guard-daemonset.yaml - Nexus tuning —
scripts/nexus/01-apply-tuning.nxos - VyOS config —
k8s/dut/45-vyos-isp-router.yaml