Skip to content

L2 BPDU isolation — operator methodology

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — L2 BPDU isolation per ADR 0009 is layer 1 of the 5-layer defense stack. Subsequent layers added post-2026-05: - L2: VLAN segregation per 3-plane classification (ARCHITECTURE.md) - L3: OOBI VXLAN immutable fabric (ADR 0019) - L4: RELAY.Art trust-zone bridge (ADR 0020) - L5: DOM production blocking — DDPB 7-layer chain (ADR 0014)

TL;DR

The TLSStress.Art test bench bridges multiple L2 segments. Without explicit configuration, our Linux bridges and the upstream Nexus 9000 trunk would propagate (or generate) Spanning-Tree Protocol (STP) BPDU frames. Two failure modes:

  1. A BPDU leaking onto a customer-facing uplink can corrupt the customer's STP topology, triggering a ~30s network outage at the customer site.
  2. A BPDU loop inside the lab itself saturates bridge CPU and kills the stack.

This document explains how the lab prevents both failures (3-layer defense) and how the run report attests the isolation (Annex G layer 5).

For the full design rationale, see ADR 0009.

The threat is bidirectional

Direction Source Why we care
Inbound (customer → lab) Customer's own switches A BPDU entering our bridges and propagating to other ports = topology corruption
Outbound (lab → customer) Nexus 9000 generates BPDU every 2s; NGFW transparent mode forwards them; Linux bridges with STP on generate them A BPDU leaving the lab = customer outage caused by us

The defense addresses both directions at every boundary.

3-layer defense

Layer 1 — Linux bridges

Every Linux bridge in the test bench (Multus, macvlan parents, Docker bridges in dev) MUST have:

  • stp_state = 0 on the bridge itself (we do not participate in STP)
  • bpdu_guard on on every port (drop incoming BPDUs)
  • root_block on on every port (refuse to accept root-bridge candidacy)

Applied automatically by the privileged DaemonSet k8s/dut/47-bpdu-guard-daemonset.yaml at node-bring-up time. The DaemonSet is idempotent and re-runs on bridge creation events.

Verify manually:

ip link show type bridge | grep stp_state
bridge -s link show

Layer 2 — Nexus 9000 trunk

Two port classes:

Class Configuration Rationale
Lab-internal (Nexus → Linux bridges) port type edge + bpduguard enable Detect inconsistencies if our internal config drifts (canary)
Customer-facing uplink (Nexus → customer switch) bpdufilter enable + bpduguard enable Bidirectional silent BPDU drop, AND err-disable if anything arrives

The bpdufilter enable command (Cisco NX-OS) is the critical one for the uplink: it stops Nexus from sending Hello on this port AND ignores any BPDU received on it. Different from bpduguard, which only catches incoming and err-disables.

Applied via scripts/nexus/01-apply-tuning.nxos. Verify:

N9K# show running-config interface ethernet 1/49
N9K# show spanning-tree summary

Layer 3 — VyOS pod

set interfaces bridge br0 stp false
set interfaces bridge br0 member interface eth1 bpdu-guard
set interfaces bridge br0 member interface eth1 root-guard

Applied via k8s/dut/45-vyos-isp-router.yaml ConfigMap. Verify inside the VyOS pod:

kubectl exec -n web-agents vyos-isp-router-0 -- vyos-show-config | grep -A5 bridge

Annex G — 5th layer attestation

The run report's Annex G ("air-gap attestation") attests 5 layers of isolation now (was 4 before v4.3.1):

  1. Physical (interfaces listed, no unexpected cabling)
  2. VyOS BGP (default route is blackhole, 0 external peers)
  3. NetworkPolicy (every persona ns has default-deny + country-allow)
  4. DNS NXDOMAIN (public resolvers return NXDOMAIN for persona FQDNs)
  5. L2 BPDU isolation (this layer)

Layer 5 is verified by scripts/airgap-l2-verify.sh, which:

  1. Audits every Linux bridge's stp_state and per-port bpdu_guard
  2. Audits every Nexus port's port type + bpduguard + bpdufilter
  3. Runs a 60-second tcpdump capture on any interface filtering for BPDU frames (ether proto 0x0026 or stp)
  4. Emits a JSON document consumed by dashboard/src/lib/preflight/airgap-checks.ts

Zero BPDU packets observed in 60 seconds = layer 5 PASS. Any non-zero = FAIL.

Operator workflow

After every cabling change

scripts/airgap-l2-verify.sh
# Expected output:
#   Layer 5 — L2 BPDU isolation: PASS
#   - Bridges audited: 7 (all stp_state=0, all ports bpdu_guard on)
#   - Nexus ports audited: 49 (47 internal, 2 uplink, all configured)
#   - 60s BPDU capture: 0 packets observed

If FAIL, the script prints exactly which check failed (which bridge port has STP on, which Nexus port lacks bpdufilter, or which BPDUs were captured and where). Fix and re-run.

Before every test run

The dashboard's preflight check runs the same script automatically. In production environments (AIRGAP_GATING=production, default), a layer-5 FAIL blocks the test run from starting. In dev environments (AIRGAP_GATING=observational), it logs a warning and proceeds.

Operator escalation

If you see frequent layer-5 failures with no obvious cause, the most common explanations are:

  1. A new lab cable was added without classifying the Nexus port (still port type edge on what is now an uplink — needs to be reclassified as uplink with bpdufilter).
  2. The customer has changed their switch and the new switch generates more BPDUs than before. Filter still works, but if bpduguard is active on the uplink you'll see err-disable events. Check Nexus show interface ethernet 1/49 status err-disabled.
  3. A new Linux bridge was created (e.g. a new persona namespace with macvlan) and the DaemonSet hasn't yet applied BPDU guard to it. Force a re-run: kubectl rollout restart daemonset/bpdu-guard -n web-agents.

When to bypass (emergency only)

If you absolutely must run a test before BPDU isolation is verified (disaster scenario, customer demo in 5 minutes), set AIRGAP_GATING=observational in the operator UI before launching the plan. The test will run and the Annex G report will mark layer 5 as SKIPPED with a red flag. Do not use this in front of customers.

References

  • ADR 0009 — L2 BPDU isolation (docs/ADR/0009-l2-bpdu-isolation.md)
  • Annex G template — dashboard/templates/annex-g-airgap-attestation.md.tmpl
  • Verifier script — scripts/airgap-l2-verify.sh
  • DaemonSet — k8s/dut/47-bpdu-guard-daemonset.yaml
  • Nexus tuning — scripts/nexus/01-apply-tuning.nxos
  • VyOS config — k8s/dut/45-vyos-isp-router.yaml