ADR 0011 — Topology axes: deployment nodes × L2 fabric × DUT type¶
- Status: Accepted
- Date: 2026-05-08
- Deciders: TLSStress.Art project
- Targets: v4.5 (this work — formalizes axes + makes Nexus 9000 optional)
Context¶
Until v4.4, the test bench's deployment configuration mixed three independent concepts into a single "deployment mode" enum:
single-node(1 UCS, all roles colocated)dual-node(2 UCS, agents vs personas split)tri-node(3 UCS, Playwright vs k6 vs personas split)multi-node(4+ UCS, full role separation)
Implicit assumptions baked into that enum:
- L2 fabric is always Cisco Nexus 9000. Every deployment script,
preflight check, runbook, and report annex assumes the Nexus is
present.
scripts/nexus/01-apply-tuning.nxosis referenced fromscripts/k8s-dut-up.sh. Annex G layer 5 attests Nexus BPDU isolation. The dashboard preflight checkairgapL2BpduIsolationqueries Nexus state via SSH. - DUT is always a Cisco FTD. v4.5 introduces Inspection Profile methodology that re-uses this assumption (FMC + FDM apply scripts) while v4.6 will add Cisco Catalyst Secure Router (Catalyst SD-WAN Manager, vManage REST API).
Two real-world deployment shapes break those assumptions:
-
Single-node lab without external switch. The smallest production deployment — 1 UCS, 1 NGFW, no Nexus. The Nexus does not exist; any layer 5 BPDU attestation that requires it must skip cleanly. Operator wires the UCS NICs directly to the NGFW interfaces. This is the user's actual current setup as of 2026-05-08.
-
Multi-node deployments where the operator owns a different L2 switch. Nexus is one option among many; a dual-node deployment in a customer rack might use a Catalyst 9300, Arista 7050, or a small-form-factor switch. The bench should not require a Nexus — only require the network properties Nexus provides (VLAN trunking, BPDU isolation at the boundary, MACsec on uplinks, etc.) which can be obtained from any modern switch.
Coupling all of this into a single deployment-mode enum makes the
configuration brittle: every new permutation invents a new enum value,
the matrix of supported combinations is implicit, and the Nexus-specific
optimizations (scripts/nexus/01-apply-tuning.nxos) cannot be turned off
without forking the install script.
Decision¶
Decompose deployment configuration into three independent axes:
| Axis | Values | What it controls |
|---|---|---|
deployment_nodes |
single · dual · tri · multi |
Number of UCS hosts and how roles are distributed (existing) |
l2_fabric |
nexus · none · (future: arista, catalyst, generic) |
What L2 switch (if any) carries inter-host traffic |
dut_type |
cisco-ftd · cisco-secure-router · (future: palo-alto, fortinet, ...) |
Which security DUT is under test; gates which apply/verify scripts run |
Each axis is independent — any value of any axis composes with any value of the others. The matrix is explicit (see §"Permitted combinations" below) so unsupported permutations fail loudly at config-load time rather than silently producing broken state.
Single source of truth: platform/topology.yaml¶
# platform/topology.yaml — TLSStress.Art deployment topology declaration.
# Single source of truth for the three axes; consumed at every layer
# (install scripts, dashboard preflight, report cover, runbooks).
version: 1
deployment_nodes: single # single | dual | tri | multi
l2_fabric: none # nexus | none (future: arista | catalyst | generic)
dut_type: cisco-ftd # cisco-ftd | cisco-secure-router (future: more)
# Per-axis details (only the relevant fields are read for the chosen value)
single_node:
cabling: multi-nic-trunk # multi-nic-trunk | single-nic-trunk
# When `multi-nic-trunk`: each physical NIC on the UCS carries an
# 802.1q-tagged trunk to a different NGFW interface. This distributes
# bandwidth across NICs and isolates traffic classes (agents vs
# personas) onto different physical links. Default and recommended.
l2_fabric_nexus:
# Only consulted when l2_fabric == nexus.
apply_tuning_script: scripts/nexus/01-apply-tuning.nxos
verify_script: scripts/nexus/02-verify.nxos
bpdu_classification: scripts/nexus/PORT_CLASSIFICATION.md
dut_cisco_ftd:
# Only consulted when dut_type == cisco-ftd. (FMC vs FDM credentials
# belong here, populated by the operator at install time.)
manager: fmc # fmc | fdm
dut_cisco_secure_router:
# Only consulted when dut_type == cisco-secure-router.
manager: catalyst-sd-wan-manager # always vManage; field is reserved
The file is loaded at dashboard startup by
dashboard/src/lib/topology/loader.ts, validated with Zod, and
materialized into a Kubernetes ConfigMap (topology in
web-agents namespace) so cluster-side scripts and DaemonSets read
the same values. Operator never edits the ConfigMap directly —
edit topology.yaml and re-run install.
Permitted combinations¶
Any (deployment_nodes, l2_fabric, dut_type) triple is permitted
unless explicitly forbidden below. The dashboard refuses to start with
an unsupported combination and prints which axis is the problem.
deployment_nodes |
l2_fabric |
OK? | Reason if no |
|---|---|---|---|
single |
none |
✓ | Default. UCS NICs cabled directly to NGFW. |
single |
nexus |
⚠ | Permitted but unusual — operator chose to put a Nexus in front of a single-host lab. The bench will configure Nexus tuning + BPDU isolation as if it were a multi-node deployment. No technical block; flag for operator review. |
dual / tri / multi |
nexus |
✓ | Standard production shape. |
dual / tri / multi |
none |
✓ | Operator wires inter-host traffic via direct cabling or via a switch the bench does not configure. Bench skips Nexus tuning + Nexus-side BPDU attestation; Linux-bridge BPDU attestation continues to run. |
dut_type is independent of all of the above.
Gating behavior per layer¶
The single-node-without-Nexus case (the user's current setup) is the
exemplar. For every layer that previously assumed Nexus, this ADR
defines what happens when l2_fabric == none:
scripts/k8s-install.sh— skips the Nexus tuning step. Tuning is announced in the install log as "skipped — l2_fabric=none".scripts/airgap-l2-verify.sh— accepts--no-nexusflag (also honors envL2_FABRIC=noneso the install harness can set it once and forget). Layer 5 of the report's airgap attestation getsnexus.skipped: truewithreason: "no L2 fabric in this topology". Linux-bridge BPDU attestation still runs (the bridges are inside the UCS and exist regardless of external switch).dashboard/src/lib/preflight/airgap-checks.ts—airgapL2BpduIsolationreceivesl2FabricPresent: booleanvia the AirgapCheckContext. When false, the Nexus subcheck is not even dispatched (no SSH attempt, no error logged); the result keepsnexus.skipped: truewith the by-design reason.dashboard/templates/annex-g-airgap-attestation.md.tmpl— already has a{{#nexus.skipped}}Mustache section. The reason text gets a tonally-neutral phrasing when the cause is "by design" vs the alarm-tone phrasing for "credentials missing" or "SSH timed out".- Run report cover page — adds a single line "Topology: single-node · L2 fabric: none" so the customer reading the report knows what shape was attested.
What does NOT change¶
- Linux-bridge BPDU isolation (
k8s/dut/48-bpdu-guard-daemonset.yaml) continues to run unconditionally on every DUT data-plane node. The bridges are inside the UCS regardless of external switch presence. - The 3-layer L2 BPDU defense methodology (ADR 0009) stays valid; layer 2 (Nexus) just becomes conditional — when absent, the defense collapses to 2 layers (Linux bridges + live capture), which is the correct posture for a deployment without an external switch to defend against.
- The 5 air-gap layers (ADR 0007 §G + ADR 0009) are unchanged in number — layer 5 just attests fewer sub-sources when fabric is absent.
Single-node cabling: multi-nic-trunk¶
The user's current setup uses multiple physical NICs on the UCS,
each carrying an 802.1q-tagged trunk to a different NGFW interface.
This is the recommended pattern and the default when
single_node.cabling: multi-nic-trunk is not overridden:
- Each NIC is a separate physical link:
enp1s0,enp1s1, etc. - Each NIC connects to a different NGFW interface (subscriber-side, server-side, management, etc.).
- Each NIC carries the appropriate VLANs as 802.1q sub-interfaces
(
enp1s0.20,enp1s0.30, etc.). - Linux bridges + Multus NADs handle VLAN demux internally; from the pod's POV the topology looks identical to the multi-node case with a Nexus.
Why multi-NIC-trunk over single-NIC-trunk:
- Bandwidth distribution: a single 10G link to the NGFW would saturate before reaching the agent fleet's combined throughput. Multiple 10G NICs give linear scaling.
- Traffic-class isolation: agents-class traffic and persona-class traffic on different NICs do not interfere. A burst of TLS handshakes from agents does not back up persona-bound traffic.
- Failure isolation: a NIC failure takes down one traffic class rather than the entire bench.
The single-nic-trunk alternative is supported (one NIC carrying every
VLAN) for environments where physical NICs are scarce, but is not the
default.
Consequences¶
Positive¶
- Single-node lab without Nexus becomes a first-class deployment, matching the user's actual setup. No more silent assumption that Nexus is present.
- The Nexus is no longer a hard prerequisite. Operators with
Catalyst 9300, Arista, etc. can run the bench by setting
l2_fabric: none(we don't configure their switch — they configure it manually) and the bench attests what it can. - The configuration matrix is explicit. New deployment shapes (e.g. cloud-native single-node on AWS) are just new axis values, not new enum constants spread across N scripts.
- DUT type axis decouples NGFW vendor from topology. v4.6 (Cisco
SR), v4.7 (PAN), v4.8 (Fortinet) all add
dut_typevalues without needing to touch any topology / fabric code paths.
Negative¶
- One more configuration file.
topology.yamljoinspersonas.yaml,platform/test-plans/catalog.yaml, etc. as repo- versioned config the operator must understand. Mitigated by making the defaults sensible (single-node + l2_fabric=none + dut_type=cisco-ftd is the dev-setup default). - More code paths to test. Every place that touches L2 attestation now has two branches (with-Nexus / without-Nexus). Mitigated by funneling all "is Nexus present?" decisions through one helper reading the loaded topology, rather than scattered if-checks.
Neutral¶
- Existing single-node operators (the user) get the right behavior
out-of-box without any config change —
l2_fabric: noneis the inferred default whendeployment_nodes: single.
Alternatives considered¶
Alternative A — Keep the single deployment-mode enum, add single-node-no-nexus as a 5th value¶
Reject. Combinatorial explosion the moment a 6th case appears (e.g. "dual-node-arista" or "tri-node-no-nexus"). Three independent axes multiply linearly; one mega-enum multiplies exponentially.
Alternative B — Detect L2 fabric presence at runtime (probe SSH to the configured Nexus address)¶
Reject. Adds a probe step at every install / preflight, and is fundamentally guessing — a Nexus that is up but unreachable from this moment looks identical to "no Nexus exists". The operator's intent should be declarative, not inferred.
Alternative C — Make L2 fabric optional at the script level, leave dashboard / report templates assuming Nexus¶
Reject. The annex G template would print misleading "Nexus check failed" entries on every report from a single-node deployment. Operator-facing surface (cover page, dashboard cards) needs to reflect the topology the run actually attested — anything else damages report defensibility.
Alternative D — Drop the dut_type axis (treat each release as a fork)¶
Reject. v4.6 needs to coexist with v4.5 for customers that have FTD in production today and Cisco SR coming next year. A single bench running both is the realistic scenario; two parallel forks is operationally untenable.
Implementation references¶
platform/topology.yaml— single source of truth, this PRdashboard/src/lib/topology/loader.ts— Zod schema + loader + cache, this PRdashboard/src/lib/preflight/airgap-checks.ts—airgapL2BpduIsolationacceptsl2FabricPresentvia context, this PRdashboard/src/lib/preflight/environmental.ts— composer reads topology cache and forwardsl2FabricPresent, this PRdashboard/templates/annex-g-airgap-attestation.md.tmpl— already hasnexus.skippedMustache section, no template change needed; the preflight check supplies a tonally-neutral reason stringscripts/k8s-install.sh— gates Nexus tuning by topology, deferred to a follow-up PR (operator-facing scripts touch real machines, warrant a separate review)scripts/airgap-l2-verify.sh— accepts--no-nexusflag, deferred to the same follow-up PRdocs/L2_ISOLATION.md(+.pt-BR.md,.es.md) — adds "When no L2 fabric exists" section, deferred to the same follow-up PRCLAUDE.md— deployment table + stack table, this PR
References¶
- ADR 0007 — Public-Internet Realism (5-layer airgap; layer 2 is the one this ADR makes conditional)
- ADR 0009 — L2 BPDU isolation (3-layer defense; layer 2 of that defense becomes conditional)
- ADR 0010 — Inspection Profile (the per-vendor
dut_typeaxis is exercised here for the first time, withcisco-ftdas v4.5 MVP andcisco-secure-routeras v4.6) - IEEE 802.1Q — VLAN trunking (the multi-NIC-trunk single-node pattern)