ADR 0012 — BGP Routing Table Saturation Test¶
- Status: Proposed
- Date: 2026-05-08
- Deciders: TLSStress.Art project
- Targets: v4.8 (this ADR is the design lock; implementation follows in PR-2..PR-12 of this feature, ~5-6 week sprint after the dual-stack + real-Internet-snapshot + capacity-aware refinements landed 2026-05-08)
IP migration v4.3 carve-out: this ADR documents a deliberate design choice to use RFC 5737 TEST-NET-1 (
192.0.2.252/30) for the BGP peer link itself (NOT for user-facing explanations). Perproject_ip_addressing_v43, the testbed otherwise uses real RNP public IPs (200.130.0.0/30). Future ADRs may revise the peer-link choice; this one preserves the historical audit trail.
Context¶
The bench's existing test types (Branch Office, Inspection Profile, SDWAN and Cloud On-Ramp, etc.) all stress the data plane: TLS inspection throughput, latency, concurrent sessions, application-mix workload. Routing has been static throughout: the NGFW (DUT) sees a small handful of pre-defined routes (one per persona VLAN, plus the agent VLANs).
Real-world enterprise NGFWs sit at the boundary of networks that consume sizeable routing tables:
- A branch office typically learns 50–500 routes via dynamic routing from the WAN edge (BGP, OSPF, or both).
- A mid-size enterprise edge carries 10,000–50,000 routes — partial Internet view via BGP from one or more upstream ISPs.
- A service-provider edge or datacenter perimeter can carry the full Internet table — about 950,000 IPv4 routes as of early 2026, growing ~40K/year.
The NGFW's behaviour under that route load is rarely characterised in vendor datasheets. Vendors publish FIB and RIB capacity numbers (usually generous round numbers), but they do not publish:
- Convergence time as a function of route count
- CPU + memory utilisation during the advertise burst
- Behaviour at the limit (graceful degradation with a
maximum- prefixpolicy? slowdown? hard crash? silent drop of new routes? route flap?) - Stability under churn (advertise → withdraw → re-advertise)
- Interaction with TLS inspection (does the inspection pipeline stall or drop sessions while the routing daemon is processing 10K updates/s?)
Operators procuring NGFWs for high-route-count edges are flying blind on these dimensions. A test methodology that can demonstrably populate the FIB to a target depth, measure convergence time, and expose vendor-specific failure modes would be a strong differentiator versus throughput-only competitors.
Decision¶
Add a dedicated control-plane stress stack to the bench that:
- Stands up a new VyOS appliance ("Router Peer BGP") on a new VLAN 40, sharing only an L2 BGP-peering link with the NGFW.
- Establishes dual-stack eBGP (IPv4 + IPv6 simultaneously) with the NGFW using ASN documentation values (RFC 5398).
- Advertises one of:
- A real Internet routing-table snapshot sourced from RouteViews / RIPE RIS (default mode — ~950K IPv4 prefixes + ~190K IPv6 prefixes as of early 2026), OR
- N synthesised prefixes where
N ∈ {100, 1K, 10K, 100K, 1M}(fallback mode for airgap / unreachable setups), OR - A capacity-fitted subset auto-sized from the NGFW catalog's
routing_table_capacity_*fields so a small SKU (e.g. FortiGate 40F) gets a meaningful test, not a futile 1M-route attempt. - Carries zero data-plane traffic on this link — it exists purely to populate the NGFW RIB and FIB.
- Captures convergence time, NGFW resource usage, and final RIB/FIB depth into a new report annex (Annex L — Routing Table Stress).
The new stack is opt-in via topology.yaml (bgp_stress.enabled:
true), unrelated to and orthogonal from every other test type. It
can run standalone, or layered on top of a Branch Office /
Inspection Profile / SDWAN-OnRamp test for full-plane stress.
Operator UX — three-axis selection¶
Per the user's refinement on 2026-05-08, the dashboard test setup exposes the BGP stress test through three independent axes:
| Axis | Options |
|---|---|
| BGP enabled? | yes / no (default no) |
| AFI stack? | IPv4-only / IPv6-only / dual-stack (default dual-stack) |
| Routes to inject? | 100 / 1K / 10K / 100K / 1M / real Internet snapshot / fit to DUT capacity |
The dashboard pre-validates the (SKU, route count) combination
against the DUT catalog's routing_table_capacity_* fields and
warns/blocks any combination that is known to be infeasible
(e.g. selecting "real Internet snapshot" with a FortiGate 40F as
DUT triggers a hard warning + suggested fitted alternative).
Static-route precedence — load-bearing safety¶
The bench's persona forwarding plane uses static routes to reach the persona VLANs (10.1.x.0/27 + 10.2.x.0/27, configured directly on the NGFW). When the BGP peer advertises an overlapping prefix — intentionally or by accident from a real Internet snapshot — the static route always wins by route preference (administrative distance 1 vs eBGP's 20 in Cisco terms, equivalent semantics in every other vendor):
Source Default admin distance
────────────────────────────── ────────────────────────
Connected interface 0
Static route 1 ← persona forwarding wins
EBGP 20 ← BGP-injected lose
OSPF 110
RIP 120
This means persona forwarding is never impacted by BGP route flap, route load, or even a deliberately overlapping Internet prefix. Operators should not second-guess the test when overlap appears in the Annex L report.
This safety is documented in three operator-facing surfaces:
- Dashboard tooltip on the BGP stress section
- Annex L preface (every report includes the rule)
- Help Center entry "BGP stress and static-route precedence"
Value narrative — control plane vs data plane¶
This feature exists because the bottleneck of an NGFW is often control plane (routing memory) before it is data plane (throughput or TLS inspection capacity). Procurement teams sizing NGFWs by throughput alone routinely ship under-provisioned devices that crash, drop sessions, or refuse new routes the moment a real Internet table arrives via BGP. A FortiGate 40F has perfectly adequate throughput for a 100-user branch but 4 GB of total RAM — trying to carry the full Internet table on it is futile.
The marketing narrative for this feature: throughput-only benchmarks (Spirent, Ixia, Keysight) test how fast it forwards but not when it falls over. Customers procuring NGFWs for high-route-count edges (datacenter, transit, multi-homed enterprise edge) deserve defensible evidence of routing capacity headroom, not just inspection bandwidth. Use this exact framing in marketing site, sales decks, customer-facing reports, and the Help Center primer for this feature.
Architecture¶
Topology (delta vs current)¶
Existing data-plane stack (unchanged):
Agents ─→ NGFW ─→ Personas (synthetic + cloned)
(static routes — admin distance 1, always win)
New control-plane sidecar (added by bgp_stress.enabled):
┌──────────────────────┐ VLAN 40 (NEW) ┌────────────┐
│ Router Peer BGP │═══ DUAL-STACK BGP peering ══│ NGFW DUT │
│ (VyOS rolling, FRR) │ ⚡ no data traffic ⚡ │ │
│ AS 64496 │ IPv4: 192.0.2.252/30 │ AS 64497 │
│ 192.0.2.254/30 │ IPv6: 2001:db8:0:40::/126 │ │
│ 2001:db8:0:40::2/126│ AFIs: v4 + v6 unicast │ │
└──────────────────────┘ eBGP MP-BGP capability └────────────┘
│ advertises EITHER: │
│ • real Internet snapshot │
│ • synthetic prefixes │
│ • capacity-fitted subset │
│ eth0 (mgmt OOBI — Prometheus scrape) │
▼ ▼
k3s control plane k3s data plane
(unchanged from base topology)
VLAN 40 — dual-stack peering link¶
| Field | Value | Rationale |
|---|---|---|
| VLAN ID | 40 | Reserved gap between VLAN 30 (K6 agents) and VLAN 99 (SNMP mgmt). Easy to remember; no clash with existing 20/30/99/101-120/200-209. |
| IPv4 subnet | 192.0.2.252/30 | Last /30 of RFC 5737 TEST-NET-1. Documentation range, IANA-reserved, never appears in real Internet routing. |
| IPv4 NGFW | 192.0.2.253 | Lower /30 host |
| IPv4 VyOS | 192.0.2.254 | Upper /30 host |
| IPv6 subnet | 2001:db8:0:40::/126 | RFC 3849 documentation prefix; :0:40::/126 mirrors VLAN 40 in the network ID for human readability. |
| IPv6 NGFW | 2001:db8:0:40::1 | Lower /126 host |
| IPv6 VyOS | 2001:db8:0:40::2 | Upper /126 host |
| MTU | 1500 | Control plane only — no jumbo benefit |
| BPDU | guard + filter (per ADR 0009) | Same isolation rules as every other VLAN |
The user proposed /32 for the peer link. We choose /30 (IPv4) and
/126 (IPv6) instead for broadest vendor compatibility — every NGFW
we plan to support handles /30 and /126 natively, while /32 and
/128 interface configurations require special "unnumbered" or "host
route" handling that varies by vendor. Operators who insist on the
narrower variants can override the topology.yaml values, but /30
and /126 are the documented defaults.
Single TCP session, multi-AFI: VyOS / FRR establishes one BGP TCP
session per AFI by default (one over IPv4 transport carrying IPv4
unicast NLRI; one over IPv6 transport carrying IPv6 unicast NLRI).
This is the most vendor-portable behaviour. RFC 4760 MP-BGP allows
carrying both AFIs over a single session; bgp_stress.session_mode
in topology.yaml exposes the choice (per-afi default, multi-afi
opt-in for vendors that support it cleanly).
AS numbers¶
| Side | ASN | Source |
|---|---|---|
| VyOS Router Peer BGP | AS 64496 | RFC 5398 documentation (16-bit) |
| NGFW DUT | AS 64497 | RFC 5398 documentation (16-bit) |
eBGP (between two different AS) was chosen over iBGP so the BGP best-path algorithm exercises full AS_PATH processing, which is more representative of real edge deployments. iBGP variant remains available through topology.yaml override for operators who want to characterise route-reflector workloads.
Route advertisement modes¶
The Router Peer BGP can advertise prefixes in three modes, chosen by the operator at test setup:
Mode 1 — Real Internet snapshot (default when reachable)¶
Source the routing table from a public BGP collector and replay it. This is the most realistic mode — the NGFW sees what an actual Internet edge router sees:
| Source | What it provides | Refresh cadence |
|---|---|---|
| University of Oregon RouteViews (archive.routeviews.org) | Daily MRT dumps from 30+ collectors; IPv4 + IPv6 | Daily 00:00 / 06:00 / 12:00 / 18:00 UTC |
| RIPE RIS (data.ris.ripe.net) | Real-time + archived; IPv4 + IPv6 | Every 8 hours archived; live BMP feed |
| bgp.tools community feed | Curated full-table snapshots | On request |
Approximate sizes (early 2026):
- IPv4 full table: ~950,000 prefixes
- IPv6 full table: ~190,000 prefixes
- Combined: ~1.14 M prefixes
The bench ships with a bundled snapshot (timestamped, e.g.
2026-04-01_routeviews2_full_v4.mrt, _v6.mrt) so airgap labs work
out of the box; operators with Internet connectivity from the
control plane can opt into a fresh weekly download. Snapshot files
are processed by bgpdump or bgpscanner into a deterministic
prefix list that the FRR config generator turns into BGP UPDATE
messages.
Mode 2 — Synthetic prefixes (fallback for offline / count-tuned tests)¶
When the operator wants exact control over count or has no snapshot
available, the bench generates N deterministic non-overlapping
prefixes:
| Operator selection | Pool(s) — IPv4 | Pool(s) — IPv6 | Prefix size |
|---|---|---|---|
| 100 | 198.51.100.0/24 (RFC 5737) | 2001:db8:100::/40 (RFC 3849) | /27, /48 |
| 1,000 | 198.18.0.0/15 (RFC 2544) | 2001:db8:1000::/36 | /24, /48 |
| 10,000 | 198.18.0.0/15 | 2001:db8:1::/32 carved into /48s | /24, /48 |
| 100,000 | 100.64.0.0/10 (RFC 6598 CGNAT) | 2001:db8::/32 split | /24, /48 |
| 1,000,000 | 100.64.0.0/10 + 240.0.0.0/4 fallback | 2001:db8::/32 + 2001:db9::/32 (lab-only ext) | /24, /48 |
All IPv4 pools are valid public-IPv4 numbers in IANA-allocated ranges that are reserved (documentation, benchmarking, CGNAT, "Class E future use"). All IPv6 pools draw from RFC 3849 documentation prefix. Every one is guaranteed never to clash with any real Internet route. Safe for lab while satisfying the "use public-looking IPs" requirement.
Mode 3 — Capacity-fitted (auto-sized to the DUT)¶
When the operator has selected an NGFW SKU as DUT, the dashboard
queries the catalog's routing_table_capacity_* fields and computes
a meaningful test scale:
fitted_v4_count = min(catalog.ipv4_rib_max_routes * 0.80, 1_000_000)
fitted_v6_count = min(catalog.ipv6_rib_max_routes * 0.80, 200_000)
The 80% factor leaves headroom so the test characterises behaviour near the limit without immediately overflowing — the failure-mode characterisation is a separate "stress to overflow" follow-up phase in the test workflow.
Concrete examples:
| DUT SKU | Catalog ipv4_rib_max |
fitted_v4_count |
Outcome |
|---|---|---|---|
| FortiGate 40F | 10,000 | 8,000 | Tests SOHO branch realistic load |
| Cisco FTD CSF1230 | 50,000 | 40,000 | Tests mid-tier branch realistic load |
| Cisco FTD CSF6160 | 1,500,000 | 1,000,000 (cap) | Tests near full Internet table |
| Huawei USG12000-NF | 2,500,000 | 1,000,000 (cap) | Tests near full Internet table |
When the operator picks "real Internet snapshot" with a small SKU (e.g. FortiGate 40F), the dashboard warns and offers capacity-fitted as the recommended alternative.
Catalog schema dependency¶
This mode requires the NGFW catalog (platform/dut-catalog/*.yaml)
to carry routing-capacity fields. Schema extension proposed in a
catalog refresh PR (PR-A2 follow-up):
capabilities:
routing_table_capacity:
ipv4_rib_max_routes: 1500000 # software RIB limit (control plane)
ipv4_fib_max_routes: 800000 # hardware FIB limit (data plane TCAM)
ipv6_rib_max_routes: 250000
ipv6_fib_max_routes: 100000
Where vendor datasheets do not publish these numbers (common — most
publish only "thousands of routes" without specifying), the catalog
field uses null and the dashboard falls back to synthetic mode
with a warning that capacity-fitted is unavailable for this SKU.
Operator UX (Dashboard)¶
Test setup wizard gains a new section "Routing table stress". The user's three-axis selection becomes the top of the section; advanced options collapse below:
┌─ Routing table stress ──────────────────────────────────────────┐
│ │
│ ☑ Enable BGP route injection │
│ │
│ ── Axis 1 — AFI stack ─────────────────────────────────────────│
│ ◯ IPv4-only │
│ ◯ IPv6-only │
│ ● Dual-stack (RECOMMENDED — modern Internet edges carry both) │
│ │
│ ── Axis 2 — Route source ─────────────────────────────────────│
│ ● Real Internet snapshot (RouteViews / RIPE RIS) │
│ ↳ snapshot date: 2026-04-01 (latest bundled) [refresh ↻] │
│ ↳ ~950K IPv4 + ~190K IPv6 = ~1.14M prefixes │
│ ◯ Capacity-fitted (auto-size to selected DUT SKU) │
│ ↳ Selected DUT: Cisco FTD CSF6160 → fitted: 1M v4 + 200K v6 │
│ ◯ Synthetic (manual count) │
│ ↳ Count: ◯ 100 ◯ 1K ◯ 10K ◯ 100K ◯ 1M │
│ ↳ Prefix size: ◯ /24 v4 ⊕ /48 v6 ◯ /22 v4 ◯ /20 v4 │
│ │
│ ⚠ Capacity warning (auto-detect from selected DUT): │
│ "Cisco FTD CSF1230 cannot hold 1M routes (RIB max 50K). │
│ Recommended: 'Capacity-fitted' or synthetic 100/1K/10K." │
│ │
│ ▼ Advanced (click to expand) │
│ AS numbers: VyOS AS 64496 ↔ NGFW AS 64497 [edit] │
│ BGP session: ◯ per-AFI (default) ◯ multi-AFI (RFC 4760) │
│ Convergence: ☑ measure (default ON) │
│ Withdraw: ☐ flap routes mid-test (advanced) │
│ Auth: ☐ MD5 ☐ TCP-AO │
│ Graceful Rst: ☐ enable (off in MVP) │
└─────────────────────────────────────────────────────────────────┘
Topology.yaml schema extension¶
bgp_stress:
enabled: false # default off — opt-in feature
router: vyos # MVP vendor (FRR + BIRD planned)
vlan: 40
# Dual-stack peer link (IPv4 + IPv6 both required when enabled)
peer_link_v4_cidr: 192.0.2.252/30
ngfw_v4: 192.0.2.253
router_v4: 192.0.2.254
peer_link_v6_cidr: 2001:db8:0:40::/126
ngfw_v6: 2001:db8:0:40::1
router_v6: 2001:db8:0:40::2
# AS numbers (RFC 5398 documentation range)
router_asn: 64496
ngfw_asn: 64497
# AFI selection — defaults to dual-stack
afi_stack: dual # ipv4 | ipv6 | dual
session_mode: per-afi # per-afi (default) | multi-afi (RFC 4760)
# Route source — operator picks at test time, default below
route_source: real-internet # real-internet | capacity-fitted | synthetic
default_route_count: 10000 # only used when route_source == synthetic
snapshot_path: /opt/bgp-stress/snapshots/2026-04-01_routeviews2.mrt
snapshot_refresh: weekly # never | weekly | daily (requires Internet from CP)
The YAML loader (per ADR 0011) refuses combinations where:
bgp_stress.enabled: truebutrouter_v4andngfw_v4are not in the same/30(or/31if explicitly chosen)bgp_stress.enabled: trueandafi_stack ∈ {ipv6, dual}butrouter_v6andngfw_v6are not in the same/126(or/127)vlancollides with any VLAN already used by the topologyrouter_asn == ngfw_asn(eBGP only in MVP; iBGP is opt-in via separate flag in a follow-up amendment)route_source: real-internetbutsnapshot_pathdoes not exist on disk andsnapshot_refresh: neverroute_source: capacity-fittedbut no DUT SKU is selected for the test plan
Metrics + Annex L¶
From the NGFW (via SNMP / vendor MIB / API):
bgp.peer.state(Idle / Connect / Active / Established)bgp.peer.received_prefix_countbgp.peer.uptime_secondsrouting.rib_entries_totalrouting.fib_entries_totalcpu.utilization_percent(during advertisement burst)memory.used_mb(idem)bgp.convergence_time_ms(advertise complete → all routes installed)
From the VyOS Router Peer BGP (via FRR vtysh JSON output):
advertised_prefixes_countpeer_received_count(sanity check both sides agree)flap_count(during withdraw-stress phase)
Annex L — Routing Table Stress layout:
- Settings — route count, prefix size, AS numbers, peer link
- Convergence — graph of
received_prefix_countover time - NGFW resources — CPU + memory timeline during burst
- Failure point — if route advertisement stalled or peer dropped
- Datasheet comparison — tested capacity vs vendor-claimed
capacity (sourced from
dut-catalog/<vendor>.yaml)
Three-language template (en, pt-BR, es) in line with every
other annex.
IBO integration¶
Per discuss_intelligent_bench_orchestrator_2026_05_08:
- S1 (Lab Resource Inventory) gains a new resource kind:
bgp-peer-router(vendor: vyos, model: vyos-rolling, ASN, peer link) - S2 (DUT Envelope Discovery) gains a new capability check:
routing_table_capacity_routes(queried via SNMPbgp4-mibor the vendor-specific MIB; populated into the DUT catalog entry) - S4 (Tier Mixer) treats BGP stress as orthogonal to the data-plane test types — it does not consume agent or persona bandwidth, so it can run in parallel with any other test type without contention. This is unique among test types and changes the scheduling model accordingly.
- S7 (BoM Generator) lists the Router Peer BGP appliance in the
bill of materials when
bgp_stress.enabled: true. - S8 (Physical Layer Validator) validates the new VLAN 40 link.
Consequences¶
Positive¶
- Demonstrable differentiator: throughput-only competitors cannot reproduce this characterisation. Marketing narrative locked: "we don't just measure how fast it forwards — we measure when it falls over and whether the bottleneck is data plane or control plane."
- Procurement-changing: this test surfaces a hidden truth — procurement teams sizing NGFWs by throughput alone routinely ship under-provisioned devices that crash under real route load. The feature shifts purchasing decisions toward correctly-sized models for routing capacity in addition to forwarding capacity. This is the strongest commercial signal of any test type the bench ships.
- Real-world realism: the real-Internet-snapshot mode replays what an actual edge router sees. No other vendor (Spirent, Ixia, Keysight, NetSecOPEN) currently ships this in their DUT characterisation suites.
- Vendor-neutral methodology: works on every NGFW that supports BGP (every vendor in the locked DUT catalog scope of 10).
- Capacity-aware test fit: small SKUs (FortiGate 40F class) get a meaningful test rather than a pointless 1M-route attempt. Auto-fitting from the catalog removes guesswork.
- Static route precedence guarantee: persona forwarding is never impacted by BGP route flap, route load, or even a deliberately overlapping Internet prefix. Operators can run BGP stress alongside data-plane experiments and trust both results.
- Stack isolation keeps the new control-plane stack from perturbing data-plane experiments. CPU/memory graph timestamps line up against route-count graph for attribution.
- Aligns with NetSecOPEN / RFC 9411 spirit of multi-dimensional
benchmarking even though RFC 9411 doesn't itself prescribe
routing-table tests (per
discuss_netsecopen_rfc9411).
Negative / costs¶
- New stack to maintain: VyOS image build, FRR config templates, dual-stack manifest set, Annex L template in three languages, dashboard wizard section, per-vendor BGP config snippets (10 vendors × 3 management planes × IPv4 + IPv6).
- NGFW-side configuration is manual:
integration_tier: catalog-onlymeans we provide copy-pasteable BGP configs but the operator applies them on the NGFW. Operators less familiar with BGP need clear runbooks (covered by Help Center entry perdiscuss_help_center_2026_05_08). - Memory footprint at full Internet table is significant: VyOS +
FRR with 950K IPv4 + 190K IPv6 routes consumes roughly 3–5 GB
RAM. The Router Peer BGP pod's resource requests must reflect
this when
route_source: real-internetordefault_route_count >= 1_000_000. Smaller counts (≤100K) fit comfortably in 1 GB. - Test duration grows with count: 1M+ routes convergence on a modest NGFW can take 5–10 minutes (dual-stack adds ~30%). The dashboard must surface expected duration estimates and not time out the test prematurely.
- Snapshot data freshness: the bundled MRT snapshot ages between refreshes. Operators on airgap labs run against a snapshot that may be 1–4 weeks old. Annex L always records snapshot timestamp for transparent reporting.
- Catalog re-scrape pass needed (PR-4): 54 NGFW SKUs across 4 current vendors plus the 6 vendors still to land (Fortinet, Palo Alto, Check Point, Sophos, Forcepoint, WatchGuard) need routing-capacity values. Many vendor datasheets do not publish these — see open question #8.
- MRT processing pipeline complexity: parsing RouteViews dumps
into FRR-loadable config requires
bgpdumporbgpscanner(third-party tools). Adds container build complexity.
Neutral¶
- VLAN 40 is a previously unused VLAN ID. No collision with the existing reserved bands (20/30/99/101-120/200-209). Future test types that need a new VLAN should pick from the 41-49 range to preserve the BGP-stress band.
- ASN choice (64496/64497) is arbitrary within RFC 5398. Operators
in a lab where those exact ASNs already exist for other reasons
can override via
topology.yaml. - Class E (240.0.0.0/4) is dropped by default by some vendor software. When the operator selects 1M routes, the dashboard warns: "1M routes pulls from RFC 1112 Class E reserved space. Some NGFWs reject Class E by default — see vendor config snippet."
Implementation roadmap¶
This ADR is PR-1 of a now-12-PR feature (expanded from the original 7 by the dual-stack + real-Internet-snapshot + capacity-aware refinements):
| PR | Scope | Estimate |
|---|---|---|
| PR-1 (this) | ADR + project memo + topology.yaml schema | ~400 LoC docs (with refinements) |
| PR-2 | VyOS container image + cloud-init template + BGP config generator | ~350 LoC |
| PR-3 | NetworkAttachmentDefinition VLAN 40 + manifests + namespace (dual-stack: v4 + v6) | ~250 LoC YAML |
| PR-4 | NGFW catalog schema extension — routing_table_capacity_* fields per SKU (54 NGFW SKUs need a re-scrape pass) |
~150 LoC schema + ~500 LoC YAML data |
| PR-5 | RouteViews / RIPE RIS snapshot ingestion pipeline (download / verify / convert MRT → FRR config) | ~300 LoC |
| PR-6 | Bundled snapshot artefact build pipeline + CI job (refresh weekly) | ~150 LoC + GitHub Actions workflow |
| PR-7 | Dashboard UI — three-axis "Routing table stress" section + capacity warning logic | ~400 LoC TSX |
| PR-8 | Metrics scraping (BGP4-MIB + vendor MIBs) + Annex L report template (3 languages) | ~300 LoC |
| PR-9 | NGFW-side BGP config snippets per vendor (10 vendors × 3 planes × IPv4/IPv6) | ~600 LoC docs |
| PR-10 | E2E integration test using BIRD or ExaBGP as a mock NGFW peer (dual-stack) | ~200 LoC |
| PR-11 | Help Center entry "BGP routing table stress test" (3 langs) + 2-min video tutorial | ~250 LoC docs + Mux video pipeline |
| PR-12 | Marketing copy update (per project_marketing_site_obligation) — control-plane-vs-data-plane narrative on tlsstress-art.com (5 langs) |
~200 LoC docs |
| Total | ~4,000 LoC across 12 PRs | ~5-6 week sprint |
Suggested target: v4.8 still feasible if the schema work (PR-4) parallelises with snapshot pipeline (PR-5/6); these have no runtime dependency on each other.
Open questions¶
These do not block ADR acceptance — captured for later resolution during PR-2..PR-12.
/30vs/31peer link —/31(RFC 3021) is more efficient but historically less universally supported by NGFW vendors. Default/30; track per-vendor/31support in the DUT catalog via a follow-upbgp_p2p_31_supportedcapability flag. Same logic for/126vs/127IPv6.- Graceful restart / route refresh — enabled by default would be more realistic but masks raw advertisement-burst behaviour. Default OFF; enable via dashboard advanced options.
- MD5 / TCP-AO authentication — production realism feature. Default OFF; enable via dashboard advanced options.
- BGP communities — additional control-plane stress (each community attribute is extra processing per route). Real Internet snapshots include extensive community attributes (ISP-tagged, regional, etc.) — preserve them on replay or strip? Default PRESERVE (more realistic stress); strip option in advanced.
- Snapshot freshness vs reproducibility — daily refresh is technically feasible but breaks test-to-test comparison (different prefixes between runs). Default to bundled weekly-refreshed snapshot with operator opt-in for "always latest". Mark Annex L results with the snapshot timestamp so comparisons stay meaningful.
- Snapshot source provenance — RouteViews, RIPE RIS, and
bgp.tools each have slightly different views (different peer
sets, different propagation delays). MVP defaults to RouteViews
route-views2collector for IPv4 + IPv6 — single, well-known source. Multi-source diff is a follow-up. - Route flap intensity in withdraw-stress mode — default "1% of routes flapped every 30s" was chosen as a mild starting point. Configurable via dashboard advanced options.
- Catalog routing-capacity field provenance — most vendor datasheets do not publish FIB / RIB hard limits. Per-SKU values need to come from: (a) datasheet where available, (b) vendor documentation portals (admin guides), (c) test bench measurement (the bench can characterise the limit empirically as a side effect of running this test, then back-port that finding to the catalog YAML).
- Capacity-fitted 80% headroom — pulled from a reasonable default; measure-driven tuning may show that 70% (more conservative) or 90% (more aggressive) is a better operating point per vendor family. Track via Annex L results.
References¶
- RFC 4760 — Multiprotocol Extensions for BGP-4 (MP-BGP — IPv4 + IPv6
on a single session, used for
session_mode: multi-afi) - RFC 5398 — Autonomous System (AS) Number Reservation for Documentation Use (defines AS 64496-64511 and AS 65536-65551)
- RFC 5737 — IPv4 Address Blocks Reserved for Documentation (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24)
- RFC 6598 — IANA-Reserved IPv4 Prefix for Shared Address Space (100.64.0.0/10, the CGNAT range)
- RFC 3849 — IPv6 Address Prefix Reserved for Documentation (2001:db8::/32 — used for synthetic IPv6 prefixes + IPv6 peer link)
- RFC 2544 — Benchmarking Methodology for Network Interconnect Devices (198.18.0.0/15)
- RFC 1112 — Host Extensions for IP Multicasting (defines Class E 240.0.0.0/4 as "reserved for future use")
- RFC 9411 — Benchmarking Methodology for Network Security Device Performance (NetSecOPEN; vocabulary alignment for the Annex L report)
- RFC 6396 — Multi-Threaded Routing Toolkit (MRT) Routing Information Export Format (snapshot file format consumed by the ingestion pipeline)
- University of Oregon RouteViews Project — http://archive.routeviews.org/ — public BGP snapshots, default source for the real-Internet-snapshot mode
- RIPE RIS — https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris — alternative public BGP snapshot source
- ADR 0009 — L2 BPDU isolation (BPDU rules apply to VLAN 40 too)
- ADR 0011 — Topology axes (this ADR adds an orthogonal axis)
project_bgp_routing_table_saturation_2026_05_08(memory) — including the four 2026-05-08 follow-up refinementsdiscuss_intelligent_bench_orchestrator_2026_05_08(memory) — IBO integration pointsdiscuss_vyos_observability(memory) — observability pattern re-used for the new VyOS appliancediscuss_ipv6_dualstack_testing(memory) — Phase 1 of dual-stack rollout (this ADR gets dual-stack support promoted to MVP)discuss_vpn_ipsec_simulation(memory) — iPerf3 agent deployment (still pending; tracked separately, this ADR does not depend on it)
Amendment 2 — VLAN 40 conflict; switch to VLAN 2809¶
Date: 2026-05-09 Status: Accepted (correction) Supersedes: every reference to "VLAN 40" in the original ADR text and Amendment 1 above
What went wrong¶
The original ADR specified VLAN 40 for the BGP peer link, with a justification that read "[VLAN 40 is a previously unused VLAN ID. No collision with the existing reserved bands ...]". That claim was wrong. VLAN 40 was already in PRODUCTION use by the Cloner ISP egress at the time the ADR was written:
k8s/80-cloner-nad.yamldeclaresmaster: eth1.40(Cloner ISP NetworkAttachmentDefinition)k8s/81-cloner-deployment.yamlreferences the cloner-isp NAD on itsnet1macvlan attachment- The CHANGELOG and README document VLAN 40 as the Cloner ISP egress (Cloner pod → eth1.40 → Nexus 9000 → upstream router → Internet)
Putting BGP peering and Cloner ISP egress on the same VLAN would be a security boundary violation (two unrelated control flows sharing a broadcast domain) and a debugging nightmare. The mistake was caught during PR-3 (K8s manifests) review by the project owner, after PR-3 had already merged into main.
Correction¶
The BGP peer link uses VLAN 2809 (new, dedicated, never shared). Rationale for choosing 2809:
- Far outside any current reservation (1, 10, 20, 30, 40, 99, 101–120, 200–209)
- Far outside any reasonable future reservation (the project grows VLAN IDs in <= 3-digit ranges; 2809 is in the 4-digit range that nothing else in the project will plausibly enter)
- 4-digit IDs are valid 802.1Q (range 1..4094)
- Memorable + project-owner-chosen
Affected files¶
The following files use VLAN 2809 (not VLAN 40):
k8s/dut/41-vlan2809-bgp-peer-nad.yaml(renamed from the previous file with VLAN 40 in its filename)k8s/23-bgp-router-peer-deployment.yaml(Multus annotation referencesbgp-peer-vlan2809)k8s/33-bgp-router-peer-network-policy.yaml(comments)
The IPv6 prefix is also updated to mirror the new VLAN ID:
- IPv6 subnet:
2001:db8:0:2809::/126(was2001:db8:0:40::/126) - NGFW IPv6:
2001:db8:0:2809::1 - Router IPv6:
2001:db8:0:2809::2
The IPv4 subnet stays as 192.0.2.252/30 (RFC 5737 documentation
prefix; not VLAN-tied).
Lesson learned¶
Any future ADR that proposes a VLAN ID MUST include an explicit audit pass against:
git grep -E "vlan[_ -]?<id>|eth[0-9]+\.<id>"across the whole repo- The project's VLAN-allocation table in CLAUDE.md and
docs/ARCHITECTURE.md
before the ADR is accepted. The original ADR (and its Amendment 1) performed neither check.
Original ADR text — historical context preserved¶
The body of this ADR (and Amendment 1) above intentionally retains
the "VLAN 40" wording as written, so anyone reading this document
in the future can see what was originally proposed. All
runtime/manifest/code references to VLAN 40 in the BGP feature
have been corrected to VLAN 2809. The project memo
project_bgp_routing_table_saturation_2026_05_08 and any future
documentation must use VLAN 2809.