Skip to content

ADR 0012 — BGP Routing Table Saturation Test

  • Status: Proposed
  • Date: 2026-05-08
  • Deciders: TLSStress.Art project
  • Targets: v4.8 (this ADR is the design lock; implementation follows in PR-2..PR-12 of this feature, ~5-6 week sprint after the dual-stack + real-Internet-snapshot + capacity-aware refinements landed 2026-05-08)

IP migration v4.3 carve-out: this ADR documents a deliberate design choice to use RFC 5737 TEST-NET-1 (192.0.2.252/30) for the BGP peer link itself (NOT for user-facing explanations). Per project_ip_addressing_v43, the testbed otherwise uses real RNP public IPs (200.130.0.0/30). Future ADRs may revise the peer-link choice; this one preserves the historical audit trail.

Context

The bench's existing test types (Branch Office, Inspection Profile, SDWAN and Cloud On-Ramp, etc.) all stress the data plane: TLS inspection throughput, latency, concurrent sessions, application-mix workload. Routing has been static throughout: the NGFW (DUT) sees a small handful of pre-defined routes (one per persona VLAN, plus the agent VLANs).

Real-world enterprise NGFWs sit at the boundary of networks that consume sizeable routing tables:

  • A branch office typically learns 50–500 routes via dynamic routing from the WAN edge (BGP, OSPF, or both).
  • A mid-size enterprise edge carries 10,000–50,000 routes — partial Internet view via BGP from one or more upstream ISPs.
  • A service-provider edge or datacenter perimeter can carry the full Internet table — about 950,000 IPv4 routes as of early 2026, growing ~40K/year.

The NGFW's behaviour under that route load is rarely characterised in vendor datasheets. Vendors publish FIB and RIB capacity numbers (usually generous round numbers), but they do not publish:

  • Convergence time as a function of route count
  • CPU + memory utilisation during the advertise burst
  • Behaviour at the limit (graceful degradation with a maximum- prefix policy? slowdown? hard crash? silent drop of new routes? route flap?)
  • Stability under churn (advertise → withdraw → re-advertise)
  • Interaction with TLS inspection (does the inspection pipeline stall or drop sessions while the routing daemon is processing 10K updates/s?)

Operators procuring NGFWs for high-route-count edges are flying blind on these dimensions. A test methodology that can demonstrably populate the FIB to a target depth, measure convergence time, and expose vendor-specific failure modes would be a strong differentiator versus throughput-only competitors.

Decision

Add a dedicated control-plane stress stack to the bench that:

  1. Stands up a new VyOS appliance ("Router Peer BGP") on a new VLAN 40, sharing only an L2 BGP-peering link with the NGFW.
  2. Establishes dual-stack eBGP (IPv4 + IPv6 simultaneously) with the NGFW using ASN documentation values (RFC 5398).
  3. Advertises one of:
  4. A real Internet routing-table snapshot sourced from RouteViews / RIPE RIS (default mode — ~950K IPv4 prefixes + ~190K IPv6 prefixes as of early 2026), OR
  5. N synthesised prefixes where N ∈ {100, 1K, 10K, 100K, 1M} (fallback mode for airgap / unreachable setups), OR
  6. A capacity-fitted subset auto-sized from the NGFW catalog's routing_table_capacity_* fields so a small SKU (e.g. FortiGate 40F) gets a meaningful test, not a futile 1M-route attempt.
  7. Carries zero data-plane traffic on this link — it exists purely to populate the NGFW RIB and FIB.
  8. Captures convergence time, NGFW resource usage, and final RIB/FIB depth into a new report annex (Annex L — Routing Table Stress).

The new stack is opt-in via topology.yaml (bgp_stress.enabled: true), unrelated to and orthogonal from every other test type. It can run standalone, or layered on top of a Branch Office / Inspection Profile / SDWAN-OnRamp test for full-plane stress.

Operator UX — three-axis selection

Per the user's refinement on 2026-05-08, the dashboard test setup exposes the BGP stress test through three independent axes:

Axis Options
BGP enabled? yes / no (default no)
AFI stack? IPv4-only / IPv6-only / dual-stack (default dual-stack)
Routes to inject? 100 / 1K / 10K / 100K / 1M / real Internet snapshot / fit to DUT capacity

The dashboard pre-validates the (SKU, route count) combination against the DUT catalog's routing_table_capacity_* fields and warns/blocks any combination that is known to be infeasible (e.g. selecting "real Internet snapshot" with a FortiGate 40F as DUT triggers a hard warning + suggested fitted alternative).

Static-route precedence — load-bearing safety

The bench's persona forwarding plane uses static routes to reach the persona VLANs (10.1.x.0/27 + 10.2.x.0/27, configured directly on the NGFW). When the BGP peer advertises an overlapping prefix — intentionally or by accident from a real Internet snapshot — the static route always wins by route preference (administrative distance 1 vs eBGP's 20 in Cisco terms, equivalent semantics in every other vendor):

Source                           Default admin distance
──────────────────────────────   ────────────────────────
Connected interface              0
Static route                     1     ← persona forwarding wins
EBGP                            20     ← BGP-injected lose
OSPF                           110
RIP                            120

This means persona forwarding is never impacted by BGP route flap, route load, or even a deliberately overlapping Internet prefix. Operators should not second-guess the test when overlap appears in the Annex L report.

This safety is documented in three operator-facing surfaces:

  1. Dashboard tooltip on the BGP stress section
  2. Annex L preface (every report includes the rule)
  3. Help Center entry "BGP stress and static-route precedence"

Value narrative — control plane vs data plane

This feature exists because the bottleneck of an NGFW is often control plane (routing memory) before it is data plane (throughput or TLS inspection capacity). Procurement teams sizing NGFWs by throughput alone routinely ship under-provisioned devices that crash, drop sessions, or refuse new routes the moment a real Internet table arrives via BGP. A FortiGate 40F has perfectly adequate throughput for a 100-user branch but 4 GB of total RAM — trying to carry the full Internet table on it is futile.

The marketing narrative for this feature: throughput-only benchmarks (Spirent, Ixia, Keysight) test how fast it forwards but not when it falls over. Customers procuring NGFWs for high-route-count edges (datacenter, transit, multi-homed enterprise edge) deserve defensible evidence of routing capacity headroom, not just inspection bandwidth. Use this exact framing in marketing site, sales decks, customer-facing reports, and the Help Center primer for this feature.

Architecture

Topology (delta vs current)

Existing data-plane stack (unchanged):
  Agents ─→ NGFW ─→ Personas (synthetic + cloned)
          (static routes — admin distance 1, always win)

New control-plane sidecar (added by bgp_stress.enabled):
  ┌──────────────────────┐  VLAN 40 (NEW)              ┌────────────┐
  │  Router Peer BGP     │═══ DUAL-STACK BGP peering ══│  NGFW DUT  │
  │  (VyOS rolling, FRR) │   ⚡ no data traffic ⚡     │            │
  │  AS 64496            │   IPv4: 192.0.2.252/30      │  AS 64497  │
  │  192.0.2.254/30      │   IPv6: 2001:db8:0:40::/126 │            │
  │  2001:db8:0:40::2/126│   AFIs: v4 + v6 unicast     │            │
  └──────────────────────┘   eBGP MP-BGP capability    └────────────┘
       │                     advertises EITHER:                │
       │                     • real Internet snapshot          │
       │                     • synthetic prefixes              │
       │                     • capacity-fitted subset          │
       │ eth0 (mgmt OOBI — Prometheus scrape)                  │
       ▼                                                       ▼
  k3s control plane                                  k3s data plane
                                          (unchanged from base topology)
Field Value Rationale
VLAN ID 40 Reserved gap between VLAN 30 (K6 agents) and VLAN 99 (SNMP mgmt). Easy to remember; no clash with existing 20/30/99/101-120/200-209.
IPv4 subnet 192.0.2.252/30 Last /30 of RFC 5737 TEST-NET-1. Documentation range, IANA-reserved, never appears in real Internet routing.
IPv4 NGFW 192.0.2.253 Lower /30 host
IPv4 VyOS 192.0.2.254 Upper /30 host
IPv6 subnet 2001:db8:0:40::/126 RFC 3849 documentation prefix; :0:40::/126 mirrors VLAN 40 in the network ID for human readability.
IPv6 NGFW 2001:db8:0:40::1 Lower /126 host
IPv6 VyOS 2001:db8:0:40::2 Upper /126 host
MTU 1500 Control plane only — no jumbo benefit
BPDU guard + filter (per ADR 0009) Same isolation rules as every other VLAN

The user proposed /32 for the peer link. We choose /30 (IPv4) and /126 (IPv6) instead for broadest vendor compatibility — every NGFW we plan to support handles /30 and /126 natively, while /32 and /128 interface configurations require special "unnumbered" or "host route" handling that varies by vendor. Operators who insist on the narrower variants can override the topology.yaml values, but /30 and /126 are the documented defaults.

Single TCP session, multi-AFI: VyOS / FRR establishes one BGP TCP session per AFI by default (one over IPv4 transport carrying IPv4 unicast NLRI; one over IPv6 transport carrying IPv6 unicast NLRI). This is the most vendor-portable behaviour. RFC 4760 MP-BGP allows carrying both AFIs over a single session; bgp_stress.session_mode in topology.yaml exposes the choice (per-afi default, multi-afi opt-in for vendors that support it cleanly).

AS numbers

Side ASN Source
VyOS Router Peer BGP AS 64496 RFC 5398 documentation (16-bit)
NGFW DUT AS 64497 RFC 5398 documentation (16-bit)

eBGP (between two different AS) was chosen over iBGP so the BGP best-path algorithm exercises full AS_PATH processing, which is more representative of real edge deployments. iBGP variant remains available through topology.yaml override for operators who want to characterise route-reflector workloads.

Route advertisement modes

The Router Peer BGP can advertise prefixes in three modes, chosen by the operator at test setup:

Mode 1 — Real Internet snapshot (default when reachable)

Source the routing table from a public BGP collector and replay it. This is the most realistic mode — the NGFW sees what an actual Internet edge router sees:

Source What it provides Refresh cadence
University of Oregon RouteViews (archive.routeviews.org) Daily MRT dumps from 30+ collectors; IPv4 + IPv6 Daily 00:00 / 06:00 / 12:00 / 18:00 UTC
RIPE RIS (data.ris.ripe.net) Real-time + archived; IPv4 + IPv6 Every 8 hours archived; live BMP feed
bgp.tools community feed Curated full-table snapshots On request

Approximate sizes (early 2026):

  • IPv4 full table: ~950,000 prefixes
  • IPv6 full table: ~190,000 prefixes
  • Combined: ~1.14 M prefixes

The bench ships with a bundled snapshot (timestamped, e.g. 2026-04-01_routeviews2_full_v4.mrt, _v6.mrt) so airgap labs work out of the box; operators with Internet connectivity from the control plane can opt into a fresh weekly download. Snapshot files are processed by bgpdump or bgpscanner into a deterministic prefix list that the FRR config generator turns into BGP UPDATE messages.

Mode 2 — Synthetic prefixes (fallback for offline / count-tuned tests)

When the operator wants exact control over count or has no snapshot available, the bench generates N deterministic non-overlapping prefixes:

Operator selection Pool(s) — IPv4 Pool(s) — IPv6 Prefix size
100 198.51.100.0/24 (RFC 5737) 2001:db8:100::/40 (RFC 3849) /27, /48
1,000 198.18.0.0/15 (RFC 2544) 2001:db8:1000::/36 /24, /48
10,000 198.18.0.0/15 2001:db8:1::/32 carved into /48s /24, /48
100,000 100.64.0.0/10 (RFC 6598 CGNAT) 2001:db8::/32 split /24, /48
1,000,000 100.64.0.0/10 + 240.0.0.0/4 fallback 2001:db8::/32 + 2001:db9::/32 (lab-only ext) /24, /48

All IPv4 pools are valid public-IPv4 numbers in IANA-allocated ranges that are reserved (documentation, benchmarking, CGNAT, "Class E future use"). All IPv6 pools draw from RFC 3849 documentation prefix. Every one is guaranteed never to clash with any real Internet route. Safe for lab while satisfying the "use public-looking IPs" requirement.

Mode 3 — Capacity-fitted (auto-sized to the DUT)

When the operator has selected an NGFW SKU as DUT, the dashboard queries the catalog's routing_table_capacity_* fields and computes a meaningful test scale:

fitted_v4_count = min(catalog.ipv4_rib_max_routes * 0.80, 1_000_000)
fitted_v6_count = min(catalog.ipv6_rib_max_routes * 0.80, 200_000)

The 80% factor leaves headroom so the test characterises behaviour near the limit without immediately overflowing — the failure-mode characterisation is a separate "stress to overflow" follow-up phase in the test workflow.

Concrete examples:

DUT SKU Catalog ipv4_rib_max fitted_v4_count Outcome
FortiGate 40F 10,000 8,000 Tests SOHO branch realistic load
Cisco FTD CSF1230 50,000 40,000 Tests mid-tier branch realistic load
Cisco FTD CSF6160 1,500,000 1,000,000 (cap) Tests near full Internet table
Huawei USG12000-NF 2,500,000 1,000,000 (cap) Tests near full Internet table

When the operator picks "real Internet snapshot" with a small SKU (e.g. FortiGate 40F), the dashboard warns and offers capacity-fitted as the recommended alternative.

Catalog schema dependency

This mode requires the NGFW catalog (platform/dut-catalog/*.yaml) to carry routing-capacity fields. Schema extension proposed in a catalog refresh PR (PR-A2 follow-up):

capabilities:
  routing_table_capacity:
    ipv4_rib_max_routes: 1500000   # software RIB limit (control plane)
    ipv4_fib_max_routes: 800000    # hardware FIB limit (data plane TCAM)
    ipv6_rib_max_routes: 250000
    ipv6_fib_max_routes: 100000

Where vendor datasheets do not publish these numbers (common — most publish only "thousands of routes" without specifying), the catalog field uses null and the dashboard falls back to synthetic mode with a warning that capacity-fitted is unavailable for this SKU.

Operator UX (Dashboard)

Test setup wizard gains a new section "Routing table stress". The user's three-axis selection becomes the top of the section; advanced options collapse below:

┌─ Routing table stress ──────────────────────────────────────────┐
│                                                                 │
│ ☑ Enable BGP route injection                                    │
│                                                                 │
│ ── Axis 1 — AFI stack ─────────────────────────────────────────│
│ ◯ IPv4-only                                                    │
│ ◯ IPv6-only                                                    │
│ ● Dual-stack  (RECOMMENDED — modern Internet edges carry both) │
│                                                                 │
│ ── Axis 2 — Route source ─────────────────────────────────────│
│ ● Real Internet snapshot   (RouteViews / RIPE RIS)              │
│   ↳ snapshot date: 2026-04-01 (latest bundled)  [refresh ↻]    │
│   ↳ ~950K IPv4 + ~190K IPv6 = ~1.14M prefixes                  │
│ ◯ Capacity-fitted          (auto-size to selected DUT SKU)      │
│   ↳ Selected DUT: Cisco FTD CSF6160 → fitted: 1M v4 + 200K v6  │
│ ◯ Synthetic                (manual count)                       │
│   ↳ Count: ◯ 100  ◯ 1K  ◯ 10K  ◯ 100K  ◯ 1M                   │
│   ↳ Prefix size: ◯ /24 v4 ⊕ /48 v6  ◯ /22 v4  ◯ /20 v4         │
│                                                                 │
│ ⚠ Capacity warning (auto-detect from selected DUT):            │
│   "Cisco FTD CSF1230 cannot hold 1M routes (RIB max 50K).      │
│    Recommended: 'Capacity-fitted' or synthetic 100/1K/10K."     │
│                                                                 │
│ ▼ Advanced (click to expand)                                    │
│   AS numbers:    VyOS AS 64496  ↔  NGFW AS 64497  [edit]       │
│   BGP session:   ◯ per-AFI (default)  ◯ multi-AFI (RFC 4760)   │
│   Convergence:   ☑ measure (default ON)                        │
│   Withdraw:      ☐ flap routes mid-test (advanced)              │
│   Auth:          ☐ MD5    ☐ TCP-AO                              │
│   Graceful Rst:  ☐ enable (off in MVP)                          │
└─────────────────────────────────────────────────────────────────┘

Topology.yaml schema extension

bgp_stress:
  enabled: false                       # default off — opt-in feature
  router: vyos                         # MVP vendor (FRR + BIRD planned)
  vlan: 40

  # Dual-stack peer link (IPv4 + IPv6 both required when enabled)
  peer_link_v4_cidr: 192.0.2.252/30
  ngfw_v4: 192.0.2.253
  router_v4: 192.0.2.254
  peer_link_v6_cidr: 2001:db8:0:40::/126
  ngfw_v6: 2001:db8:0:40::1
  router_v6: 2001:db8:0:40::2

  # AS numbers (RFC 5398 documentation range)
  router_asn: 64496
  ngfw_asn: 64497

  # AFI selection — defaults to dual-stack
  afi_stack: dual                      # ipv4 | ipv6 | dual
  session_mode: per-afi                # per-afi (default) | multi-afi (RFC 4760)

  # Route source — operator picks at test time, default below
  route_source: real-internet          # real-internet | capacity-fitted | synthetic
  default_route_count: 10000           # only used when route_source == synthetic
  snapshot_path: /opt/bgp-stress/snapshots/2026-04-01_routeviews2.mrt
  snapshot_refresh: weekly             # never | weekly | daily (requires Internet from CP)

The YAML loader (per ADR 0011) refuses combinations where:

  • bgp_stress.enabled: true but router_v4 and ngfw_v4 are not in the same /30 (or /31 if explicitly chosen)
  • bgp_stress.enabled: true and afi_stack ∈ {ipv6, dual} but router_v6 and ngfw_v6 are not in the same /126 (or /127)
  • vlan collides with any VLAN already used by the topology
  • router_asn == ngfw_asn (eBGP only in MVP; iBGP is opt-in via separate flag in a follow-up amendment)
  • route_source: real-internet but snapshot_path does not exist on disk and snapshot_refresh: never
  • route_source: capacity-fitted but no DUT SKU is selected for the test plan

Metrics + Annex L

From the NGFW (via SNMP / vendor MIB / API):

  • bgp.peer.state (Idle / Connect / Active / Established)
  • bgp.peer.received_prefix_count
  • bgp.peer.uptime_seconds
  • routing.rib_entries_total
  • routing.fib_entries_total
  • cpu.utilization_percent (during advertisement burst)
  • memory.used_mb (idem)
  • bgp.convergence_time_ms (advertise complete → all routes installed)

From the VyOS Router Peer BGP (via FRR vtysh JSON output):

  • advertised_prefixes_count
  • peer_received_count (sanity check both sides agree)
  • flap_count (during withdraw-stress phase)

Annex L — Routing Table Stress layout:

  1. Settings — route count, prefix size, AS numbers, peer link
  2. Convergence — graph of received_prefix_count over time
  3. NGFW resources — CPU + memory timeline during burst
  4. Failure point — if route advertisement stalled or peer dropped
  5. Datasheet comparison — tested capacity vs vendor-claimed capacity (sourced from dut-catalog/<vendor>.yaml)

Three-language template (en, pt-BR, es) in line with every other annex.

IBO integration

Per discuss_intelligent_bench_orchestrator_2026_05_08:

  • S1 (Lab Resource Inventory) gains a new resource kind: bgp-peer-router (vendor: vyos, model: vyos-rolling, ASN, peer link)
  • S2 (DUT Envelope Discovery) gains a new capability check: routing_table_capacity_routes (queried via SNMP bgp4-mib or the vendor-specific MIB; populated into the DUT catalog entry)
  • S4 (Tier Mixer) treats BGP stress as orthogonal to the data-plane test types — it does not consume agent or persona bandwidth, so it can run in parallel with any other test type without contention. This is unique among test types and changes the scheduling model accordingly.
  • S7 (BoM Generator) lists the Router Peer BGP appliance in the bill of materials when bgp_stress.enabled: true.
  • S8 (Physical Layer Validator) validates the new VLAN 40 link.

Consequences

Positive

  • Demonstrable differentiator: throughput-only competitors cannot reproduce this characterisation. Marketing narrative locked: "we don't just measure how fast it forwards — we measure when it falls over and whether the bottleneck is data plane or control plane."
  • Procurement-changing: this test surfaces a hidden truth — procurement teams sizing NGFWs by throughput alone routinely ship under-provisioned devices that crash under real route load. The feature shifts purchasing decisions toward correctly-sized models for routing capacity in addition to forwarding capacity. This is the strongest commercial signal of any test type the bench ships.
  • Real-world realism: the real-Internet-snapshot mode replays what an actual edge router sees. No other vendor (Spirent, Ixia, Keysight, NetSecOPEN) currently ships this in their DUT characterisation suites.
  • Vendor-neutral methodology: works on every NGFW that supports BGP (every vendor in the locked DUT catalog scope of 10).
  • Capacity-aware test fit: small SKUs (FortiGate 40F class) get a meaningful test rather than a pointless 1M-route attempt. Auto-fitting from the catalog removes guesswork.
  • Static route precedence guarantee: persona forwarding is never impacted by BGP route flap, route load, or even a deliberately overlapping Internet prefix. Operators can run BGP stress alongside data-plane experiments and trust both results.
  • Stack isolation keeps the new control-plane stack from perturbing data-plane experiments. CPU/memory graph timestamps line up against route-count graph for attribution.
  • Aligns with NetSecOPEN / RFC 9411 spirit of multi-dimensional benchmarking even though RFC 9411 doesn't itself prescribe routing-table tests (per discuss_netsecopen_rfc9411).

Negative / costs

  • New stack to maintain: VyOS image build, FRR config templates, dual-stack manifest set, Annex L template in three languages, dashboard wizard section, per-vendor BGP config snippets (10 vendors × 3 management planes × IPv4 + IPv6).
  • NGFW-side configuration is manual: integration_tier: catalog-only means we provide copy-pasteable BGP configs but the operator applies them on the NGFW. Operators less familiar with BGP need clear runbooks (covered by Help Center entry per discuss_help_center_2026_05_08).
  • Memory footprint at full Internet table is significant: VyOS + FRR with 950K IPv4 + 190K IPv6 routes consumes roughly 3–5 GB RAM. The Router Peer BGP pod's resource requests must reflect this when route_source: real-internet or default_route_count >= 1_000_000. Smaller counts (≤100K) fit comfortably in 1 GB.
  • Test duration grows with count: 1M+ routes convergence on a modest NGFW can take 5–10 minutes (dual-stack adds ~30%). The dashboard must surface expected duration estimates and not time out the test prematurely.
  • Snapshot data freshness: the bundled MRT snapshot ages between refreshes. Operators on airgap labs run against a snapshot that may be 1–4 weeks old. Annex L always records snapshot timestamp for transparent reporting.
  • Catalog re-scrape pass needed (PR-4): 54 NGFW SKUs across 4 current vendors plus the 6 vendors still to land (Fortinet, Palo Alto, Check Point, Sophos, Forcepoint, WatchGuard) need routing-capacity values. Many vendor datasheets do not publish these — see open question #8.
  • MRT processing pipeline complexity: parsing RouteViews dumps into FRR-loadable config requires bgpdump or bgpscanner (third-party tools). Adds container build complexity.

Neutral

  • VLAN 40 is a previously unused VLAN ID. No collision with the existing reserved bands (20/30/99/101-120/200-209). Future test types that need a new VLAN should pick from the 41-49 range to preserve the BGP-stress band.
  • ASN choice (64496/64497) is arbitrary within RFC 5398. Operators in a lab where those exact ASNs already exist for other reasons can override via topology.yaml.
  • Class E (240.0.0.0/4) is dropped by default by some vendor software. When the operator selects 1M routes, the dashboard warns: "1M routes pulls from RFC 1112 Class E reserved space. Some NGFWs reject Class E by default — see vendor config snippet."

Implementation roadmap

This ADR is PR-1 of a now-12-PR feature (expanded from the original 7 by the dual-stack + real-Internet-snapshot + capacity-aware refinements):

PR Scope Estimate
PR-1 (this) ADR + project memo + topology.yaml schema ~400 LoC docs (with refinements)
PR-2 VyOS container image + cloud-init template + BGP config generator ~350 LoC
PR-3 NetworkAttachmentDefinition VLAN 40 + manifests + namespace (dual-stack: v4 + v6) ~250 LoC YAML
PR-4 NGFW catalog schema extension — routing_table_capacity_* fields per SKU (54 NGFW SKUs need a re-scrape pass) ~150 LoC schema + ~500 LoC YAML data
PR-5 RouteViews / RIPE RIS snapshot ingestion pipeline (download / verify / convert MRT → FRR config) ~300 LoC
PR-6 Bundled snapshot artefact build pipeline + CI job (refresh weekly) ~150 LoC + GitHub Actions workflow
PR-7 Dashboard UI — three-axis "Routing table stress" section + capacity warning logic ~400 LoC TSX
PR-8 Metrics scraping (BGP4-MIB + vendor MIBs) + Annex L report template (3 languages) ~300 LoC
PR-9 NGFW-side BGP config snippets per vendor (10 vendors × 3 planes × IPv4/IPv6) ~600 LoC docs
PR-10 E2E integration test using BIRD or ExaBGP as a mock NGFW peer (dual-stack) ~200 LoC
PR-11 Help Center entry "BGP routing table stress test" (3 langs) + 2-min video tutorial ~250 LoC docs + Mux video pipeline
PR-12 Marketing copy update (per project_marketing_site_obligation) — control-plane-vs-data-plane narrative on tlsstress-art.com (5 langs) ~200 LoC docs
Total ~4,000 LoC across 12 PRs ~5-6 week sprint

Suggested target: v4.8 still feasible if the schema work (PR-4) parallelises with snapshot pipeline (PR-5/6); these have no runtime dependency on each other.

Open questions

These do not block ADR acceptance — captured for later resolution during PR-2..PR-12.

  1. /30 vs /31 peer link/31 (RFC 3021) is more efficient but historically less universally supported by NGFW vendors. Default /30; track per-vendor /31 support in the DUT catalog via a follow-up bgp_p2p_31_supported capability flag. Same logic for /126 vs /127 IPv6.
  2. Graceful restart / route refresh — enabled by default would be more realistic but masks raw advertisement-burst behaviour. Default OFF; enable via dashboard advanced options.
  3. MD5 / TCP-AO authentication — production realism feature. Default OFF; enable via dashboard advanced options.
  4. BGP communities — additional control-plane stress (each community attribute is extra processing per route). Real Internet snapshots include extensive community attributes (ISP-tagged, regional, etc.) — preserve them on replay or strip? Default PRESERVE (more realistic stress); strip option in advanced.
  5. Snapshot freshness vs reproducibility — daily refresh is technically feasible but breaks test-to-test comparison (different prefixes between runs). Default to bundled weekly-refreshed snapshot with operator opt-in for "always latest". Mark Annex L results with the snapshot timestamp so comparisons stay meaningful.
  6. Snapshot source provenance — RouteViews, RIPE RIS, and bgp.tools each have slightly different views (different peer sets, different propagation delays). MVP defaults to RouteViews route-views2 collector for IPv4 + IPv6 — single, well-known source. Multi-source diff is a follow-up.
  7. Route flap intensity in withdraw-stress mode — default "1% of routes flapped every 30s" was chosen as a mild starting point. Configurable via dashboard advanced options.
  8. Catalog routing-capacity field provenance — most vendor datasheets do not publish FIB / RIB hard limits. Per-SKU values need to come from: (a) datasheet where available, (b) vendor documentation portals (admin guides), (c) test bench measurement (the bench can characterise the limit empirically as a side effect of running this test, then back-port that finding to the catalog YAML).
  9. Capacity-fitted 80% headroom — pulled from a reasonable default; measure-driven tuning may show that 70% (more conservative) or 90% (more aggressive) is a better operating point per vendor family. Track via Annex L results.

References

  • RFC 4760 — Multiprotocol Extensions for BGP-4 (MP-BGP — IPv4 + IPv6 on a single session, used for session_mode: multi-afi)
  • RFC 5398 — Autonomous System (AS) Number Reservation for Documentation Use (defines AS 64496-64511 and AS 65536-65551)
  • RFC 5737 — IPv4 Address Blocks Reserved for Documentation (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24)
  • RFC 6598 — IANA-Reserved IPv4 Prefix for Shared Address Space (100.64.0.0/10, the CGNAT range)
  • RFC 3849 — IPv6 Address Prefix Reserved for Documentation (2001:db8::/32 — used for synthetic IPv6 prefixes + IPv6 peer link)
  • RFC 2544 — Benchmarking Methodology for Network Interconnect Devices (198.18.0.0/15)
  • RFC 1112 — Host Extensions for IP Multicasting (defines Class E 240.0.0.0/4 as "reserved for future use")
  • RFC 9411 — Benchmarking Methodology for Network Security Device Performance (NetSecOPEN; vocabulary alignment for the Annex L report)
  • RFC 6396 — Multi-Threaded Routing Toolkit (MRT) Routing Information Export Format (snapshot file format consumed by the ingestion pipeline)
  • University of Oregon RouteViews Project — http://archive.routeviews.org/ — public BGP snapshots, default source for the real-Internet-snapshot mode
  • RIPE RIS — https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris — alternative public BGP snapshot source
  • ADR 0009 — L2 BPDU isolation (BPDU rules apply to VLAN 40 too)
  • ADR 0011 — Topology axes (this ADR adds an orthogonal axis)
  • project_bgp_routing_table_saturation_2026_05_08 (memory) — including the four 2026-05-08 follow-up refinements
  • discuss_intelligent_bench_orchestrator_2026_05_08 (memory) — IBO integration points
  • discuss_vyos_observability (memory) — observability pattern re-used for the new VyOS appliance
  • discuss_ipv6_dualstack_testing (memory) — Phase 1 of dual-stack rollout (this ADR gets dual-stack support promoted to MVP)
  • discuss_vpn_ipsec_simulation (memory) — iPerf3 agent deployment (still pending; tracked separately, this ADR does not depend on it)

Amendment 2 — VLAN 40 conflict; switch to VLAN 2809

Date: 2026-05-09 Status: Accepted (correction) Supersedes: every reference to "VLAN 40" in the original ADR text and Amendment 1 above

What went wrong

The original ADR specified VLAN 40 for the BGP peer link, with a justification that read "[VLAN 40 is a previously unused VLAN ID. No collision with the existing reserved bands ...]". That claim was wrong. VLAN 40 was already in PRODUCTION use by the Cloner ISP egress at the time the ADR was written:

  • k8s/80-cloner-nad.yaml declares master: eth1.40 (Cloner ISP NetworkAttachmentDefinition)
  • k8s/81-cloner-deployment.yaml references the cloner-isp NAD on its net1 macvlan attachment
  • The CHANGELOG and README document VLAN 40 as the Cloner ISP egress (Cloner pod → eth1.40 → Nexus 9000 → upstream router → Internet)

Putting BGP peering and Cloner ISP egress on the same VLAN would be a security boundary violation (two unrelated control flows sharing a broadcast domain) and a debugging nightmare. The mistake was caught during PR-3 (K8s manifests) review by the project owner, after PR-3 had already merged into main.

Correction

The BGP peer link uses VLAN 2809 (new, dedicated, never shared). Rationale for choosing 2809:

  • Far outside any current reservation (1, 10, 20, 30, 40, 99, 101–120, 200–209)
  • Far outside any reasonable future reservation (the project grows VLAN IDs in <= 3-digit ranges; 2809 is in the 4-digit range that nothing else in the project will plausibly enter)
  • 4-digit IDs are valid 802.1Q (range 1..4094)
  • Memorable + project-owner-chosen

Affected files

The following files use VLAN 2809 (not VLAN 40):

  • k8s/dut/41-vlan2809-bgp-peer-nad.yaml (renamed from the previous file with VLAN 40 in its filename)
  • k8s/23-bgp-router-peer-deployment.yaml (Multus annotation references bgp-peer-vlan2809)
  • k8s/33-bgp-router-peer-network-policy.yaml (comments)

The IPv6 prefix is also updated to mirror the new VLAN ID:

  • IPv6 subnet: 2001:db8:0:2809::/126 (was 2001:db8:0:40::/126)
  • NGFW IPv6: 2001:db8:0:2809::1
  • Router IPv6: 2001:db8:0:2809::2

The IPv4 subnet stays as 192.0.2.252/30 (RFC 5737 documentation prefix; not VLAN-tied).

Lesson learned

Any future ADR that proposes a VLAN ID MUST include an explicit audit pass against:

  • git grep -E "vlan[_ -]?<id>|eth[0-9]+\.<id>" across the whole repo
  • The project's VLAN-allocation table in CLAUDE.md and docs/ARCHITECTURE.md

before the ADR is accepted. The original ADR (and its Amendment 1) performed neither check.

Original ADR text — historical context preserved

The body of this ADR (and Amendment 1) above intentionally retains the "VLAN 40" wording as written, so anyone reading this document in the future can see what was originally proposed. All runtime/manifest/code references to VLAN 40 in the BGP feature have been corrected to VLAN 2809. The project memo project_bgp_routing_table_saturation_2026_05_08 and any future documentation must use VLAN 2809.