Skip to content

Branch Office Simulation — operator runbook

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — Branch Office is Test Kind #2 in the canonical 7-Test-Kind set. Combinatorial Test Plan integration available via dashboard/src/lib/test-plan/schema.ts. Sister test kinds: SDWAN/CoR (#4) and PURE (#7). See ARCHITECTURE.md for the canonical Test Kind table.

TL;DR

Branch Office Simulation (BO) lets you run the existing Q1-Q4 test plans under realistic WAN bandwidth constraints — 100 Mbps cable, 50/10 DSL, 1 Gbps fiber, whatever your customer has. The shaper sits on the VyOS-ISP router and applies bidirectional, asymmetric rate limits at the simulated carrier edge, exposing the NGFW's behavior under real-world ISP conditions.

For the design rationale, see ADR 0008.

What it does

When BO is enabled on a test plan launch, the test-run engine:

  1. Calls scripts/bo-shape.sh --down-mbps N --up-mbps M BEFORE agents start
  2. The script kubectl execs into the VyOS pod and applies VyOS QoS:
  3. Egress shaper on eth1 capping outbound at --down-mbps
  4. Ingress shaper on eth1 (via ifb0) capping inbound at --up-mbps
  5. Runs scripts/bo-verify.sh to confirm the shaper is in place AND measure actual vs requested capacity (5s iperf3 baseline)
  6. Launches the test agents (browser-engine/synthetic-load) — they hit the bandwidth ceiling
  7. At run cleanup, calls scripts/bo-unshape.sh to remove all shaping
  8. Annex J in the run report attests the requested vs measured rate + per-direction byte/drop counters

When to use it

  • Pre-sales: customer says "my branches are on 100 Mbps fiber, will the NGFW behave?"
  • Capacity planning: prove the NGFW works at the customer's actual link speed, not the catalog max
  • Regression: catch buffer/queueing bugs that only appear under bandwidth pressure (NGFW vendors sometimes ship cipher fast-path optimizations that work great at line rate but break down under shaping)
  • Methodology rigor: customer demands RFC-2544/6349-style baseline + load testing under their ISP's actual bandwidth profile

Bandwidth input

Free-form — no fixed presets. Operator types the exact value the customer's ISP delivers. Constraints:

  • Down: integer in Mbps OR Gbps, range 1 Mbps to 100 Gbps
  • Up: integer in Mbps OR Gbps, range 1 Mbps to 100 Gbps
  • Asymmetric: down and up are independent; symmetric is the special case where both are equal (e.g. dedicated fiber)

Common values you'll see in customer environments:

Scenario Down Up
Rural DSL / remote worker 25 Mbps 5 Mbps
SMB cable 100 Mbps 30 Mbps
Mid-market MPLS 200 Mbps 200 Mbps
Branch fiber 500 Mbps 500 Mbps
Regional HQ dedicated 1 Gbps 1 Gbps
Datacenter peering 10 Gbps 10 Gbps

Just type what the customer has — no need to map to a preset.

How to use it from the dashboard

  1. Pick a quadrant as usual (Q1 baseline, Q2 decrypt-only, Q3 NAT-only, Q4 production)
  2. In the new Environment picker:
    Environment: ○ Datacenter (no shaping, current default)
                 ● Branch Office Simulation
                     Down: [_____] [Mbps▾]
                     Up:   [_____] [Mbps▾]
    
  3. Enter the down + up values
  4. Click Run Test Plan

The dashboard validates the values (range, type), passes them to the test-run engine, and the engine handles the rest.

How to use it from the CLI

# Apply shaping manually (rare — usually the test-run engine does this):
scripts/bo-shape.sh --down-mbps 100 --up-mbps 30

# Verify the shaper is correctly configured:
scripts/bo-verify.sh
# Output:
#   Layer J — Branch Office Simulation: PASS
#   Configured: down=100Mbps up=30Mbps
#   Measured  : down=98.7Mbps up=29.4Mbps (delta within 5%)
#   Drops     : down=0 up=0

# Remove shaping after a one-off test:
scripts/bo-unshape.sh

What gets measured

When BO is in place during a test run, the existing dashboard metrics remain (throughput, sessions/sec, latency P50/P95/P99, cert validation rate, etc.). Annex J adds:

  • Configured rate (operator input): down/up Mbps
  • Measured rate (verifier output before agents launched): down/up Mbps
  • delta percentage. Anything > 10% off blocks the run in production gating mode.
  • Per-direction byte counters (during the run): bytes shaped vs bytes attempted; gives you an aggressive utilization number
  • Per-direction drop counters (during the run): packets dropped by the HTB shaper. High drops = the test is hitting the ceiling and the NGFW behavior under rejection is what you're measuring
  • Buffer occupancy estimate (from tc -s qdisc show): peak queue depth on the shaper, useful to compare with NGFW buffer behavior

What is NOT included

By design, BO does bandwidth shaping only. NOT included in v4.4:

  • Latency injection — would mask NGFW latency under queueing pressure with artificial latency from netem. Future work.
  • Packet loss injection — would conflate ISP-side loss with NGFW buffer overruns. Future work.
  • Jitter — same rationale as latency. Future work.
  • Per-flow shaping — single global shaper per direction (matches how real ISPs deliver service to a single branch).
  • AQM (fq_codel, cake) — uses plain HTB + bfifo to expose NGFW buffer behavior, not VyOS queue smarts. ADR 0008 §"AQM-friendly defaults" explains why.

Operator workflow

After every test plan with BO

  1. Run completes successfully → Annex J populated, no action required.
  2. Run aborts on verifier failure → check the dashboard preflight panel. Common causes:
  3. VyOS pod not reachable (check kubectl get pod -n web-agents -l app=vyos-isp-router)
  4. Configured rate way above ISP capability (rare in lab, but tc will silently cap to physical NIC speed)
  5. scripts/bo-verify.sh measured >10% off requested rate (production gating blocks; observational mode warns)

Operator escalation

If the verifier consistently reports "delta > 10%" with no obvious cause:

  • Check the VyOS pod's CPU — if CPU-bound, tc can't push the configured rate. Solution: increase pod CPU request/limit in k8s/dut/45-vyos-isp-router.yaml
  • Check the underlying physical NIC — tc can't shape above the wire speed. Verify ethtool eth1 | grep Speed on the K8s node.
  • Check for competing traffic on the K8s node — other pods on the same macvlan parent eat bandwidth.

Bypass / emergency disable

If BO is misbehaving and blocking test runs you need to do urgently:

# Disable BO for the next runs without removing the dashboard option:
export BO_GATING=observational
# Or in the dashboard UI: select "Datacenter (no shaping)" instead.

The default gating is production (block runs on verifier failure). Override only when you understand WHY the verifier failed.

What's in the report (Annex J preview)

## J — Branch Office Simulation

Operator-requested bandwidth
  Down: 100 Mbps
  Up  :  30 Mbps  (asymmetric)

Pre-test verification (5s iperf3 baseline)
  Down: 98.7 Mbps  (delta -1.3%)  ✓ within 10%
  Up  : 29.4 Mbps  (delta -2.0%)  ✓ within 10%

During-test counters (VyOS HTB shaper)
  Down direction:
    bytes_shaped: 142.3 GB
    drops       : 0
    peak_qlen   : 134 KB
  Up direction:
    bytes_shaped: 18.7 GB
    drops       : 23
    peak_qlen   : 8 KB

Methodology
  Shaper      : VyOS QoS (Linux HTB + ifb for ingress)
  AQM         : NONE (plain bfifo default queue)
  Latency     : NOT injected (only bandwidth shaped)
  Verifier    : RFC 2544 / RFC 6349 reduced sample (5s)

Verdict: PASS — shaping was correctly applied and measured throughout
the run. Drops on up direction (23) indicate the NGFW upload path
hit the cap; check NGFW buffer behavior in the per-quadrant section.

How this composes with the rest of the test bench

  • Q1-Q4 quadrants — orthogonal. You can run any quadrant under any bandwidth profile. Pick both at run launch.
  • Inspection Profile (v4.5) — orthogonal again. Q4 + balanced inspection + 100/30 Mbps is a perfectly valid combination.
  • Annex G layer 5 (L2 BPDU isolation) — independent. BO touches VyOS interface state but the BPDU guard DaemonSet runs every 60s on the K8s nodes regardless.
  • NetFlow / IPFIX validation (v4.7) — composes. BO + NetFlow shows whether the NGFW correctly emits flow records under bandwidth constraint.
  • SDWAN and Cloud On-Ramp test (v4.8) — composes deeply. The SD-WAN tunnel goes over the same shaped WAN link, which is exactly the realistic scenario.

References

  • ADR 0008 — docs/ADR/0008-branch-office-simulation.md
  • VyOS QoS docs — https://docs.vyos.io/en/latest/configuration/trafficpolicy/
  • Linux tc-htb(8) — kernel-level mechanics under VyOS abstraction
  • RFC 2544 — Benchmarking Methodology for Network Interconnect Devices
  • RFC 6349 — Framework for TCP Throughput Testing
  • Apply script — scripts/bo-shape.sh (PR-B)
  • Verify script — scripts/bo-verify.sh (PR-B)
  • Unshape script — scripts/bo-unshape.sh (PR-B)
  • Annex J template — dashboard/templates/annex-j-branch-office.md.tmpl (PR-D)