Physical NGFW test-bed — operations guide¶
Companion to
docs/SPLIT_STACKS.mdanddocs/ARCHITECTURE.md.Scope status (post-Scope-Freeze 2026-05-10) — DUT test-bed now integrates with the DOM (DUT Operating Mode) discriminator (5 modes per ADR 0014), RELAY.Art trust-zone bridge for customer-side MGMT (ADR 0020), and the OOBI VXLAN immutable fabric (ADR 0019). Operator-facing primers: DOM primer · RELAY primer.
Overview¶
The DUT (Device Under Test) test-bed places a physical NGFW in the data path of the entire agent fleet. Every HTTP/2 and HTTP/3 session between the browser-engine/synthetic-load agents and the Caddy persona webservers traverses the NGFW, generating realistic TLS decryption load. The NGFW is monitored via SNMP on a dedicated management VLAN.
💡 Beyond the L7 webserver workload — the same test-bed also runs a catalog of orthogonal stress engines that exercise the NGFW's control plane (BGP / OSPF), L2 tables (MAC / ARP saturation), VPN / SDWAN tunnels, VXLAN overlay, HAR replay at scale, and DPDK line-rate stateful generation. See
docs/STRESS_ENGINES_CATALOG.mdfor the per-engine inspection-surface mapping, ADR references, enable instructions, and verification commands.
This guide covers the Docker Compose DUT mode (docker-compose.dut.*.yml).
For Kubernetes DUT mode, see scripts/k8s-dut-up.sh and the
NGFW Configuration Reference.
Ubuntu host (Cisco UCS) — Docker Compose mode
│
├─ eth1 (802.1q trunk to Cisco Nexus 9000)
│ ├─ eth1.20 → ai_forse_dut_pw (172.16.0.0/16) browser-engine agents
│ ├─ eth1.30 → ai_forse_dut_k6 (172.17.0.0/16) synthetic-load agents
│ └─ eth1.99 → ai_forse_mgmt (10.254.254.0/24) Management (SNMP only)
│
└─ ai_forse_oobi (--internal bridge)
├─ Prometheus ←→ snmp_exporter (dual-homed: OOBI + MGMT)
├─ Grafana
├─ postgres + dashboard
└─ (agent control plane API)
Physical network (Docker mode):
Ubuntu eth1 → Nexus 9000 (trunk port, VLANs 20/30/99)
Nexus 9000 → NGFW DUT (access ports per VLAN OR sub-interfaces)
NGFW DUT → L3 gateway .1 on each data VLAN
Nexus MGMT0 → VLAN 99 (access port)
NGFW MGMT → VLAN 99 (access port)
Kubernetes mode (production) hosts the webservers as 20 Synthetic Personas
on VLANs 101-120 + 10 Cloned Personas on VLANs 200-209. Docker mode (dev only)
runs browser-engine/synthetic-load agents on VLANs 20/30 pointing at the K8s persona ingress —
the legacy Docker-mode VLAN 10 webserver layer was removed in v3.7.0
(K8s personas are the single source of truth for DUT webservers).
Supported NGFW hardware¶
| Vendor | Model examples | SNMP module | Notes |
|---|---|---|---|
| Cisco | Firepower FTD 7.x | cisco_ftd |
FIREWALL-MIB + CRYPTO-ACCEL-MIB |
| Cisco | C8475-G2 IOS-XE | cisco_iosxe |
No FIREWALL-MIB; uses PROCESS-MIB |
| Cisco | Meraki MX450 | cisco_meraki |
IF-MIB only; Dashboard API for sessions/CPU |
| Fortinet | FortiGate FG200F / FG600F | fortinet_fortigate |
fgSysCpu/Mem/SesCount |
| Palo Alto | PA-series / VM-series | palo_alto |
PAN-COMMON-MIB |
| Check Point | Quantum / Gaia R81.x | checkpoint |
CHECKPOINT-MIB + SVN-FOUNDATION-MIB |
| Huawei | USG / NGFW (VRP) | huawei_ngfw |
HUAWEI-ENTITY-EXTENT-MIB |
| Any | Generic / unknown | generic_ngfw |
IF-MIB + HOST-RESOURCES-MIB |
The Cisco Nexus 9000 switch is always monitored with the cisco_nexus module
regardless of which NGFW is in use.
Prerequisites¶
Hardware¶
- Ubuntu Linux host (Cisco UCS recommended) with a dedicated NIC for the trunk.
- Cisco Nexus 9000 switch with 802.1q trunk port toward the Ubuntu host.
- Nexus MGMT0 connected to a switch access port on VLAN 99.
- NGFW DUT with interfaces in VLANs 20, 30 (data) and one on VLAN 99 (management). In Kubernetes mode (production), also configure VLANs 101–120 (Synthetic Personas) and 200–209 (Cloned Personas).
- NGFW configured as the L3 default gateway on each data VLAN:
172.16.0.1(VLAN 20 / browser engine),172.17.0.1(VLAN 30 / synthetic-load engine).
Software¶
- Docker Engine 24+ (not Docker Desktop — macvlan requires the Linux kernel).
scripts/stack-up.shat current version.- Standard stacks already running (
scripts/stack-up.sh up).
SNMP configuration on devices¶
Cisco Nexus 9000 (NX-OS):
snmp-server community public ro
snmp-server host 10.254.254.X traps version 2c public
Cisco Firepower FTD 7.x: Enable SNMP under Platform Settings → SNMP → SNMPv2c community. Ensure the snmp_exporter IP (the MGMT macvlan IP) is in the allowed hosts list.
Fortinet FortiGate:
config system snmp community
edit 1
set name public
config hosts
edit 1
set ip 10.254.254.0 255.255.255.0
end
end
next
end
Palo Alto: Device > Setup > Operations > SNMP Setup. Add the snmp_exporter MGMT IP as a permitted SNMP manager.
Check Point:
Run cpconfig → SNMP Extension, or configure via SmartConsole > Gateway > SNMP.
Huawei:
snmp-agent sys-info version v2c
snmp-agent community read public
snmp-agent target-host trap-hostname prom-mgmt address 10.254.254.X udp-port 162
One-time host setup (Docker Compose mode)¶
# 1. Set the trunk NIC name and bring up all macvlan networks + subinterfaces
sudo DUT_DATA_IFACE=eth1 scripts/netsetup-dut.sh setup
# 2. Verify all networks are present
scripts/netsetup-dut.sh status
# Expected output (Docker mode):
# ✓ eth1.20 (172.16.0.0/16) → ai_forse_dut_pw
# ✓ eth1.30 (172.17.0.0/16) → ai_forse_dut_k6
# ✓ eth1.99 (10.254.254.0/24) → ai_forse_mgmt
# For Kubernetes mode: also run netsetup-personas.sh to create
# VLANs 101-120 (Synthetic Personas) and 200-209 (Cloned Personas):
# sudo scripts/netsetup-personas.sh setup
Environment variables¶
Add to .env before running up-dut:
# DUT physical devices (VLAN 99 / MGMT IPs)
SNMP_NEXUS_HOST=10.254.254.2 # Nexus 9000 MGMT0 IP
SNMP_NGFW_HOST=10.254.254.3 # NGFW management interface IP
SNMP_COMMUNITY=public # SNMPv2c community
# SNMP module for the NGFW DUT
# Options: cisco_ftd | cisco_iosxe | cisco_meraki | fortinet_fortigate |
# palo_alto | checkpoint | huawei_ngfw | generic_ngfw
SNMP_DUT_MODULE=cisco_ftd
# Optional DUT network overrides for Docker Compose mode (only needed if defaults clash)
# DUT_DATA_IFACE=eth1
# DUT_VLAN_PW=20 DUT_SUBNET_PW=172.16.0.0/16 DUT_GW_PW=172.16.0.1
# DUT_VLAN_K6=30 DUT_SUBNET_K6=172.17.0.0/16 DUT_GW_K6=172.17.0.1
# DUT_VLAN_MGMT=99 DUT_SUBNET_MGMT=10.254.254.0/24
Bring up / operate the DUT stack¶
# Full DUT stack (webservers + browser-engine + synthetic-load + observability overlay)
scripts/stack-up.sh up-dut
# Optionally scale synthetic-load engine for load testing
scripts/stack-up.sh scale-k6 10
# Status of all DUT services
scripts/stack-up.sh ps-dut
# Tear down (stops containers; keeps volumes and macvlan networks)
scripts/stack-up.sh down-dut
# Full teardown including macvlan networks
scripts/stack-up.sh destroy-dut
sudo DUT_DATA_IFACE=eth1 scripts/netsetup-dut.sh teardown
Swapping NGFW vendor¶
To switch from one NGFW vendor to another:
- Change
SNMP_DUT_MODULEin.envto the new module name. - Restart only the observability stack:
scripts/stack-up.sh down-dut scripts/stack-up.sh up-dut - Verify in Grafana → NGFW DUT dashboard that new metrics appear within one scrape interval (15 s for the NGFW job).
Data-plane configuration (agent VLANs) doesn't change — the NGFW still needs to be the L3 gateway on VLANs 10/20/30.
Ubuntu host monitoring (Cisco UCS)¶
Physical Ubuntu hosts on the MGMT network are monitored via node_exporter.
Deploy on each Ubuntu/UCS host:
# Run as root on the Ubuntu host
docker run -d --name node_exporter \
--net host --pid host \
--restart unless-stopped \
-v /:/host:ro,rslave \
prom/node-exporter:v1.8.2 \
--path.rootfs=/host \
--collector.systemd \
--collector.processes
Register the host with Prometheus:
Edit observability/prometheus/targets/ubuntu-hosts.yml:
- targets:
- 10.254.254.10:9100 # ucs-host-01
- 10.254.254.11:9100 # ucs-host-02
labels:
role: ubuntu_host
rack: ucs
Prometheus polls the file every 30 s — no restart needed.
UCS chassis-level hardware health (CIMC):
The node_exporter covers OS-level metrics. For chassis sensors (temperature, fans, PSU state), enable SNMP on each server's CIMC:
Cisco IMC > Admin > Communication Services > SNMP > Enable
Community: public
Then add a second scrape job in prometheus.dut.yml using generic_ngfw
module targeting the CIMC management IP, or use the
Cisco UCS Grafana plugin
for richer UCS Manager integration.
Grafana dashboards¶
Three dashboards are provisioned automatically:
Nexus 9000 (dut-nexus9000)¶
| Row | Panels |
|---|---|
| Overview | CPU max %, memory %, max temp, temp state, fan state, PSU state, uptime, total CRC/s |
| CPU | Per-core 1-min + 5-min timeseries, gauge bar |
| Memory | Used / free stacked, largest free block (fragmentation) |
| Temperature | Per-sensor timeseries + shutdown threshold line, state gauge |
| Fans & PSU | Color-mapped state gauges (1=green…5=gray) |
| Interface Throughput | TX↑/RX↓ bps + pps (active interfaces only) |
| CRC / FCS | rate(dot3StatsFCSErrors) + alignment errors + MAC internal errors |
| Queue Drops | rate(cieIfInputQueueDrops) + output queue drops |
| Interface Table | Summary table: operStatus, speed, tx/rx bps, CRC/s |
NGFW DUT (dut-ngfw)¶
| Row | Panels |
|---|---|
| Overview | CPU%, mem%, temp, active sessions, CPS, total DUT throughput, uptime |
| TLS Load | Per-VLAN TX↑/RX↓ (VLAN 20/30 agent-side, VLAN 101-120/200-209 persona-side labeled), aggregate bidir bps |
| Sessions / CPS | Concurrent sessions + rate(cfwConnectionStatTotal) for CPS |
| HW Crypto Engine | Decrypted/encrypted pps, active requests, dropped/s |
| CPU / Memory | Per-core timeseries, gauge, used/free stacked |
| Temperature | Timeseries + shutdown threshold, state gauge |
| Fans & PSU | Color-mapped state gauges |
| CRC / Drops | FCS + alignment errors; queue drops (test validity indicator) |
Note: HW Crypto Engine panels are Cisco-FTD/ASA-specific. Other vendors will show "No data" on those panels — this is expected.
Ubuntu Hosts / UCS (dut-ubuntu-hosts)¶
| Row | Panels |
|---|---|
| Overview | CPU%, mem%, disk%, load1, uptime, kernel |
| CPU | Mode breakdown timeseries (idle hidden), load avg bar gauge |
| Memory | Breakdown stacked (used/buffers/cached/free), swap |
| Disk I/O | Throughput bps + IOPS (read↑/write↓), disk template var |
| Filesystem | Used% bar gauge, free space timeseries |
| Network | bps + pps (RX↑/TX↓), errors + drops, interface template var |
| System | Running/blocked processes, open FDs, context switches/forks |
| Docker Engine | Container state counts (if Docker running on host) |
Troubleshooting¶
snmp_exporter shows no metrics for a device:
1. Verify SNMP is enabled on the device and the community matches SNMP_COMMUNITY.
2. Check the snmp_exporter can reach the device IP from the MGMT macvlan:
docker exec -it <snmp_exporter_container> wget -qO- \
"http://localhost:9116/snmp?module=cisco_nexus&target=10.254.254.2" | head -20
ai_forse_mgmt macvlan is attached to snmp_exporter:
docker inspect <snmp_exporter_container> | jq '.[0].NetworkSettings.Networks | keys'
Prometheus shows ${SNMP_DUT_MODULE} as literal string in targets:
The --config.expand-environment-variables flag must be present in the
prometheus container command. Verify:
docker inspect <prometheus_container> | jq '.[0].Config.Cmd'
"--config.expand-environment-variables".
Containers on DUT macvlan can't reach the NGFW (no route to host):
1. Confirm the NGFW is configured as the L3 gateway (.1) on each VLAN.
2. Check the macvlan parent interface is correct (e.g., ip link show eth1.20 for Docker-mode browser-engine agents, or ip link show eth1.101 for K8s mode Synthetic Personas).
3. macvlan containers cannot communicate with the host via the same parent
interface — the snmp_exporter uses the ai_forse_mgmt macvlan (VLAN 99)
which is separate from the data VLANs.
node_exporter not scraping:
1. Check the host IP and port in ubuntu-hosts.yml are correct.
2. Verify node_exporter is running: docker ps | grep node_exporter on the host.
3. Test from the Prometheus container: wget -qO- http://10.254.254.10:9100/metrics | head -5.
See also¶
docs/SPLIT_STACKS.md— overall topology guidedocs/ARCHITECTURE.md— DUT topology diagramscripts/netsetup-dut.sh— network setup scriptobservability/snmp/snmp.yml— SNMP module definitionsobservability/prometheus/targets/ubuntu-hosts.yml— host list