Skip to content

Physical NGFW test-bed — operations guide

Companion to docs/SPLIT_STACKS.md and docs/ARCHITECTURE.md.

Scope status (post-Scope-Freeze 2026-05-10) — DUT test-bed now integrates with the DOM (DUT Operating Mode) discriminator (5 modes per ADR 0014), RELAY.Art trust-zone bridge for customer-side MGMT (ADR 0020), and the OOBI VXLAN immutable fabric (ADR 0019). Operator-facing primers: DOM primer · RELAY primer.

Overview

The DUT (Device Under Test) test-bed places a physical NGFW in the data path of the entire agent fleet. Every HTTP/2 and HTTP/3 session between the browser-engine/synthetic-load agents and the Caddy persona webservers traverses the NGFW, generating realistic TLS decryption load. The NGFW is monitored via SNMP on a dedicated management VLAN.

💡 Beyond the L7 webserver workload — the same test-bed also runs a catalog of orthogonal stress engines that exercise the NGFW's control plane (BGP / OSPF), L2 tables (MAC / ARP saturation), VPN / SDWAN tunnels, VXLAN overlay, HAR replay at scale, and DPDK line-rate stateful generation. See docs/STRESS_ENGINES_CATALOG.md for the per-engine inspection-surface mapping, ADR references, enable instructions, and verification commands.

This guide covers the Docker Compose DUT mode (docker-compose.dut.*.yml). For Kubernetes DUT mode, see scripts/k8s-dut-up.sh and the NGFW Configuration Reference.

Ubuntu host (Cisco UCS) — Docker Compose mode
│
├─ eth1 (802.1q trunk to Cisco Nexus 9000)
│   ├─ eth1.20  →  ai_forse_dut_pw   (172.16.0.0/16)  browser-engine agents
│   ├─ eth1.30  →  ai_forse_dut_k6   (172.17.0.0/16)  synthetic-load agents
│   └─ eth1.99  →  ai_forse_mgmt     (10.254.254.0/24)  Management (SNMP only)
│
└─ ai_forse_oobi (--internal bridge)
    ├─ Prometheus  ←→  snmp_exporter (dual-homed: OOBI + MGMT)
    ├─ Grafana
    ├─ postgres + dashboard
    └─ (agent control plane API)

Physical network (Docker mode):
  Ubuntu eth1 → Nexus 9000 (trunk port, VLANs 20/30/99)
  Nexus 9000  → NGFW DUT (access ports per VLAN OR sub-interfaces)
  NGFW DUT    → L3 gateway .1 on each data VLAN
  Nexus MGMT0 → VLAN 99 (access port)
  NGFW MGMT   → VLAN 99 (access port)

Kubernetes mode (production) hosts the webservers as 20 Synthetic Personas
on VLANs 101-120 + 10 Cloned Personas on VLANs 200-209. Docker mode (dev only)
runs browser-engine/synthetic-load agents on VLANs 20/30 pointing at the K8s persona ingress —
the legacy Docker-mode VLAN 10 webserver layer was removed in v3.7.0
(K8s personas are the single source of truth for DUT webservers).

Supported NGFW hardware

Vendor Model examples SNMP module Notes
Cisco Firepower FTD 7.x cisco_ftd FIREWALL-MIB + CRYPTO-ACCEL-MIB
Cisco C8475-G2 IOS-XE cisco_iosxe No FIREWALL-MIB; uses PROCESS-MIB
Cisco Meraki MX450 cisco_meraki IF-MIB only; Dashboard API for sessions/CPU
Fortinet FortiGate FG200F / FG600F fortinet_fortigate fgSysCpu/Mem/SesCount
Palo Alto PA-series / VM-series palo_alto PAN-COMMON-MIB
Check Point Quantum / Gaia R81.x checkpoint CHECKPOINT-MIB + SVN-FOUNDATION-MIB
Huawei USG / NGFW (VRP) huawei_ngfw HUAWEI-ENTITY-EXTENT-MIB
Any Generic / unknown generic_ngfw IF-MIB + HOST-RESOURCES-MIB

The Cisco Nexus 9000 switch is always monitored with the cisco_nexus module regardless of which NGFW is in use.

Prerequisites

Hardware

  • Ubuntu Linux host (Cisco UCS recommended) with a dedicated NIC for the trunk.
  • Cisco Nexus 9000 switch with 802.1q trunk port toward the Ubuntu host.
  • Nexus MGMT0 connected to a switch access port on VLAN 99.
  • NGFW DUT with interfaces in VLANs 20, 30 (data) and one on VLAN 99 (management). In Kubernetes mode (production), also configure VLANs 101–120 (Synthetic Personas) and 200–209 (Cloned Personas).
  • NGFW configured as the L3 default gateway on each data VLAN: 172.16.0.1 (VLAN 20 / browser engine), 172.17.0.1 (VLAN 30 / synthetic-load engine).

Software

  • Docker Engine 24+ (not Docker Desktop — macvlan requires the Linux kernel).
  • scripts/stack-up.sh at current version.
  • Standard stacks already running (scripts/stack-up.sh up).

SNMP configuration on devices

Cisco Nexus 9000 (NX-OS):

snmp-server community public ro
snmp-server host 10.254.254.X traps version 2c public

Cisco Firepower FTD 7.x: Enable SNMP under Platform Settings → SNMP → SNMPv2c community. Ensure the snmp_exporter IP (the MGMT macvlan IP) is in the allowed hosts list.

Fortinet FortiGate:

config system snmp community
  edit 1
    set name public
    config hosts
      edit 1
        set ip 10.254.254.0 255.255.255.0
      end
    end
  next
end

Palo Alto: Device > Setup > Operations > SNMP Setup. Add the snmp_exporter MGMT IP as a permitted SNMP manager.

Check Point: Run cpconfig → SNMP Extension, or configure via SmartConsole > Gateway > SNMP.

Huawei:

snmp-agent sys-info version v2c
snmp-agent community read public
snmp-agent target-host trap-hostname prom-mgmt address 10.254.254.X udp-port 162

One-time host setup (Docker Compose mode)

# 1. Set the trunk NIC name and bring up all macvlan networks + subinterfaces
sudo DUT_DATA_IFACE=eth1 scripts/netsetup-dut.sh setup

# 2. Verify all networks are present
scripts/netsetup-dut.sh status

# Expected output (Docker mode):
#   ✓ eth1.20 (172.16.0.0/16) → ai_forse_dut_pw
#   ✓ eth1.30 (172.17.0.0/16) → ai_forse_dut_k6
#   ✓ eth1.99 (10.254.254.0/24) → ai_forse_mgmt

# For Kubernetes mode: also run netsetup-personas.sh to create
# VLANs 101-120 (Synthetic Personas) and 200-209 (Cloned Personas):
#   sudo scripts/netsetup-personas.sh setup

Environment variables

Add to .env before running up-dut:

# DUT physical devices (VLAN 99 / MGMT IPs)
SNMP_NEXUS_HOST=10.254.254.2       # Nexus 9000 MGMT0 IP
SNMP_NGFW_HOST=10.254.254.3        # NGFW management interface IP
SNMP_COMMUNITY=public            # SNMPv2c community

# SNMP module for the NGFW DUT
# Options: cisco_ftd | cisco_iosxe | cisco_meraki | fortinet_fortigate |
#          palo_alto | checkpoint | huawei_ngfw | generic_ngfw
SNMP_DUT_MODULE=cisco_ftd

# Optional DUT network overrides for Docker Compose mode (only needed if defaults clash)
# DUT_DATA_IFACE=eth1
# DUT_VLAN_PW=20      DUT_SUBNET_PW=172.16.0.0/16    DUT_GW_PW=172.16.0.1
# DUT_VLAN_K6=30      DUT_SUBNET_K6=172.17.0.0/16    DUT_GW_K6=172.17.0.1
# DUT_VLAN_MGMT=99    DUT_SUBNET_MGMT=10.254.254.0/24

Bring up / operate the DUT stack

# Full DUT stack (webservers + browser-engine + synthetic-load + observability overlay)
scripts/stack-up.sh up-dut

# Optionally scale synthetic-load engine for load testing
scripts/stack-up.sh scale-k6 10

# Status of all DUT services
scripts/stack-up.sh ps-dut

# Tear down (stops containers; keeps volumes and macvlan networks)
scripts/stack-up.sh down-dut

# Full teardown including macvlan networks
scripts/stack-up.sh destroy-dut
sudo DUT_DATA_IFACE=eth1 scripts/netsetup-dut.sh teardown

Swapping NGFW vendor

To switch from one NGFW vendor to another:

  1. Change SNMP_DUT_MODULE in .env to the new module name.
  2. Restart only the observability stack:
    scripts/stack-up.sh down-dut
    scripts/stack-up.sh up-dut
    
  3. Verify in Grafana → NGFW DUT dashboard that new metrics appear within one scrape interval (15 s for the NGFW job).

Data-plane configuration (agent VLANs) doesn't change — the NGFW still needs to be the L3 gateway on VLANs 10/20/30.

Ubuntu host monitoring (Cisco UCS)

Physical Ubuntu hosts on the MGMT network are monitored via node_exporter.

Deploy on each Ubuntu/UCS host:

# Run as root on the Ubuntu host
docker run -d --name node_exporter \
  --net host --pid host \
  --restart unless-stopped \
  -v /:/host:ro,rslave \
  prom/node-exporter:v1.8.2 \
  --path.rootfs=/host \
  --collector.systemd \
  --collector.processes

Register the host with Prometheus:

Edit observability/prometheus/targets/ubuntu-hosts.yml:

- targets:
    - 10.254.254.10:9100   # ucs-host-01
    - 10.254.254.11:9100   # ucs-host-02
  labels:
    role: ubuntu_host
    rack: ucs

Prometheus polls the file every 30 s — no restart needed.

UCS chassis-level hardware health (CIMC):

The node_exporter covers OS-level metrics. For chassis sensors (temperature, fans, PSU state), enable SNMP on each server's CIMC:

Cisco IMC > Admin > Communication Services > SNMP > Enable
Community: public

Then add a second scrape job in prometheus.dut.yml using generic_ngfw module targeting the CIMC management IP, or use the Cisco UCS Grafana plugin for richer UCS Manager integration.

Grafana dashboards

Three dashboards are provisioned automatically:

Nexus 9000 (dut-nexus9000)

Row Panels
Overview CPU max %, memory %, max temp, temp state, fan state, PSU state, uptime, total CRC/s
CPU Per-core 1-min + 5-min timeseries, gauge bar
Memory Used / free stacked, largest free block (fragmentation)
Temperature Per-sensor timeseries + shutdown threshold line, state gauge
Fans & PSU Color-mapped state gauges (1=green…5=gray)
Interface Throughput TX↑/RX↓ bps + pps (active interfaces only)
CRC / FCS rate(dot3StatsFCSErrors) + alignment errors + MAC internal errors
Queue Drops rate(cieIfInputQueueDrops) + output queue drops
Interface Table Summary table: operStatus, speed, tx/rx bps, CRC/s

NGFW DUT (dut-ngfw)

Row Panels
Overview CPU%, mem%, temp, active sessions, CPS, total DUT throughput, uptime
TLS Load Per-VLAN TX↑/RX↓ (VLAN 20/30 agent-side, VLAN 101-120/200-209 persona-side labeled), aggregate bidir bps
Sessions / CPS Concurrent sessions + rate(cfwConnectionStatTotal) for CPS
HW Crypto Engine Decrypted/encrypted pps, active requests, dropped/s
CPU / Memory Per-core timeseries, gauge, used/free stacked
Temperature Timeseries + shutdown threshold, state gauge
Fans & PSU Color-mapped state gauges
CRC / Drops FCS + alignment errors; queue drops (test validity indicator)

Note: HW Crypto Engine panels are Cisco-FTD/ASA-specific. Other vendors will show "No data" on those panels — this is expected.

Ubuntu Hosts / UCS (dut-ubuntu-hosts)

Row Panels
Overview CPU%, mem%, disk%, load1, uptime, kernel
CPU Mode breakdown timeseries (idle hidden), load avg bar gauge
Memory Breakdown stacked (used/buffers/cached/free), swap
Disk I/O Throughput bps + IOPS (read↑/write↓), disk template var
Filesystem Used% bar gauge, free space timeseries
Network bps + pps (RX↑/TX↓), errors + drops, interface template var
System Running/blocked processes, open FDs, context switches/forks
Docker Engine Container state counts (if Docker running on host)

Troubleshooting

snmp_exporter shows no metrics for a device: 1. Verify SNMP is enabled on the device and the community matches SNMP_COMMUNITY. 2. Check the snmp_exporter can reach the device IP from the MGMT macvlan:

docker exec -it <snmp_exporter_container> wget -qO- \
  "http://localhost:9116/snmp?module=cisco_nexus&target=10.254.254.2" | head -20
3. Confirm ai_forse_mgmt macvlan is attached to snmp_exporter:
docker inspect <snmp_exporter_container> | jq '.[0].NetworkSettings.Networks | keys'

Prometheus shows ${SNMP_DUT_MODULE} as literal string in targets: The --config.expand-environment-variables flag must be present in the prometheus container command. Verify:

docker inspect <prometheus_container> | jq '.[0].Config.Cmd'
It should include "--config.expand-environment-variables".

Containers on DUT macvlan can't reach the NGFW (no route to host): 1. Confirm the NGFW is configured as the L3 gateway (.1) on each VLAN. 2. Check the macvlan parent interface is correct (e.g., ip link show eth1.20 for Docker-mode browser-engine agents, or ip link show eth1.101 for K8s mode Synthetic Personas). 3. macvlan containers cannot communicate with the host via the same parent interface — the snmp_exporter uses the ai_forse_mgmt macvlan (VLAN 99) which is separate from the data VLANs.

node_exporter not scraping: 1. Check the host IP and port in ubuntu-hosts.yml are correct. 2. Verify node_exporter is running: docker ps | grep node_exporter on the host. 3. Test from the Prometheus container: wget -qO- http://10.254.254.10:9100/metrics | head -5.

See also