TLSStress.Art — Dual-Node Deployment on 2 UCS Servers¶
Deployment mode: dual-node — UCS-1 runs the agent fleet (browser-engine + synthetic-load) and UCS-2 runs everything else (personas, services, observability). For a single-server setup see
UBUNTU_K3S_SINGLENODE_QUICKSTART_DEPLOY.en.md; for the four-server distributed topology seeUBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md.Last verified against shipping code: v3.7.0 (2026-05-12) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture + the ZTP-prem 12/12 camadas insider-operator posture (25 patent claims, Tier A/B partition, Confidential Computing, sealed audit hash-chain, K8s admission webhook, TPM 2.0 measured-boot, DLP egress monitor, behavioural anomaly detector). ADRs 0014, 0019-0025 cover post-Freeze additions.
Goal: Deploy a TLSStress.Art across exactly two Cisco UCS servers — one dedicated to load generation, one dedicated to webservers + services + observability — to provide an intermediate scale point between single-node and four-server multi-node.
Who this is for: Network and systems engineers running an NGFW test-bed who have two UCS chassis available and want hardware isolation between the load generators and the webservers being measured.
Time estimate: 60–90 minutes for initial cluster setup; 15 minutes for teardown + redeploy.
Automated installation (recommended)¶
Run the install script once per server in this exact order (UCS-2 first because it hosts the k3s control plane):
# Clone the repository on each server
git clone https://github.com/nollagluiz/AI_forSE.git && cd AI_forSE
# UCS-2 — k3s server + personas + services + observability
sudo ./scripts/k8s-install.sh --mode=dual-server --data-iface=eth1
# ↑ Prints JOIN_TOKEN and SERVER_IP at the end — save them for UCS-1.
# UCS-1 — browser-engine + synthetic-load agents
sudo ./scripts/k8s-install.sh --mode=dual-agent \
--server-ip=<UCS-2-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1
# Back on UCS-2 — apply all Kubernetes manifests
sudo ./scripts/k8s-install.sh --mode=dual-apply
Dry run: append
--dry-runto any command to preview without making changes.
The rest of this guide explains each step in detail and is useful for customising the setup or troubleshooting the automated installation.
Table of contents¶
- Why dual-node
- Architecture overview
- OOBI network — mandatory on both UCS
- Prerequisites per server
- Step 1 — Host preparation (both UCS)
- Step 2 — k3s cluster bootstrap
- Step 3 — VLAN setup per node
- Step 4 — Apply manifests
- Step 5 — Verify the deployment
- Observability — Grafana and Prometheus in dual-node
- Troubleshooting
- Reference — overlay file
1. Why dual-node¶
Single-node is the simplest layout but every workload competes for the same CPU, memory and NIC queues — under load you cannot tell whether the bottleneck is the NGFW or the test bed itself. Multi-node fixes that completely but requires four UCS chassis.
Dual-node is the middle ground:
- Hardware isolation: load generators (browser-engine + synthetic-load) and webservers (personas + cloned-personas) run on different physical UCS chassis, so the only contention path between them is the NGFW under test — exactly the variable being measured.
- Two roles instead of four: half the rack space, half the cabling, half the operational overhead of multi-node.
- Same observability: identical Grafana dashboards and Prometheus alerts as single- and multi-node —
node_exporterruns on both UCS via DaemonSet, and the test-bed-bottleneck alert auto-adapts to the two-node topology.
Pick dual-node when you have two chassis and want clean NGFW measurements without standing up a four-server cluster.
2. Architecture overview¶
Physical topology¶
┌──────────────────────────────────────────────────┐
│ OOBI network — eth0 on BOTH UCS │
│ 10.0.0.0/24 (example) │
│ k3s API :6443 · kubelet · Prometheus scrape │
└────────┬──────────────────────┬───────────────────┘
│ │
┌────────┴────────┐ ┌────────┴────────────────────┐
│ UCS-1 │ │ UCS-2 │
│ role=agents │ │ role=ngfw-dut │
│ │ │ │
│ browser engine │ │ 20 Caddy persona pods │
│ synthetic-load engine │ │ 10 cloned-persona slots │
│ │ │ Dashboard · Postgres │
│ │ │ PgBouncer · Cloner · NFS │
│ │ │ Grafana · Prometheus │
│ │ │ SNMP exporter │
│ eth0 (OOBI) │ │ eth0 (OOBI) │
│ eth1 (trunk) │ │ eth1 (trunk) │
└────────┬─────────┘ └────────────┬─────────────────┘
│ │
VLAN 20 (172.16/16) VLAN 99 mgmt (192.168.90/24)
VLAN 30 (172.17/16) VLAN 40 ISP (DHCP via macvlan)
VLANs 101–120 (10.1.x/27)
VLANs 200–209 (10.2.x/27)
│ │
└─────────────┬─────────────┘
│
┌────────────┴────────────┐
│ Cisco Nexus 9000 │
│ VLAN trunk · ECMP │
│ DSCP AF41 · MTU 9216 │
└────────────┬────────────┘
│
┌────────────┴────────────┐
│ NGFW (Device Under Test)│
│ TLS leg-1: agents→NGFW │
│ TLS leg-2: NGFW→Caddy │
└─────────────────────────┘
Workload placement¶
| Workload | UCS-1 (role=agents) |
UCS-2 (role=ngfw-dut) |
|---|---|---|
browser-engine agents (web-agent) |
✓ | — |
synthetic-load agents (k6-agent) |
✓ | — |
| Synthetic personas (20× Caddy) | — | ✓ |
| Cloned-persona slots (10× Caddy) | — | ✓ |
| Dashboard, Postgres, PgBouncer | — | ✓ |
| Cloner | — | ✓ |
| NFS server (cloned-sites) | — | ✓ |
| SNMP exporter | — | ✓ |
| Prometheus + Grafana | — | ✓ |
node_exporter (per-host metrics) |
✓ DaemonSet | ✓ DaemonSet |
node-tuning (sysctl + BBR + CPU governor) |
✓ DaemonSet (dut-data-plane=true) |
✓ DaemonSet (dut-data-plane=true) |
cni-dhcp (cloner ISP DHCP) |
— | ✓ DaemonSet |
Why role=ngfw-dut on UCS-2 (instead of role=infra)¶
The synthetic personas (personas/_generated/*/deployment.yaml), cloned-personas (k8s/clone-personas/20-slots.yaml) and SNMP exporter (k8s/dut/60-snmp-exporter.yaml) all hardcode nodeSelector: role=ngfw-dut. Reusing that label on UCS-2 avoids touching twenty-something manifests. The dual-node overlay then patches the additional service workloads (Dashboard, Postgres, PgBouncer, Cloner) — which the multi-node overlay sends to role=infra — onto the same role=ngfw-dut node.
3. OOBI network — mandatory on both UCS¶
The OOBI (out-of-band infrastructure) network is the dedicated control-plane segment carried over eth0 on every UCS, regardless of deployment mode.
What runs on OOBI¶
- k3s API server (
:6443) — UCS-1 reaches it on UCS-2 to join the cluster - kubelet (
:10250) — UCS-2 scrapes UCS-1 for pod status - flannel (CNI) — pod-to-pod traffic across nodes uses VXLAN over eth0
- Prometheus scrape —
node_exporter,cAdvisor,kube-state-metricsare all reached over eth0 - cert-manager renewals — webhook + ACME challenge traffic
- Dashboard / kubectl — operator access
What does NOT run on OOBI¶
- NGFW test traffic — that is on
eth1VLAN trunk (data plane) - SNMP polling — VLAN 99 (
192.168.90.0/24) oneth1.99 - Cloner internet egress — VLAN 40 on
eth1.40
Requirements per UCS¶
| UCS-1 (agents) | UCS-2 (services) | |
|---|---|---|
| eth0 IP | static, OOBI subnet | static, OOBI subnet |
| Reachable from peer eth0 | ✓ (kubelet ↔ control plane) | ✓ |
| MTU | 1500 (default) | 1500 (default) |
| Default route via OOBI | ✓ — internet egress for image pulls and k3s install | ✓ |
If eth0 is missing on either UCS, k3s flannel cannot establish the control plane, UCS-1 cannot join the cluster, and Prometheus on UCS-2 cannot scrape host metrics from UCS-1. The pre-flight check in k8s-install.sh rejects both dual-server and dual-agent if eth0 is absent.
4. Prerequisites per server¶
Hardware¶
| UCS-1 | UCS-2 | |
|---|---|---|
| CPU | ≥ 16 physical cores | ≥ 32 physical cores |
| RAM | ≥ 64 GB | ≥ 128 GB |
| NICs | eth0 (OOBI) + eth1 (trunk to Nexus) | eth0 (OOBI) + eth1 (trunk to Nexus) |
| Disk | ≥ 200 GB free on / |
≥ 500 GB free on / (NFS + Postgres + Prometheus retention) |
UCS-2 is the heavier node — it runs personas, all services, and the observability stack. Plan for 2× the resources of UCS-1.
Operating system¶
- Ubuntu 22.04 LTS or 24.04 LTS, amd64
- root access (or sudo)
- internet access via OOBI default route during install (k3s + Helm + container images)
Network — Nexus 9000¶
VLANs that must be configured on the trunk to both UCS:
| VLAN | UCS-1 | UCS-2 | Subnet | Use |
|---|---|---|---|---|
| 20 | ✓ | — | 172.16.0.0/16 | browser-engine agents |
| 30 | ✓ | — | 172.17.0.0/16 | synthetic-load agents |
| 40 | — | ✓ | DHCP from ISP | Cloner internet egress (macvlan) |
| 99 | — | ✓ | 192.168.90.0/24 | SNMP polling — Nexus (.2), NGFW (.3) |
| 101–120 | — | ✓ | 10.1.{1..20}.0/27 | Synthetic persona webservers |
| 200–209 | — | ✓ | 10.2.{1..10}.0/27 | Cloned persona slots |
Strictly: VLAN 20 and VLAN 30 must arrive at UCS-1 only; VLANs 40, 99, 101–120 and 200–209 must arrive at UCS-2 only. The Nexus trunk can carry all of them and let each side ignore the ones it does not consume — the trunk is identical for the two ports.
Step 1 — Host preparation (both UCS)¶
Identical preparation on UCS-1 and UCS-2:
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
curl git jq openssl ca-certificates gnupg lsb-release \
vlan iproute2 ethtool net-tools
# Verify both NICs
ip link show eth0
ip link show eth1
Then run the project's host-tuning script on both UCS — sysctls for high-fan-out QUIC + TCP, BBR + FQ, CPU governor, conntrack, ports:
sudo scripts/host-tuning.sh apply
sudo scripts/host-tuning.sh status
The apply step is idempotent and persists via systemd-sysctl drop-ins. On every measurement run the dashboard's TestBedSysctlMissing alert will fire if either UCS regresses to kernel defaults.
Step 2 — k3s cluster bootstrap¶
UCS-2 first (it is the k3s server)¶
sudo ./scripts/k8s-install.sh --mode=dual-server --data-iface=eth1
The script:
- Pre-flights eth0 + eth1 + ISP interface
- Creates VLAN subinterfaces 40, 99, 101–120, 200–209 on
eth1 - Brings up the
eth1.40ISP subinterface and persists it via netplan - Installs k3s server, binds the API to
eth0, disables Traefik and ServiceLB - Installs Helm, cert-manager, Multus
- Labels the node
role=ngfw-dut+dut-data-plane=true - Prints the JOIN_TOKEN and SERVER_IP
Save the printed credentials — UCS-1 needs them.
UCS-1 (agents)¶
sudo ./scripts/k8s-install.sh --mode=dual-agent \
--server-ip=<UCS-2-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1
The script:
- Pre-flights eth0 + eth1 (both mandatory)
- Creates VLAN subinterfaces 20, 30 on
eth1 - Joins k3s as agent over eth0 → UCS-2:6443
- Labels itself
role=agents+dut-data-plane=true
Verify from UCS-2:
kubectl get nodes -o wide
# Expect:
# NAME STATUS ROLES AGE VERSION LABELS
# ucs-1 Ready agent 1m v1.31.x role=agents,dut-data-plane=true,...
# ucs-2 Ready master 3m v1.31.x role=ngfw-dut,dut-data-plane=true,...
Step 3 — VLAN setup per node¶
The install script performs this automatically. To verify:
UCS-1:
ip -br link | grep eth1\\.
# eth1.20 UP (172.16.0.1/16)
# eth1.30 UP (172.17.0.1/16)
UCS-2:
ip -br link | grep eth1\\.
# eth1.40 UP (no IP — DHCP via cloner pod macvlan)
# eth1.99 UP (192.168.90.1/24)
# eth1.101 UP (10.1.1.1/27)
# … through eth1.120 (10.1.20.1/27)
# eth1.200 UP (10.2.1.1/27)
# … through eth1.209 (10.2.10.1/27)
VLAN persistence across reboots is handled by netplan (script writes /etc/netplan/99-*.yaml).
Step 4 — Apply manifests¶
From UCS-2:
# Create the NGFW CA configmap (once)
kubectl create configmap ngfw-ca -n web-agents --from-file=ngfw-ca.crt=<path-to-cert>
# Create application secrets from .env
kubectl create secret generic web-agent-secrets -n web-agents --from-env-file=.env
# Apply everything
sudo ./scripts/k8s-install.sh --mode=dual-apply
This applies:
kubectl apply -k overlays/dual-node/— base k8s manifests with browser-engine/synthetic-load pinned torole=agentsand Dashboard/Postgres/PgBouncer/Cloner pinned torole=ngfw-dutkubectl apply -k k8s/dut/— DUT overlay (NADs, SNMP probes, NFS server, node-tuning DaemonSet, cAdvisor ServiceMonitor, infra Prometheus rules)- Patches
node-tuningDaemonSetnodeSelectortodut-data-plane=true(runs on both UCS) - Patches browser-engine + synthetic-load deployments via
40-playwright-patch.yaml/50-k6-patch.yaml— Multus net1 macvlan annotation, NGFW CA trust, REJECT_INVALID_CERTS=true - Re-pins browser-engine + synthetic-load to
role=agents(the patch files default torole=ngfw-dutfor single-node compatibility — dual-node overrides) - Waits for all pods to become Ready
Step 5 — Verify the deployment¶
# Pods on UCS-1 (agents)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-1
# Expect: web-agent-* and k6-agent-* only
# Pods on UCS-2 (services + personas)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-2
# Expect: dashboard, postgres, pgbouncer, cloner, nfs-server,
# persona-shop-*, persona-news-*, …, clone-persona-1-*, …
# Synthetic + cloned personas in their own namespaces
kubectl get pods --all-namespaces -o wide | grep -E "persona-|clone-persona-"
# Multus net1 attached on agents
kubectl exec -n web-agents deploy/web-agent -- ip -br addr | grep net1
# Expect a 172.16.x.x address
# Check NFS server is reachable from cloner
kubectl exec -n web-agents deploy/cloner -- mount | grep nfs
# End-to-end smoke test — agents hit a persona via the NGFW
kubectl exec -n web-agents deploy/web-agent -- \
curl -sf https://shop.persona.local/ | head -1
6. Observability — Grafana and Prometheus in dual-node¶
Observability runs entirely on UCS-2 (role=ngfw-dut):
- Prometheus — scrapes both UCS over OOBI (eth0) at port
:9100fornode_exporter, kubelet:10250for cAdvisor, and:8080forkube-state-metrics - Grafana —
Test-Bed Infrastructure Healthdashboard auto-adapts to two-node topology via per-host filters andcount(node_exporter)recording rules - Alerts — the composite
TestBedInfrastructureBottleneck, host-level (HostUDPBufferOverflow,HostConntrackNearFull, etc.) and pod-level (PodCPUThrottled,OOMKilled) alerts all work out-of-the-box. SeeMONITORING_TEST_VALIDITY.mdfor the full alert catalogue
The node_exporter DaemonSet has no nodeSelector and tolerations: operator: Exists, so it covers both UCS without further configuration. The NodeExporterCoverageIncomplete alert fires automatically if either node stops reporting host metrics.
To open Grafana from your operator workstation (assuming you have kubectl access to UCS-2):
kubectl port-forward -n web-agents svc/grafana 3000:3000
# then visit http://localhost:3000
7. Troubleshooting¶
UCS-1 fails to join the cluster¶
Symptoms: dual-agent install hangs at Joining k3s cluster as agent.
# On UCS-1 — check OOBI reachability to UCS-2
ping <UCS-2-OOBI-IP>
nc -vz <UCS-2-OOBI-IP> 6443
# On UCS-1 — check k3s-agent service status and logs
sudo systemctl status k3s-agent
sudo journalctl -u k3s-agent -n 100
Most common causes: OOBI subnet mismatch, firewall blocking 6443, or wrong JOIN_TOKEN.
browser-engine pods stay in Pending¶
kubectl describe pod -n web-agents -l app=web-agent | grep -A4 Events
If you see node(s) didn't match Pod's node affinity/selector, the agent re-pinning step did not run. Apply it manually:
kubectl patch deployment web-agent -n web-agents --type=strategic-merge-patch \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"role":"agents"}}}}}'
kubectl patch deployment k6-agent -n web-agents --type=strategic-merge-patch \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"role":"agents"}}}}}'
Personas stay in Pending¶
kubectl describe pod -n persona-shop persona-shop-* | grep -A4 Events
If node(s) didn't match again, UCS-2 is not labeled role=ngfw-dut. Re-label:
kubectl label node ucs-2 role=ngfw-dut dut-data-plane=true --overwrite
Cloner cannot reach the internet¶
Cloner uses eth1.40 macvlan with DHCP. Verify on UCS-2:
ip link show eth1.40
sudo dhclient -v eth1.40 # only as a manual diagnostic — pod has its own DHCP
If eth1.40 is missing, re-run setup_isp_iface (it is part of dual-server).
node_exporter only reports one node¶
kubectl get pods -n web-agents -l app.kubernetes.io/name=node-exporter -o wide
Both ucs-1 and ucs-2 should appear. If not, check:
kubectl describe daemonset node-exporter -n web-agents | grep -A3 Tolerations
Tolerations should be operator: Exists (matches every taint). If a node has a custom taint blocking it, add a toleration or remove the taint.
8. Reference — overlay file¶
The dual-node overlay lives in overlays/dual-node/kustomization.yaml. It inherits from k8s/ and adds six strategic-merge patches:
patches:
# browser-engine + K6 → UCS-1
- target: { kind: Deployment, name: web-agent } # role=agents
- target: { kind: Deployment, name: k6-agent } # role=agents
# Services → UCS-2 (reuses role=ngfw-dut)
- target: { kind: Deployment, name: dashboard } # role=ngfw-dut
- target: { kind: StatefulSet, name: postgres } # role=ngfw-dut
- target: { kind: Deployment, name: pgbouncer } # role=ngfw-dut
- target: { kind: Deployment, name: cloner } # role=ngfw-dut
To apply manually (without the install script):
kubectl apply -k overlays/dual-node/
kubectl apply -k k8s/dut/
kubectl patch daemonset node-tuning -n web-agents \
--type=strategic-merge-patch \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"dut-data-plane":"true"}}}}}'
See also¶
UBUNTU_K3S_SINGLENODE_QUICKSTART_DEPLOY.en.md— single-server alternativeUBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md— four-server alternativeMONITORING_TEST_VALIDITY.md— observability + test-validity alertsPERFORMANCE_TUNING_HOST.md— host kernel tuningSYSTEM_OVERVIEW.md— architecture reference for all three deployment modes