TLSStress.Art — Tri-Node Deployment on 3 UCS Servers¶
Deployment mode: tri-node — UCS-1 runs browser-engine agents only, UCS-2 runs synthetic-load agents only, and UCS-3 runs personas + services + observability. For a single-server setup see
UBUNTU_K3S_SINGLENODE_QUICKSTART_DEPLOY.en.md; for the two-server topology seeUBUNTU_K3S_DUALNODE_QUICKSTART_DEPLOY.en.md; for four-server distributed seeUBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md.Goal: Deploy a TLSStress.Art across exactly three Cisco UCS servers — one dedicated to browser engine load generation, one dedicated to synthetic-load engine load generation, and one dedicated to webservers + services + observability — to provide a middle scale point between dual-node and four-server multi-node and to isolate the two agent runtimes from each other.
Last verified against shipping code: v3.7.0 (2026-05-12) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture + the ZTP-prem 12/12 camadas insider-operator posture (25 patent claims, Tier A/B partition, Confidential Computing, sealed audit hash-chain, K8s admission webhook, TPM 2.0 measured-boot, DLP egress monitor, behavioural anomaly detector). ADRs 0014, 0019-0025 cover post-Freeze additions.
Who this is for: Network and systems engineers running an NGFW test-bed who have three UCS chassis available and want hardware isolation between browser-engine agents, synthetic-load agents, and webservers without standing up a fourth chassis for services.
Time estimate: 75–105 minutes for initial cluster setup; 15 minutes for teardown + redeploy.
Automated installation (recommended)¶
Run the install script once per server in this exact order (UCS-3 first because it hosts the k3s control plane):
# Clone the repository on each server
git clone https://github.com/nollagluiz/AI_forSE.git && cd AI_forSE
# UCS-3 — k3s server + personas + services + observability
sudo ./scripts/k8s-install.sh --mode=tri-server --data-iface=eth1
# ↑ Prints JOIN_TOKEN and SERVER_IP at the end — save them.
# UCS-1 — browser-engine agents only
sudo ./scripts/k8s-install.sh --mode=tri-agent --role=playwright \
--server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1
# UCS-2 — synthetic-load agents only
sudo ./scripts/k8s-install.sh --mode=tri-agent --role=k6 \
--server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1
# Back on UCS-3 — apply all Kubernetes manifests
sudo ./scripts/k8s-install.sh --mode=tri-apply
Dry run: append
--dry-runto any command to preview without making changes.
The rest of this guide explains each step in detail and is useful for customising the setup or troubleshooting the automated installation.
Table of contents¶
- Why tri-node
- Architecture overview
- OOBI network — mandatory on all three UCS
- Prerequisites per server
- Step 1 — Host preparation (all three UCS)
- Step 2 — k3s cluster bootstrap
- Step 3 — VLAN setup per node
- Step 4 — Apply manifests
- Step 5 — Verify the deployment
- Observability — Grafana and Prometheus in tri-node
- Troubleshooting
- Reference — overlay file
1. Why tri-node¶
Choose tri-node when:
- You have three UCS chassis available and want hardware isolation between browser-engine agents, synthetic-load agents, and webservers.
- The two agent runtimes shouldn't compete for resources. browser engine spawns headless Chromium processes (memory- and IO-heavy); synthetic-load engine runs a single Go process emitting tens of thousands of HTTP/2 + HTTP/3 requests (CPU- and connection-heavy). Sharing one host means whichever runtime spikes first throttles the other and your NGFW measurements become noisy.
- You don't have a fourth chassis for the services tier (which would be the multi-node layout).
Compared to the other modes:
| Single | Dual | Tri | Multi | |
|---|---|---|---|---|
| UCS count | 1 | 2 | 3 | 4 |
| browser-engine + synthetic-load share a host | yes | yes (both on UCS-1) | no — separate UCS | no |
| Services on the agents host | yes | no | no | no |
| Hardware isolation between browser-engine and synthetic-load | — | — | ✓ | ✓ |
| Hardware isolation between agents and webservers | — | ✓ | ✓ | ✓ |
| Dedicated services chassis | — | — | — | ✓ |
2. Architecture overview¶
Physical topology¶
┌────────────────────────────────────────────────────────┐
│ OOBI network — eth0 on ALL three UCS │
│ 10.0.0.0/24 (example) │
│ k3s API :6443 · kubelet · Prometheus scrape │
└──────┬───────────────────┬────────────────┬─────────────┘
│ │ │
┌────────┴───────┐ ┌────────┴────────┐ ┌────┴───────────────────┐
│ UCS-1 │ │ UCS-2 │ │ UCS-3 │
│ role=playwright │ │ role=k6 │ │ role=ngfw-dut │
│ │ │ │ │ │
│ browser engine │ │ synthetic-load engine │ │ 20 Caddy persona pods │
│ agents only │ │ agents only │ │ 10 cloned-persona │
│ │ │ │ │ slots │
│ │ │ │ │ Dashboard · Postgres │
│ │ │ │ │ PgBouncer · Cloner │
│ │ │ │ │ NFS · Grafana │
│ │ │ │ │ Prometheus │
│ │ │ │ │ SNMP exporter │
│ eth0 (OOBI) │ │ eth0 (OOBI) │ │ eth0 (OOBI) │
│ eth1 (trunk) │ │ eth1 (trunk) │ │ eth1 (trunk) │
└────────┬─────────┘ └────────┬─────────┘ └──────────┬──────────────┘
│ │ │
VLAN 20 (172.16/16) VLAN 30 (172.17/16) VLAN 99 mgmt (192.168.90/24)
VLAN 40 ISP (DHCP via macvlan)
VLANs 101–120 (10.1.x/27)
VLANs 200–209 (10.2.x/27)
│ │ │
└─────────┬───────────┴───────────────┬───────┘
│ │
┌──────────┴──────────────┐
│ Cisco Nexus 9000 │
│ VLAN trunk · ECMP │
│ DSCP AF41 · MTU 9216 │
└──────────┬──────────────┘
│
┌──────────┴──────────────┐
│ NGFW (Device Under Test)│
│ TLS leg-1: agents→NGFW │
│ TLS leg-2: NGFW→Caddy │
└─────────────────────────┘
Workload placement¶
| Workload | UCS-1 (role=playwright) |
UCS-2 (role=k6) |
UCS-3 (role=ngfw-dut) |
|---|---|---|---|
browser-engine agents (web-agent) |
✓ | — | — |
synthetic-load agents (k6-agent) |
— | ✓ | — |
| Synthetic personas (20× Caddy) | — | — | ✓ |
| Cloned-persona slots (10× Caddy) | — | — | ✓ |
| Dashboard, Postgres, PgBouncer | — | — | ✓ |
| Cloner | — | — | ✓ |
| NFS server (cloned-sites) | — | — | ✓ |
| SNMP exporter | — | — | ✓ |
| Prometheus + Grafana | — | — | ✓ |
node_exporter (per-host metrics) |
✓ DaemonSet | ✓ DaemonSet | ✓ DaemonSet |
node-tuning (sysctl + BBR + CPU governor) |
✓ DaemonSet (dut-data-plane=true) |
✓ DaemonSet (dut-data-plane=true) |
✓ DaemonSet (dut-data-plane=true) |
cni-dhcp (cloner ISP DHCP) |
— | — | ✓ DaemonSet |
Why role=ngfw-dut on UCS-3 (instead of role=infra)¶
The synthetic personas (personas/_generated/*/deployment.yaml), cloned-personas (k8s/clone-personas/20-slots.yaml) and SNMP exporter (k8s/dut/60-snmp-exporter.yaml) all hardcode nodeSelector: role=ngfw-dut. Reusing that label on UCS-3 avoids touching twenty-something manifests. The tri-node overlay then patches the additional service workloads (Dashboard, Postgres, PgBouncer, Cloner) — which the multi-node overlay sends to role=infra — onto the same role=ngfw-dut node. This is exactly the same scheme the dual-node overlay uses for UCS-2.
3. OOBI network — mandatory on all three UCS¶
The OOBI (out-of-band infrastructure) network is the dedicated control-plane segment carried over eth0 on every UCS, regardless of deployment mode.
What runs on OOBI¶
- k3s API server (
:6443) — UCS-1 and UCS-2 reach it on UCS-3 to join the cluster - kubelet (
:10250) — UCS-3 scrapes UCS-1 and UCS-2 for pod status - flannel (CNI) — pod-to-pod traffic across nodes uses VXLAN over eth0
- Prometheus scrape —
node_exporter,cAdvisor,kube-state-metricsare all reached over eth0 - cert-manager renewals — webhook + ACME challenge traffic
- Dashboard / kubectl — operator access
What does NOT run on OOBI¶
- NGFW test traffic — that is on
eth1VLAN trunk (data plane) - SNMP polling — VLAN 99 (
192.168.90.0/24) oneth1.99 - Cloner internet egress — VLAN 40 on
eth1.40
Requirements per UCS¶
| UCS-1 (browser engine) | UCS-2 (synthetic-load engine) | UCS-3 (services) | |
|---|---|---|---|
| eth0 IP | static, OOBI subnet | static, OOBI subnet | static, OOBI subnet |
| Reachable from peer eth0 | ✓ | ✓ | ✓ |
| MTU | 1500 (default) | 1500 (default) | 1500 (default) |
| Default route via OOBI | ✓ — internet egress for image pulls and k3s install | ✓ | ✓ |
If eth0 is missing on any UCS, k3s flannel cannot establish the control plane, the affected agent node cannot join the cluster, and Prometheus on UCS-3 cannot scrape host metrics from the missing node. The pre-flight check in k8s-install.sh rejects all three modes (tri-server, tri-agent, and the per-node validation in tri-apply) if eth0 is absent.
4. Prerequisites per server¶
Hardware¶
| UCS-1 (browser engine) | UCS-2 (synthetic-load engine) | UCS-3 (services) | |
|---|---|---|---|
| CPU | ≥ 16 physical cores | ≥ 16 physical cores | ≥ 32 physical cores |
| RAM | ≥ 64 GB | ≥ 32 GB | ≥ 128 GB |
| NICs | eth0 (OOBI) + eth1 (trunk) | eth0 (OOBI) + eth1 (trunk) | eth0 (OOBI) + eth1 (trunk) |
| Disk | ≥ 100 GB free on / |
≥ 100 GB free on / |
≥ 500 GB free on / (NFS + Postgres + Prometheus retention) |
UCS-3 is the heaviest node. browser engine (UCS-1) is generally heavier than synthetic-load engine (UCS-2) because each agent spawns its own headless Chromium.
Operating system¶
- Ubuntu 22.04 LTS or 24.04 LTS, amd64
- root access (or sudo)
- internet access via OOBI default route during install (k3s + Helm + container images)
Network — Nexus 9000¶
VLANs that must be configured on the trunk to all three UCS:
| VLAN | UCS-1 | UCS-2 | UCS-3 | Subnet | Use |
|---|---|---|---|---|---|
| 20 | ✓ | — | — | 172.16.0.0/16 | browser-engine agents |
| 30 | — | ✓ | — | 172.17.0.0/16 | synthetic-load agents |
| 40 | — | — | ✓ | DHCP from ISP | Cloner internet egress (macvlan) |
| 99 | — | — | ✓ | 192.168.90.0/24 | SNMP polling — Nexus (.2), NGFW (.3) |
| 101–120 | — | — | ✓ | 10.1.{1..20}.0/27 | Synthetic persona webservers |
| 200–209 | — | — | ✓ | 10.2.{1..10}.0/27 | Cloned persona slots |
Strictly: VLAN 20 must arrive at UCS-1 only; VLAN 30 at UCS-2 only; VLANs 40, 99, 101–120 and 200–209 at UCS-3 only. The Nexus trunk can carry all of them and let each side ignore the ones it does not consume — the trunk is identical for the three ports.
Step 1 — Host preparation (all three UCS)¶
Identical preparation on UCS-1, UCS-2 and UCS-3:
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
curl git jq openssl ca-certificates gnupg lsb-release \
vlan iproute2 ethtool net-tools
# Verify both NICs
ip link show eth0
ip link show eth1
Then run the project's host-tuning script on every UCS — sysctls for high-fan-out QUIC + TCP, BBR + FQ, CPU governor, conntrack, ports:
sudo scripts/host-tuning.sh apply
sudo scripts/host-tuning.sh status
The apply step is idempotent and persists via systemd-sysctl drop-ins. On every measurement run the dashboard's TestBedSysctlMissing alert will fire if any UCS regresses to kernel defaults.
Step 2 — k3s cluster bootstrap¶
UCS-3 first (it is the k3s server)¶
sudo ./scripts/k8s-install.sh --mode=tri-server --data-iface=eth1
The script:
- Pre-flights eth0 + eth1 + ISP interface
- Creates VLAN subinterfaces 40, 99, 101–120, 200–209 on
eth1 - Brings up the
eth1.40ISP subinterface and persists it via netplan - Installs k3s server, binds the API to
eth0, disables Traefik and ServiceLB - Installs Helm, cert-manager, Multus
- Labels the node
role=ngfw-dut+dut-data-plane=true - Prints the JOIN_TOKEN and SERVER_IP
Save the printed credentials — UCS-1 and UCS-2 need them.
UCS-1 (browser engine)¶
sudo ./scripts/k8s-install.sh --mode=tri-agent --role=playwright \
--server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1
The script:
- Pre-flights eth0 + eth1
- Creates VLAN subinterface 20 on
eth1(172.16.0.1/16) - Joins k3s as agent over eth0 → UCS-3:6443
- Labels itself
role=playwright+dut-data-plane=true
UCS-2 (synthetic-load engine)¶
sudo ./scripts/k8s-install.sh --mode=tri-agent --role=k6 \
--server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1
Same as UCS-1 but creates VLAN 30 (172.17.0.1/16) and labels itself role=k6.
Verify all three nodes joined¶
From UCS-3:
kubectl get nodes -o wide
# Expect:
# NAME STATUS ROLES AGE VERSION LABELS
# ucs-1 Ready agent 2m v1.31.x role=playwright,dut-data-plane=true,...
# ucs-2 Ready agent 1m v1.31.x role=k6,dut-data-plane=true,...
# ucs-3 Ready master 5m v1.31.x role=ngfw-dut,dut-data-plane=true,...
Step 3 — VLAN setup per node¶
The install script performs this automatically. To verify:
UCS-1 (browser engine):
ip -br link | grep eth1\\.
# eth1.20 UP (172.16.0.1/16)
UCS-2 (synthetic-load engine):
ip -br link | grep eth1\\.
# eth1.30 UP (172.17.0.1/16)
UCS-3 (services):
ip -br link | grep eth1\\.
# eth1.40 UP (no IP — DHCP via cloner pod macvlan)
# eth1.99 UP (192.168.90.1/24)
# eth1.101 UP (10.1.1.1/27)
# … through eth1.120 (10.1.20.1/27)
# eth1.200 UP (10.2.1.1/27)
# … through eth1.209 (10.2.10.1/27)
VLAN persistence across reboots is handled by netplan (script writes /etc/netplan/99-*.yaml).
Step 4 — Apply manifests¶
From UCS-3:
# Create the NGFW CA configmap (once)
kubectl create configmap ngfw-ca -n web-agents --from-file=ngfw-ca.crt=<path-to-cert>
# Create application secrets from .env
kubectl create secret generic web-agent-secrets -n web-agents --from-env-file=.env
# Apply everything
sudo ./scripts/k8s-install.sh --mode=tri-apply
This applies:
kubectl apply -k overlays/tri-node/— base k8s manifests with browser engine pinned torole=playwright, synthetic-load engine pinned torole=k6, and Dashboard/Postgres/PgBouncer/Cloner pinned torole=ngfw-dutkubectl apply -k k8s/dut/— DUT overlay (NADs, SNMP probes, NFS server, node-tuning DaemonSet, cAdvisor ServiceMonitor, infra Prometheus rules)- Patches
node-tuningDaemonSetnodeSelectortodut-data-plane=true(runs on all three UCS) - Patches browser-engine + synthetic-load deployments via
40-playwright-patch.yaml/50-k6-patch.yaml— Multus net1 macvlan annotation, NGFW CA trust, REJECT_INVALID_CERTS=true - Re-pins browser engine to
role=playwrightand synthetic-load engine torole=k6(the patch files default torole=ngfw-dutfor single-node compatibility — tri-node overrides per-runtime) - Waits for all pods to become Ready
Step 5 — Verify the deployment¶
# Pods on UCS-1 (browser engine)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-1
# Expect: web-agent-* only
# Pods on UCS-2 (synthetic-load engine)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-2
# Expect: k6-agent-* only
# Pods on UCS-3 (services + personas)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-3
# Expect: dashboard, postgres, pgbouncer, cloner, nfs-server,
# persona-shop-*, persona-news-*, …, clone-persona-1-*, …
# Synthetic + cloned personas in their own namespaces
kubectl get pods --all-namespaces -o wide | grep -E "persona-|clone-persona-"
# Multus net1 attached on browser engine
kubectl exec -n web-agents deploy/web-agent -- ip -br addr | grep net1
# Expect a 172.16.x.x address
# Multus net1 attached on synthetic-load engine
kubectl exec -n web-agents deploy/k6-agent -- ip -br addr | grep net1
# Expect a 172.17.x.x address
# End-to-end smoke test — browser engine → NGFW → persona
kubectl exec -n web-agents deploy/web-agent -- \
curl -sf https://shop.persona.local/ | head -1
6. Observability — Grafana and Prometheus in tri-node¶
Observability runs entirely on UCS-3 (role=ngfw-dut):
- Prometheus — scrapes all three UCS over OOBI (eth0) at port
:9100fornode_exporter, kubelet:10250for cAdvisor, and:8080forkube-state-metrics - Grafana —
Test-Bed Infrastructure Healthdashboard auto-adapts to three-node topology via per-host filters andcount(node_exporter)recording rules - Alerts — the composite
TestBedInfrastructureBottleneck, host-level (HostUDPBufferOverflow,HostConntrackNearFull, etc.) and pod-level (PodCPUThrottled,OOMKilled) alerts all work out-of-the-box. SeeMONITORING_TEST_VALIDITY.mdfor the full alert catalogue
The node_exporter DaemonSet has no nodeSelector and tolerations: operator: Exists, so it covers all three UCS without further configuration. The NodeExporterCoverageIncomplete alert fires automatically if any node stops reporting host metrics.
In tri-node specifically, Grafana panels naturally separate browser engine (UCS-1) from synthetic-load engine (UCS-2) host metrics because of the node label dimension — you can tell at a glance whether one runtime is the bottleneck and the other is idle, which a dual-node single-agents-host cannot show you.
To open Grafana from your operator workstation (assuming you have kubectl access to UCS-3):
kubectl port-forward -n web-agents svc/grafana 3000:3000
# then visit http://localhost:3000
7. Troubleshooting¶
UCS-1 or UCS-2 fails to join the cluster¶
Symptoms: tri-agent install hangs at Joining k3s cluster as agent.
# On the failing UCS — check OOBI reachability to UCS-3
ping <UCS-3-OOBI-IP>
nc -vz <UCS-3-OOBI-IP> 6443
# On the failing UCS — check k3s-agent service status and logs
sudo systemctl status k3s-agent
sudo journalctl -u k3s-agent -n 100
Most common causes: OOBI subnet mismatch, firewall blocking 6443, or wrong JOIN_TOKEN.
browser-engine pods stay in Pending on UCS-1¶
kubectl describe pod -n web-agents -l app=web-agent | grep -A4 Events
If you see node(s) didn't match Pod's node affinity/selector, the agent re-pinning step did not run. Apply it manually:
kubectl patch deployment web-agent -n web-agents --type=strategic-merge-patch \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"role":"playwright"}}}}}'
synthetic-load pods stay in Pending on UCS-2¶
kubectl patch deployment k6-agent -n web-agents --type=strategic-merge-patch \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"role":"k6"}}}}}'
Personas stay in Pending¶
kubectl describe pod -n persona-shop persona-shop-* | grep -A4 Events
If node(s) didn't match again, UCS-3 is not labeled role=ngfw-dut. Re-label:
kubectl label node ucs-3 role=ngfw-dut dut-data-plane=true --overwrite
Cloner cannot reach the internet¶
Cloner uses eth1.40 macvlan with DHCP. Verify on UCS-3:
ip link show eth1.40
sudo dhclient -v eth1.40 # only as a manual diagnostic — pod has its own DHCP
If eth1.40 is missing, re-run setup_isp_iface (it is part of tri-server).
node_exporter only reports two of three nodes¶
kubectl get pods -n web-agents -l app.kubernetes.io/name=node-exporter -o wide
All three (ucs-1, ucs-2, ucs-3) should appear. If not, check:
kubectl describe daemonset node-exporter -n web-agents | grep -A3 Tolerations
Tolerations should be operator: Exists (matches every taint). If a node has a custom taint blocking it, add a toleration or remove the taint.
8. Reference — overlay file¶
The tri-node overlay lives in overlays/tri-node/kustomization.yaml. It inherits from k8s/ and adds six strategic-merge patches:
patches:
# browser-engine → UCS-1
- target: { kind: Deployment, name: web-agent } # role=playwright
# synthetic-load → UCS-2
- target: { kind: Deployment, name: k6-agent } # role=k6
# Services → UCS-3 (reuses role=ngfw-dut)
- target: { kind: Deployment, name: dashboard } # role=ngfw-dut
- target: { kind: StatefulSet, name: postgres } # role=ngfw-dut
- target: { kind: Deployment, name: pgbouncer } # role=ngfw-dut
- target: { kind: Deployment, name: cloner } # role=ngfw-dut
To apply manually (without the install script):
kubectl apply -k overlays/tri-node/
kubectl apply -k k8s/dut/
kubectl patch daemonset node-tuning -n web-agents \
--type=strategic-merge-patch \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"dut-data-plane":"true"}}}}}'
See also¶
UBUNTU_K3S_SINGLENODE_QUICKSTART_DEPLOY.en.md— single-server alternativeUBUNTU_K3S_DUALNODE_QUICKSTART_DEPLOY.en.md— two-server alternativeUBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md— four-server alternativeMONITORING_TEST_VALIDITY.md— observability + test-validity alertsPERFORMANCE_TUNING_HOST.md— host kernel tuningSYSTEM_OVERVIEW.md— architecture reference for all four deployment modes