TLSStress.Art — Tri-Node Deployment on 3 UCS Servers¶

Deployment mode: tri-node — UCS-1 runs browser-engine agents only, UCS-2 runs synthetic-load agents only, and UCS-3 runs personas + services + observability. For a single-server setup see UBUNTU_K3S_SINGLENODE_QUICKSTART_DEPLOY.en.md; for the two-server topology see UBUNTU_K3S_DUALNODE_QUICKSTART_DEPLOY.en.md; for four-server distributed see UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md.

Goal: Deploy a TLSStress.Art across exactly three Cisco UCS servers — one dedicated to browser engine load generation, one dedicated to synthetic-load engine load generation, and one dedicated to webservers + services + observability — to provide a middle scale point between dual-node and four-server multi-node and to isolate the two agent runtimes from each other.

Last verified against shipping code: v3.7.0 (2026-05-12) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture + the ZTP-prem 12/12 camadas insider-operator posture (25 patent claims, Tier A/B partition, Confidential Computing, sealed audit hash-chain, K8s admission webhook, TPM 2.0 measured-boot, DLP egress monitor, behavioural anomaly detector). ADRs 0014, 0019-0025 cover post-Freeze additions.

Who this is for: Network and systems engineers running an NGFW test-bed who have three UCS chassis available and want hardware isolation between browser-engine agents, synthetic-load agents, and webservers without standing up a fourth chassis for services.

Time estimate: 75–105 minutes for initial cluster setup; 15 minutes for teardown + redeploy.

Automated installation (recommended)¶

Run the install script once per server in this exact order (UCS-3 first because it hosts the k3s control plane):

# Clone the repository on each server
git clone https://github.com/nollagluiz/AI_forSE.git && cd AI_forSE

# UCS-3 — k3s server + personas + services + observability
sudo ./scripts/k8s-install.sh --mode=tri-server --data-iface=eth1
# ↑ Prints JOIN_TOKEN and SERVER_IP at the end — save them.

# UCS-1 — browser-engine agents only
sudo ./scripts/k8s-install.sh --mode=tri-agent --role=playwright \
     --server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1

# UCS-2 — synthetic-load agents only
sudo ./scripts/k8s-install.sh --mode=tri-agent --role=k6 \
     --server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1

# Back on UCS-3 — apply all Kubernetes manifests
sudo ./scripts/k8s-install.sh --mode=tri-apply

Dry run: append --dry-run to any command to preview without making changes.

The rest of this guide explains each step in detail and is useful for customising the setup or troubleshooting the automated installation.

Table of contents¶

Why tri-node
Architecture overview
OOBI network — mandatory on all three UCS
Prerequisites per server
Step 1 — Host preparation (all three UCS)
Step 2 — k3s cluster bootstrap
Step 3 — VLAN setup per node
Step 4 — Apply manifests
Step 5 — Verify the deployment
Observability — Grafana and Prometheus in tri-node
Troubleshooting
Reference — overlay file

1. Why tri-node¶

Choose tri-node when:

You have three UCS chassis available and want hardware isolation between browser-engine agents, synthetic-load agents, and webservers.
The two agent runtimes shouldn't compete for resources. browser engine spawns headless Chromium processes (memory- and IO-heavy); synthetic-load engine runs a single Go process emitting tens of thousands of HTTP/2 + HTTP/3 requests (CPU- and connection-heavy). Sharing one host means whichever runtime spikes first throttles the other and your NGFW measurements become noisy.
You don't have a fourth chassis for the services tier (which would be the multi-node layout).

Compared to the other modes:

	Single	Dual	Tri	Multi
UCS count	1	2	3	4
browser-engine + synthetic-load share a host	yes	yes (both on UCS-1)	no — separate UCS	no
Services on the agents host	yes	no	no	no
Hardware isolation between browser-engine and synthetic-load	—	—	✓	✓
Hardware isolation between agents and webservers	—	✓	✓	✓
Dedicated services chassis	—	—	—	✓

2. Architecture overview¶

Physical topology¶

            ┌────────────────────────────────────────────────────────┐
            │      OOBI network — eth0 on ALL three UCS               │
            │              10.0.0.0/24 (example)                      │
            │   k3s API :6443 · kubelet · Prometheus scrape           │
            └──────┬───────────────────┬────────────────┬─────────────┘
                   │                   │                │
          ┌────────┴───────┐  ┌────────┴────────┐  ┌────┴───────────────────┐
          │     UCS-1        │  │     UCS-2        │  │            UCS-3        │
          │  role=playwright │  │   role=k6        │  │       role=ngfw-dut     │
          │                  │  │                  │  │                          │
          │   browser engine     │  │   synthetic-load engine             │  │   20 Caddy persona pods │
          │   agents only    │  │   agents only    │  │   10 cloned-persona     │
          │                  │  │                  │  │   slots                 │
          │                  │  │                  │  │   Dashboard · Postgres  │
          │                  │  │                  │  │   PgBouncer · Cloner    │
          │                  │  │                  │  │   NFS · Grafana         │
          │                  │  │                  │  │   Prometheus            │
          │                  │  │                  │  │   SNMP exporter          │
          │  eth0 (OOBI)     │  │  eth0 (OOBI)     │  │   eth0 (OOBI)           │
          │  eth1 (trunk)    │  │  eth1 (trunk)    │  │   eth1 (trunk)          │
          └────────┬─────────┘  └────────┬─────────┘  └──────────┬──────────────┘
                   │                     │                       │
            VLAN 20 (172.16/16)   VLAN 30 (172.17/16)   VLAN 99 mgmt (192.168.90/24)
                                                          VLAN 40 ISP (DHCP via macvlan)
                                                          VLANs 101–120 (10.1.x/27)
                                                          VLANs 200–209 (10.2.x/27)
                   │                     │                       │
                   └─────────┬───────────┴───────────────┬───────┘
                             │                           │
                  ┌──────────┴──────────────┐
                  │   Cisco Nexus 9000      │
                  │   VLAN trunk · ECMP     │
                  │   DSCP AF41 · MTU 9216  │
                  └──────────┬──────────────┘
                             │
                  ┌──────────┴──────────────┐
                  │  NGFW (Device Under Test)│
                  │  TLS leg-1: agents→NGFW │
                  │  TLS leg-2: NGFW→Caddy  │
                  └─────────────────────────┘

Workload placement¶

Workload	UCS-1 (`role=playwright`)	UCS-2 (`role=k6`)	UCS-3 (`role=ngfw-dut`)
browser-engine agents (`web-agent`)	✓	—	—
synthetic-load agents (`k6-agent`)	—	✓	—
Synthetic personas (20× Caddy)	—	—	✓
Cloned-persona slots (10× Caddy)	—	—	✓
Dashboard, Postgres, PgBouncer	—	—	✓
Cloner	—	—	✓
NFS server (cloned-sites)	—	—	✓
SNMP exporter	—	—	✓
Prometheus + Grafana	—	—	✓
`node_exporter` (per-host metrics)	✓ DaemonSet	✓ DaemonSet	✓ DaemonSet
`node-tuning` (sysctl + BBR + CPU governor)	✓ DaemonSet (`dut-data-plane=true`)	✓ DaemonSet (`dut-data-plane=true`)	✓ DaemonSet (`dut-data-plane=true`)
`cni-dhcp` (cloner ISP DHCP)	—	—	✓ DaemonSet

Why `role=ngfw-dut` on UCS-3 (instead of `role=infra`)¶

The synthetic personas (personas/_generated/*/deployment.yaml), cloned-personas (k8s/clone-personas/20-slots.yaml) and SNMP exporter (k8s/dut/60-snmp-exporter.yaml) all hardcode nodeSelector: role=ngfw-dut. Reusing that label on UCS-3 avoids touching twenty-something manifests. The tri-node overlay then patches the additional service workloads (Dashboard, Postgres, PgBouncer, Cloner) — which the multi-node overlay sends to role=infra — onto the same role=ngfw-dut node. This is exactly the same scheme the dual-node overlay uses for UCS-2.

3. OOBI network — mandatory on all three UCS¶

The OOBI (out-of-band infrastructure) network is the dedicated control-plane segment carried over eth0 on every UCS, regardless of deployment mode.

What runs on OOBI¶

k3s API server (:6443) — UCS-1 and UCS-2 reach it on UCS-3 to join the cluster
kubelet (:10250) — UCS-3 scrapes UCS-1 and UCS-2 for pod status
flannel (CNI) — pod-to-pod traffic across nodes uses VXLAN over eth0
Prometheus scrape — node_exporter, cAdvisor, kube-state-metrics are all reached over eth0
cert-manager renewals — webhook + ACME challenge traffic
Dashboard / kubectl — operator access

What does NOT run on OOBI¶

NGFW test traffic — that is on eth1 VLAN trunk (data plane)
SNMP polling — VLAN 99 (192.168.90.0/24) on eth1.99
Cloner internet egress — VLAN 40 on eth1.40

Requirements per UCS¶

	UCS-1 (browser engine)	UCS-2 (synthetic-load engine)	UCS-3 (services)
eth0 IP	static, OOBI subnet	static, OOBI subnet	static, OOBI subnet
Reachable from peer eth0	✓	✓	✓
MTU	1500 (default)	1500 (default)	1500 (default)
Default route via OOBI	✓ — internet egress for image pulls and k3s install	✓	✓

If eth0 is missing on any UCS, k3s flannel cannot establish the control plane, the affected agent node cannot join the cluster, and Prometheus on UCS-3 cannot scrape host metrics from the missing node. The pre-flight check in k8s-install.sh rejects all three modes (tri-server, tri-agent, and the per-node validation in tri-apply) if eth0 is absent.

4. Prerequisites per server¶

Hardware¶

	UCS-1 (browser engine)	UCS-2 (synthetic-load engine)	UCS-3 (services)
CPU	≥ 16 physical cores	≥ 16 physical cores	≥ 32 physical cores
RAM	≥ 64 GB	≥ 32 GB	≥ 128 GB
NICs	eth0 (OOBI) + eth1 (trunk)	eth0 (OOBI) + eth1 (trunk)	eth0 (OOBI) + eth1 (trunk)
Disk	≥ 100 GB free on `/`	≥ 100 GB free on `/`	≥ 500 GB free on `/` (NFS + Postgres + Prometheus retention)

UCS-3 is the heaviest node. browser engine (UCS-1) is generally heavier than synthetic-load engine (UCS-2) because each agent spawns its own headless Chromium.

Operating system¶

Ubuntu 22.04 LTS or 24.04 LTS, amd64
root access (or sudo)
internet access via OOBI default route during install (k3s + Helm + container images)

Network — Nexus 9000¶

VLANs that must be configured on the trunk to all three UCS:

VLAN	UCS-1	UCS-2	UCS-3	Subnet	Use
20	✓	—	—	172.16.0.0/16	browser-engine agents
30	—	✓	—	172.17.0.0/16	synthetic-load agents
40	—	—	✓	DHCP from ISP	Cloner internet egress (macvlan)
99	—	—	✓	192.168.90.0/24	SNMP polling — Nexus (.2), NGFW (.3)
101–120	—	—	✓	10.1.{1..20}.0/27	Synthetic persona webservers
200–209	—	—	✓	10.2.{1..10}.0/27	Cloned persona slots

Strictly: VLAN 20 must arrive at UCS-1 only; VLAN 30 at UCS-2 only; VLANs 40, 99, 101–120 and 200–209 at UCS-3 only. The Nexus trunk can carry all of them and let each side ignore the ones it does not consume — the trunk is identical for the three ports.

Step 1 — Host preparation (all three UCS)¶

Identical preparation on UCS-1, UCS-2 and UCS-3:

sudo apt-get update
sudo apt-get install -y --no-install-recommends \
  curl git jq openssl ca-certificates gnupg lsb-release \
  vlan iproute2 ethtool net-tools

# Verify both NICs
ip link show eth0
ip link show eth1

Then run the project's host-tuning script on every UCS — sysctls for high-fan-out QUIC + TCP, BBR + FQ, CPU governor, conntrack, ports:

sudo scripts/host-tuning.sh apply
sudo scripts/host-tuning.sh status

The apply step is idempotent and persists via systemd-sysctl drop-ins. On every measurement run the dashboard's TestBedSysctlMissing alert will fire if any UCS regresses to kernel defaults.

Step 2 — k3s cluster bootstrap¶

UCS-3 first (it is the k3s server)¶

sudo ./scripts/k8s-install.sh --mode=tri-server --data-iface=eth1

The script:

Pre-flights eth0 + eth1 + ISP interface
Creates VLAN subinterfaces 40, 99, 101–120, 200–209 on eth1
Brings up the eth1.40 ISP subinterface and persists it via netplan
Installs k3s server, binds the API to eth0, disables Traefik and ServiceLB
Installs Helm, cert-manager, Multus
Labels the node role=ngfw-dut + dut-data-plane=true
Prints the JOIN_TOKEN and SERVER_IP

Save the printed credentials — UCS-1 and UCS-2 need them.

UCS-1 (browser engine)¶

sudo ./scripts/k8s-install.sh --mode=tri-agent --role=playwright \
     --server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1

The script:

Pre-flights eth0 + eth1
Creates VLAN subinterface 20 on eth1 (172.16.0.1/16)
Joins k3s as agent over eth0 → UCS-3:6443
Labels itself role=playwright + dut-data-plane=true

UCS-2 (synthetic-load engine)¶

sudo ./scripts/k8s-install.sh --mode=tri-agent --role=k6 \
     --server-ip=<UCS-3-OOBI-IP> --token=<JOIN_TOKEN> --data-iface=eth1

Same as UCS-1 but creates VLAN 30 (172.17.0.1/16) and labels itself role=k6.

Verify all three nodes joined¶

From UCS-3:

kubectl get nodes -o wide
# Expect:
# NAME    STATUS  ROLES    AGE  VERSION  LABELS
# ucs-1   Ready   agent    2m   v1.31.x  role=playwright,dut-data-plane=true,...
# ucs-2   Ready   agent    1m   v1.31.x  role=k6,dut-data-plane=true,...
# ucs-3   Ready   master   5m   v1.31.x  role=ngfw-dut,dut-data-plane=true,...

Step 3 — VLAN setup per node¶

The install script performs this automatically. To verify:

UCS-1 (browser engine):

ip -br link | grep eth1\\.
# eth1.20  UP  (172.16.0.1/16)

UCS-2 (synthetic-load engine):

ip -br link | grep eth1\\.
# eth1.30  UP  (172.17.0.1/16)

UCS-3 (services):

ip -br link | grep eth1\\.
# eth1.40  UP  (no IP — DHCP via cloner pod macvlan)
# eth1.99  UP  (192.168.90.1/24)
# eth1.101 UP  (10.1.1.1/27)
# … through eth1.120 (10.1.20.1/27)
# eth1.200 UP  (10.2.1.1/27)
# … through eth1.209 (10.2.10.1/27)

VLAN persistence across reboots is handled by netplan (script writes /etc/netplan/99-*.yaml).

Step 4 — Apply manifests¶

From UCS-3:

# Create the NGFW CA configmap (once)
kubectl create configmap ngfw-ca -n web-agents --from-file=ngfw-ca.crt=<path-to-cert>

# Create application secrets from .env
kubectl create secret generic web-agent-secrets -n web-agents --from-env-file=.env

# Apply everything
sudo ./scripts/k8s-install.sh --mode=tri-apply

This applies:

kubectl apply -k overlays/tri-node/ — base k8s manifests with browser engine pinned to role=playwright, synthetic-load engine pinned to role=k6, and Dashboard/Postgres/PgBouncer/Cloner pinned to role=ngfw-dut
kubectl apply -k k8s/dut/ — DUT overlay (NADs, SNMP probes, NFS server, node-tuning DaemonSet, cAdvisor ServiceMonitor, infra Prometheus rules)
Patches node-tuning DaemonSet nodeSelector to dut-data-plane=true (runs on all three UCS)
Patches browser-engine + synthetic-load deployments via 40-playwright-patch.yaml / 50-k6-patch.yaml — Multus net1 macvlan annotation, NGFW CA trust, REJECT_INVALID_CERTS=true
Re-pins browser engine to role=playwright and synthetic-load engine to role=k6 (the patch files default to role=ngfw-dut for single-node compatibility — tri-node overrides per-runtime)
Waits for all pods to become Ready

Step 5 — Verify the deployment¶

# Pods on UCS-1 (browser engine)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-1
# Expect: web-agent-* only

# Pods on UCS-2 (synthetic-load engine)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-2
# Expect: k6-agent-* only

# Pods on UCS-3 (services + personas)
kubectl get pods -n web-agents -o wide --field-selector spec.nodeName=ucs-3
# Expect: dashboard, postgres, pgbouncer, cloner, nfs-server,
#         persona-shop-*, persona-news-*, …, clone-persona-1-*, …

# Synthetic + cloned personas in their own namespaces
kubectl get pods --all-namespaces -o wide | grep -E "persona-|clone-persona-"

# Multus net1 attached on browser engine
kubectl exec -n web-agents deploy/web-agent -- ip -br addr | grep net1
# Expect a 172.16.x.x address

# Multus net1 attached on synthetic-load engine
kubectl exec -n web-agents deploy/k6-agent -- ip -br addr | grep net1
# Expect a 172.17.x.x address

# End-to-end smoke test — browser engine → NGFW → persona
kubectl exec -n web-agents deploy/web-agent -- \
  curl -sf https://shop.persona.local/ | head -1

6. Observability — Grafana and Prometheus in tri-node¶

Observability runs entirely on UCS-3 (role=ngfw-dut):

Prometheus — scrapes all three UCS over OOBI (eth0) at port :9100 for node_exporter, kubelet :10250 for cAdvisor, and :8080 for kube-state-metrics
Grafana — Test-Bed Infrastructure Health dashboard auto-adapts to three-node topology via per-host filters and count(node_exporter) recording rules
Alerts — the composite TestBedInfrastructureBottleneck, host-level (HostUDPBufferOverflow, HostConntrackNearFull, etc.) and pod-level (PodCPUThrottled, OOMKilled) alerts all work out-of-the-box. See MONITORING_TEST_VALIDITY.md for the full alert catalogue

The node_exporter DaemonSet has no nodeSelector and tolerations: operator: Exists, so it covers all three UCS without further configuration. The NodeExporterCoverageIncomplete alert fires automatically if any node stops reporting host metrics.

In tri-node specifically, Grafana panels naturally separate browser engine (UCS-1) from synthetic-load engine (UCS-2) host metrics because of the node label dimension — you can tell at a glance whether one runtime is the bottleneck and the other is idle, which a dual-node single-agents-host cannot show you.

To open Grafana from your operator workstation (assuming you have kubectl access to UCS-3):

kubectl port-forward -n web-agents svc/grafana 3000:3000
# then visit http://localhost:3000

7. Troubleshooting¶

UCS-1 or UCS-2 fails to join the cluster¶

Symptoms: tri-agent install hangs at Joining k3s cluster as agent.

# On the failing UCS — check OOBI reachability to UCS-3
ping <UCS-3-OOBI-IP>
nc -vz <UCS-3-OOBI-IP> 6443

# On the failing UCS — check k3s-agent service status and logs
sudo systemctl status k3s-agent
sudo journalctl -u k3s-agent -n 100

Most common causes: OOBI subnet mismatch, firewall blocking 6443, or wrong JOIN_TOKEN.

browser-engine pods stay in `Pending` on UCS-1¶

kubectl describe pod -n web-agents -l app=web-agent | grep -A4 Events

If you see node(s) didn't match Pod's node affinity/selector, the agent re-pinning step did not run. Apply it manually:

kubectl patch deployment web-agent -n web-agents --type=strategic-merge-patch \
  -p '{"spec":{"template":{"spec":{"nodeSelector":{"role":"playwright"}}}}}'

synthetic-load pods stay in `Pending` on UCS-2¶

kubectl patch deployment k6-agent -n web-agents --type=strategic-merge-patch \
  -p '{"spec":{"template":{"spec":{"nodeSelector":{"role":"k6"}}}}}'

Personas stay in `Pending`¶

kubectl describe pod -n persona-shop persona-shop-* | grep -A4 Events

If node(s) didn't match again, UCS-3 is not labeled role=ngfw-dut. Re-label:

kubectl label node ucs-3 role=ngfw-dut dut-data-plane=true --overwrite

Cloner cannot reach the internet¶

Cloner uses eth1.40 macvlan with DHCP. Verify on UCS-3:

ip link show eth1.40
sudo dhclient -v eth1.40   # only as a manual diagnostic — pod has its own DHCP

If eth1.40 is missing, re-run setup_isp_iface (it is part of tri-server).

`node_exporter` only reports two of three nodes¶

kubectl get pods -n web-agents -l app.kubernetes.io/name=node-exporter -o wide

All three (ucs-1, ucs-2, ucs-3) should appear. If not, check:

kubectl describe daemonset node-exporter -n web-agents | grep -A3 Tolerations

Tolerations should be operator: Exists (matches every taint). If a node has a custom taint blocking it, add a toleration or remove the taint.

8. Reference — overlay file¶

The tri-node overlay lives in overlays/tri-node/kustomization.yaml. It inherits from k8s/ and adds six strategic-merge patches:

patches:
  # browser-engine → UCS-1
  - target: { kind: Deployment, name: web-agent }    # role=playwright
  # synthetic-load → UCS-2
  - target: { kind: Deployment, name: k6-agent }     # role=k6
  # Services → UCS-3 (reuses role=ngfw-dut)
  - target: { kind: Deployment,  name: dashboard }   # role=ngfw-dut
  - target: { kind: StatefulSet, name: postgres }    # role=ngfw-dut
  - target: { kind: Deployment,  name: pgbouncer }   # role=ngfw-dut
  - target: { kind: Deployment,  name: cloner }      # role=ngfw-dut

To apply manually (without the install script):

kubectl apply -k overlays/tri-node/
kubectl apply -k k8s/dut/
kubectl patch daemonset node-tuning -n web-agents \
  --type=strategic-merge-patch \
  -p '{"spec":{"template":{"spec":{"nodeSelector":{"dut-data-plane":"true"}}}}}'