Skip to content

TLSStress.Art — Single-Node Deployment on Ubuntu with k3s

Deployment mode: single-node — all components (agents, webservers, dashboard, observability) run on one Ubuntu server. This is the quickest way to get started. For a multi-server deployment that scales each tier to a dedicated machine, see UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md.

Last verified against shipping code: v3.7.0 (2026-05-12) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture + the ZTP-prem 12/12 camadas insider-operator posture (25 patent claims, Tier A/B partition, Confidential Computing, sealed audit hash-chain, K8s admission webhook, TPM 2.0 measured-boot, DLP egress monitor, behavioural anomaly detector). ADRs 0014, 0019-0025 cover post-Freeze additions.

Goal: install the full TLSStress.Art stack on a single Ubuntu server using k3s — including Dashboard, Postgres, browser-engine fleet (up to 300 agents), synthetic-load fleet (up to 1,000 agents), 20 TLS persona webservers, Grafana and Prometheus — and optionally place a physical NGFW in the traffic path to measure TLS inspection performance (DUT mode).

Who this is for: network and systems engineers comfortable with a Linux terminal. No prior Kubernetes experience required.

Time estimate: Simple mode: 30–45 min | Full DUT mode: 90–120 min.

Author: André Luiz Gallon — agallon@Cisco.com


If you just want to get started quickly, use the automated install script. It installs k3s, Helm, cert-manager, Multus, configures VLAN interfaces and applies all Kubernetes manifests in the correct order — no Kubernetes knowledge required:

# Clone the repository
git clone https://github.com/nollagluiz/AI_forSE.git
cd AI_forSE

# Single-node: everything on one server
sudo ./scripts/k8s-install.sh --mode=single --data-iface=eth1

The script will: 1. Validate prerequisites (OS, RAM, disk, network interfaces) 2. Install k3s, Helm, cert-manager, and Multus CNI 3. Configure VLAN subinterfaces on eth1 (VLANs 20, 30, 99, 101–120) 4. Apply all Kubernetes manifests in the correct order 5. Wait for pods to become Ready and print the Dashboard URL

Options: run sudo ./scripts/k8s-install.sh --help for all flags, including --dry-run to preview what would be executed.

The rest of this guide explains each step in detail, which is useful for customising the setup or troubleshooting the automated installation.


Choose your installation mode

Simple mode DUT mode
Use case Basic lab, reachability and throughput testing against real internet sites NGFW test-bed: measures the real cost of TLS inspection with HTTP/2 and HTTP/3
browser-engine fleet
synthetic-load fleet
30 persona webservers (20 Synthetic + 10 Cloned slots)
Physical NGFW (DUT) in path
Grafana + Prometheus Optional
cert-manager + internal PKI ✅ (required)
Multus CNI + macvlan
NICs required 1 2 (eth0 management + eth1 VLAN trunk)
Orchestration script Manual steps scripts/k8s-dut-up.sh up

Prerequisites

Item Simple mode DUT mode
OS Ubuntu 22.04 LTS or 24.04 LTS same
Architecture x86_64 or arm64 x86_64 (bare-metal/UCS recommended)
RAM 16 GB (up to ~50 agents) 64 GB+
vCPUs 4 16+
Disk 60 GB free 200 GB+ (personas + logs + metrics)
NICs 1 (eth0) 2 (eth0=k8s management, eth1=802.1q trunk)
Switch Nexus 9000 with VLAN trunk (VLANs 20, 30, 99, 101–120)
NGFW Cisco FTD/IOS-XE, FortiGate, Palo Alto, Check Point, Huawei, etc.
Host access user with sudo same
Internet (bootstrap) Yes (k3s, GHCR images) Yes, or offline — see Images section

Headless server? Use ssh -L 3000:localhost:3000 user@server to reach the UI from your laptop.


Part A — Host setup (both modes)

Step 0 — Base packages

sudo apt update && sudo apt install -y \
  curl ca-certificates iptables openssl jq git python3 iproute2 vlan

Step 1 — Install k3s

Simple mode:

curl -sfL https://get.k3s.io | \
  INSTALL_K3S_EXEC="--disable=traefik --write-kubeconfig-mode=644" sh -

DUT mode (includes Multus CNI for macvlan):

curl -sfL https://get.k3s.io | \
  INSTALL_K3S_EXEC="--disable=traefik --write-kubeconfig-mode=644 \
    --flannel-iface=eth0" sh -

# Multus CNI — required for the net1 macvlan VLAN data-plane interface
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml

kubectl -n kube-system wait --for=condition=ready \
  pod -l app=multus --timeout=120s

Verify the node is up:

sudo systemctl status k3s --no-pager | head -5
sudo k3s kubectl get nodes

Expected: STATUS=Ready, ROLES=control-plane,master.

Configure kubectl without sudo:

mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc
source ~/.bashrc
kubectl get nodes

Step 2 — Install Nginx Ingress Controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.11.3/deploy/static/provider/cloud/deploy.yaml

kubectl -n ingress-nginx wait --for=condition=available \
  deployment/ingress-nginx-controller --timeout=180s

On a single-node host the EXTERNAL-IP may stay <pending> — that's fine; the dashboard is reached via kubectl port-forward.


Step 3 — Install metrics-server

The HPA needs CPU/memory metrics to autoscale the agent fleets.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# k3s uses a self-signed kubelet cert; metrics-server needs this flag:
kubectl -n kube-system patch deployment metrics-server --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

kubectl -n kube-system wait --for=condition=available \
  deployment/metrics-server --timeout=180s

# Sanity check — should print CPU/MEM within seconds:
kubectl top nodes

Step 4 — Install cert-manager

Required in both modes. In Simple mode it manages internal TLS; in DUT mode it issues certificates for all 30 persona webservers (20 Synthetic + 10 Cloned slots) via an internal CA chain.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.0/cert-manager.yaml

kubectl -n cert-manager wait --for=condition=available \
  deployment --all --timeout=180s

# Confirm all 3 cert-manager pods are Running:
kubectl -n cert-manager get pods

Step 5 — Clone the repository

git clone https://github.com/nollagluiz/AI_forSE.git
cd AI_forSE
# Always use the main branch

Part B — Simple mode (agents vs. internet sites)

browser-engine and synthetic-load agents navigate real internet sites. No personas, no NGFW, no macvlan. Good for quickly validating the agent/dashboard/synthetic-load engine stack.


Step 6B — Create secrets

# Create namespace
kubectl create namespace web-agents

# Postgres credentials
DB_PASS=$(openssl rand -hex 24)
kubectl -n web-agents create secret generic postgres-credentials \
  --from-literal=POSTGRES_USER=agent_dashboard \
  --from-literal=POSTGRES_PASSWORD="$DB_PASS" \
  --from-literal=POSTGRES_DB=agent_dashboard

# Shared token between dashboard and browser-engine agents
TOKEN=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic web-agent-secrets \
  --from-literal=CONTROLLER_TOKEN="$TOKEN"

# synthetic-load agent token
K6_SECRET=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic k6-agent-secrets \
  --from-literal=DASHBOARD_SECRET="$K6_SECRET"

# Dashboard secrets
ADMIN_PASS=$(openssl rand -hex 16)
SESSION=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic dashboard-secrets \
  --from-literal=DATABASE_URL="postgresql://agent_dashboard:${DB_PASS}@postgres.web-agents.svc.cluster.local:5432/agent_dashboard?sslmode=disable" \
  --from-literal=AGENT_API_TOKEN="$TOKEN" \
  --from-literal=ADMIN_BASIC_AUTH="admin:$ADMIN_PASS" \
  --from-literal=SESSION_SECRET="$SESSION"

echo "================================================"
echo "  Dashboard admin login:"
echo "    User:     admin"
echo "    Password: $ADMIN_PASS"
echo "================================================"

Save the password — you'll need it on /login.


Step 7B — Apply base manifests

Images are already correct in the manifests (ghcr.io/nollagluiz/). No local build required — k3s pulls them from GHCR automatically.

for f in k8s/00-namespace.yaml \
          k8s/05-resource-quota.yaml \
          k8s/10-agent-config.yaml \
          k8s/11-k6-agent-config.yaml \
          k8s/20-agent-deployment.yaml \
          k8s/21-k6-agent-deployment.yaml \
          k8s/40-postgres.yaml \
          k8s/42-pgbouncer.yaml \
          k8s/50-dashboard.yaml \
          k8s/52-dashboard-persona-rbac.yaml \
          k8s/55-dashboard-pdb.yaml; do
  kubectl apply -f "$f" --server-side 2>/dev/null || kubectl apply -f "$f"
done

Step 8B — Wait for pods to become ready

kubectl -n web-agents wait --for=condition=ready \
  pod -l app.kubernetes.io/name=postgres --timeout=180s

kubectl -n web-agents wait --for=condition=available \
  deployment/dashboard --timeout=240s

kubectl -n web-agents get pods

Expected: postgres-0, dashboard-* and web-agent-* all Running.


Step 9B — Apply database migrations

PG_POD=$(kubectl -n web-agents get pod \
  -l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}')

kubectl -n web-agents exec -i "$PG_POD" -- \
  psql -U agent_dashboard -d agent_dashboard \
  -c 'CREATE EXTENSION IF NOT EXISTS pgcrypto;'

for f in dashboard/src/db/migrations/*.sql; do
  echo "==> applying $f"
  kubectl -n web-agents exec -i "$PG_POD" -- \
    psql -U agent_dashboard -d agent_dashboard -v ON_ERROR_STOP=1 < "$f"
done

Normal output: ALTER TABLE, CREATE INDEX. Seeing already exists, skipping is fine — migrations are idempotent.


Step 10B — Open the dashboard

In a second terminal:

kubectl -n web-agents port-forward svc/dashboard 3000:3000
  • Local desktop: http://localhost:3000/
  • Remote server: ssh -L 3000:localhost:3000 user@your-ubuntu-server then open http://localhost:3000/ on your laptop.

Login: admin + the password from Step 6B.


Step 11B — Add target sites and start the fleet

ADMIN_PASS=$(kubectl -n web-agents get secret dashboard-secrets \
  -o jsonpath='{.data.ADMIN_BASIC_AUTH}' | base64 -d | cut -d: -f2-)
ADMIN="admin:$ADMIN_PASS"

# Add sample target sites:
for url in https://www.g1.globo.com https://www.uol.com.br https://www.nasa.gov; do
  curl -s -u "$ADMIN" -H 'content-type: application/json' \
    -d "{\"url\":\"$url\",\"weight\":1}" \
    http://localhost:3000/api/admin/targets; echo
done

# Enable the browser-engine fleet with 5 agents:
curl -s -u "$ADMIN" -X PUT -H 'content-type: application/json' \
  -d '{"enabled":true,"cycleIntervalMs":30000,"desiredAgentCount":5}' \
  http://localhost:3000/api/admin/config

Step 12B — Scale up (up to 300 browser engine, 1,000 synthetic-load engine)

# Scale browser-engine fleet manually:
kubectl -n web-agents scale deployment/web-agent --replicas=50

# Scale synthetic-load fleet manually:
kubectl -n web-agents scale deployment/k6-agent --replicas=100

# Or let the HPA decide based on CPU (already configured in manifests):
kubectl -n web-agents get hpa -w

Each browser-engine agent reserves 512 MB RAM. Each synthetic-load agent reserves only 128 MB (Go runtime, no browser). Size accordingly.


Part C — DUT mode (full NGFW test-bed)

This mode places a physical NGFW between the agents and the 30 persona webservers (20 Synthetic + 10 Cloned slots). All HTTPS traffic traverses the firewall, which decrypts, inspects and re-encrypts every connection. The stack measures the real cost of that inspection.

Architecture summary:

Agents (browser-engine VLAN 20 / synthetic-load VLAN 30)
      │  TLS leg 1: agents → NGFW (NGFW presents its own cert)
      ▼
  NGFW (Device Under Test)
      │  TLS leg 2: NGFW → Caddy persona (cert issued by cert-manager)
      ▼
20 Synthetic Personas (VLANs 101–120, subnets 10.1.x.0/27)
10 Cloned Persona slots (VLANs 200–209, subnets 10.2.{1..10}.0/27)


Step 5C — Extra dependencies for DUT mode

Multus was already installed in Step 1. Install Helm and the observability stack (Prometheus + Grafana) now:

# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Prometheus + Grafana (kube-prometheus-stack)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

kubectl create namespace monitoring

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.adminPassword=prom-operator \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --timeout 10m

kubectl -n monitoring wait --for=condition=available deployment --all --timeout=300s

Step 6C — Configure the NGFW CA

Agents must trust the certificates presented by the NGFW during TLS interception. Export the NGFW's CA bundle and paste it into the ConfigMap:

# Export the CA from your NGFW in PEM format — process varies by vendor:
#   Cisco FTD:   Devices → Certificates → Internal CA → Export Chain
#   FortiGate:   System → Certificates → Export
#   Palo Alto:   Device → Certificate Management → Certificates → Export
#   Check Point: SmartConsole → Gateways → Edit → HTTPS Inspection → Export CA

# Paste the PEM content (including -----BEGIN CERTIFICATE-----)
# into k8s/dut/10-ngfw-ca.yaml under data.ca.crt:
${EDITOR:-nano} k8s/dut/10-ngfw-ca.yaml

Step 7C — Create all secrets

# Namespace (may already exist if you did Simple mode first)
kubectl create namespace web-agents --dry-run=client -o yaml | kubectl apply -f -

# Postgres
DB_PASS=$(openssl rand -hex 24)
kubectl -n web-agents create secret generic postgres-credentials \
  --from-literal=POSTGRES_USER=agent_dashboard \
  --from-literal=POSTGRES_PASSWORD="$DB_PASS" \
  --from-literal=POSTGRES_DB=agent_dashboard \
  --dry-run=client -o yaml | kubectl apply -f -

# browser-engine agent token
TOKEN=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic web-agent-secrets \
  --from-literal=CONTROLLER_TOKEN="$TOKEN" \
  --dry-run=client -o yaml | kubectl apply -f -

# synthetic-load agent token
K6_SECRET=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic k6-agent-secrets \
  --from-literal=DASHBOARD_SECRET="$K6_SECRET" \
  --dry-run=client -o yaml | kubectl apply -f -

# Dashboard
ADMIN_PASS=$(openssl rand -hex 16)
SESSION=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic dashboard-secrets \
  --from-literal=DATABASE_URL="postgresql://agent_dashboard:${DB_PASS}@postgres.web-agents.svc.cluster.local:5432/agent_dashboard?sslmode=disable" \
  --from-literal=AGENT_API_TOKEN="$TOKEN" \
  --from-literal=ADMIN_BASIC_AUTH="admin:$ADMIN_PASS" \
  --from-literal=SESSION_SECRET="$SESSION" \
  --dry-run=client -o yaml | kubectl apply -f -

# SNMP community for the NGFW (default "public" in most lab setups)
SNMP_COMMUNITY="public"   # Change to match your NGFW SNMP config
kubectl -n web-agents create secret generic dut-snmp-secrets \
  --from-literal=SNMP_COMMUNITY="$SNMP_COMMUNITY" \
  --dry-run=client -o yaml | kubectl apply -f -

echo "================================================"
echo "  Dashboard admin login:"
echo "    User:     admin"
echo "    Password: $ADMIN_PASS"
echo "================================================"

Step 8C — Label the DUT node

The role: ngfw-dut nodeSelector ensures personas and the DUT webserver run on the node with the trunk NIC:

NODE=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl label node "$NODE" role=ngfw-dut
kubectl get node "$NODE" --show-labels | grep ngfw

Step 9C — Bring up the full stack with k8s-dut-up.sh

The script orchestrates all 4 phases idempotently:

# Recommended: run preflight checks first
bash scripts/k8s-dut-up.sh preflight

# Bring up the complete stack (10–20 min on first run):
bash scripts/k8s-dut-up.sh up

What the script does:

Phase What it applies
Phase 1 kubectl apply -f k8s/ — namespace, Postgres, Dashboard, browser-engine and synthetic-load agents, PgBouncer, Prometheus rules
Phase 2 kubectl apply -k k8s/dut/ — NGFW CA, macvlan NetworkAttachmentDefinitions, DUT Caddy webserver, SNMP exporter, DUT NetworkPolicy, TLS PKI, DUT Caddyfile, node tuning DaemonSet, Stakater Reloader; strategic merge patches for browser-engine and synthetic-load
Phase 3 kubectl apply -k platform/ — persona PKI ClusterIssuer, cert-manager issues certs for all 30 personas (20 Synthetic + 10 Cloned slots); CoreDNS patch for persona.internal zone; Grafana dashboards. Waits for persona-ca-bundle Secret
Phase 4 kubectl apply -k personas/ (20 Synthetic Persona namespaces) and kubectl apply -k k8s/clone-personas/ (10 Cloned Persona slot namespaces) — Caddy deployments, NetworkAttachmentDefinitions, TLS Certificates; scripts/netsetup-personas.sh setup — creates VLAN subinterfaces 101–120 + 200–209 and routing on the host

Step 10C — Verify the persona fleet

# Check persona pods — all should be Running after ~5 min:
kubectl get pods -A -l app.kubernetes.io/part-of=web-agent-cluster \
  --field-selector=status.phase!=Running 2>/dev/null | head -20

# Check TLS certificates issued by cert-manager:
kubectl get certificate -A | grep -v "True" | head -20

# Test DNS resolution for personas:
kubectl run -it --rm dns-test --image=busybox --restart=Never \
  -- nslookup shop.persona.internal
# Expected: Address: 10.1.1.2

# List macvlan interfaces created on the host:
ip link show | grep -E "vlan|macvlan" | head -20

Step 11C — Create the Saleor secret (persona shop)

The shop persona runs the Saleor e-commerce app, which needs a dedicated Postgres database. Create the Secret after Phase 4 (the persona-shop namespace only exists then):

# Create the Saleor database using the cluster Postgres:
PG_POD=$(kubectl -n web-agents get pod \
  -l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}')

kubectl -n web-agents exec -i "$PG_POD" -- \
  psql -U agent_dashboard -d agent_dashboard <<SQL
CREATE DATABASE saleor;
CREATE USER saleor WITH PASSWORD 'saleor_dev_pass';
GRANT ALL PRIVILEGES ON DATABASE saleor TO saleor;
SQL

kubectl -n persona-shop create secret generic persona-shop-db \
  --from-literal=url="postgres://saleor:saleor_dev_pass@postgres.web-agents.svc.cluster.local:5432/saleor"

# Restart the shop pod to pick up the Secret:
kubectl -n persona-shop rollout restart deployment/caddy

Step 12C — Access Grafana and the Dashboard

# Grafana (TLS throughput, per-persona latency, NGFW CPU via SNMP):
kubectl -n monitoring port-forward svc/kube-prometheus-grafana 3001:80

# Web Agent Dashboard:
kubectl -n web-agents port-forward svc/dashboard 3000:3000
  • Grafana: http://localhost:3001 — login admin / prom-operator
  • "Persona Fleet Overview" dashboard — throughput per persona
  • "DUT NGFW" dashboard — CPU/memory via SNMP
  • Dashboard: http://localhost:3000 — login admin / password from Step 7C

Step 13C — Configure the switch (Nexus 9000)

# Ready-made NX-OS scripts are in scripts/nexus/:
#   scripts/nexus/01-apply-tuning.nxos   — QoS, MTU 9216, EEE off, ECMP hash
#   scripts/nexus/02-verify.nxos         — verify configuration
#   scripts/nexus/03-rollback.nxos       — roll back if needed

# Apply via SSH:
ssh admin@nexus-9000 < scripts/nexus/01-apply-tuning.nxos

See docs/DUT_TESTBED.md for the complete hardware setup guide including cabling, VLAN trunk design and NGFW policy configuration.


Step 14C — Start the fleet and run tests

ADMIN_PASS=$(kubectl -n web-agents get secret dashboard-secrets \
  -o jsonpath='{.data.ADMIN_BASIC_AUTH}' | base64 -d | cut -d: -f2-)
ADMIN="admin:$ADMIN_PASS"

# Enable browser-engine fleet pointing at internal personas:
curl -s -u "$ADMIN" -X PUT -H 'content-type: application/json' \
  -d '{"enabled":true,"cycleIntervalMs":15000,"desiredAgentCount":50}' \
  http://localhost:3000/api/admin/config

# Start synthetic-load engine load test (e.g., 200 agents for 10 minutes):
kubectl -n web-agents scale deployment/k6-agent --replicas=200

Step 15C — Full DUT mode validation

# Confirm all persona pods are Running:
kubectl get pods -A -l app.kubernetes.io/part-of=web-agent-cluster | grep -v Running

# browser-engine agents completing cycles:
kubectl -n web-agents logs -l app.kubernetes.io/name=web-agent \
  --tail=5 --prefix 2>/dev/null | grep -i "cycle\|success\|http"

# synthetic-load agents running tests:
kubectl -n web-agents logs -l app.kubernetes.io/name=k6-agent \
  --tail=5 --prefix 2>/dev/null | grep -i "run\|result\|http"

# Throughput in the last 5 minutes:
PG_POD=$(kubectl -n web-agents get pod \
  -l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}')
kubectl -n web-agents exec -i "$PG_POD" -- \
  psql -U agent_dashboard -d agent_dashboard -c \
  "SELECT count(*) AS runs_5m,
          round(avg(total_duration_ms))::int AS avg_ms,
          (sum(transferred_bytes)/1024/1024)::int AS mb
     FROM runs WHERE started_at > now() - interval '5 minutes';"

Container images

All images are public on ghcr.io/nollagluiz/ and support amd64 and arm64. No local build is required.

# Pull manually (optional — k3s pulls automatically from GHCR):
docker pull ghcr.io/nollagluiz/web-agent-agent:v3.7.0
docker pull ghcr.io/nollagluiz/web-agent-dashboard:v3.7.0
docker pull ghcr.io/nollagluiz/web-agent-k6agent:v3.7.0
docker pull ghcr.io/nollagluiz/web-agent-webserver:v3.7.0

Air-gapped / offline k3s:

# On a machine with internet access:
for img in \
  ghcr.io/nollagluiz/web-agent-agent:v3.7.0 \
  ghcr.io/nollagluiz/web-agent-dashboard:v3.7.0 \
  ghcr.io/nollagluiz/web-agent-k6agent:v3.7.0 \
  ghcr.io/nollagluiz/web-agent-webserver:v3.7.0; do
  docker pull "$img"
  docker save "$img" -o "$(basename ${img%:*}).tar"
done

# Copy the .tar files to the server and import into k3s containerd:
sudo k3s ctr images import web-agent-agent.tar
sudo k3s ctr images import web-agent-dashboard.tar
sudo k3s ctr images import web-agent-k6agent.tar
sudo k3s ctr images import web-agent-webserver.tar

Day-to-day commands

# ── Overall status ────────────────────────────────────────────────────────
kubectl -n web-agents get pods
kubectl get pods -A -l app.kubernetes.io/part-of=web-agent-cluster

# ── browser-engine fleet ──────────────────────────────────────────────────────
kubectl -n web-agents scale deployment/web-agent --replicas=N
kubectl -n web-agents rollout restart deployment/web-agent
kubectl -n web-agents logs -l app.kubernetes.io/name=web-agent -f --max-log-requests 50

# ── synthetic-load fleet ──────────────────────────────────────────────────────────────
kubectl -n web-agents scale deployment/k6-agent --replicas=N
kubectl -n web-agents logs -l app.kubernetes.io/name=k6-agent -f --max-log-requests 50

# ── Dashboard ─────────────────────────────────────────────────────────────
kubectl -n web-agents port-forward svc/dashboard 3000:3000
kubectl -n web-agents rollout restart deployment/dashboard
kubectl -n web-agents logs deploy/dashboard -f

# ── Grafana (DUT mode) ────────────────────────────────────────────────────
kubectl -n monitoring port-forward svc/kube-prometheus-grafana 3001:80

# ── Personas ──────────────────────────────────────────────────────────────
# Status of all persona pods:
kubectl get pods -A -l persona --no-headers | sort -k1

# Restart a specific persona:
kubectl -n persona-shop rollout restart deployment/caddy

# View a persona's TLS certificate:
kubectl -n persona-shop get certificate persona-shop-tls

# ── HPA (autoscaler) ──────────────────────────────────────────────────────
kubectl -n web-agents get hpa -w

# ── Postgres ──────────────────────────────────────────────────────────────
PG=$(kubectl -n web-agents get pod -l app.kubernetes.io/name=postgres \
  -o jsonpath='{.items[0].metadata.name}')
kubectl -n web-agents exec -it "$PG" -- psql -U agent_dashboard -d agent_dashboard

# ── Resource usage ────────────────────────────────────────────────────────
kubectl top nodes
kubectl -n web-agents top pods

Troubleshooting

Symptom Likely cause Fix
Pods in ImagePullBackOff GHCR unreachable or wrong tag Check image tag; use offline import (see Images section)
kubectl top nodes shows "metrics not available" metrics-server missing --kubelet-insecure-tls Repeat the kubectl patch from Step 3 and wait 60 s
Pods stuck in Pending Insufficient CPU/RAM on the node Lower --replicas or reduce requests in k8s/20-agent-deployment.yaml
Dashboard HTTP 500 on API calls Migrations not applied Repeat Step 9B
Agents register but no cycles Fleet not enabled or no targets Repeat Step 11B or 14C
Admin login refused Wrong password kubectl -n web-agents get secret dashboard-secrets -o jsonpath='{.data.ADMIN_BASIC_AUTH}' \| base64 -d
nslookup shop.persona.internal fails CoreDNS not patched bash platform/dns/patch-coredns.sh
Persona pods in CrashLoopBackOff TLS cert not yet issued kubectl get certificate -A; wait ~60 s for cert-manager
persona-shop in CreateContainerConfigError persona-shop-db Secret missing Create the Secret (Step 11C)
node-tuning DaemonSet rejected by PSA PSA label missing on namespace Apply kubectl apply -k k8s/dut/ (includes 00-namespace-psa.yaml)
macvlan interfaces not created netsetup-personas.sh not run bash scripts/netsetup-personas.sh setup
Traffic not passing through the NGFW Wrong VLAN routing on Nexus Check 802.1q trunk and routes — see scripts/nexus/02-verify.nxos
SNMP metrics missing in Grafana Wrong SNMP community Update dut-snmp-secrets and restart snmp-exporter
Caddy pods serving expired cert after renewal Stakater Reloader not installed Check kubectl -n web-agents get deploy/reloader; re-apply k8s/dut/

Cleanup

# Pause agents (keeps everything installed, zero CPU/RAM usage):
kubectl -n web-agents scale deployment/web-agent --replicas=0
kubectl -n web-agents scale deployment/k6-agent  --replicas=0
kubectl -n web-agents scale deployment/dashboard  --replicas=0

# Remove personas (DUT mode):
kubectl delete -k personas/

# Remove platform (PKI, observability):
kubectl delete -k platform/ 2>/dev/null || true

# Remove DUT overlay:
kubectl delete -k k8s/dut/ 2>/dev/null || true

# Remove base stack:
kubectl delete namespace web-agents

# Remove macvlan interfaces from the host:
bash scripts/netsetup-personas.sh teardown

# Uninstall k3s entirely:
sudo /usr/local/bin/k3s-uninstall.sh

Repository references

Path Contents
docs/DUT_TESTBED.md Full physical NGFW test-bed setup guide
docs/ARCHITECTURE.md Component diagram and topology
docs/K6_FLEET.md synthetic-load fleet operations guide
docs/what-it-does.en.md What this project does
k8s/ Base manifests (namespace, agents, dashboard, Postgres)
k8s/dut/ DUT overlay (macvlan, PKI, SNMP, tuning, Reloader)
platform/ Persona PKI, CoreDNS, Grafana dashboards
personas/_generated/ 20 generated persona webservers
scripts/k8s-dut-up.sh 4-phase orchestration script
scripts/netsetup-personas.sh Creates/removes VLAN subinterfaces on the host
scripts/nexus/ NX-OS scripts for Nexus 9000 configuration
dashboard/src/db/migrations/ PostgreSQL migration files
scripts/host-tuning.sh One-shot host tuning: sysctls + modules + CPU governor + (optional) cpuManagerPolicy: static

Host tuning — REQUIRED for the persona stacks

The Synthetic Persona and Cloned Persona Caddy webservers REQUIRE host-side tuning to deliver their target throughput. Without it, kernel UDP buffers cap QUIC at ~30 Mbps per replica and the kernel resets cwnd on every HTTP/2 idle window, adding ~1 ms RTT to every cycle.

The in-cluster node-tuning DaemonSet applies the same kernel knobs at pod start, but does not survive a host reboot until the pod comes back up. The scripts/host-tuning.sh script writes the values into /etc/sysctl.d/, installs a systemd unit for the CPU governor, and (optionally) flips on cpuManagerPolicy: static for exclusive-core allocation — making the DaemonSet a belt-and-braces re-applier.

In single-node mode every workload (personas, slots, agents, dashboard, Cloner, NFS server) lives on the only node — so this script is run once, on that node:

# Apply sysctls + modules + CPU governor + cpuManagerPolicy: static
sudo scripts/host-tuning.sh apply --enable-cpu-pinning

# Verify (coloured report of every key value vs expected)
sudo scripts/host-tuning.sh status

# Undo (also reverts cpuManagerPolicy with --enable-cpu-pinning)
sudo scripts/host-tuning.sh remove --enable-cpu-pinning

--enable-cpu-pinning toggles cpuManagerPolicy: static on the kubelet (vanilla and k3s both supported, auto-detected) and restarts the kubelet, so plan a brief maintenance window. Without the flag the script still applies sysctls, modules, governor, and THP — pinning is the only step that touches running workloads.

For the multi-node deploy guide (UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md) the script must run on every UCS host. Full reference (and the values it writes) in PERFORMANCE_TUNING_HOST.md.


Support

If a step gets stuck, open an issue on the repository or reach out:

André Luiz Gallon — agallon@Cisco.com


© 2026 André Luiz Gallon — Distributed under PolyForm Noncommercial 1.0.0 with Additional Use Restrictions (Appendix A).


Optional: Public Website Cloner

Goal

The cloner downloads real public sites via VLAN 40 (direct internet access, bypassing the NGFW) and serves the cloned content locally. This lets the agent fleet exercise the NGFW with real-world content without depending on external reachability during the test runs.

Storage — single-node

In single-node mode, every pod (cloner + 10 cloned-persona slots + NFS server) lands on the same node. The shared cloned-sites volume is still served by the in-cluster NFS server (k8s/dut/35-nfs-server.yaml) — but because client and server are co-located the NFS round-trip is loopback and overhead is negligible. The hostPath /var/lib/agent-cluster/cloned-sites underneath the NFS server holds the durable copy.

The same architecture trivially extends to multi-node — see UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md for the multi-node-specific notes (NFS over OOBI, cloner pinned to UCS-4).

Rede — VLAN 40 (ISP)

Interface VLAN Subnet Uso
eth1.40 40 DHCP (ISP) Cloner — egresso internet

O k8s-install.sh cria eth1.40 automaticamente. O Nexus 9000 deve permitir VLAN 40 no trunk de UCS-1 (single-node):

# No Nexus 9000
conf t
  vlan 40
    name cloner-isp-egress
  interface Ethernet1/<N>       ! trunk para o servidor
    switchport trunk allowed vlan add 40

HTTPS — sem configuração adicional

O cloner acessa sites HTTPS públicos diretamente. Os certificados são validados pelo bundle Mozilla/NSS do Debian (ca-certificates). Nenhuma configuração é necessária para sites com certificados públicos (Let's Encrypt, DigiCert, etc.). A CA do NGFW não é injetada no cloner.

Deploy

O k8s-install.sh já criou eth1.40 e o secrets-init.sh já criou cloner-secrets. Verificar:

# Interface VLAN 40
ip link show eth1.40   # deve estar UP

# Secret
kubectl get secret cloner-secrets -n web-agents

# Pod cloner
kubectl get pod -n web-agents -l app.kubernetes.io/name=cloner
kubectl logs -n web-agents -l app.kubernetes.io/name=cloner -f
# Aguardar: "[health] ping 8.8.8.8: ok X.Xms"

Criar job de clonagem

curl -s -X POST http://localhost:3000/api/clone/jobs \
  -H "Authorization: Basic $(echo -n 'admin:<ADMIN_PASS>' | base64)" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","personaName":"shop"}'

# Acompanhar status
curl -s http://localhost:3000/api/clone/jobs \
  -H "Authorization: Basic $(echo -n 'admin:<ADMIN_PASS>' | base64)" | \
  python3 -m json.tool

Monitorar saúde da internet

Abrir o dashboard TLSStress.Art no Grafana → linha Cloner — Internet Health & Jobs.

kubectl port-forward -n web-agents svc/clone-serve 8081:8081
curl http://localhost:8081/metrics | grep cloner_

Saída esperada:

cloner_internet_up{target="8.8.8.8"} 1
cloner_internet_up{target="1.1.1.1"} 1
cloner_internet_any_up 1
cloner_gateway_up{gateway="<IP-DHCP>"} 1
cloner_ping_rtt_ms{target="8.8.8.8"} 4.2

Troubleshooting rápido

Sintoma Verificação
Pod não inicia kubectl describe pod -n web-agents -l app.kubernetes.io/name=cloner
net1 sem IP kubectl exec -n web-agents deploy/cloner -- ip addr show net1
VLAN 40 ausente ip link show eth1.40 no host; VLAN 40 no Nexus trunk
Internet vermelho no Grafana kubectl exec -n web-agents deploy/cloner -- ping -c 3 -I net1 8.8.8.8
Erro de certificado HTTPS Verificar que o site usa CA pública; CA privada requer ConfigMap
Job travado kubectl logs -n web-agents deploy/cloner -f

Guia completo: docs/CLONER_OPERATIONS.md