TLSStress.Art — Single-Node Deployment on Ubuntu with k3s¶
Deployment mode: single-node — all components (agents, webservers, dashboard, observability) run on one Ubuntu server. This is the quickest way to get started. For a multi-server deployment that scales each tier to a dedicated machine, see
UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md.Last verified against shipping code: v3.7.0 (2026-05-12) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture + the ZTP-prem 12/12 camadas insider-operator posture (25 patent claims, Tier A/B partition, Confidential Computing, sealed audit hash-chain, K8s admission webhook, TPM 2.0 measured-boot, DLP egress monitor, behavioural anomaly detector). ADRs 0014, 0019-0025 cover post-Freeze additions.
Goal: install the full TLSStress.Art stack on a single Ubuntu server using k3s — including Dashboard, Postgres, browser-engine fleet (up to 300 agents), synthetic-load fleet (up to 1,000 agents), 20 TLS persona webservers, Grafana and Prometheus — and optionally place a physical NGFW in the traffic path to measure TLS inspection performance (DUT mode).
Who this is for: network and systems engineers comfortable with a Linux terminal. No prior Kubernetes experience required.
Time estimate: Simple mode: 30–45 min | Full DUT mode: 90–120 min.
Author: André Luiz Gallon — agallon@Cisco.com
Automated installation (recommended)¶
If you just want to get started quickly, use the automated install script. It installs k3s, Helm, cert-manager, Multus, configures VLAN interfaces and applies all Kubernetes manifests in the correct order — no Kubernetes knowledge required:
# Clone the repository
git clone https://github.com/nollagluiz/AI_forSE.git
cd AI_forSE
# Single-node: everything on one server
sudo ./scripts/k8s-install.sh --mode=single --data-iface=eth1
The script will:
1. Validate prerequisites (OS, RAM, disk, network interfaces)
2. Install k3s, Helm, cert-manager, and Multus CNI
3. Configure VLAN subinterfaces on eth1 (VLANs 20, 30, 99, 101–120)
4. Apply all Kubernetes manifests in the correct order
5. Wait for pods to become Ready and print the Dashboard URL
Options: run
sudo ./scripts/k8s-install.sh --helpfor all flags, including--dry-runto preview what would be executed.
The rest of this guide explains each step in detail, which is useful for customising the setup or troubleshooting the automated installation.
Choose your installation mode¶
| Simple mode | DUT mode | |
|---|---|---|
| Use case | Basic lab, reachability and throughput testing against real internet sites | NGFW test-bed: measures the real cost of TLS inspection with HTTP/2 and HTTP/3 |
| browser-engine fleet | ✅ | ✅ |
| synthetic-load fleet | ✅ | ✅ |
| 30 persona webservers (20 Synthetic + 10 Cloned slots) | ❌ | ✅ |
| Physical NGFW (DUT) in path | ❌ | ✅ |
| Grafana + Prometheus | Optional | ✅ |
| cert-manager + internal PKI | ✅ (required) | ✅ |
| Multus CNI + macvlan | ❌ | ✅ |
| NICs required | 1 | 2 (eth0 management + eth1 VLAN trunk) |
| Orchestration script | Manual steps | scripts/k8s-dut-up.sh up |
Prerequisites¶
| Item | Simple mode | DUT mode |
|---|---|---|
| OS | Ubuntu 22.04 LTS or 24.04 LTS | same |
| Architecture | x86_64 or arm64 | x86_64 (bare-metal/UCS recommended) |
| RAM | 16 GB (up to ~50 agents) | 64 GB+ |
| vCPUs | 4 | 16+ |
| Disk | 60 GB free | 200 GB+ (personas + logs + metrics) |
| NICs | 1 (eth0) | 2 (eth0=k8s management, eth1=802.1q trunk) |
| Switch | — | Nexus 9000 with VLAN trunk (VLANs 20, 30, 99, 101–120) |
| NGFW | — | Cisco FTD/IOS-XE, FortiGate, Palo Alto, Check Point, Huawei, etc. |
| Host access | user with sudo |
same |
| Internet (bootstrap) | Yes (k3s, GHCR images) | Yes, or offline — see Images section |
Headless server? Use
ssh -L 3000:localhost:3000 user@serverto reach the UI from your laptop.
Part A — Host setup (both modes)¶
Step 0 — Base packages¶
sudo apt update && sudo apt install -y \
curl ca-certificates iptables openssl jq git python3 iproute2 vlan
Step 1 — Install k3s¶
Simple mode:
curl -sfL https://get.k3s.io | \
INSTALL_K3S_EXEC="--disable=traefik --write-kubeconfig-mode=644" sh -
DUT mode (includes Multus CNI for macvlan):
curl -sfL https://get.k3s.io | \
INSTALL_K3S_EXEC="--disable=traefik --write-kubeconfig-mode=644 \
--flannel-iface=eth0" sh -
# Multus CNI — required for the net1 macvlan VLAN data-plane interface
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml
kubectl -n kube-system wait --for=condition=ready \
pod -l app=multus --timeout=120s
Verify the node is up:
sudo systemctl status k3s --no-pager | head -5
sudo k3s kubectl get nodes
Expected: STATUS=Ready, ROLES=control-plane,master.
Configure kubectl without sudo:
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc
source ~/.bashrc
kubectl get nodes
Step 2 — Install Nginx Ingress Controller¶
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.11.3/deploy/static/provider/cloud/deploy.yaml
kubectl -n ingress-nginx wait --for=condition=available \
deployment/ingress-nginx-controller --timeout=180s
On a single-node host the
EXTERNAL-IPmay stay<pending>— that's fine; the dashboard is reached viakubectl port-forward.
Step 3 — Install metrics-server¶
The HPA needs CPU/memory metrics to autoscale the agent fleets.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# k3s uses a self-signed kubelet cert; metrics-server needs this flag:
kubectl -n kube-system patch deployment metrics-server --type='json' \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
kubectl -n kube-system wait --for=condition=available \
deployment/metrics-server --timeout=180s
# Sanity check — should print CPU/MEM within seconds:
kubectl top nodes
Step 4 — Install cert-manager¶
Required in both modes. In Simple mode it manages internal TLS; in DUT mode it issues certificates for all 30 persona webservers (20 Synthetic + 10 Cloned slots) via an internal CA chain.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.0/cert-manager.yaml
kubectl -n cert-manager wait --for=condition=available \
deployment --all --timeout=180s
# Confirm all 3 cert-manager pods are Running:
kubectl -n cert-manager get pods
Step 5 — Clone the repository¶
git clone https://github.com/nollagluiz/AI_forSE.git
cd AI_forSE
# Always use the main branch
Part B — Simple mode (agents vs. internet sites)¶
browser-engine and synthetic-load agents navigate real internet sites. No personas, no NGFW, no macvlan. Good for quickly validating the agent/dashboard/synthetic-load engine stack.
Step 6B — Create secrets¶
# Create namespace
kubectl create namespace web-agents
# Postgres credentials
DB_PASS=$(openssl rand -hex 24)
kubectl -n web-agents create secret generic postgres-credentials \
--from-literal=POSTGRES_USER=agent_dashboard \
--from-literal=POSTGRES_PASSWORD="$DB_PASS" \
--from-literal=POSTGRES_DB=agent_dashboard
# Shared token between dashboard and browser-engine agents
TOKEN=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic web-agent-secrets \
--from-literal=CONTROLLER_TOKEN="$TOKEN"
# synthetic-load agent token
K6_SECRET=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic k6-agent-secrets \
--from-literal=DASHBOARD_SECRET="$K6_SECRET"
# Dashboard secrets
ADMIN_PASS=$(openssl rand -hex 16)
SESSION=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic dashboard-secrets \
--from-literal=DATABASE_URL="postgresql://agent_dashboard:${DB_PASS}@postgres.web-agents.svc.cluster.local:5432/agent_dashboard?sslmode=disable" \
--from-literal=AGENT_API_TOKEN="$TOKEN" \
--from-literal=ADMIN_BASIC_AUTH="admin:$ADMIN_PASS" \
--from-literal=SESSION_SECRET="$SESSION"
echo "================================================"
echo " Dashboard admin login:"
echo " User: admin"
echo " Password: $ADMIN_PASS"
echo "================================================"
Save the password — you'll need it on
/login.
Step 7B — Apply base manifests¶
Images are already correct in the manifests (ghcr.io/nollagluiz/). No local
build required — k3s pulls them from GHCR automatically.
for f in k8s/00-namespace.yaml \
k8s/05-resource-quota.yaml \
k8s/10-agent-config.yaml \
k8s/11-k6-agent-config.yaml \
k8s/20-agent-deployment.yaml \
k8s/21-k6-agent-deployment.yaml \
k8s/40-postgres.yaml \
k8s/42-pgbouncer.yaml \
k8s/50-dashboard.yaml \
k8s/52-dashboard-persona-rbac.yaml \
k8s/55-dashboard-pdb.yaml; do
kubectl apply -f "$f" --server-side 2>/dev/null || kubectl apply -f "$f"
done
Step 8B — Wait for pods to become ready¶
kubectl -n web-agents wait --for=condition=ready \
pod -l app.kubernetes.io/name=postgres --timeout=180s
kubectl -n web-agents wait --for=condition=available \
deployment/dashboard --timeout=240s
kubectl -n web-agents get pods
Expected: postgres-0, dashboard-* and web-agent-* all Running.
Step 9B — Apply database migrations¶
PG_POD=$(kubectl -n web-agents get pod \
-l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}')
kubectl -n web-agents exec -i "$PG_POD" -- \
psql -U agent_dashboard -d agent_dashboard \
-c 'CREATE EXTENSION IF NOT EXISTS pgcrypto;'
for f in dashboard/src/db/migrations/*.sql; do
echo "==> applying $f"
kubectl -n web-agents exec -i "$PG_POD" -- \
psql -U agent_dashboard -d agent_dashboard -v ON_ERROR_STOP=1 < "$f"
done
Normal output:
ALTER TABLE,CREATE INDEX. Seeingalready exists, skippingis fine — migrations are idempotent.
Step 10B — Open the dashboard¶
In a second terminal:
kubectl -n web-agents port-forward svc/dashboard 3000:3000
- Local desktop: http://localhost:3000/
- Remote server:
ssh -L 3000:localhost:3000 user@your-ubuntu-serverthen open http://localhost:3000/ on your laptop.
Login: admin + the password from Step 6B.
Step 11B — Add target sites and start the fleet¶
ADMIN_PASS=$(kubectl -n web-agents get secret dashboard-secrets \
-o jsonpath='{.data.ADMIN_BASIC_AUTH}' | base64 -d | cut -d: -f2-)
ADMIN="admin:$ADMIN_PASS"
# Add sample target sites:
for url in https://www.g1.globo.com https://www.uol.com.br https://www.nasa.gov; do
curl -s -u "$ADMIN" -H 'content-type: application/json' \
-d "{\"url\":\"$url\",\"weight\":1}" \
http://localhost:3000/api/admin/targets; echo
done
# Enable the browser-engine fleet with 5 agents:
curl -s -u "$ADMIN" -X PUT -H 'content-type: application/json' \
-d '{"enabled":true,"cycleIntervalMs":30000,"desiredAgentCount":5}' \
http://localhost:3000/api/admin/config
Step 12B — Scale up (up to 300 browser engine, 1,000 synthetic-load engine)¶
# Scale browser-engine fleet manually:
kubectl -n web-agents scale deployment/web-agent --replicas=50
# Scale synthetic-load fleet manually:
kubectl -n web-agents scale deployment/k6-agent --replicas=100
# Or let the HPA decide based on CPU (already configured in manifests):
kubectl -n web-agents get hpa -w
Each browser-engine agent reserves 512 MB RAM. Each synthetic-load agent reserves only 128 MB (Go runtime, no browser). Size accordingly.
Part C — DUT mode (full NGFW test-bed)¶
This mode places a physical NGFW between the agents and the 30 persona webservers (20 Synthetic + 10 Cloned slots). All HTTPS traffic traverses the firewall, which decrypts, inspects and re-encrypts every connection. The stack measures the real cost of that inspection.
Architecture summary:
Agents (browser-engine VLAN 20 / synthetic-load VLAN 30) │ TLS leg 1: agents → NGFW (NGFW presents its own cert) ▼ NGFW (Device Under Test) │ TLS leg 2: NGFW → Caddy persona (cert issued by cert-manager) ▼ 20 Synthetic Personas (VLANs 101–120, subnets 10.1.x.0/27) 10 Cloned Persona slots (VLANs 200–209, subnets 10.2.{1..10}.0/27)
Step 5C — Extra dependencies for DUT mode¶
Multus was already installed in Step 1. Install Helm and the observability stack (Prometheus + Grafana) now:
# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Prometheus + Grafana (kube-prometheus-stack)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword=prom-operator \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--timeout 10m
kubectl -n monitoring wait --for=condition=available deployment --all --timeout=300s
Step 6C — Configure the NGFW CA¶
Agents must trust the certificates presented by the NGFW during TLS interception. Export the NGFW's CA bundle and paste it into the ConfigMap:
# Export the CA from your NGFW in PEM format — process varies by vendor:
# Cisco FTD: Devices → Certificates → Internal CA → Export Chain
# FortiGate: System → Certificates → Export
# Palo Alto: Device → Certificate Management → Certificates → Export
# Check Point: SmartConsole → Gateways → Edit → HTTPS Inspection → Export CA
# Paste the PEM content (including -----BEGIN CERTIFICATE-----)
# into k8s/dut/10-ngfw-ca.yaml under data.ca.crt:
${EDITOR:-nano} k8s/dut/10-ngfw-ca.yaml
Step 7C — Create all secrets¶
# Namespace (may already exist if you did Simple mode first)
kubectl create namespace web-agents --dry-run=client -o yaml | kubectl apply -f -
# Postgres
DB_PASS=$(openssl rand -hex 24)
kubectl -n web-agents create secret generic postgres-credentials \
--from-literal=POSTGRES_USER=agent_dashboard \
--from-literal=POSTGRES_PASSWORD="$DB_PASS" \
--from-literal=POSTGRES_DB=agent_dashboard \
--dry-run=client -o yaml | kubectl apply -f -
# browser-engine agent token
TOKEN=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic web-agent-secrets \
--from-literal=CONTROLLER_TOKEN="$TOKEN" \
--dry-run=client -o yaml | kubectl apply -f -
# synthetic-load agent token
K6_SECRET=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic k6-agent-secrets \
--from-literal=DASHBOARD_SECRET="$K6_SECRET" \
--dry-run=client -o yaml | kubectl apply -f -
# Dashboard
ADMIN_PASS=$(openssl rand -hex 16)
SESSION=$(openssl rand -hex 32)
kubectl -n web-agents create secret generic dashboard-secrets \
--from-literal=DATABASE_URL="postgresql://agent_dashboard:${DB_PASS}@postgres.web-agents.svc.cluster.local:5432/agent_dashboard?sslmode=disable" \
--from-literal=AGENT_API_TOKEN="$TOKEN" \
--from-literal=ADMIN_BASIC_AUTH="admin:$ADMIN_PASS" \
--from-literal=SESSION_SECRET="$SESSION" \
--dry-run=client -o yaml | kubectl apply -f -
# SNMP community for the NGFW (default "public" in most lab setups)
SNMP_COMMUNITY="public" # Change to match your NGFW SNMP config
kubectl -n web-agents create secret generic dut-snmp-secrets \
--from-literal=SNMP_COMMUNITY="$SNMP_COMMUNITY" \
--dry-run=client -o yaml | kubectl apply -f -
echo "================================================"
echo " Dashboard admin login:"
echo " User: admin"
echo " Password: $ADMIN_PASS"
echo "================================================"
Step 8C — Label the DUT node¶
The role: ngfw-dut nodeSelector ensures personas and the DUT webserver run
on the node with the trunk NIC:
NODE=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl label node "$NODE" role=ngfw-dut
kubectl get node "$NODE" --show-labels | grep ngfw
Step 9C — Bring up the full stack with k8s-dut-up.sh¶
The script orchestrates all 4 phases idempotently:
# Recommended: run preflight checks first
bash scripts/k8s-dut-up.sh preflight
# Bring up the complete stack (10–20 min on first run):
bash scripts/k8s-dut-up.sh up
What the script does:
| Phase | What it applies |
|---|---|
| Phase 1 | kubectl apply -f k8s/ — namespace, Postgres, Dashboard, browser-engine and synthetic-load agents, PgBouncer, Prometheus rules |
| Phase 2 | kubectl apply -k k8s/dut/ — NGFW CA, macvlan NetworkAttachmentDefinitions, DUT Caddy webserver, SNMP exporter, DUT NetworkPolicy, TLS PKI, DUT Caddyfile, node tuning DaemonSet, Stakater Reloader; strategic merge patches for browser-engine and synthetic-load |
| Phase 3 | kubectl apply -k platform/ — persona PKI ClusterIssuer, cert-manager issues certs for all 30 personas (20 Synthetic + 10 Cloned slots); CoreDNS patch for persona.internal zone; Grafana dashboards. Waits for persona-ca-bundle Secret |
| Phase 4 | kubectl apply -k personas/ (20 Synthetic Persona namespaces) and kubectl apply -k k8s/clone-personas/ (10 Cloned Persona slot namespaces) — Caddy deployments, NetworkAttachmentDefinitions, TLS Certificates; scripts/netsetup-personas.sh setup — creates VLAN subinterfaces 101–120 + 200–209 and routing on the host |
Step 10C — Verify the persona fleet¶
# Check persona pods — all should be Running after ~5 min:
kubectl get pods -A -l app.kubernetes.io/part-of=web-agent-cluster \
--field-selector=status.phase!=Running 2>/dev/null | head -20
# Check TLS certificates issued by cert-manager:
kubectl get certificate -A | grep -v "True" | head -20
# Test DNS resolution for personas:
kubectl run -it --rm dns-test --image=busybox --restart=Never \
-- nslookup shop.persona.internal
# Expected: Address: 10.1.1.2
# List macvlan interfaces created on the host:
ip link show | grep -E "vlan|macvlan" | head -20
Step 11C — Create the Saleor secret (persona shop)¶
The shop persona runs the Saleor e-commerce app, which needs a dedicated
Postgres database. Create the Secret after Phase 4 (the persona-shop
namespace only exists then):
# Create the Saleor database using the cluster Postgres:
PG_POD=$(kubectl -n web-agents get pod \
-l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}')
kubectl -n web-agents exec -i "$PG_POD" -- \
psql -U agent_dashboard -d agent_dashboard <<SQL
CREATE DATABASE saleor;
CREATE USER saleor WITH PASSWORD 'saleor_dev_pass';
GRANT ALL PRIVILEGES ON DATABASE saleor TO saleor;
SQL
kubectl -n persona-shop create secret generic persona-shop-db \
--from-literal=url="postgres://saleor:saleor_dev_pass@postgres.web-agents.svc.cluster.local:5432/saleor"
# Restart the shop pod to pick up the Secret:
kubectl -n persona-shop rollout restart deployment/caddy
Step 12C — Access Grafana and the Dashboard¶
# Grafana (TLS throughput, per-persona latency, NGFW CPU via SNMP):
kubectl -n monitoring port-forward svc/kube-prometheus-grafana 3001:80
# Web Agent Dashboard:
kubectl -n web-agents port-forward svc/dashboard 3000:3000
- Grafana: http://localhost:3001 — login
admin/prom-operator - "Persona Fleet Overview" dashboard — throughput per persona
- "DUT NGFW" dashboard — CPU/memory via SNMP
- Dashboard: http://localhost:3000 — login
admin/ password from Step 7C
Step 13C — Configure the switch (Nexus 9000)¶
# Ready-made NX-OS scripts are in scripts/nexus/:
# scripts/nexus/01-apply-tuning.nxos — QoS, MTU 9216, EEE off, ECMP hash
# scripts/nexus/02-verify.nxos — verify configuration
# scripts/nexus/03-rollback.nxos — roll back if needed
# Apply via SSH:
ssh admin@nexus-9000 < scripts/nexus/01-apply-tuning.nxos
See docs/DUT_TESTBED.md for the complete hardware setup
guide including cabling, VLAN trunk design and NGFW policy configuration.
Step 14C — Start the fleet and run tests¶
ADMIN_PASS=$(kubectl -n web-agents get secret dashboard-secrets \
-o jsonpath='{.data.ADMIN_BASIC_AUTH}' | base64 -d | cut -d: -f2-)
ADMIN="admin:$ADMIN_PASS"
# Enable browser-engine fleet pointing at internal personas:
curl -s -u "$ADMIN" -X PUT -H 'content-type: application/json' \
-d '{"enabled":true,"cycleIntervalMs":15000,"desiredAgentCount":50}' \
http://localhost:3000/api/admin/config
# Start synthetic-load engine load test (e.g., 200 agents for 10 minutes):
kubectl -n web-agents scale deployment/k6-agent --replicas=200
Step 15C — Full DUT mode validation¶
# Confirm all persona pods are Running:
kubectl get pods -A -l app.kubernetes.io/part-of=web-agent-cluster | grep -v Running
# browser-engine agents completing cycles:
kubectl -n web-agents logs -l app.kubernetes.io/name=web-agent \
--tail=5 --prefix 2>/dev/null | grep -i "cycle\|success\|http"
# synthetic-load agents running tests:
kubectl -n web-agents logs -l app.kubernetes.io/name=k6-agent \
--tail=5 --prefix 2>/dev/null | grep -i "run\|result\|http"
# Throughput in the last 5 minutes:
PG_POD=$(kubectl -n web-agents get pod \
-l app.kubernetes.io/name=postgres -o jsonpath='{.items[0].metadata.name}')
kubectl -n web-agents exec -i "$PG_POD" -- \
psql -U agent_dashboard -d agent_dashboard -c \
"SELECT count(*) AS runs_5m,
round(avg(total_duration_ms))::int AS avg_ms,
(sum(transferred_bytes)/1024/1024)::int AS mb
FROM runs WHERE started_at > now() - interval '5 minutes';"
Container images¶
All images are public on ghcr.io/nollagluiz/ and support amd64 and
arm64. No local build is required.
# Pull manually (optional — k3s pulls automatically from GHCR):
docker pull ghcr.io/nollagluiz/web-agent-agent:v3.7.0
docker pull ghcr.io/nollagluiz/web-agent-dashboard:v3.7.0
docker pull ghcr.io/nollagluiz/web-agent-k6agent:v3.7.0
docker pull ghcr.io/nollagluiz/web-agent-webserver:v3.7.0
Air-gapped / offline k3s:
# On a machine with internet access:
for img in \
ghcr.io/nollagluiz/web-agent-agent:v3.7.0 \
ghcr.io/nollagluiz/web-agent-dashboard:v3.7.0 \
ghcr.io/nollagluiz/web-agent-k6agent:v3.7.0 \
ghcr.io/nollagluiz/web-agent-webserver:v3.7.0; do
docker pull "$img"
docker save "$img" -o "$(basename ${img%:*}).tar"
done
# Copy the .tar files to the server and import into k3s containerd:
sudo k3s ctr images import web-agent-agent.tar
sudo k3s ctr images import web-agent-dashboard.tar
sudo k3s ctr images import web-agent-k6agent.tar
sudo k3s ctr images import web-agent-webserver.tar
Day-to-day commands¶
# ── Overall status ────────────────────────────────────────────────────────
kubectl -n web-agents get pods
kubectl get pods -A -l app.kubernetes.io/part-of=web-agent-cluster
# ── browser-engine fleet ──────────────────────────────────────────────────────
kubectl -n web-agents scale deployment/web-agent --replicas=N
kubectl -n web-agents rollout restart deployment/web-agent
kubectl -n web-agents logs -l app.kubernetes.io/name=web-agent -f --max-log-requests 50
# ── synthetic-load fleet ──────────────────────────────────────────────────────────────
kubectl -n web-agents scale deployment/k6-agent --replicas=N
kubectl -n web-agents logs -l app.kubernetes.io/name=k6-agent -f --max-log-requests 50
# ── Dashboard ─────────────────────────────────────────────────────────────
kubectl -n web-agents port-forward svc/dashboard 3000:3000
kubectl -n web-agents rollout restart deployment/dashboard
kubectl -n web-agents logs deploy/dashboard -f
# ── Grafana (DUT mode) ────────────────────────────────────────────────────
kubectl -n monitoring port-forward svc/kube-prometheus-grafana 3001:80
# ── Personas ──────────────────────────────────────────────────────────────
# Status of all persona pods:
kubectl get pods -A -l persona --no-headers | sort -k1
# Restart a specific persona:
kubectl -n persona-shop rollout restart deployment/caddy
# View a persona's TLS certificate:
kubectl -n persona-shop get certificate persona-shop-tls
# ── HPA (autoscaler) ──────────────────────────────────────────────────────
kubectl -n web-agents get hpa -w
# ── Postgres ──────────────────────────────────────────────────────────────
PG=$(kubectl -n web-agents get pod -l app.kubernetes.io/name=postgres \
-o jsonpath='{.items[0].metadata.name}')
kubectl -n web-agents exec -it "$PG" -- psql -U agent_dashboard -d agent_dashboard
# ── Resource usage ────────────────────────────────────────────────────────
kubectl top nodes
kubectl -n web-agents top pods
Troubleshooting¶
| Symptom | Likely cause | Fix |
|---|---|---|
Pods in ImagePullBackOff |
GHCR unreachable or wrong tag | Check image tag; use offline import (see Images section) |
kubectl top nodes shows "metrics not available" |
metrics-server missing --kubelet-insecure-tls |
Repeat the kubectl patch from Step 3 and wait 60 s |
Pods stuck in Pending |
Insufficient CPU/RAM on the node | Lower --replicas or reduce requests in k8s/20-agent-deployment.yaml |
| Dashboard HTTP 500 on API calls | Migrations not applied | Repeat Step 9B |
| Agents register but no cycles | Fleet not enabled or no targets | Repeat Step 11B or 14C |
| Admin login refused | Wrong password | kubectl -n web-agents get secret dashboard-secrets -o jsonpath='{.data.ADMIN_BASIC_AUTH}' \| base64 -d |
nslookup shop.persona.internal fails |
CoreDNS not patched | bash platform/dns/patch-coredns.sh |
Persona pods in CrashLoopBackOff |
TLS cert not yet issued | kubectl get certificate -A; wait ~60 s for cert-manager |
persona-shop in CreateContainerConfigError |
persona-shop-db Secret missing |
Create the Secret (Step 11C) |
node-tuning DaemonSet rejected by PSA |
PSA label missing on namespace | Apply kubectl apply -k k8s/dut/ (includes 00-namespace-psa.yaml) |
| macvlan interfaces not created | netsetup-personas.sh not run |
bash scripts/netsetup-personas.sh setup |
| Traffic not passing through the NGFW | Wrong VLAN routing on Nexus | Check 802.1q trunk and routes — see scripts/nexus/02-verify.nxos |
| SNMP metrics missing in Grafana | Wrong SNMP community | Update dut-snmp-secrets and restart snmp-exporter |
| Caddy pods serving expired cert after renewal | Stakater Reloader not installed | Check kubectl -n web-agents get deploy/reloader; re-apply k8s/dut/ |
Cleanup¶
# Pause agents (keeps everything installed, zero CPU/RAM usage):
kubectl -n web-agents scale deployment/web-agent --replicas=0
kubectl -n web-agents scale deployment/k6-agent --replicas=0
kubectl -n web-agents scale deployment/dashboard --replicas=0
# Remove personas (DUT mode):
kubectl delete -k personas/
# Remove platform (PKI, observability):
kubectl delete -k platform/ 2>/dev/null || true
# Remove DUT overlay:
kubectl delete -k k8s/dut/ 2>/dev/null || true
# Remove base stack:
kubectl delete namespace web-agents
# Remove macvlan interfaces from the host:
bash scripts/netsetup-personas.sh teardown
# Uninstall k3s entirely:
sudo /usr/local/bin/k3s-uninstall.sh
Repository references¶
| Path | Contents |
|---|---|
docs/DUT_TESTBED.md |
Full physical NGFW test-bed setup guide |
docs/ARCHITECTURE.md |
Component diagram and topology |
docs/K6_FLEET.md |
synthetic-load fleet operations guide |
docs/what-it-does.en.md |
What this project does |
k8s/ |
Base manifests (namespace, agents, dashboard, Postgres) |
k8s/dut/ |
DUT overlay (macvlan, PKI, SNMP, tuning, Reloader) |
platform/ |
Persona PKI, CoreDNS, Grafana dashboards |
personas/_generated/ |
20 generated persona webservers |
scripts/k8s-dut-up.sh |
4-phase orchestration script |
scripts/netsetup-personas.sh |
Creates/removes VLAN subinterfaces on the host |
scripts/nexus/ |
NX-OS scripts for Nexus 9000 configuration |
dashboard/src/db/migrations/ |
PostgreSQL migration files |
scripts/host-tuning.sh |
One-shot host tuning: sysctls + modules + CPU governor + (optional) cpuManagerPolicy: static |
Host tuning — REQUIRED for the persona stacks¶
The Synthetic Persona and Cloned Persona Caddy webservers REQUIRE host-side tuning to deliver their target throughput. Without it, kernel UDP buffers cap QUIC at ~30 Mbps per replica and the kernel resets cwnd on every HTTP/2 idle window, adding ~1 ms RTT to every cycle.
The in-cluster node-tuning DaemonSet applies the same kernel knobs at pod start, but does not survive a host reboot until the pod comes back up. The scripts/host-tuning.sh script writes the values into /etc/sysctl.d/, installs a systemd unit for the CPU governor, and (optionally) flips on cpuManagerPolicy: static for exclusive-core allocation — making the DaemonSet a belt-and-braces re-applier.
In single-node mode every workload (personas, slots, agents, dashboard, Cloner, NFS server) lives on the only node — so this script is run once, on that node:
# Apply sysctls + modules + CPU governor + cpuManagerPolicy: static
sudo scripts/host-tuning.sh apply --enable-cpu-pinning
# Verify (coloured report of every key value vs expected)
sudo scripts/host-tuning.sh status
# Undo (also reverts cpuManagerPolicy with --enable-cpu-pinning)
sudo scripts/host-tuning.sh remove --enable-cpu-pinning
--enable-cpu-pinning toggles cpuManagerPolicy: static on the kubelet (vanilla and k3s both supported, auto-detected) and restarts the kubelet, so plan a brief maintenance window. Without the flag the script still applies sysctls, modules, governor, and THP — pinning is the only step that touches running workloads.
For the multi-node deploy guide (UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md) the script must run on every UCS host. Full reference (and the values it writes) in PERFORMANCE_TUNING_HOST.md.
Support¶
If a step gets stuck, open an issue on the repository or reach out:
André Luiz Gallon — agallon@Cisco.com
© 2026 André Luiz Gallon — Distributed under PolyForm Noncommercial 1.0.0 with Additional Use Restrictions (Appendix A).
Optional: Public Website Cloner¶
Goal¶
The cloner downloads real public sites via VLAN 40 (direct internet access, bypassing the NGFW) and serves the cloned content locally. This lets the agent fleet exercise the NGFW with real-world content without depending on external reachability during the test runs.
Storage — single-node¶
In single-node mode, every pod (cloner + 10 cloned-persona slots + NFS server) lands on the same node. The shared cloned-sites volume is still served by the in-cluster NFS server (k8s/dut/35-nfs-server.yaml) — but because client and server are co-located the NFS round-trip is loopback and overhead is negligible. The hostPath /var/lib/agent-cluster/cloned-sites underneath the NFS server holds the durable copy.
The same architecture trivially extends to multi-node — see UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.en.md for the multi-node-specific notes (NFS over OOBI, cloner pinned to UCS-4).
Rede — VLAN 40 (ISP)¶
| Interface | VLAN | Subnet | Uso |
|---|---|---|---|
| eth1.40 | 40 | DHCP (ISP) | Cloner — egresso internet |
O k8s-install.sh cria eth1.40 automaticamente. O Nexus 9000 deve permitir VLAN 40 no trunk de UCS-1 (single-node):
# No Nexus 9000
conf t
vlan 40
name cloner-isp-egress
interface Ethernet1/<N> ! trunk para o servidor
switchport trunk allowed vlan add 40
HTTPS — sem configuração adicional¶
O cloner acessa sites HTTPS públicos diretamente. Os certificados são validados pelo bundle Mozilla/NSS do Debian (ca-certificates). Nenhuma configuração é necessária para sites com certificados públicos (Let's Encrypt, DigiCert, etc.). A CA do NGFW não é injetada no cloner.
Deploy¶
O k8s-install.sh já criou eth1.40 e o secrets-init.sh já criou cloner-secrets. Verificar:
# Interface VLAN 40
ip link show eth1.40 # deve estar UP
# Secret
kubectl get secret cloner-secrets -n web-agents
# Pod cloner
kubectl get pod -n web-agents -l app.kubernetes.io/name=cloner
kubectl logs -n web-agents -l app.kubernetes.io/name=cloner -f
# Aguardar: "[health] ping 8.8.8.8: ok X.Xms"
Criar job de clonagem¶
curl -s -X POST http://localhost:3000/api/clone/jobs \
-H "Authorization: Basic $(echo -n 'admin:<ADMIN_PASS>' | base64)" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","personaName":"shop"}'
# Acompanhar status
curl -s http://localhost:3000/api/clone/jobs \
-H "Authorization: Basic $(echo -n 'admin:<ADMIN_PASS>' | base64)" | \
python3 -m json.tool
Monitorar saúde da internet¶
Abrir o dashboard TLSStress.Art no Grafana → linha Cloner — Internet Health & Jobs.
kubectl port-forward -n web-agents svc/clone-serve 8081:8081
curl http://localhost:8081/metrics | grep cloner_
Saída esperada:
cloner_internet_up{target="8.8.8.8"} 1
cloner_internet_up{target="1.1.1.1"} 1
cloner_internet_any_up 1
cloner_gateway_up{gateway="<IP-DHCP>"} 1
cloner_ping_rtt_ms{target="8.8.8.8"} 4.2
Troubleshooting rápido¶
| Sintoma | Verificação |
|---|---|
| Pod não inicia | kubectl describe pod -n web-agents -l app.kubernetes.io/name=cloner |
net1 sem IP |
kubectl exec -n web-agents deploy/cloner -- ip addr show net1 |
| VLAN 40 ausente | ip link show eth1.40 no host; VLAN 40 no Nexus trunk |
| Internet vermelho no Grafana | kubectl exec -n web-agents deploy/cloner -- ping -c 3 -I net1 8.8.8.8 |
| Erro de certificado HTTPS | Verificar que o site usa CA pública; CA privada requer ConfigMap |
| Job travado | kubectl logs -n web-agents deploy/cloner -f |
Guia completo: docs/CLONER_OPERATIONS.md