Skip to content

TLSStress.Art — System Overview

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.

A comprehensive technical reference for engineers, operators, and new team members. This document explains what the system is, why it exists, and how every part fits together.

Table of Contents

  1. Project Purpose
  2. Two-Leg TLS Architecture
  3. Physical / Logical Topology
  4. browser engine Agents
  5. synthetic-load engine Agents
  6. Synthetic Personas (20 webservers)
  7. Cloned Personas (10 slots)
  8. The Cloner
  9. Certificate Orchestration (PKI)
  10. Dashboard (Orchestration)
  11. Observability
  12. Performance Tuning
  13. Operational Subsystems (DUT API, Test Plans, Reports, Pre-flight, Time-sync, Syslog)
  14. Onboarding & IP Protection
  15. Production Deployment Steps
  16. Key File Reference Table

1. Project Purpose

TLSStress.Art is a lab test-bed designed to measure the performance of a Next-Generation Firewall (NGFW, the Device Under Test — DUT) when performing TLS decryption and inspection at scale.

  • Primary protocol under test: HTTP/3 over QUIC/UDP 443.
  • Secondary protocol: HTTP/2 over TCP 443.
  • Workload mix: real browser traffic (browser engine + headless Chromium) plus synthetic high-volume HTTP load (synthetic-load engine).
  • Scope: lab only. No production data, no public Internet exposure for the agent fleet, no security requirements beyond what is needed for the test to be representative.

The system intentionally exercises the most CPU-expensive code paths inside the NGFW — full TLS handshake on every connection, no session resumption, modern AEAD ciphers — so that performance measurements reflect worst-case inspection cost.


2. Two-Leg TLS Architecture

Every request is encrypted twice on its way through the NGFW. The NGFW terminates the inbound TLS session, inspects the cleartext, and re-originates a new TLS session toward the webserver. This is the workload that consumes NGFW CPU and is what we want to measure.

+------------------+   TLS Leg 1    +-----------+   TLS Leg 2    +-------------------+
|                  |  agent-side    |           |  server-side   |                   |
|  Agent           | ------------>  |   NGFW    | ------------>  |  Persona Caddy    |
|  (browser-engine/synthetic-load) |  decrypt       |  (DUT)    |  re-encrypt    |  (webserver)      |
|                  |  inspect       |           |                |                   |
|  trusts: ngfw-ca |                |           |                | cert: persona-ca  |
+------------------+                +-----------+                +-------------------+
                                       trusts:
                                       persona-ca
Leg Direction Server cert presented by Client trusts
Leg 1 Agent → NGFW ngfw-ca-issued ngfw-ca ConfigMap (mounted in agent)
Leg 2 NGFW → Webserver persona-ca-issuer persona-ca (imported into the NGFW)

Why two legs: the NGFW must decrypt Leg 1, parse and inspect the cleartext payload, then re-encrypt onto Leg 2. The cost of this decrypt-inspect-encrypt cycle, multiplied by the request rate, is what the test measures.


3. Physical / Logical Topology

3.1 VLAN Map

VLAN Purpose Subnet Gateway
20 browser-engine agents 172.16.0.0/16 172.16.0.1 (NGFW)
30 synthetic-load agents 172.17.0.0/16 172.17.0.1 (NGFW)
40 Cloner ISP egress DHCP from upstream upstream router
99 SNMP management 192.168.90.0/24 management gateway
101–120 Synthetic Personas (20) 10.1.{1..20}.0/27 NGFW (10.1.x.1)
200–209 Cloned Personas (10 slots) 10.2.{1..10}.0/27 NGFW (10.2.x.1)

3.2 Agent Routing

Defined in k8s/dut/20-network-attachments.yaml:

  • eth0 — Kubernetes OOBI (Out-Of-Band Infrastructure). Default route. Used to reach Dashboard, Prometheus, CoreDNS, Postgres. Never routed through the NGFW.
  • net1 — macvlan attachment on the Persona VLANs. Carries traffic to:
  • 10.1.0.0/16 via the NGFW (Synthetic Personas)
  • 10.2.0.0/16 via the NGFW (Cloned Personas)
                     +--------------------+
                     |   K8s OOBI fabric  |
                     | Dashboard, Promu,  |
                     | CoreDNS, Postgres  |
                     +---------+----------+
                               | eth0 (default)
                               |
                +--------------+---------------+
                |          Agent pod           |
                |    (browser engine or synthetic-load engine)        |
                +--------------+---------------+
                               | net1 (macvlan)
                               |
                          VLAN 20 / 30
                               |
                       +-------+-------+
                       |     NGFW      |
                       |     (DUT)     |
                       +-------+-------+
                               |
            +------------------+------------------+
            |                                     |
       VLANs 101-120                         VLANs 200-209
       Synthetic Personas                    Cloned Personas

3.3 Deployment Topologies

Four Linux+Kubernetes layouts are supported. All four are wired by scripts/k8s-install.sh (see the dedicated quickstart guides for the full procedure per language).

Mode UCS count OOBI on every UCS browser engine synthetic-load engine Personas + services + observability Overlay Guide
Single-node 1 ✓ (eth0) same UCS same UCS same UCS none — base k8s/ + k8s/dut/ UBUNTU_K3S_SINGLENODE_QUICKSTART_DEPLOY.{en,es,pt-BR}.md
Dual-node 2 ✓ on UCS-1 + UCS-2 (eth0) UCS-1 (role=agents) UCS-1 (role=agents) UCS-2 (role=ngfw-dut) overlays/dual-node/ UBUNTU_K3S_DUALNODE_QUICKSTART_DEPLOY.{en,es,pt-BR}.md
Tri-node 3 ✓ on UCS-1..3 (eth0) UCS-1 (role=playwright) UCS-2 (role=k6) UCS-3 (role=ngfw-dut) overlays/tri-node/ UBUNTU_K3S_TRINODE_QUICKSTART_DEPLOY.{en,es,pt-BR}.md
Multi-node 4 ✓ on UCS-1..4 (eth0) UCS-2 (role=playwright) UCS-3 (role=k6) UCS-1 personas (role=ngfw-dut) + UCS-4 services (role=infra) overlays/multi-node/ UBUNTU_K3S_MULTINODE_QUICKSTART_DEPLOY.{en,es,pt-BR}.md

Key invariants across modes:

  • OOBI (eth0) is mandatory on every UCS. It carries the k3s API, kubelet, flannel CNI, and Prometheus scrape. Without it the node cannot join the cluster.
  • The node_exporter DaemonSet has no nodeSelector and runs on every node — host metrics coverage is identical in all four modes.
  • The node-tuning DaemonSet is patched to dut-data-plane=true after apply — every UCS that owns a data-plane VLAN receives sysctls + BBR + CPU governor + THP tuning.
  • Personas, cloned-personas, SNMP exporter all hardcode nodeSelector: role=ngfw-dut. In single-node, dual-node and tri-node the corresponding UCS carries that label; in multi-node it is UCS-1.

4. browser engine Agents

Real-browser traffic generators based on browser engine + headless Chromium.

  • Manifest: k8s/20-agent-deployment.yaml
  • Base replicas: 10
  • HPA range: 1 → 300 replicas
  • Controller URL: https://dashboard.web-agents.svc.cluster.local

Lifecycle

register -> poll for URL -> navigate (Chromium) -> collect timing -> report -> wait -> repeat

Timing metrics include DNS, TCP/QUIC connect, TLS handshake, time to first byte, and full page load. The agent reports back to the Dashboard after each cycle.

DUT-mode patch

k8s/dut/40-playwright-patch.yaml overlays:

  • Adds a net1 macvlan attachment on VLAN 20.
  • Mounts the ngfw-ca ConfigMap as NODE_EXTRA_CA_CERTS.
  • Sets REJECT_INVALID_CERTS=true and REQUIRE_TLS13=true to fail loudly on misconfigured trust.
Variable Base mode (no NGFW) DUT mode
NODE_EXTRA_CA_CERTS persona-ca-bundle ngfw-ca ConfigMap
REJECT_INVALID_CERTS (default) true
REQUIRE_TLS13 (default) true

5. synthetic-load engine Agents

High-volume HTTP load generators, complementary to browser engine. synthetic-load engine produces traffic without the overhead of a full browser engine, allowing higher request rates per pod.

DUT-mode patch

k8s/dut/50-k6-patch.yaml:

  • Adds net1 macvlan attachment on VLAN 30.
  • Sets both SSL_CERT_FILE and NODE_EXTRA_CA_CERTS to the ngfw-ca ConfigMap.

synthetic-load engine and browser engine share the same Dashboard control plane and the same Persona endpoints. The two fleets together produce the request mix used to stress the NGFW.


6. Synthetic Personas (20 webservers)

A fleet of 20 distinct webservers that mimic the kinds of sites a real user fleet would hit during a typical browsing day. Each persona has its own namespace, its own VLAN, its own /27 subnet, and its own gateway on the NGFW.

  • Source of truth: personas.yaml.
  • Generated: personas/_generated/ (do not hand-edit).
  • Per-persona namespace: persona-{name}.
  • Webserver: Caddy (HTTP/1.1 + HTTP/2 + HTTP/3), with optional sidecar backend.
  • Certificate: issued by persona-ca-issuer (cluster-wide).

Archetypes

Archetype Letter Backend Personas
skin C Caddy file_server over content seeded by initContainer blog, docs, gallery, stream, download, edu, gov, cdn
real-app A / B Caddy reverse_proxy to a real-app sidecar shop (Saleor + Redis + Postgres), news (Ghost)
mock D Caddy reverse_proxy to a Go mock-engine sidecar api-rest, api-graphql, chat, webhook, telemetry, ads
har-replay F Caddy reverse_proxy to a Go HAR-replay engine sidecar har-saas, har-social, har-webmail, har-media

Full VLAN map

Persona VLAN Subnet Gateway Archetype
shop 101 10.1.1.0/27 10.1.1.1 real-app A
news 102 10.1.2.0/27 10.1.2.1 real-app B
blog 103 10.1.3.0/27 10.1.3.1 skin
docs 104 10.1.4.0/27 10.1.4.1 skin
gallery 105 10.1.5.0/27 10.1.5.1 skin
stream 106 10.1.6.0/27 10.1.6.1 skin
download 107 10.1.7.0/27 10.1.7.1 skin
edu 108 10.1.8.0/27 10.1.8.1 skin
gov 109 10.1.9.0/27 10.1.9.1 skin
cdn 110 10.1.10.0/27 10.1.10.1 skin
api-rest 111 10.1.11.0/27 10.1.11.1 mock
api-graphql 112 10.1.12.0/27 10.1.12.1 mock
chat 113 10.1.13.0/27 10.1.13.1 mock
webhook 114 10.1.14.0/27 10.1.14.1 mock
telemetry 115 10.1.15.0/27 10.1.15.1 mock
ads 116 10.1.16.0/27 10.1.16.1 mock
har-saas 117 10.1.17.0/27 10.1.17.1 har-replay
har-social 118 10.1.18.0/27 10.1.18.1 har-replay
har-webmail 119 10.1.19.0/27 10.1.19.1 har-replay
har-media 120 10.1.20.0/27 10.1.20.1 har-replay

7. Cloned Personas (10 slots)

Pre-provisioned slots for serving content cloned from real public sites, used to extend the fleet beyond the 20 hand-curated Synthetic Personas.

  • Manifests: k8s/clone-personas/.
  • Slot count: 10, each pre-provisioned at 0 replicas.
  • Per-slot namespace: clone-persona-N (1 ≤ N ≤ 10).
  • Webserver: Caddy file_server serving /mnt/cloned/{env.SITE_NAME} from a read-only PVC. No reverse proxy — content is static, exactly as captured by the Cloner.
  • Orchestration: the Dashboard exposes PATCH /api/clone/persona-slots/{n} which patches the slot's ConfigMap (changing SITE_NAME) and scales the Deployment.
  • Restart trigger: Stakater Reloader restarts the pod whenever SITE_NAME changes or the cert Secret is renewed by cert-manager.

Slot VLAN map

Slot VLAN Subnet Gateway
1 200 10.2.1.0/27 10.2.1.1
2 201 10.2.2.0/27 10.2.2.1
3 202 10.2.3.0/27 10.2.3.1
4 203 10.2.4.0/27 10.2.4.1
5 204 10.2.5.0/27 10.2.5.1
6 205 10.2.6.0/27 10.2.6.1
7 206 10.2.7.0/27 10.2.7.1
8 207 10.2.8.0/27 10.2.8.1
9 208 10.2.9.0/27 10.2.9.1
10 209 10.2.10.0/27 10.2.10.1

8. The Cloner

The Cloner is the only component allowed to reach the public Internet. It downloads complete sites and stores them on a shared PVC for the Cloned Persona slots to serve.

Network interfaces

Interface Attachment Purpose
eth0 OOBI Talk to Dashboard, CoreDNS, Postgres
net1 VLAN 40 macvlan (DHCP) Public Internet via the upstream ISP gateway

DNS

DNS is forced (dnsPolicy: None) to a fixed list:

  • 8.8.8.8 (Google)
  • 208.67.222.222 (OpenDNS)
  • 10.96.0.10 (CoreDNS, in-cluster)

Policy routing

iptables marks any packet with a non-RFC1918 destination using fwmark. A separate routing table (table 100) routes marked packets out via the ISP gateway on net1. RFC1918 traffic continues to use the OOBI default route on eth0.

Lifecycle

poll Dashboard for pending clone jobs
    -> download site via browser engine-extra (stealth mode) on net1
    -> store under /mnt/cloned/{site}/ on PVC `cloned-sites`
    -> report job completion to Dashboard

Health monitor

Exposes Prometheus metrics on :8081/metrics:

Metric Meaning
cloner_internet_up Gauge — Internet reachability (0/1)
cloner_ping_rtt_ms Gauge — round-trip time to Internet probes (ms)
cloner_gateway_up Gauge — local ISP gateway reachability (0/1)

For deep operational details, see docs/CLONER.md and docs/CLONER_OPERATIONS.md.


9. Certificate Orchestration (PKI)

A single internal CA covers every persona — Synthetic and Cloned — so the NGFW operator only ever has to import one CA certificate.

Trust chain

persona-selfsigned     (ClusterIssuer)
        |
        v
persona-root-ca        (Certificate, 10-year, in cert-manager namespace)
        |
        v
persona-ca-issuer      (ClusterIssuer)
        |
        +-> persona-shop      (Certificate, 1 yr)
        +-> persona-news      (Certificate, 1 yr)
        +-> ...                (20 Synthetic Persona certs)
        +-> clone-persona-1   (Certificate, 1 yr)
        +-> ...                (10 Cloned Persona slot certs)
  • Total persona certs: 30 (20 Synthetic + 10 Cloned), all issued by the same persona-ca-issuer.
  • Single import: export persona-ca-bundle ca.crt once and import it into the NGFW as a trusted server CA.

Agent trust (base mode)

A persona-ca-bundle Secret is published in the web-agents namespace; agents mount ca.crt and pass it via NODE_EXTRA_CA_CERTS.

Auto-restart on rotation

k8s/87-stakater-reloader.yaml deploys Stakater Reloader with watches across persona-* and clone-persona-* namespaces. When cert-manager rotates a cert Secret, the corresponding Deployment is rolling-restarted so the new key/cert is picked up without operator action.

TLS posture (every Persona Caddy)

Setting Value
Protocols h1 h2 h3
Minimum TLS version TLS 1.2 (TLS 1.3 implied for QUIC/h3)
Cipher suites ECDHE+AEAD only — 6 suites, ECDSA listed first
Session tickets disabled

Disabling session tickets is intentional: every connection performs a full TLS handshake. This maximizes the CPU load on the NGFW per request, which is exactly what the test measures.


10. Dashboard (Orchestration)

The control plane for the entire fleet — agents register here, personas are scaled here, clone jobs are submitted here.

Key APIs

Method Path Purpose
POST /api/agents/register Agent registration on startup
GET /api/agents/poll Agent fetches the next URL to hit
POST /api/agents/result Agent reports timing + outcome of a cycle
PATCH /api/personas/{name} Start/stop/scale a Synthetic Persona
POST /api/clone/jobs Create a clone job for the Cloner to consume
GET /api/clone/persona-slots List Cloned Persona slot status
GET /api/clone/persona-slots/{n} Slot detail
PATCH /api/clone/persona-slots/{n} Bind a slot to a cloned site name + scale replicas
GET / POST /api/admin/dut-api/devices DUT API device registry (CRUD, encrypted credentials)
POST /api/admin/dut-api/devices/{id}/test Live connectivity + auth test against a registered DUT
POST /api/admin/dut-api/devices/{id}/snapshot On-demand sanitized config snapshot
GET /api/admin/time-sync/status NTP relay reachability + per-host clock skew
POST /api/admin/time-sync/verify Force a fresh time-sync verification cycle
GET /api/admin/audit?actor=&from=&to= Audit log query (admin mutations, login attempts)
GET /api/test-plans List the 15-plan catalog (capacity, soak, decrypt …)
POST /api/test-runs/start Start a run from a plan (snapshots plan + DUT inventory)
POST /api/test-runs/{id}/preflight Run the 5-check preflight gate
GET /api/test-runs/{id}/report.json Canonical ReportData (Phase 1)
GET /runs/{id}/report Print-styled HTML report (page-level Phase 1 surface)

RBAC

The dashboard ServiceAccount holds a ClusterRole granting get/list/watch/patch on Deployments and ConfigMaps, scoped across all persona-* and clone-persona-* namespaces. This is what lets the Dashboard scale personas and rebind clone slots at runtime.


11. Observability

Prometheus ServiceMonitors (Kubernetes mode, Prometheus Operator)

ServiceMonitor Namespace Notes
dashboard web-agents
pgbouncer web-agents
cloner web-agents
persona-{name} × 20 each persona-* Relabel: persona, persona_archetype, persona_vlan
clone-persona-{n} × 10 each clone-persona-* Relabel: cloned_persona, cloned_persona_vlan

DUT SNMP scraping

Defined in observability/prometheus/prometheus.dut.yml:

Job Source
snmp_nexus9000 Cisco Nexus 9000 switch metrics
snmp_ngfw_dut NGFW physical metrics — module configurable: cisco_ftd, fortinet_fortigate, palo_alto, checkpoint, etc.
node_exporter Ubuntu Linux host metrics, file-based service discovery

Grafana

The default deployment ships seven dashboards:

Dashboard UID What it covers
Agent fleet web-agents Agents alive, runs/min, cycle duration, throughput, error rate, webservers active, time series + logs panel
Cloner health cloner ISP up, ping 8.8.8.8 / 1.1.1.1, gateway ping, RTT timeseries
Cloned Personas cloned-personas Active slots, requests/s, p99 latency, bytes/s
Nexus 9000 dut-nexus9000 Switch CPU/mem/temp/fans/PSU, CRC/FCS, queue drops, optical sensors
NGFW DUT dut-ngfw Throughput per VLAN, sessions, CPS, HW crypto engine, CPU/mem
Ubuntu Hosts (UCS) dut-ubuntu-hosts OS-level: CPU, load, memory, disk I/O, filesystem, network, processes
TLS Decrypt Mode decrypt-mode Issuer-cert ground-truth probe (ACTIVE/BYPASS), TLSDecryptModeChanged alert state

Two more dashboards are bundled with the UCS CIMC adapter (Cisco UCS hardware monitoring, see §13 — Operational Subsystems).

Syslog correlation (OOBI-only policy)

Promtail listens on UDP/TCP :514 bound exclusively to the OOBI subnet (192.168.90.0/24). NGFW + Nexus syslog enters Loki via this OOBI listener; data-plane VLANs (browser engine, synthetic-load engine, Personas, Cloner) are deliberately not allowed to ingest log entries — preventing log injection from the test-traffic path. Test-run timelines join Loki entries to runs by timestamp + DUT identity (DUT API snapshot), giving operators a single Grafana panel per run with NGFW errors, drops, and policy hits inline. See docs/SYSLOG_CORRELATION.md and docs/SYSLOG_OPERATIONS.md.


12. Performance Tuning

Node tuning DaemonSet

k8s/dut/85-node-tuning.yaml applies the following on every node:

Setting Value Reason
net.core.rmem_max / wmem_max 67108864 (64 MB) Required headroom for QUIC/UDP buffers
net.ipv4.tcp_congestion_control bbr Better throughput on the lossy DUT path
net.core.default_qdisc fq Required by BBR
CPU governor performance Avoid frequency-scaling jitter during measurement
Transparent Huge Pages defer+madvise Reduce stalls without giving up THP entirely

Caddy pods (Go runtime)

Variable Value
GOMAXPROCS from resourceFieldRef (limits.cpu)
GOMEMLIMIT 90% of the pod memory limit
GOGC 200

Nexus 9000 switch

Setting Value
EEE off
Flow control off
MTU 9216 (jumbo)
QoS DSCP AF41
ECMP hash includes UDP source port (so QUIC flows distribute evenly)
ARP timeout 300 s

13. Operational Subsystems

The following subsystems shipped in v3.7+/v4.0.0 and form the operator-facing day-2 surface. Each is a thin, well-bounded module — not a heavyweight framework — so the engineering surface stays small.

13.1 DUT API Integration

Vendor adapters in dashboard/src/lib/dut-api/ capture sanitized config + identity from the Device Under Test on every test-run start (or on demand from /admin/dut-api). Four adapters ship today:

Adapter Transport What it captures
cisco-ftd.ts REST (cdFMC / FMC) Model, serial, version, decrypt-policy state, sanitized running-config
cisco-nexus.ts NX-API (HTTPS) Model, serial, version, interface counters, port-channel state
cisco-ucs-cimc.ts Redfish (HTTPS) Chassis/blade hardware inventory, PSU, fans, temp, BIOS
fortinet-fortigate.ts REST (FortiOS) Model, serial, version, sanitized config, session table summary

Credentials live encrypted-at-rest (encryption.ts); snapshots are hashed (SHA-256) and embedded in the Test Run Report Annex B/C/D. A polling worker (poller.ts) refreshes liveness every 60 s; failures show on the /admin/dut-api UI. Full reference: docs/DUT_API_INTEGRATION.md · docs/DUT_API_OPERATIONS.md.

13.2 Test Plans

platform/test-plans/catalog.yaml ships 15 vendored, git-versioned plans (capacity-find-knee, soak-30m, decrypt-on-vs-off, vendor-compare, …). Each plan declares phases[] with VU/duration/target-mix and requirements{} (ngfw_state_required, personas_min, decrypt_mode_required). When a run starts, the plan is frozenplanSnapshotSha256 is recorded so post-run review can prove the parameters were not edited. Full reference: docs/TEST_PLANS.md.

13.3 Test Run Reports

Phase 1 ships a print-styled HTML report at /runs/{executionId}/report with the canonical ReportData JSON at /api/test-runs/{executionId}/report.json. The cover prints reportSha256 + planSnapshotSha256 + license badge + SLO pass/fail + budget burn. Phases 2–5 add server-rendered PDF (Puppeteer), DUT inventory annexes (Nexus / NGFW / UCS), Cosign signature + Rekor entry, and N-run comparison. Full reference: docs/REPORTS.md.

13.4 Pre-flight Check Engine

dashboard/src/lib/preflight/ runs a 5-check catalog before every test-run start; failures block. Catalog: subnet conflict (OOBI ↔ ISP ↔ persona VLANs), NGFW reachability + decrypt policy state matches plan, persona PKI freshness, NTP relay clock skew within tolerance, DUT API auth + snapshot succeeds. Operators bypass in non-strict mode but the bypass is logged in audit_log and printed on the report cover. Full reference: docs/PREFLIGHT_CHECKS.md.

13.5 Time-sync Layer

The OOBI subnet runs an in-cluster chronyd relay. Every UCS host syncs to the relay; the relay holds drift to the upstream stratum-1 source (cluster operator chooses public NTP, GPS, or PTP grandmaster). On every test-run start, each agent + DUT records its current epoch + computed skew vs. the relay; reports flag any window that drifted beyond tolerance. For air-gapped labs, a browser-clock fallback button on /admin/time-sync lets the operator pin reference time from the operator's own laptop. Surfaces at /admin/time-sync + alerts via the OOBI Prometheus.

13.6 Syslog Correlation

Already documented in §11 (Observability). Summary: Promtail bound to OOBI-only on UDP/TCP :514; NGFW + Nexus emit syslog → Loki; reports embed correlated entries by timestamp + DUT identity.


14. Onboarding & IP Protection

14.1 Onboarding chain

The path from "can I see this project?" to "I have a running install" is a 3-stage gated flow, with each stage having its own doc in 3 languages (EN / PT-BR / ES):

Step Doc Audience What happens
1 ACCESS_REQUEST.md External pre/post-sales engineer Submit Cisco Access Broker request; broker auto-recognizes Cisco/partner domains and routes to the project owner
2 CLONE_FOR_INSTALL.md Approved engineer Clone the (private) repo against your Cisco-issued GitHub identity; license acceptance modal records the session
3 RUNBOOK_FIRST_INSTALL.md Newly-onboarded engineer Walk through first install (single/dual/tri/multi-node); bottom-of-doc breadcrumb back to BRAND, DUT_TESTBED, REPORTS

Disconnected environments use AIRGAP_INSTALL.md as a step-3 substitute. Cross-references between the four docs form a chain so an engineer landing on any one of them can navigate forward and backward without leaving the doc set.

14.2 IP protection

All forensic identifiers — cert fingerprints, deployment hashes, asset signatures, TLS Decrypt-Mode snapshots — live in a separate private/forensic repo with the project owner as sole collaborator. The public repo never contains identity-binding data. This separation is policy-enforced because GitHub does not support per-branch ACLs: any collaborator with pull on the public repo would otherwise see every branch including a hypothetical in-tree forensic branch. Full policy: docs/IP_PROTECTION.md. Setup procedure for the private repo: docs/PRIVATE_REPO_SETUP.md.


15. Production Deployment Steps

The canonical bring-up is scripts/k8s-install.sh, which handles k3s, VLAN setup, cert-manager, Multus CNI, manifest application, and overlays in one step. It supports all four topologies via --mode={single,dual,tri,multi}:

# Single-node (one UCS, all workloads):
sudo ./scripts/k8s-install.sh --mode=single --data-iface=eth1

# Multi-node (UCS-1 personas, UCS-2 browser engine, UCS-3 synthetic-load engine, UCS-4 services):
sudo ./scripts/k8s-install.sh --mode=multi-server --role=ngfw-dut --data-iface=eth1
# … then on each agent UCS:
sudo ./scripts/k8s-install.sh --mode=multi-agent  --role=playwright --data-iface=eth1
sudo ./scripts/k8s-install.sh --mode=multi-agent  --role=k6         --data-iface=eth1
# … and finally on the services UCS:
sudo ./scripts/k8s-install.sh --mode=multi-apply

For operators who prefer to apply manifests by hand, the equivalent ordered procedure is:

# 1. Base stack: namespaces, RBAC, Dashboard, agents, OOBI infra
kubectl apply -f k8s/

# 2. Platform services: PKI, DNS, observability, test-plans
kubectl apply -k platform/

# 3. Synthetic Personas (20 webservers, generated from personas.yaml)
kubectl apply -k personas/

# 4. Cloned Personas (10 pre-provisioned slots, initially scaled to 0)
kubectl apply -k k8s/clone-personas/

# 5. DUT overlay: macvlan attachments, node tuning, NGFW trust
kubectl apply -k k8s/dut/

# 6. Patch agents into DUT mode (adds net1, ngfw-ca trust, reject-on-bad-cert)
kubectl patch deployment playwright-agent -n web-agents --patch-file k8s/dut/40-playwright-patch.yaml
kubectl patch deployment k6-agent -n web-agents --patch-file k8s/dut/50-k6-patch.yaml

# 7. Export the persona CA and import into the NGFW as trusted server CA
kubectl get secret persona-ca-bundle -n web-agents -o jsonpath='{.data.ca\.crt}' | base64 -d > persona-ca.crt
# Then import persona-ca.crt into the NGFW's trusted CA store —
# or use scripts/inject-ngfw-ca.sh for a Cisco FTD / Nexus-vendor automation.

After step 7 the NGFW will trust every persona server cert and the Two-Leg TLS architecture is fully wired.


16. Key File Reference Table

Path Purpose
personas.yaml Source of truth for all 20 Synthetic Personas
personas/_generated/ Generated per-persona manifests — do not hand-edit
k8s/20-agent-deployment.yaml browser-engine agents (10 base, HPA 1–300)
k8s/21-k6-agent-deployment.yaml synthetic-load agents (1 base, HPA 0–1000)
k8s/50-dashboard.yaml Dashboard (Next.js 15, 2 replicas)
k8s/81-cloner-deployment.yaml Cloner pod (single replica, dual NIC)
k8s/87-stakater-reloader.yaml Reloader for cert rotation + slot rebinding
k8s/dut/20-network-attachments.yaml macvlan NetworkAttachmentDefinitions for the agent fleet
k8s/dut/40-playwright-patch.yaml Patches browser-engine agents into DUT mode (VLAN 20, ngfw-ca)
k8s/dut/50-k6-patch.yaml Patches synthetic-load agents into DUT mode (VLAN 30, ngfw-ca)
k8s/dut/85-node-tuning.yaml Sysctls + CPU governor + THP DaemonSet
k8s/clone-personas/ 10 pre-provisioned Cloned Persona slot manifests
platform/pki/ persona-selfsigned, persona-root-ca, persona-ca-issuer
platform/dns/ CoreDNS configuration
platform/observability/ Prometheus Operator + Grafana
observability/prometheus/prometheus.yml Base Prometheus configuration
observability/prometheus/prometheus.dut.yml DUT-mode Prometheus: SNMP + node_exporter scrape jobs
dashboard/src/lib/dut-api/ 4 vendor adapters (Cisco FTD, Nexus, UCS CIMC Redfish, FortiGate)
dashboard/src/lib/preflight/ Pre-flight check engine (5-check catalog, blocking gate)
dashboard/src/app/admin/dut-api/ DUT API admin UI: register, test, snapshot, preflight
dashboard/src/app/admin/time-sync/ Time-sync admin UI: NTP relay status, browser-clock fallback
dashboard/src/app/admin/audit/ Audit log viewer (admin mutations, login attempts)
dashboard/src/app/api/test-runs/ Test-run lifecycle (start, preflight, report.json)
platform/test-plans/catalog.yaml The 15-plan catalog (capacity, soak, decrypt, vendor-compare, …)
scripts/k8s-install.sh One-shot k3s + manifests installer (single/dual/tri/multi-node)
scripts/host-tuning.sh Sysctls + CPU governor + THP + (optional) cpuManagerPolicy=static
scripts/inject-ngfw-ca.sh Push NGFW CA bundle into agent ConfigMap
scripts/secrets-init.sh Bootstrap k8s Secrets (controller token, postgres, dashboard auth)
docs/CLONER.md Cloner: architecture, behavior, troubleshooting
docs/CLONER_OPERATIONS.md Cloner: day-2 operations and runbooks
docs/DUT_API_INTEGRATION.md DUT API: vendor adapters, snapshot, polling
docs/DUT_API_OPERATIONS.md DUT API ops: registration, snapshot, troubleshooting
docs/PREFLIGHT_CHECKS.md 5-check catalog (subnet, reach, PKI, NTP, DUT API)
docs/SYSLOG_CORRELATION.md Syslog OOBI-only correlation policy
docs/SYSLOG_OPERATIONS.md Syslog ops: Promtail :514, Loki, Grafana correlation
docs/TEST_PLANS.md The 15 catalog plans + plan-snapshot semantics
docs/REPORTS.md Test Run Reports (Phase 1 shipped, Phases 2–5 roadmap)
docs/MONITORING_TEST_VALIDITY.md Alerts that prove the test-bed itself was healthy during a run
docs/TLS_DECRYPT_MODE_VERIFICATION.en.md Independent issuer-cert probe (decrypt ACTIVE/BYPASS, alert)
docs/ACCESS_REQUEST.md Onboarding step 1: lab access via Cisco Access Broker
docs/CLONE_FOR_INSTALL.md Onboarding step 2: clone repo for first install
docs/RUNBOOK_FIRST_INSTALL.md Onboarding step 3: first-install runbook
docs/AIRGAP_INSTALL.md Onboarding addendum: air-gapped install procedure
docs/IP_PROTECTION.md IP protection policy (private/forensic separation)
docs/PRIVATE_REPO_SETUP.md Setup procedure for the private companion repo
docker-compose.cloner.yml Cloner stack for Docker-based dev (split topology)

End of System Overview. For component deep-dives, see the per-topic documents in docs/.