ADR 0006 — K6 load-test fleet as a second agent tier¶

Status: Accepted
Date: 2026-05-03
Deciders: André Luiz Gallon

Context¶

The Playwright fleet (ADR 0003) proves that a real Chromium browser can reach a target site and load the full page — including third-party JS, fonts, and TLS negotiation. However, it answers the question "does the site work correctly for a single user?", not "how does the site behave under load?".

A separate load-test signal was needed:

Scale: Playwright agents are bounded by Chromium RAM (~300–500 MB each); the practical ceiling on a 16 GB laptop is ~10–15 agents. A dedicated HTTP-only tool can reach 1,000 agents on the same host.
Percentile latency: Playwright captures per-resource timing but does not aggregate p95/p99 across many users. k6 natively computes histograms over N virtual users.
Error rate: The Playwright fleet is designed to abort on TLS or navigation errors; k6 is designed to count and tolerate them as a percentage of total requests — the right model for load tests.
Protocol coverage: k6 supports HTTP/1.1, HTTP/2 and HTTP/3 (QUIC) via its own Go HTTP stack, independently of the browser.

Decision¶

Deploy Grafana k6 as a second, complementary agent type alongside the existing Playwright fleet. The K6 fleet:

Runs as a separate Compose project (ai_forse_k6) and Kubernetes Deployment, using the same dual-network topology (OOBI control + PROD data plane) as the Playwright fleet (ADR 0004 pattern).
Each agent is a TypeScript + Node.js process that wraps the official grafana/k6 binary (v0.56.0). The binary is copied from the grafana/k6 Docker image in a multi-stage build — no Alpine package dependency, version pinned reproducibly.
The k6 binary writes a per-run JSON summary (--summary-export) to /tmp; the Node.js wrapper parses it and POSTs structured metrics to the dashboard. This approach avoids streaming k6 output and keeps the metrics pipeline consistent with the Playwright schema.
The dashboard exposes a dedicated API surface (/api/k6/*) and a three-tab UI (/agents/k6: Visão geral, Sites, Execuções) with a slider that scales the fleet from 0 to 1,000 agents.
K8s manifests use readOnlyRootFilesystem: true with emptyDir: medium: Memory volumes for /tmp and /home/k6agent, and the Downward API for agent identity (AGENT_NAME).

Consequences¶

✅ Scales to 1,000 concurrent agents (~128 MB RAM each vs. 300–500 MB for Playwright).
✅ Native p50/p95/p99 latency, error rate, RPS and data-received metrics — first-class load-test signals.
✅ HPA can scale to zero (with KEDA or Kubernetes ≥ 1.32) — the K6 fleet does not consume resources when idle.
✅ Both fleets share the same dashboard, PostgreSQL backend, and observability stack — one operational surface.
⚠️ K6 does not execute JavaScript or render CSS — it measures HTTP transport, not user experience. Playwright remains the source of truth for real-browser behaviour.
⚠️ The default k6 script is a simple GET with configurable VUs and duration. Custom JavaScript test scripts require operator injection via a ConfigMap or init-container.

Alternatives considered¶

Add VUs to the existing Playwright agent — Playwright can run parallel contexts (CYCLE_CONCURRENCY) but each context carries a full browser overhead. Impractical at 1,000 VUs.
Artillery / Locust / JMeter — capable tools, but k6 has a smaller runtime (~50 MB binary), native Go concurrency, first-class --summary-export JSON output, and the official grafana/k6 image is well-maintained and multi-arch.
k6 Operator for Kubernetes — provides TestRun CRDs and distributed execution, but introduces a CRD dependency and a separate control plane. The agent-poll model we already use for Playwright is simpler and sufficient for the lab scope.
Expose k6 metrics directly to Prometheus — k6 has a built-in OTLP exporter, but routing metrics through the dashboard keeps a single source of truth and reuses the existing idempotency + audit-log infrastructure.