ADR 0006 — K6 load-test fleet as a second agent tier¶
- Status: Accepted
- Date: 2026-05-03
- Deciders: André Luiz Gallon
Context¶
The Playwright fleet (ADR 0003) proves that a real Chromium browser can reach a target site and load the full page — including third-party JS, fonts, and TLS negotiation. However, it answers the question "does the site work correctly for a single user?", not "how does the site behave under load?".
A separate load-test signal was needed:
- Scale: Playwright agents are bounded by Chromium RAM (~300–500 MB each); the practical ceiling on a 16 GB laptop is ~10–15 agents. A dedicated HTTP-only tool can reach 1,000 agents on the same host.
- Percentile latency: Playwright captures per-resource timing but does not aggregate p95/p99 across many users. k6 natively computes histograms over N virtual users.
- Error rate: The Playwright fleet is designed to abort on TLS or navigation errors; k6 is designed to count and tolerate them as a percentage of total requests — the right model for load tests.
- Protocol coverage: k6 supports HTTP/1.1, HTTP/2 and HTTP/3 (QUIC) via its own Go HTTP stack, independently of the browser.
Decision¶
Deploy Grafana k6 as a second, complementary agent type alongside the existing Playwright fleet. The K6 fleet:
- Runs as a separate Compose project (
ai_forse_k6) and Kubernetes Deployment, using the same dual-network topology (OOBI control + PROD data plane) as the Playwright fleet (ADR 0004 pattern). - Each agent is a TypeScript + Node.js process that wraps the official
grafana/k6binary (v0.56.0). The binary is copied from thegrafana/k6Docker image in a multi-stage build — no Alpine package dependency, version pinned reproducibly. - The k6 binary writes a per-run JSON summary (
--summary-export) to/tmp; the Node.js wrapper parses it and POSTs structured metrics to the dashboard. This approach avoids streaming k6 output and keeps the metrics pipeline consistent with the Playwright schema. - The dashboard exposes a dedicated API surface (
/api/k6/*) and a three-tab UI (/agents/k6: Visão geral, Sites, Execuções) with a slider that scales the fleet from 0 to 1,000 agents. - K8s manifests use
readOnlyRootFilesystem: truewithemptyDir: medium: Memoryvolumes for/tmpand/home/k6agent, and the Downward API for agent identity (AGENT_NAME).
Consequences¶
- ✅ Scales to 1,000 concurrent agents (~128 MB RAM each vs. 300–500 MB for Playwright).
- ✅ Native p50/p95/p99 latency, error rate, RPS and data-received metrics — first-class load-test signals.
- ✅ HPA can scale to zero (with KEDA or Kubernetes ≥ 1.32) — the K6 fleet does not consume resources when idle.
- ✅ Both fleets share the same dashboard, PostgreSQL backend, and observability stack — one operational surface.
- ⚠️ K6 does not execute JavaScript or render CSS — it measures HTTP transport, not user experience. Playwright remains the source of truth for real-browser behaviour.
- ⚠️ The default k6 script is a simple GET with configurable VUs and duration. Custom JavaScript test scripts require operator injection via a ConfigMap or init-container.
Alternatives considered¶
- Add VUs to the existing Playwright agent — Playwright can run
parallel contexts (
CYCLE_CONCURRENCY) but each context carries a full browser overhead. Impractical at 1,000 VUs. - Artillery / Locust / JMeter — capable tools, but k6 has a smaller
runtime (~50 MB binary), native Go concurrency, first-class
--summary-exportJSON output, and the officialgrafana/k6image is well-maintained and multi-arch. - k6 Operator for Kubernetes — provides
TestRunCRDs and distributed execution, but introduces a CRD dependency and a separate control plane. The agent-poll model we already use for Playwright is simpler and sufficient for the lab scope. - Expose k6 metrics directly to Prometheus — k6 has a built-in OTLP exporter, but routing metrics through the dashboard keeps a single source of truth and reuses the existing idempotency + audit-log infrastructure.