ADR 0001 — Mandatory TLS 1.3 with strict certificate validation (per-target overrides since 2026-05)¶

Status: Accepted (amended 2026-05-02 with per-target exceptions)
Date: 2026-04-26 (initial), 2026-05-02 (amendment)
Deciders: André Luiz Gallon

Context¶

The agent has to load real public sites (g1, NASA, Cisco, …) on behalf of operators that may use the data for performance baselines. Anything below TLS 1.3 is increasingly considered weak (RFC 9325 deprecates TLS 1.2 KEX modes that lack forward secrecy when negotiated with legacy ciphers), and ignoring certificate errors silently is a footgun that hides MITM.

Decision¶

The agent's Chromium is launched with --ssl-version-min=tls1.3.
The navigation response is inspected; if securityDetails().protocol is not TLS 1.3, the cycle is aborted with error_message recorded.
ignoreHTTPSErrors is forced to false (Playwright default true).
The dashboard's httpsOnly flag rejects non-HTTPS targets at the validator (only https:// URLs are accepted on POST /api/admin/targets).

Consequences¶

✅ The cluster cannot be used to bypass certificate hygiene.
✅ Cycles that previously "succeeded" against a broken TLS chain now fail loudly, surfacing real production risk earlier.
⚠️ A handful of legacy sites that still negotiate TLS 1.2 cannot be monitored. Operators that need this can opt out with REQUIRE_TLS13=false, but the option is documented as an explicit security trade-off.

Alternatives considered¶

Allow TLS 1.2 but block weak ciphers — chose to keep the policy simple. TLS 1.3 is universal in 2026.
Disable the check via UI toggle per target — initially rejected for fear of false signals; later adopted as a controlled exception (see Amendment below) once the internal benchmark fleet (web-agent-webserver / Caddy tls internal) made the blanket policy genuinely impractical for lab use.

Amendment (2026-05-02) — per-target TLS overrides¶

Context¶

The introduction of the web-agent-webserver Caddy fleet (HTTP/2 + HTTP/3 benchmark targets, scaled 1..20 from the dashboard) made the original blanket policy untenable for lab use:

Caddy's tls internal directive generates per-host self-signed certificates from a CA that nobody else trusts. Every cycle against the internal fleet would fail with net::ERR_CERT_AUTHORITY_INVALID.
The internal fleet uses Docker DNS aliases like webserver and *.ai_forse.local, which the dashboard's PRIVATE_HOST_RE SSRF safety net already blocked at create-time on POST /api/admin/targets.
Both restrictions are correct for the public-monitoring use case (g1, NASA, Cisco, …), but become a hard wall for the lab use case the same UI is supposed to also support.

A global env-var kill-switch (REJECT_INVALID_CERTS=false) was explicitly rejected — it would degrade posture for every target, not just the lab ones. We needed per-row granularity that the operator can see in the UI and that auditors can review.

Decision¶

Add two per-target columns (migration 0012_target_tls_overrides.sql):

allow_insecure_tls boolean NOT NULL DEFAULT false — when true, the agent passes ignoreHTTPSErrors: true to Playwright for this target only. The cluster-wide REJECT_INVALID_CERTS env still defaults to true and still applies to every target whose per-row flag is false.
tls_min_version text NOT NULL DEFAULT 'tls1.3' CHECK (… IN ('tls1.2','tls1.3','any')) — per-target floor on the negotiated TLS version.

Plumbing:

POST /api/admin/targets accepts both new fields. The private-hostname validator is only bypassed when the request also sets allowInsecureTls=true (see code comments for the SSRF rationale).
The same logic applies to PATCH /api/admin/targets/[id].
The agent's Target Zod schema gains the two fields with safe defaults. The runner combines target.tlsMinVersion (authoritative) with the legacy REQUIRE_TLS13 env (now a paranoid safety net that silently upgrades 'any' → 'tls1.2').
Every cycle that runs with the relaxed posture emits a structured warn log (allowInsecureTls: true, reason: 'per-target opt-in (ADR-0001 exception)') so the SIEM / dashboard can list every target running outside default policy.
The dashboard UI shows two badges (⚠ TLS lab and TLS 1.2+ / TLS qualquer) on any target with non-default values, plus controls in the create + edit forms with red-tinted styling and explicit copy ("Lab interno apenas").

Consequences¶

✅ Lab fleets with self-signed CAs are usable from the same UI that drives public-site monitoring, without weakening the default posture for anything the operator hasn't explicitly opted in.
✅ Every exception is visible in the database (queryable), in the UI (badged on the row), and in the audit log (structured warn per cycle).
⚠️ A misconfigured target marked allow_insecure_tls=true against a public site silently degrades that target's TLS validation — the badge in the UI is the operator's only signal there. Mitigated by red colouring + the explicit "lab interno apenas" copy in the form.

Migration impact¶

Existing rows get the strict defaults (false, 'tls1.3'), byte-for-byte preserving the original ADR-0001 behaviour.
No code change required for callers that ignore the new fields — the agent's Zod schema makes both optional with defaults.
The legacy REQUIRE_TLS13 env is kept for backwards compatibility but its meaning is narrowed: it now only acts as a floor on per-target tlsMinVersion='any' (silently upgrading it to 'tls1.2'). Setting it to false no longer disables the per- target check.