Skip to content

ADR 0030 — K8s ValidatingAdmissionWebhook with audit / enforce / break-glass modes

  • Status: Accepted (formalized 2026-05-12 with v3.7.0 — Wave 12 shipping)
  • Date: 2026-05-11 (locked), 2026-05-12 (formalized)
  • Deciders: TLSStress.Art project
  • Targets: v3.7.0 (audit + enforce + break-glass + cross-correlation)
  • Patent claim family: claim #21 — Tier A/B admission classification at admission time, with break-glass + cross-correlation
  • Umbrella ADR: 0026

Context

The bench's K8s API server is the natural choke point for "what pods are allowed to run in this cluster". The Tier A/B partition (ADR 0028) only matters if every Pod is classified — and the only honest classification timestamp is admission time. Anything later is a race condition.

We need a webhook that:

  1. Refuses to fail open — when the webhook crashes, every Pod is denied (not allowed)
  2. Doesn't break customer deployments on day one — the default mode has to be audit, not enforce, because the moment a real Fortune-500 ops team hits a hard-deny they fork the project
  3. Has a break-glass — auditors and incident response WILL need to bypass enforcement under controlled circumstances, and we'd rather observe the bypass than have ops disable the webhook entirely
  4. Hard-couples to the sealed audit log (ADR 0029) so the decision history is tamper-evident

Decision

Ship pkg/ztp-prem-admission/ as a Go ValidatingAdmissionWebhook with three modes (env-driven, not flag-driven, so K8s config controls it):

  • audit (default): every CREATE/UPDATE Pod admission is classified by ztp-prem.tlsstress.art/tier label, the decision is always Allow, and a structured entry lands in the in-memory ring buffer + optional JSONL sink.
  • enforce: missing or invalid tier label → Deny.
  • break-glass: per-Pod opt-in via label ztp-prem.tlsstress.art/break-glass=<ticket-id> — admission is Allowed even in enforce mode, but the audit entry is marked with the ticket so the trail is intact.

Operationally: - /validate is the AdmissionReview v1 endpoint (mTLS, K8s certificate auth) - /healthz is the unauthenticated liveness probe - /audit is a Bearer-token-gated read endpoint for the dashboard reader (dashboard/src/lib/ztp-prem/admission-audit.ts)

Each decision additionally emits an event into the sealed audit log (ADR 0029) with a sequence anchor; the dashboard's admission-correlate.ts reader cross-references the two and alerts on drift rather than silently reconciling.

K8s manifest at k8s/ztp-prem/admission-webhook.yaml ships a 4-step canary rollout guide embedded in comments (single-replica audit → multi-replica audit → single-replica enforce → full enforce).

Consequences

Pros - Day-one deployable in audit mode — zero workload disruption while the customer learns the rules - Enforce mode is a config flip, not a redeploy - Break-glass exists and is logged — incident response uses it instead of disabling the webhook - Cross-correlation with the sealed audit log makes "I just edited the admission audit JSONL" a detectable forgery

Cons - Failing the webhook hard-denies everything — operational visibility is critical, hence the dashboard surface - Ring buffer is 1000 entries — high-churn clusters need the JSONL sink to retain history - Break-glass label puts trust in the operator who set it; we cope by logging + alerting, not by gating

Reversibility: high. Disabling the webhook is a single-kubectl delete operation. The audit history persists in the sealed chain regardless. We don't depend on the webhook to prevent anything (that's a Tier B layer concern); we depend on it to classify and record every admission.


Last verified against shipping code: v3.7.0 (2026-05-12).