ADR 0030 — K8s ValidatingAdmissionWebhook with audit / enforce / break-glass modes¶
- Status: Accepted (formalized 2026-05-12 with v3.7.0 — Wave 12 shipping)
- Date: 2026-05-11 (locked), 2026-05-12 (formalized)
- Deciders: TLSStress.Art project
- Targets: v3.7.0 (audit + enforce + break-glass + cross-correlation)
- Patent claim family: claim #21 — Tier A/B admission classification at admission time, with break-glass + cross-correlation
- Umbrella ADR: 0026
Context¶
The bench's K8s API server is the natural choke point for "what pods are allowed to run in this cluster". The Tier A/B partition (ADR 0028) only matters if every Pod is classified — and the only honest classification timestamp is admission time. Anything later is a race condition.
We need a webhook that:
- Refuses to fail open — when the webhook crashes, every Pod is denied (not allowed)
- Doesn't break customer deployments on day one — the default mode has to be audit, not enforce, because the moment a real Fortune-500 ops team hits a hard-deny they fork the project
- Has a break-glass — auditors and incident response WILL need to bypass enforcement under controlled circumstances, and we'd rather observe the bypass than have ops disable the webhook entirely
- Hard-couples to the sealed audit log (ADR 0029) so the decision history is tamper-evident
Decision¶
Ship pkg/ztp-prem-admission/ as a Go ValidatingAdmissionWebhook
with three modes (env-driven, not flag-driven, so K8s config
controls it):
audit(default): every CREATE/UPDATE Pod admission is classified byztp-prem.tlsstress.art/tierlabel, the decision is always Allow, and a structured entry lands in the in-memory ring buffer + optional JSONL sink.enforce: missing or invalid tier label → Deny.break-glass: per-Pod opt-in via labelztp-prem.tlsstress.art/break-glass=<ticket-id>— admission is Allowed even in enforce mode, but the audit entry is marked with the ticket so the trail is intact.
Operationally:
- /validate is the AdmissionReview v1 endpoint (mTLS, K8s
certificate auth)
- /healthz is the unauthenticated liveness probe
- /audit is a Bearer-token-gated read endpoint for the dashboard
reader (dashboard/src/lib/ztp-prem/admission-audit.ts)
Each decision additionally emits an event into the sealed audit
log (ADR 0029) with a sequence anchor; the dashboard's
admission-correlate.ts reader cross-references the two and
alerts on drift rather than silently reconciling.
K8s manifest at k8s/ztp-prem/admission-webhook.yaml ships a
4-step canary rollout guide embedded in comments (single-replica
audit → multi-replica audit → single-replica enforce → full
enforce).
Consequences¶
Pros - Day-one deployable in audit mode — zero workload disruption while the customer learns the rules - Enforce mode is a config flip, not a redeploy - Break-glass exists and is logged — incident response uses it instead of disabling the webhook - Cross-correlation with the sealed audit log makes "I just edited the admission audit JSONL" a detectable forgery
Cons - Failing the webhook hard-denies everything — operational visibility is critical, hence the dashboard surface - Ring buffer is 1000 entries — high-churn clusters need the JSONL sink to retain history - Break-glass label puts trust in the operator who set it; we cope by logging + alerting, not by gating
Reversibility: high. Disabling the webhook is a
single-kubectl delete operation. The audit history persists in
the sealed chain regardless. We don't depend on the webhook to
prevent anything (that's a Tier B layer concern); we depend on
it to classify and record every admission.
Related¶
pkg/ztp-prem-admission/— webhook implementationk8s/ztp-prem/admission-webhook.yaml— K8s manifest + canary guidedashboard/src/lib/ztp-prem/admission-audit.ts— dashboard readerdashboard/src/lib/ztp-prem/admission-correlate.ts— cross-correlation bridge- ADR 0026 — ZTP-prem umbrella
- ADR 0028 — Tier A/B partition (classification source)
- ADR 0029 — sealed audit hash-chain (correlation target)
Last verified against shipping code: v3.7.0 (2026-05-12).