Skip to content

ADR 0031 — TPM 2.0 measured-boot + Confidential Computing detection

  • Status: Accepted (formalized 2026-05-12 with v3.7.0 — both probes shipping; PCR-signing planned v6.0+)
  • Date: 2026-05-11 (locked), 2026-05-12 (formalized)
  • Deciders: TLSStress.Art project
  • Targets: v3.7.0 (probes); v6.0+ (mandatory CC + sealed-key release against PCR quote)
  • Patent claim family: claim #22 — measured-boot anchor + CC-gated workload admission as ZTP-prem camadas 2 and 3
  • Umbrella ADR: 0026

Context

Tier A/B partition (ADR 0028) closes the source-side moat. The admission webhook (ADR 0030) closes the workload-scheduling moat. But both assume the host kernel itself is the kernel that was booted — and an insider operator with the right access can boot a different kernel under us.

Two hardware-rooted detections close that gap, and they sit on the same hosts the bench already runs on:

  • TPM 2.0 measured-boot — boot-time PCR measurements anchor the kernel + bootloader chain; a swapped kernel changes PCRs
  • Confidential Computing — SEV-SNP / TDX / Arm CCA push the trust root from the BIOS to the silicon; with attested launch + encrypted memory, even DC-side root cannot dump our process

In v3.7.0 we ship detection only: we want the camada 2 + 3 surface, the operator dashboard readiness card, and the API contract — but not the workload-gating enforcement, because no Fortune-500 host fleet has 100% CC coverage today and gating prematurely makes the bench un-deployable.

v6.0+ flips both probes from advisory to mandatory. By that point CC adoption among DC operators is the table-stakes baseline.

Decision

Ship two per-node Go probes as separate Tier A modules, each running via a K8s DaemonSet, each emitting a JSON status doc and updating a per-node ConfigMap consumed by the dashboard:

  • pkg/ztp-prem-detect/ (camada 2) — CC detection. Heuristics: /sys/module/sev_guest, /dev/sev-guest, cpuid SEV bits, equivalent TDX and Arm CCA signals.
  • pkg/ztp-prem-tpm/ (camada 3) — TPM 2.0 surface detection. Heuristics: /dev/tpmrm0, /dev/tpm0, /sys/class/tpm/tpm0, family classification (tpm2-rm, tpm2-raw, tpm1.2, absent, unknown).

Both probes emit: - Boolean signals (kernel module loaded? device node present? sysfs node present?) - A heuristic mode classification - Generation timestamp + wave anchor

Privacy: neither probe emits PCR values, device identifiers, or vendor secrets to stdout, ConfigMap, or sealed audit log. PCR-hash inclusion in the audit chain (ADR 0029) is Wave 11-B work, planned v6.0+ — explicitly deferred.

Dashboard: dashboard/src/components/CCStatusCard.tsx + the planned TPM readiness card render fleet-wide state on /admin/ztp-prem so the operator can see CC + TPM coverage without SSH'ing into nodes.

Future (v6.0+, Wave 11-B): - TPM PCR0..PCR7 hash inclusion in sealed audit (boot-time anchor) - Sealed-key release against PCR quote (TPM unseal API) - Admission webhook (ADR 0030) refusing to schedule Tier B Pods on non-CC nodes

Consequences

Pros - Camada 2 + camada 3 ship in v3.7.0 as probes — the surface exists today, the enforcement upgrade in v6.0+ is purely a config flip - Probes are Tier A — open, distroless, customer-auditable - Detection separation means a host without TPM still passes CC detection (and vice versa) — useful diagnostic granularity - v6.0+ enforcement gives a hard reason to bump the major (clean story for the release note)

Cons - Detection ≠ attestation; a malicious host can fake the sysfs signal. v3.7.0 explicitly does not promise to defeat that — it promises to detect honest hosts. v6.0+ closes the gap with real PCR quoting. - v6.0+ migration will break legacy deploys without CC-capable hosts. We commit to this break in the umbrella ADR (0026). - ARM CCA hosts are rare today; the heuristic is best-effort.

Reversibility: high for v3.7.0 (probes are advisory only). Low for v6.0+ once we ship mandatory enforcement.


Last verified against shipping code: v3.7.0 (2026-05-12).