Runbook — SPKI pin rotation (bootstrap-controller)¶
The on-prem bootstrap-controller pins the SPKI (Subject-Public-Key-Info)
SHA-256 of app.tlsstress.art's certificate chain (leaf + Google Trust Services
intermediate + root) to defeat a rogue-CA / DNS / BGP MITM of the
/api/install/manifest supply-chain response. Pinning is fail-closed: if no
presented certificate's SPKI matches a pin, the TLS handshake aborts and the
controller cannot reach the cloud.
The pins are baked into the controller image (DefaultSPKIPins in
pkg/bootstrap-controller/internal/cloudclient/cloudclient.go). A legitimate
CA rotation by Google Trust Services (new intermediate, or — rarely — a new root)
will therefore break the handshake for every deployed controller until the pins
are updated. This runbook covers both the fast operational fix and the permanent
fix.
Symptom¶
bootstrap-controller logs:
tls: no presented certificate SPKI matched the N pinned key(s) for app.tlsstress.art — possible MITM/rogue-CA
and heartbeat / manifest / usage-report calls fail with a TLS error. First confirm it is a real rotation, not an attack: from a trusted host,
echo | openssl s_client -connect app.tlsstress.art:443 -servername app.tlsstress.art 2>/dev/null \
| openssl x509 -noout -issuer -subject
and verify the issuer is still Google Trust Services and the cert is valid (e.g.
in an independent browser / curl -v). If anything looks off, treat it as a
security incident — do NOT widen the pins.
Compute the new pins¶
For each cert in the presented chain (leaf, intermediate, root):
# Save the chain, then for each cert:
openssl x509 -in <cert.pem> -pubkey -noout \
| openssl pkey -pubin -outform der \
| openssl dgst -sha256 -binary \
| openssl enc -base64
You can pull the full chain with:
openssl s_client -connect app.tlsstress.art:443 -servername app.tlsstress.art -showcerts </dev/null 2>/dev/null
Fast operational fix (no rebuild) — TLSSTRESS_SPKI_PINS¶
Set the env var on the controller to a comma-separated list of base64 SPKI SHA-256 pins. It REPLACES the baked-in set (so include every level you want to trust — typically leaf + intermediate + root):
# systemd drop-in or K8s Deployment env:
TLSSTRESS_SPKI_PINS="<leaf-b64>,<intermediate-b64>,<root-b64>"
Restart the controller. It logs SPKI pins overridden from TLSSTRESS_SPKI_PINS
(N pin(s)). This is the recommended immediate remediation — it ships via config,
not a new image, so the whole fleet recovers in one rollout.
Tip: pinning the root (GTS Root R4) alone is the most rotation-resilient, since Google rotates leaves (~90 days) and intermediates far more often than the root. Including all three keeps defence-in-depth while leaf/intermediate churn.
Permanent fix (image rebuild)¶
Update DefaultSPKIPins in
pkg/bootstrap-controller/internal/cloudclient/cloudclient.go with the new pins
(and the capture date in the comment), rebuild + re-sign the controller image,
and roll it out. Once every controller runs the new image, the
TLSSTRESS_SPKI_PINS override can be removed.
Last-resort (dev / local mirror only)¶
--insecure-skip-verify (or TLSSTRESS_* equivalent) disables SPKI pinning AND
cosign image verification. Never use in production — it removes the
supply-chain protection entirely. It exists only for the local-mirror dev path.