Skip to content

Runbook — SPKI pin rotation (bootstrap-controller)

The on-prem bootstrap-controller pins the SPKI (Subject-Public-Key-Info) SHA-256 of app.tlsstress.art's certificate chain (leaf + Google Trust Services intermediate + root) to defeat a rogue-CA / DNS / BGP MITM of the /api/install/manifest supply-chain response. Pinning is fail-closed: if no presented certificate's SPKI matches a pin, the TLS handshake aborts and the controller cannot reach the cloud.

The pins are baked into the controller image (DefaultSPKIPins in pkg/bootstrap-controller/internal/cloudclient/cloudclient.go). A legitimate CA rotation by Google Trust Services (new intermediate, or — rarely — a new root) will therefore break the handshake for every deployed controller until the pins are updated. This runbook covers both the fast operational fix and the permanent fix.

Symptom

bootstrap-controller logs:

tls: no presented certificate SPKI matched the N pinned key(s) for app.tlsstress.art — possible MITM/rogue-CA

and heartbeat / manifest / usage-report calls fail with a TLS error. First confirm it is a real rotation, not an attack: from a trusted host,

echo | openssl s_client -connect app.tlsstress.art:443 -servername app.tlsstress.art 2>/dev/null \
  | openssl x509 -noout -issuer -subject

and verify the issuer is still Google Trust Services and the cert is valid (e.g. in an independent browser / curl -v). If anything looks off, treat it as a security incident — do NOT widen the pins.

Compute the new pins

For each cert in the presented chain (leaf, intermediate, root):

# Save the chain, then for each cert:
openssl x509 -in <cert.pem> -pubkey -noout \
  | openssl pkey -pubin -outform der \
  | openssl dgst -sha256 -binary \
  | openssl enc -base64

You can pull the full chain with:

openssl s_client -connect app.tlsstress.art:443 -servername app.tlsstress.art -showcerts </dev/null 2>/dev/null

Fast operational fix (no rebuild) — TLSSTRESS_SPKI_PINS

Set the env var on the controller to a comma-separated list of base64 SPKI SHA-256 pins. It REPLACES the baked-in set (so include every level you want to trust — typically leaf + intermediate + root):

# systemd drop-in or K8s Deployment env:
TLSSTRESS_SPKI_PINS="<leaf-b64>,<intermediate-b64>,<root-b64>"

Restart the controller. It logs SPKI pins overridden from TLSSTRESS_SPKI_PINS (N pin(s)). This is the recommended immediate remediation — it ships via config, not a new image, so the whole fleet recovers in one rollout.

Tip: pinning the root (GTS Root R4) alone is the most rotation-resilient, since Google rotates leaves (~90 days) and intermediates far more often than the root. Including all three keeps defence-in-depth while leaf/intermediate churn.

Permanent fix (image rebuild)

Update DefaultSPKIPins in pkg/bootstrap-controller/internal/cloudclient/cloudclient.go with the new pins (and the capture date in the comment), rebuild + re-sign the controller image, and roll it out. Once every controller runs the new image, the TLSSTRESS_SPKI_PINS override can be removed.

Last-resort (dev / local mirror only)

--insecure-skip-verify (or TLSSTRESS_* equivalent) disables SPKI pinning AND cosign image verification. Never use in production — it removes the supply-chain protection entirely. It exists only for the local-mirror dev path.