CLONER.Art Egress Troubleshooting¶
Runbook for diagnosing CLONER egress failures (catalog refresh, Tranco fetch, vendor docsite scrape).
Goal¶
Quickly isolate where the egress path broke + restore it.
Symptoms¶
/admin/clonershows last refresh failed- Alert
CLONERFnFailurefired - Operator sees stale catalog data
Decision tree¶
Is the bench air-gapped?
│
┌─────────────────┴─────────────────┐
│ Yes (typical) │ No
▼ ▼
Is OBP session active? Direct egress —
│ check Internet path
┌────────┴────────┐
│ No │ Yes
▼ ▼
→ Authorize Is allowlist
OBP session blocking the
destination?
│
┌───────┴───────┐
│ Yes │ No
▼ ▼
→ Block Check CLONER
expected; pod logs for
operator actual error
blocked
this dest
Diagnostic commands¶
Check egress preference order¶
curl -fsS https://dashboard.tlsstress.art/api/cloner/egress-status
Expected output:
{
"direct_internet_available": false,
"obp_session_active": true,
"obp_session_expires_at": "2026-05-11T14:23:00Z",
"fallback": "obp"
}
Check CLONER pod logs¶
kubectl logs -n clone-serve cloner-0 --tail=200
Look for:
- egress refused: not in allowlist → allowlist gap
- connection refused → upstream destination down
- cert verify failed → cert chain issue, possibly OBP-side
- dial tcp: i/o timeout → network path broken
Check OBP session¶
curl -fsS https://dashboard.tlsstress.art/api/obp/status \
-H "Authorization: Bearer $TOKEN"
Check allowlist¶
The allowlist is hard-coded in the OBP binary signed at build time. View shipped allowlist:
gh release view obp-latest --json assets \
-q '.assets[] | select(.name == "obp-allowlist.json")' \
| jq .download_url \
| xargs curl -fsS
Common fixes¶
Fix 1 — OBP session not authorized¶
Operator opens dashboard → /admin/obp/authorize → click "Authorize
30-min session".
Fix 2 — Destination not in allowlist¶
The allowlist is intentionally restricted. If you need a new destination: 1. File an issue in the OBP repo with the destination + reason 2. Wait for next OBP release (signed allowlist update) 3. Update OBP daemon on operator notebook
No runtime workaround — by design.
Fix 3 — Stale catalog cache¶
kubectl exec -it cloner-0 -n clone-serve -- rm -rf /cache/catalog/*
kubectl rollout restart deployment cloner -n clone-serve
Fix 4 — Cert chain issue post-OBP¶
Most likely cause: cert-manager rotated bench-side mTLS cert; OBP's copy is stale. Force CA.Art to push fresh cert to OBP:
curl -X POST https://dashboard.tlsstress.art/api/ca/rotate-mtls-cert?target=obp \
-H "Authorization: Bearer $TOKEN"
Escalation¶
If none of the fixes resolve:
1. Capture diagnostic bundle: bench-support diag --include-cloner --include-obp
2. File P2 issue with the bundle
3. Page the bench on-call rotation