Skip to content

CLONER.Art Egress Troubleshooting

Runbook for diagnosing CLONER egress failures (catalog refresh, Tranco fetch, vendor docsite scrape).

Goal

Quickly isolate where the egress path broke + restore it.

Symptoms

  • /admin/cloner shows last refresh failed
  • Alert CLONERFnFailure fired
  • Operator sees stale catalog data

Decision tree

                       Is the bench air-gapped?
                              │
            ┌─────────────────┴─────────────────┐
            │ Yes (typical)                     │ No
            ▼                                   ▼
   Is OBP session active?              Direct egress —
            │                          check Internet path
   ┌────────┴────────┐
   │ No              │ Yes
   ▼                 ▼
 → Authorize       Is allowlist
   OBP session     blocking the
                   destination?
                        │
                ┌───────┴───────┐
                │ Yes           │ No
                ▼               ▼
              → Block         Check CLONER
                expected;       pod logs for
                operator        actual error
                blocked
                this dest

Diagnostic commands

Check egress preference order

curl -fsS https://dashboard.tlsstress.art/api/cloner/egress-status

Expected output:

{
  "direct_internet_available": false,
  "obp_session_active": true,
  "obp_session_expires_at": "2026-05-11T14:23:00Z",
  "fallback": "obp"
}

Check CLONER pod logs

kubectl logs -n clone-serve cloner-0 --tail=200

Look for: - egress refused: not in allowlist → allowlist gap - connection refused → upstream destination down - cert verify failed → cert chain issue, possibly OBP-side - dial tcp: i/o timeout → network path broken

Check OBP session

curl -fsS https://dashboard.tlsstress.art/api/obp/status \
  -H "Authorization: Bearer $TOKEN"

Check allowlist

The allowlist is hard-coded in the OBP binary signed at build time. View shipped allowlist:

gh release view obp-latest --json assets \
  -q '.assets[] | select(.name == "obp-allowlist.json")' \
  | jq .download_url \
  | xargs curl -fsS

Common fixes

Fix 1 — OBP session not authorized

Operator opens dashboard → /admin/obp/authorize → click "Authorize 30-min session".

Fix 2 — Destination not in allowlist

The allowlist is intentionally restricted. If you need a new destination: 1. File an issue in the OBP repo with the destination + reason 2. Wait for next OBP release (signed allowlist update) 3. Update OBP daemon on operator notebook

No runtime workaround — by design.

Fix 3 — Stale catalog cache

kubectl exec -it cloner-0 -n clone-serve -- rm -rf /cache/catalog/*
kubectl rollout restart deployment cloner -n clone-serve

Fix 4 — Cert chain issue post-OBP

Most likely cause: cert-manager rotated bench-side mTLS cert; OBP's copy is stale. Force CA.Art to push fresh cert to OBP:

curl -X POST https://dashboard.tlsstress.art/api/ca/rotate-mtls-cert?target=obp \
  -H "Authorization: Bearer $TOKEN"

Escalation

If none of the fixes resolve: 1. Capture diagnostic bundle: bench-support diag --include-cloner --include-obp 2. File P2 issue with the bundle 3. Page the bench on-call rotation