Skip to content

VALIDATOR.Art Rebuild

Runbook for rebuilding the VALIDATOR.Art MÓDULO from scratch (corrupted ML cortex state, schema migration, full reset).

Goal

Restore VALIDATOR.Art to a known-good state while preserving fleet enrollment data. Used when the ML cortex state corrupts or a major schema migration ships.

Prerequisites

  • Admin role
  • Recent backup confirmed (velero backup for validator namespace)
  • Maintenance window (30 min)
  • Notify operators — fleet view + ML predictions unavailable during rebuild

Procedure

Step 1 — Backup current state

velero backup create validator-pre-rebuild-$(date +%Y%m%d) \
  --include-namespaces=validator --wait

Step 2 — Export enrollment table (critical preservation)

kubectl exec -it validator-api -n validator -- \
  pg_dump -t fleet_enrollment > enrollment-$(date +%Y%m%d).sql

Step 3 — Stop the ML cortex sidecar

kubectl scale deployment validator-ml-cortex -n validator --replicas=0

Step 4 — Wipe ML state but preserve enrollment

kubectl exec -it validator-api -n validator -- psql -c "
  TRUNCATE TABLE ml_predictions, ml_models, drift_signals CASCADE;
  -- enrollment table NOT truncated
"

Step 5 — Restart ML cortex (cold start)

kubectl scale deployment validator-ml-cortex -n validator --replicas=1
kubectl wait --for=condition=ready pod \
  -l app.kubernetes.io/name=validator-ml-cortex \
  -n validator --timeout=300s

Step 6 — Verify enrollment intact

kubectl exec -it validator-api -n validator -- psql -c "
  SELECT COUNT(*) FROM fleet_enrollment;
"
# expected: same count as before rebuild

Step 7 — Re-trigger discovery probe

ML cortex re-builds fleet topology graph over 24h. Operators see a partial graph during this period — flagged in HID dashboard.

Rollback

If rebuild fails:

velero restore create validator-rollback-$(date +%Y%m%d) \
  --from-backup validator-pre-rebuild-$(date +%Y%m%d) --wait

Success criteria

  • Enrollment count preserved
  • ML cortex pod healthy
  • HID dashboard shows "ML rebuilding" status
  • Discovery probe re-engaged
  • No alert flap during rebuild