VALIDATOR.Art Rebuild¶
Runbook for rebuilding the VALIDATOR.Art MÓDULO from scratch (corrupted ML cortex state, schema migration, full reset).
Goal¶
Restore VALIDATOR.Art to a known-good state while preserving fleet enrollment data. Used when the ML cortex state corrupts or a major schema migration ships.
Prerequisites¶
- Admin role
- Recent backup confirmed (
velero backupforvalidatornamespace) - Maintenance window (30 min)
- Notify operators — fleet view + ML predictions unavailable during rebuild
Procedure¶
Step 1 — Backup current state¶
velero backup create validator-pre-rebuild-$(date +%Y%m%d) \
--include-namespaces=validator --wait
Step 2 — Export enrollment table (critical preservation)¶
kubectl exec -it validator-api -n validator -- \
pg_dump -t fleet_enrollment > enrollment-$(date +%Y%m%d).sql
Step 3 — Stop the ML cortex sidecar¶
kubectl scale deployment validator-ml-cortex -n validator --replicas=0
Step 4 — Wipe ML state but preserve enrollment¶
kubectl exec -it validator-api -n validator -- psql -c "
TRUNCATE TABLE ml_predictions, ml_models, drift_signals CASCADE;
-- enrollment table NOT truncated
"
Step 5 — Restart ML cortex (cold start)¶
kubectl scale deployment validator-ml-cortex -n validator --replicas=1
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/name=validator-ml-cortex \
-n validator --timeout=300s
Step 6 — Verify enrollment intact¶
kubectl exec -it validator-api -n validator -- psql -c "
SELECT COUNT(*) FROM fleet_enrollment;
"
# expected: same count as before rebuild
Step 7 — Re-trigger discovery probe¶
ML cortex re-builds fleet topology graph over 24h. Operators see a partial graph during this period — flagged in HID dashboard.
Rollback¶
If rebuild fails:
velero restore create validator-rollback-$(date +%Y%m%d) \
--from-backup validator-pre-rebuild-$(date +%Y%m%d) --wait
Success criteria¶
- Enrollment count preserved
- ML cortex pod healthy
- HID dashboard shows "ML rebuilding" status
- Discovery probe re-engaged
- No alert flap during rebuild