Pre-flight checks — validating the lab before runs¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions. The Pre-flight check engine is a read-only validator that runs against registered DUT API devices BEFORE the operator triggers a Test Plan run. It catches "lab not in expected state" conditions early so runs do not produce forensically worthless data.
The principle: garbage in, garbage out. If the NGFW has pending deploy changes, or the decrypt policy is off when the plan demands decrypt-on, the resulting p99 numbers tell you nothing useful. Pre-flight refuses to start the run instead of letting it produce misleading data.
How it fits in the operator workflow¶
Operator picks plan → Runs preflight → Reviews failures → Fixes lab state →
Triggers snapshot → Re-runs preflight → All green → Starts the actual test run
Pre-flight is manually invoked today (POST endpoint). In a future PR (PR-D), the Test Plan engine will gate run-start on a passing preflight automatically.
API¶
POST /api/test-runs/preflight¶
curl -X POST "https://dashboard.example/api/test-runs/preflight" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "content-type: application/json" \
-d '{"planIdentifier": "BASELINE-SLO-30M"}'
Response (success — 200):
{
"planIdentifier": "BASELINE-SLO-30M",
"ranAt": "2026-05-06T14:35:00.000Z",
"checksRun": 8,
"checksPassed": 8,
"checksFailed": 0,
"checksSkipped": 0,
"pass": true,
"checks": [
{
"checkId": "ngfw-deploy-clean",
"description": "NGFW deploy state is DEPLOYED — no pending config changes",
"deviceHostname": "ftd-1.lab.example.com",
"vendor": "cisco-ftd",
"pass": true,
"detail": "state=DEPLOYED — no pending changes",
"evidence": {
"snapshotId": "...",
"payloadSha256": "abc123...",
"collectedAt": "2026-05-06T14:34:42.000Z"
}
},
...
],
"summary": "8/8 checks passed — lab is ready for plan BASELINE-SLO-30M"
}
Response (any check failed — 422):
{
"planIdentifier": "BASELINE-SLO-30M",
"ranAt": "2026-05-06T14:35:00.000Z",
"checksRun": 8,
"checksPassed": 6,
"checksFailed": 1,
"checksSkipped": 1,
"pass": false,
"checks": [
...
{
"checkId": "ngfw-decrypt-state-matches-plan",
"deviceHostname": "ftd-1.lab.example.com",
"vendor": "cisco-ftd",
"pass": false,
"detail": "plan requires decrypt-on but no decrypt rules configured",
"evidence": { ... }
}
],
"summary": "1 check(s) failed, 1 skipped (missing snapshots) — review details"
}
Check catalog (current)¶
| Check ID | Applies to | Endpoint label | What it validates |
|---|---|---|---|
ngfw-deploy-clean |
Cisco FTD | deploy_status |
state == 'DEPLOYED' (no pending changes) |
ngfw-decrypt-state-matches-plan |
Cisco FTD + Fortinet | decrypt_policy |
If plan demands decrypt-on, at least one rule configured; if decrypt-off, zero rules |
ntp-source-configured |
All vendors | ntp_config |
Device has at least one NTP server configured |
ngfw-ha-state-sane |
Cisco FTD | ha_status |
HA state is one of ACTIVE / STANDBY_READY / NEGOTIATION / JOIN |
snapshot-fresh |
All vendors | system_info |
Latest snapshot is < 60 min old |
The catalog is extensible — adding a new check is appending an entry to lib/preflight/checks.ts. No engine changes needed.
What checks return¶
Each check returns one of three states:
| Result | Meaning | Operator action |
|---|---|---|
| pass: true + evidence | Check evaluated against a snapshot, all good | None — proceed |
| pass: false + evidence | Check evaluated against a snapshot, FAIL | Fix the device state, trigger a manual snapshot, re-run |
| pass: false + evidence: null | Skipped — no snapshot exists for this device + label | Trigger a manual snapshot first |
Evidence is the most important field — it cites the exact snapshot SHA-256 + collected_at. The same SHA-256 will appear in the Test Run Report annexes when PR-D ships, so the chain-of-custody is unbroken.
Why pre-flight matters¶
Without pre-flight, this is a typical scenario:
Operator triggers BASELINE-SLO-30M expecting decrypt-on. The NGFW had its decrypt policy disabled by another engineer 30 minutes ago. The 30-minute run completes; p99 looks suspiciously low. Operator notices something off only when comparing with last week's run. Run is invalid. Engagement loses 30 min + the credibility of the report.
With pre-flight:
Operator runs preflight first. Check
ngfw-decrypt-state-matches-planfails: "plan requires decrypt-on but no decrypt rules configured". Operator opens the NGFW console (or triggers a write-op via the future API), enables decrypt, snapshot, re-runs preflight, all green, starts the run. 30 minutes are spent on a valid run.
Operator workflow — full sequence¶
# 1. Confirm devices are registered
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://dashboard.example/api/admin/dut/devices"
# 2. Trigger fresh snapshots (so preflight sees current state)
for id in $(curl -s -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://dashboard.example/api/admin/dut/devices" \
| jq -r '.devices[].id'); do
curl -X POST -H "Authorization: Bearer $ADMIN_TOKEN" \
"https://dashboard.example/api/admin/dut/devices/$id/snapshot"
done
# 3. Run pre-flight
curl -X POST "https://dashboard.example/api/test-runs/preflight" \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "content-type: application/json" \
-d '{"planIdentifier": "BASELINE-SLO-30M"}'
# 4. If pass: true → proceed with run trigger (existing flow)
# 5. If pass: false → review checks[].detail, fix the lab, go to step 2
Adding a new check¶
The check catalog at lib/preflight/checks.ts is data-driven. To add a check:
{
id: 'my-new-check',
description: 'What this check validates in operator-friendly prose',
endpointLabel: 'system_info', // which snapshot label the check needs
appliesTo: (device, plan) => {
// Return true if this check is relevant for this device + plan combo
return device.vendor === 'cisco-ftd' && plan.someField === 'someValue';
},
evaluate: (snapshot, plan) => {
// Inspect snapshot.payloadJson and decide pass/fail
const fieldValue = (snapshot.payloadJson as any)?.someField;
if (fieldValue === 'expected') {
return { pass: true, detail: 'value matches expectation' };
}
return { pass: false, detail: `value=${fieldValue}, expected 'expected'` };
}
}
No engine changes needed. The runner discovers the new check automatically.
Limitations¶
Honest scoping:
- Read-only — pre-flight does NOT trigger snapshots itself. Operator triggers them via the existing
POST /api/admin/dut/devices/{id}/snapshotendpoint - Stale snapshot tolerated up to 60 min —
snapshot-freshcheck fails if older. Adjust by editing the check, OR run a fresh snapshot before preflight - No automatic run-blocking yet — PR-D will integrate preflight into the Test Plan run-start flow (refuse to start if preflight fails)
- Cisco UCS checks not yet wired — UCS adapter is in PR #199 queue. When merged, UCS-specific checks (no critical faults, thermal sane) get added
- No write/remediation — pre-flight reports state but does not fix it. F-1 / F-2 (write ops) in the API_FEATURE_CATALOG.md cover that future capability
What pre-flight does NOT replace¶
- Operator judgment for non-checkable concerns (cable connections, physical layer, vendor support contracts)
- The TLS Decrypt Mode Probe (which is independent of API state — the probe could detect "decrypt is configured but somehow not actually decrypting traffic", which API-only checks cannot)
- Time-sync verification (separate
check-time-sync.shscript — pre-flight will eventually call it as a check)
Related¶
DUT_API_INTEGRATION.md— what API integration the checks consumeDUT_API_OPERATIONS.md— how to register devices that pre-flight inspectsAPI_FEATURE_CATALOG.md— pre-flight checks correspond to category A items A-1 through A-7TEST_PLANS.md— the plans pre-flight validates againstTIME_SYNC.md— separate time-sync gate; pre-flight will integrate it in a follow-up