Test Run Reports¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.
Status: Phase 1 shipped — print-styled HTML report with structured JSON. Phase 2–5 add server-rendered PDF, DUT inventory annexes, Cosign signature, and N-run comparison. See
platform/test-plans/catalog.yamlfor the test plan side.
Why Reports¶
Spirent CyberFlood and Ixia BreakingPoint both ship proprietary reports — black-box PDFs that operators trust because they came from a paid vendor. This project ships an open, deterministic report system designed to be demonstrably stronger than those:
| Capability | Spirent / Ixia | This project |
|---|---|---|
| Cover page with run identity | ✅ | ✅ |
| Stable plan identifier across engagements | ✅ (vendor-locked) | ✅ (CAP-FIND-KNEE-30M, git-versioned) |
| Plan-snapshot hash to prove parameters were not edited post-run | ❌ | ✅ |
| Report-content hash for forensic chain-of-custody | ❌ | ✅ (Phase 1) |
| DUT (NGFW + switch) inventory + sanitized config as annex | ❌ | 🟡 Phase 3 — Nexus 9000 + NGFW |
| Independent TLS-decrypt mode evidence (issuer cert) | ❌ | ✅ (probe in Phase 3 wiring) |
| Cryptographic signature on the PDF | ❌ | 🟡 Phase 4 — Cosign |
| N-run comparison report (last 5 runs side by side) | ⚠️ paid add-on | 🟡 Phase 5 |
| Reproducibility — replay a published report against your own NGFW | ❌ | 🟡 Phase 5 — replay mode |
| Licensing notice on every page | ❌ | ✅ Phase 1 |
Eleven forensic differentiators¶
The comparison table above covers the foundation. Beyond that, the report system adds eleven concrete capabilities that aim higher than the closed-source paid alternatives. Two have shipped; nine are scoped into Phases 2–5.
| # | Capability | Status |
|---|---|---|
| 1 | Causal analysis automation — topology-aware correlation produces "fact → consequence → recommendation" sentences | ✅ shipped in #177 |
| 2 | TLS Decrypt Mode timeline — independent issuer-cert ground-truth probe; TLSDecryptModeChanged alert auto-invalidates results that span a state flip |
✅ shipped in #180 |
| 3 | Test-bed validity proof — per-window verdict (e.g. "results in window 14:00–14:23 are clean; 14:23–14:31 tainted by UCS-2 saturation") | 🟡 Phase 5 |
| 4 | Per-hop latency breakdown — agent→NGFW handshake / NGFW→persona handshake / TTFB / object load in separate columns, not just end-to-end | 🟡 Phase 5 |
| 5 | Per-archetype analysis — summary separated by skin / mock / har-replay / real-app — each archetype stresses a different NGFW path | 🟡 Phase 5 |
| 6 | Reproducibility manifest — git SHA, image digests SHA-256, applied sysctls per UCS, deployment mode, NGFW config fingerprint — the full recipe | 🟡 Phase 4 |
| 7 | Cryptographically signed PDF — Cosign keyless via the cluster signing key; optional Rekor/Sigstore transparency entry | 🟡 Phase 4 |
| 8 | Per-section confidence intervals — every metric ships with a CI based on sample size, not just an average | 🟡 Phase 5 |
| 9 | Detailed failure attribution — e.g. "47 of 12,847 errors were NGFW timeout, 12 were TLS handshake fail, 8 were conntrack overflow on the test-bed" | 🟡 Phase 5 |
| 10 | Replay snapshot — "open in viewer" link launches Grafana dashboards as they were at run time (Prometheus state frozen) | 🟡 Phase 5 |
| 11 | Forensic-grade prose narrative — executive summary in prose that walks the results with confidence levels — not just a table dump | 🟡 Phase 5 |
Phase 1 — what shipped¶
The data API¶
GET /api/test-runs/{executionId}/report.json
Returns the canonical ReportData shape:
{
"version": 1,
"generatedAt": "2026-05-06T14:35:00.000Z",
"reportSha256": "<64-char hex>",
"meta": {
"runId": "...",
"executionId": "...",
"planIdentifier": "CAP-FIND-KNEE-30M",
"planDisplayName": "Capacity — Find the knee (30 min)",
"planCatalogVersion": 1,
"planSnapshotSha256": "<64-char hex>",
"durationS": 1800,
"startedAt": "...",
"endedAt": "...",
"outcome": "..."
},
"license": { "id": "LicenseRef-PolyForm-Noncommercial-1.0.0-with-Appendix-A", ... },
"plan": { "identifier": "...", "phases": [...], ... },
"topology": { "deploymentMode": "tri-node", "ucsCount": 3, ... },
"tlsDecrypt": { "activeAtStart": "on", "activeAtEnd": "on", ... },
"results": { "aggregate": { "p50_ms": 142, "p95_ms": 380, "p99_ms": 487, ... } },
"slo": { "targetP99Ms": 500, "observedP99Ms": 487, "pass": true, ... },
"annexes": [{ "id": "annex-b-nexus", "title": "...", "sha256": "...", "body": "..." }, ...]
}
X-Report-Sha256 and X-License headers carry the same hash + license-id for downstream tooling.
The print page¶
GET /runs/{executionId}/report
A server-rendered, print-styled HTML page with:
- Cover — run id, plan, dates, SLO badge, license badge, report SHA-256 + plan-snapshot SHA-256
- Licensing & Use Restrictions — full audience + field-of-use text in EN / PT-BR / ES on the page immediately after the cover
- Executive Summary — aggregate KPIs, SLO pass/fail with budget-burn percentage
- Test Plan Configuration — parameters table + phases timeline
- Annexes — placeholders for Phase 3 (Nexus + NGFW inventory)
- License footer — pinned to every printed page (the operator cannot strip it without re-rendering)
The page renders with @page A4 portrait margins and pages-counter footers. Browsers can "Save as PDF" today; Phase 2 swaps in Puppeteer for deterministic server-side rendering.
Forensic hashes already present in Phase 1¶
| Hash | What it commits to |
|---|---|
reportSha256 |
The full canonical JSON payload — proves the report data was not tampered with |
planSnapshotSha256 |
The plan parameters frozen at run start — proves the plan was not edited mid-run |
annex.sha256 (per annex) |
Each annex body — Phase 3 uses this for Nexus/NGFW config attestation |
A reviewer who suspects PDF tampering can:
- Download the original ReportData JSON via the API
- Recompute SHA-256 of the canonical JSON
- Compare against the value printed on the cover
If they match — the report is authentic. Phase 4 adds Cosign on top to make this trivially verifiable.
What Phase 2–5 add¶
| Phase | Adds |
|---|---|
| 2 | Puppeteer renders this page server-side → /api/test-runs/{id}/report.pdf returns a real PDF |
| 3 | DUT Inventory Probe populates Annex B (Nexus 9000) + Annex C (NGFW) + Annex D (UCS chassis) with real model, S/N, sanitized running-config. Partially shipped: the DUT API foundation + 4 vendor adapters (Cisco FTD, Nexus, UCS CIMC Redfish, FortiGate) and snapshot/hash infrastructure are live since v4.0.0 (#199, #210); wiring the snapshots into the report Annex B/C/D is the remaining piece (PR-D in the v4.1 roadmap) |
| 4 | Cosign keyless OIDC signature on the PDF + Grafana state snapshot embed (the actual dashboards as they were during the run) |
| 5 | N-run comparison report (last 5 runs of the same plan, p50/p95/p99 side-by-side) + replay mode (download the catalog version + plan snapshot to reproduce the load shape elsewhere) |
Operator workflow¶
- Pick a plan from the catalog, kick off a run via the Dashboard
- Run completes —
test_run_executionsrow getsendedAt+outcome - Open
/runs/{executionId}/reportin a browser - Hit Print → Save as PDF (Phase 1) — or wait for Phase 2 and download the signed PDF directly
- Distribute the PDF to authorized parties; the licensing footer + cover page remind them of the audience policy
Compared to commercial alternatives¶
The Ixia BreakingPoint report is a closed-format PDF rendered by a closed-source engine; the operator must trust the vendor that the numbers haven't been massaged. This system inverts that:
- The data is a versioned JSON shape (
ReportData) — anyone can re-parse it - The PDF is a print rendering of an HTML page — anyone can re-render it
- The plan is a git-versioned YAML — anyone can confirm what was supposed to run
- The actual config of the gear under test is embedded as a hashed annex (Phase 3)
- The PDF will be cryptographically signed (Phase 4) — anyone can verify provenance
All while staying inside the audience policy of the license.
Related¶
TEST_PLANS.md— the 15 catalog plans whose runs feed this reportTLS_DECRYPT_MODE_VERIFICATION.en.md— independent issuer probe used by Annex B/CMONITORING_TEST_VALIDITY.md— alerts that say "the test bed itself was healthy"USAGE_POLICY.md— audience restrictions explained