Test Run Reports¶

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.

Status: Phase 1 shipped — print-styled HTML report with structured JSON. Phase 2–5 add server-rendered PDF, DUT inventory annexes, Cosign signature, and N-run comparison. See platform/test-plans/catalog.yaml for the test plan side.

Why Reports¶

Spirent CyberFlood and Ixia BreakingPoint both ship proprietary reports — black-box PDFs that operators trust because they came from a paid vendor. This project ships an open, deterministic report system designed to be demonstrably stronger than those:

Capability	Spirent / Ixia	This project
Cover page with run identity	✅	✅
Stable plan identifier across engagements	✅ (vendor-locked)	✅ (`CAP-FIND-KNEE-30M`, git-versioned)
Plan-snapshot hash to prove parameters were not edited post-run	❌	✅
Report-content hash for forensic chain-of-custody	❌	✅ (Phase 1)
DUT (NGFW + switch) inventory + sanitized config as annex	❌	🟡 Phase 3 — Nexus 9000 + NGFW
Independent TLS-decrypt mode evidence (issuer cert)	❌	✅ (probe in Phase 3 wiring)
Cryptographic signature on the PDF	❌	🟡 Phase 4 — Cosign
N-run comparison report (last 5 runs side by side)	⚠️ paid add-on	🟡 Phase 5
Reproducibility — replay a published report against your own NGFW	❌	🟡 Phase 5 — replay mode
Licensing notice on every page	❌	✅ Phase 1

Eleven forensic differentiators¶

The comparison table above covers the foundation. Beyond that, the report system adds eleven concrete capabilities that aim higher than the closed-source paid alternatives. Two have shipped; nine are scoped into Phases 2–5.

#	Capability	Status
1	Causal analysis automation — topology-aware correlation produces "fact → consequence → recommendation" sentences	✅ shipped in #177
2	TLS Decrypt Mode timeline — independent issuer-cert ground-truth probe; `TLSDecryptModeChanged` alert auto-invalidates results that span a state flip	✅ shipped in #180
3	Test-bed validity proof — per-window verdict (e.g. "results in window 14:00–14:23 are clean; 14:23–14:31 tainted by UCS-2 saturation")	🟡 Phase 5
4	Per-hop latency breakdown — agent→NGFW handshake / NGFW→persona handshake / TTFB / object load in separate columns, not just end-to-end	🟡 Phase 5
5	Per-archetype analysis — summary separated by skin / mock / har-replay / real-app — each archetype stresses a different NGFW path	🟡 Phase 5
6	Reproducibility manifest — git SHA, image digests SHA-256, applied sysctls per UCS, deployment mode, NGFW config fingerprint — the full recipe	🟡 Phase 4
7	Cryptographically signed PDF — Cosign keyless via the cluster signing key; optional Rekor/Sigstore transparency entry	🟡 Phase 4
8	Per-section confidence intervals — every metric ships with a CI based on sample size, not just an average	🟡 Phase 5
9	Detailed failure attribution — e.g. "47 of 12,847 errors were NGFW timeout, 12 were TLS handshake fail, 8 were conntrack overflow on the test-bed"	🟡 Phase 5
10	Replay snapshot — "open in viewer" link launches Grafana dashboards as they were at run time (Prometheus state frozen)	🟡 Phase 5
11	Forensic-grade prose narrative — executive summary in prose that walks the results with confidence levels — not just a table dump	🟡 Phase 5

Phase 1 — what shipped¶

The data API¶

GET /api/test-runs/{executionId}/report.json

Returns the canonical ReportData shape:

{
  "version": 1,
  "generatedAt": "2026-05-06T14:35:00.000Z",
  "reportSha256": "<64-char hex>",
  "meta": {
    "runId": "...",
    "executionId": "...",
    "planIdentifier": "CAP-FIND-KNEE-30M",
    "planDisplayName": "Capacity — Find the knee (30 min)",
    "planCatalogVersion": 1,
    "planSnapshotSha256": "<64-char hex>",
    "durationS": 1800,
    "startedAt": "...",
    "endedAt": "...",
    "outcome": "..."
  },
  "license": { "id": "LicenseRef-PolyForm-Noncommercial-1.0.0-with-Appendix-A", ... },
  "plan":     { "identifier": "...", "phases": [...], ... },
  "topology": { "deploymentMode": "tri-node", "ucsCount": 3, ... },
  "tlsDecrypt": { "activeAtStart": "on", "activeAtEnd": "on", ... },
  "results":  { "aggregate": { "p50_ms": 142, "p95_ms": 380, "p99_ms": 487, ... } },
  "slo":      { "targetP99Ms": 500, "observedP99Ms": 487, "pass": true, ... },
  "annexes":  [{ "id": "annex-b-nexus", "title": "...", "sha256": "...", "body": "..." }, ...]
}

X-Report-Sha256 and X-License headers carry the same hash + license-id for downstream tooling.

The print page¶

GET /runs/{executionId}/report

A server-rendered, print-styled HTML page with:

Cover — run id, plan, dates, SLO badge, license badge, report SHA-256 + plan-snapshot SHA-256
Licensing & Use Restrictions — full audience + field-of-use text in EN / PT-BR / ES on the page immediately after the cover
Executive Summary — aggregate KPIs, SLO pass/fail with budget-burn percentage
Test Plan Configuration — parameters table + phases timeline
Annexes — placeholders for Phase 3 (Nexus + NGFW inventory)
License footer — pinned to every printed page (the operator cannot strip it without re-rendering)

The page renders with @page A4 portrait margins and pages-counter footers. Browsers can "Save as PDF" today; Phase 2 swaps in Puppeteer for deterministic server-side rendering.

Forensic hashes already present in Phase 1¶

Hash	What it commits to
`reportSha256`	The full canonical JSON payload — proves the report data was not tampered with
`planSnapshotSha256`	The plan parameters frozen at run start — proves the plan was not edited mid-run
`annex.sha256` (per annex)	Each annex body — Phase 3 uses this for Nexus/NGFW config attestation

A reviewer who suspects PDF tampering can:

Download the original ReportData JSON via the API
Recompute SHA-256 of the canonical JSON
Compare against the value printed on the cover

If they match — the report is authentic. Phase 4 adds Cosign on top to make this trivially verifiable.

What Phase 2–5 add¶

Phase	Adds
2	Puppeteer renders this page server-side → `/api/test-runs/{id}/report.pdf` returns a real PDF
3	DUT Inventory Probe populates Annex B (Nexus 9000) + Annex C (NGFW) + Annex D (UCS chassis) with real model, S/N, sanitized running-config. Partially shipped: the DUT API foundation + 4 vendor adapters (Cisco FTD, Nexus, UCS CIMC Redfish, FortiGate) and snapshot/hash infrastructure are live since v4.0.0 (#199, #210); wiring the snapshots into the report Annex B/C/D is the remaining piece (PR-D in the v4.1 roadmap)
4	Cosign keyless OIDC signature on the PDF + Grafana state snapshot embed (the actual dashboards as they were during the run)
5	N-run comparison report (last 5 runs of the same plan, p50/p95/p99 side-by-side) + replay mode (download the catalog version + plan snapshot to reproduce the load shape elsewhere)

Operator workflow¶

Pick a plan from the catalog, kick off a run via the Dashboard
Run completes — test_run_executions row gets endedAt + outcome
Open /runs/{executionId}/report in a browser
Hit Print → Save as PDF (Phase 1) — or wait for Phase 2 and download the signed PDF directly
Distribute the PDF to authorized parties; the licensing footer + cover page remind them of the audience policy

Compared to commercial alternatives¶

The Ixia BreakingPoint report is a closed-format PDF rendered by a closed-source engine; the operator must trust the vendor that the numbers haven't been massaged. This system inverts that:

The data is a versioned JSON shape (ReportData) — anyone can re-parse it
The PDF is a print rendering of an HTML page — anyone can re-render it
The plan is a git-versioned YAML — anyone can confirm what was supposed to run
The actual config of the gear under test is embedded as a hashed annex (Phase 3)
The PDF will be cryptographically signed (Phase 4) — anyone can verify provenance

All while staying inside the audience policy of the license.

TEST_PLANS.md — the 15 catalog plans whose runs feed this report
TLS_DECRYPT_MODE_VERIFICATION.en.md — independent issuer probe used by Annex B/C
MONITORING_TEST_VALIDITY.md — alerts that say "the test bed itself was healthy"
USAGE_POLICY.md — audience restrictions explained