Skip to content

Test Run Reports

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.

Status: Phase 1 shipped — print-styled HTML report with structured JSON. Phase 2–5 add server-rendered PDF, DUT inventory annexes, Cosign signature, and N-run comparison. See platform/test-plans/catalog.yaml for the test plan side.

Why Reports

Spirent CyberFlood and Ixia BreakingPoint both ship proprietary reports — black-box PDFs that operators trust because they came from a paid vendor. This project ships an open, deterministic report system designed to be demonstrably stronger than those:

Capability Spirent / Ixia This project
Cover page with run identity
Stable plan identifier across engagements ✅ (vendor-locked) ✅ (CAP-FIND-KNEE-30M, git-versioned)
Plan-snapshot hash to prove parameters were not edited post-run
Report-content hash for forensic chain-of-custody ✅ (Phase 1)
DUT (NGFW + switch) inventory + sanitized config as annex 🟡 Phase 3 — Nexus 9000 + NGFW
Independent TLS-decrypt mode evidence (issuer cert) ✅ (probe in Phase 3 wiring)
Cryptographic signature on the PDF 🟡 Phase 4 — Cosign
N-run comparison report (last 5 runs side by side) ⚠️ paid add-on 🟡 Phase 5
Reproducibility — replay a published report against your own NGFW 🟡 Phase 5 — replay mode
Licensing notice on every page ✅ Phase 1

Eleven forensic differentiators

The comparison table above covers the foundation. Beyond that, the report system adds eleven concrete capabilities that aim higher than the closed-source paid alternatives. Two have shipped; nine are scoped into Phases 2–5.

# Capability Status
1 Causal analysis automation — topology-aware correlation produces "fact → consequence → recommendation" sentences ✅ shipped in #177
2 TLS Decrypt Mode timeline — independent issuer-cert ground-truth probe; TLSDecryptModeChanged alert auto-invalidates results that span a state flip ✅ shipped in #180
3 Test-bed validity proof — per-window verdict (e.g. "results in window 14:00–14:23 are clean; 14:23–14:31 tainted by UCS-2 saturation") 🟡 Phase 5
4 Per-hop latency breakdown — agent→NGFW handshake / NGFW→persona handshake / TTFB / object load in separate columns, not just end-to-end 🟡 Phase 5
5 Per-archetype analysis — summary separated by skin / mock / har-replay / real-app — each archetype stresses a different NGFW path 🟡 Phase 5
6 Reproducibility manifest — git SHA, image digests SHA-256, applied sysctls per UCS, deployment mode, NGFW config fingerprint — the full recipe 🟡 Phase 4
7 Cryptographically signed PDF — Cosign keyless via the cluster signing key; optional Rekor/Sigstore transparency entry 🟡 Phase 4
8 Per-section confidence intervals — every metric ships with a CI based on sample size, not just an average 🟡 Phase 5
9 Detailed failure attribution — e.g. "47 of 12,847 errors were NGFW timeout, 12 were TLS handshake fail, 8 were conntrack overflow on the test-bed" 🟡 Phase 5
10 Replay snapshot — "open in viewer" link launches Grafana dashboards as they were at run time (Prometheus state frozen) 🟡 Phase 5
11 Forensic-grade prose narrative — executive summary in prose that walks the results with confidence levels — not just a table dump 🟡 Phase 5

Phase 1 — what shipped

The data API

GET /api/test-runs/{executionId}/report.json

Returns the canonical ReportData shape:

{
  "version": 1,
  "generatedAt": "2026-05-06T14:35:00.000Z",
  "reportSha256": "<64-char hex>",
  "meta": {
    "runId": "...",
    "executionId": "...",
    "planIdentifier": "CAP-FIND-KNEE-30M",
    "planDisplayName": "Capacity — Find the knee (30 min)",
    "planCatalogVersion": 1,
    "planSnapshotSha256": "<64-char hex>",
    "durationS": 1800,
    "startedAt": "...",
    "endedAt": "...",
    "outcome": "..."
  },
  "license": { "id": "LicenseRef-PolyForm-Noncommercial-1.0.0-with-Appendix-A", ... },
  "plan":     { "identifier": "...", "phases": [...], ... },
  "topology": { "deploymentMode": "tri-node", "ucsCount": 3, ... },
  "tlsDecrypt": { "activeAtStart": "on", "activeAtEnd": "on", ... },
  "results":  { "aggregate": { "p50_ms": 142, "p95_ms": 380, "p99_ms": 487, ... } },
  "slo":      { "targetP99Ms": 500, "observedP99Ms": 487, "pass": true, ... },
  "annexes":  [{ "id": "annex-b-nexus", "title": "...", "sha256": "...", "body": "..." }, ...]
}

X-Report-Sha256 and X-License headers carry the same hash + license-id for downstream tooling.

The print page

GET /runs/{executionId}/report

A server-rendered, print-styled HTML page with:

  • Cover — run id, plan, dates, SLO badge, license badge, report SHA-256 + plan-snapshot SHA-256
  • Licensing & Use Restrictions — full audience + field-of-use text in EN / PT-BR / ES on the page immediately after the cover
  • Executive Summary — aggregate KPIs, SLO pass/fail with budget-burn percentage
  • Test Plan Configuration — parameters table + phases timeline
  • Annexes — placeholders for Phase 3 (Nexus + NGFW inventory)
  • License footer — pinned to every printed page (the operator cannot strip it without re-rendering)

The page renders with @page A4 portrait margins and pages-counter footers. Browsers can "Save as PDF" today; Phase 2 swaps in Puppeteer for deterministic server-side rendering.

Forensic hashes already present in Phase 1

Hash What it commits to
reportSha256 The full canonical JSON payload — proves the report data was not tampered with
planSnapshotSha256 The plan parameters frozen at run start — proves the plan was not edited mid-run
annex.sha256 (per annex) Each annex body — Phase 3 uses this for Nexus/NGFW config attestation

A reviewer who suspects PDF tampering can:

  1. Download the original ReportData JSON via the API
  2. Recompute SHA-256 of the canonical JSON
  3. Compare against the value printed on the cover

If they match — the report is authentic. Phase 4 adds Cosign on top to make this trivially verifiable.

What Phase 2–5 add

Phase Adds
2 Puppeteer renders this page server-side → /api/test-runs/{id}/report.pdf returns a real PDF
3 DUT Inventory Probe populates Annex B (Nexus 9000) + Annex C (NGFW) + Annex D (UCS chassis) with real model, S/N, sanitized running-config. Partially shipped: the DUT API foundation + 4 vendor adapters (Cisco FTD, Nexus, UCS CIMC Redfish, FortiGate) and snapshot/hash infrastructure are live since v4.0.0 (#199, #210); wiring the snapshots into the report Annex B/C/D is the remaining piece (PR-D in the v4.1 roadmap)
4 Cosign keyless OIDC signature on the PDF + Grafana state snapshot embed (the actual dashboards as they were during the run)
5 N-run comparison report (last 5 runs of the same plan, p50/p95/p99 side-by-side) + replay mode (download the catalog version + plan snapshot to reproduce the load shape elsewhere)

Operator workflow

  1. Pick a plan from the catalog, kick off a run via the Dashboard
  2. Run completes — test_run_executions row gets endedAt + outcome
  3. Open /runs/{executionId}/report in a browser
  4. Hit Print → Save as PDF (Phase 1) — or wait for Phase 2 and download the signed PDF directly
  5. Distribute the PDF to authorized parties; the licensing footer + cover page remind them of the audience policy

Compared to commercial alternatives

The Ixia BreakingPoint report is a closed-format PDF rendered by a closed-source engine; the operator must trust the vendor that the numbers haven't been massaged. This system inverts that:

  • The data is a versioned JSON shape (ReportData) — anyone can re-parse it
  • The PDF is a print rendering of an HTML page — anyone can re-render it
  • The plan is a git-versioned YAML — anyone can confirm what was supposed to run
  • The actual config of the gear under test is embedded as a hashed annex (Phase 3)
  • The PDF will be cryptographically signed (Phase 4) — anyone can verify provenance

All while staying inside the audience policy of the license.