ADR 0021 — PURE — Production URL Replay Engine + Discovery Hub + PVI/PVP + PIE-PA¶
- Status: Accepted (formalized 2026-05-12 with v3.7.0 — TS scaffolds Wave 4-5; Go enforcement layer shipping 2026-05-13 Tier-3 batch F at
pkg/pie-pa-executor/— patent claims #11/#12/#13 reduced-to-practice) - Date: 2026-05-10
- Deciders: TLSStress.Art project
- Targets: v5.x (Phase 1 Materialization scaffolds: PURE-1..10 already merged Wave 4-5)
- Patent claim family: claims #6..#13 (PURE + Discovery Hub + PVI/PVP + PIE-PA)
Context¶
The bench's Test Kinds 1-6 cover synthetic + lab scenarios. Customers asked: "Can you replay MY production URLs through MY DUT and validate that nothing breaks?"
Two pieces missing:
1. Discovery — sourcing the URL set from the customer's real
environment (Syslog / PCAP / HAR / API / Curated lists)
2. Production-safety — replaying real URLs against a production
DUT carries MITM risk because bench personas host real public IPs
(200.130.x.x range per project_ip_addressing_v43)
Decision¶
Introduce PURE (Production URL Replay Engine) as Test Kind #7, plus the Discovery Hub, PVI/PVP validators, and the PIE-PA 3-layer defense.
Discovery Hub — 8 sources¶
| # | Source | Origin |
|---|---|---|
| A | Syslog | NGFW syslog ingestion (Cisco FTD URL field parser) |
| B | Vendor API | PAN Cortex + Cisco SCC direct REST query |
| C | PCAP | gopacket-based URL extraction |
| D | HAR | browser-recorded session replay |
| E | Curated | pre-curated Tranco/Umbrella/Majestic monthly snapshot + cuts |
| F | SPAN | live mirror via MÓDULO SPAN.Art (richest source) |
| G | Cloud-derived | from operator OBP cloud-egress observations |
| H | KALI nmap import | operator-supplied scan results |
PVI — Pre-flight Validation, Ingestion-time¶
Every URL ingested to the Discovery Hub passes through PVI before hitting the test plan generator. PVI is run by CLONER fn #9 via ephemeral K6/Playwright probe pods, in 3-stage cascade:
Stage 1 — HEAD probe (cheap, ~50ms)
is the URL alive at all?
internet-direct via CLONER egress (NOT bench data plane)
drop URL if 4xx/5xx
Stage 2 — TLS handshake (~200ms)
cert valid? SNI match? TLS version negotiated?
drop URL if cert chain rejected
Stage 3 — Full HTTP fetch (~500ms-3s)
navigate via PW; capture protocol, response size, CSP
drop URL if heuristic fails (e.g. requires login wall)
K-anonymity ≥ 10 enforced across the PVI batch: URLs with fewer than 10 distinct customer Syslog mentions are dropped (privacy protection).
PVP — Pre-flight Validation, Pre-test¶
Right before each PURE run, PVP checks the DUT-delta scope: which URLs the DUT can plausibly inspect (vs URLs already cached by customer CDN that the DUT will never see).
PIE-PA — 3-layer defense for PURE in production¶
MANDATORY in production DOM mode (per ADR 0014). PURE replay against
production DUTs would otherwise risk MITM because personas serve
real public IPs:
Layer 1 — Pod scale-to-0
Bench persona pods (172.19.0.0/16 + 10.1.0.0/16 + 10.2.0.0/16)
scaled to 0 replicas. No bench artifact answering on real public IPs.
Layer 2 — BGP withdraw
Bench BGP advertisements (synthetic prefixes from MÓDULO BGP-1..4)
withdrawn from upstream peers. No path to bench from real Internet.
Layer 3 — DNS sanity check
External resolver (8.8.8.8) queried for each PURE URL.
Result MUST resolve to real-world IP, NOT bench (200.130.x.x).
If ANY URL resolves to bench → abort production PURE run.
DDPB chain layer 5 (RELAY pre-flight) gates production PURE runs on all 3 layers being green.
CLONER fn count: 9 (was 7)¶
PURE introduces 2 new CLONER functions: - fn #8 — Cloud proxy (used by OBP for semi-air-gap operator Internet egress) - fn #9 — PVI orchestrator (spins up K6/PW probe pods, drives the 3-stage validation cascade)
Updated CLONER catalog (per discuss_cloner_platform_2026_05_08):
1. Web clone
2. NTP server
3. Feedback channel
4. Catalog refresh
5. API discovery
6. Patch + TopURL fetch
7. Upgrade channel poll
8. Cloud proxy (NEW)
9. PVI orchestrator (NEW)
Architecture¶
Discovery → PVI → Test Plan flow¶
operator → Discovery Hub UI
→ picks source(s) A/B/C/D/E/F/G/H
→ batch URL list emitted
↓
CLONER fn #9 — PVI orchestrator
├ Stage 1 HEAD probe
├ Stage 2 TLS handshake
└ Stage 3 full HTTP fetch
↓
k-anonymity ≥ 10 filter
↓
Test Plan generator (test_kind=pure)
PURE run → PIE-PA gate (production mode)¶
operator → "run PURE" → DOM-aware check
→ if production:
├ PIE-PA Layer 1 (pod scale-to-0)
├ PIE-PA Layer 2 (BGP withdraw)
└ PIE-PA Layer 3 (DNS sanity)
all green? → run
any red? → abort + audit
→ PVP DUT-delta scope check
→ execute (replay URLs via PW/K6 agents → DUT)
→ restore (pods up, BGP re-advertise)
Consequences¶
Pros¶
- Test Kind #7 closes "real customer URL replay" customer ask
- 8-source Discovery Hub = differentiator vs single-source competitors
- PIE-PA = production-safe (patent moat)
- Cross-vendor URL parsing (Forti / PAN / CKP / Sophos) = breadth
- 13 patent claims in PURE family
Cons / risks¶
- PIE-PA layer 1 (pod scale-to-0) means bench is unavailable for other tests during a production PURE run (acceptable trade-off)
- Curated list refresh requires Internet (CLONER fn #4) — air-gap benches fall back to bundled Tranco snapshot
- K-anonymity ≥ 10 may eliminate small-customer-specific URLs (by design — privacy over completeness)
Compatibility¶
- Air-gap deployments: source E (curated bundled snapshot) only; sources A/B/F/G remain available with operator-supplied data
- Pre-Wave-4 deployments: Discovery Hub unavailable; manual URL list upload supported as fallback
References¶
- Memory:
discuss_pure_real_url_replay_2026_05_10.md - Memory:
discuss_oobi_immutable_gateway_art_2026_05_10.md(3 trust zones interplay with PVI internet egress) - Code:
dashboard/src/lib/pure/(PURE-1..10 scaffolds, Wave 4-5) - ADR cross-ref: 0014 (DOM modes — PIE-PA gates), 0019 (OOBI trust zones — CLONER egress trust), 0018 (Compliance — k-anonymity)
- Patent claims: #6..#13