PURE — Production URL Replay Engine — primer¶
Help Center primer for the Production URL Replay Engine (Test Kind #7). Pairs with ADR 0021.
What it does¶
Replays real production URLs from your environment through the DUT under test, so you can validate that the DUT correctly inspects / classifies / forwards traffic patterns your real users actually generate.
Distinct from synthetic test kinds (which test what we think your traffic looks like), PURE tests what your traffic actually looks like.
Discovery Hub — where the URLs come from¶
PURE has 8 sources for URL ingestion:
| # | Source | Best when |
|---|---|---|
| A | Syslog | NGFW already in production with verbose URL logging |
| B | Vendor API | PAN Cortex / Cisco SCC available |
| C | PCAP | offline session captures (lab forensics) |
| D | HAR | browser-recorded user session |
| E | Curated | air-gap deployments — pre-bundled Tranco / Umbrella / Majestic |
| F | SPAN | SPAN.Art live mirror — richest source |
| G | Cloud-derived | OBP cloud-egress observations |
| H | KALI nmap import | operator scan results |
You can pick one source or combine many. Each ingested URL goes through PVI before reaching the test plan.
PVI — Pre-flight Validation, Ingestion-time¶
Every ingested URL gets validated by CLONER fn #9 in 3 stages before it enters the test plan:
Stage 1 — HEAD probe (50ms) ← drops dead URLs (4xx/5xx)
Stage 2 — TLS handshake (200ms) ← drops cert errors / SNI mismatch
Stage 3 — Full HTTP fetch (500-3000ms) ← drops login-walls, etc.
K-anonymity ≥ 10 enforced: URLs mentioned in fewer than 10 distinct customer Syslog records are dropped (privacy protection — single- customer URL leakage is unacceptable).
PVP — Pre-flight Validation, Pre-test¶
Right before each PURE run, PVP scopes the URL set to what the DUT can plausibly inspect (vs URLs that customer CDN already cached so the DUT never sees them).
PIE-PA — production safety (MANDATORY)¶
If the DUT mode is production (per DOM ADR 0014), PIE-PA's
3-layer defense MUST pass before PURE will run:
- Pod scale-to-0 — bench persona pods (172.19.0.0/16 + 10.1.0.0/16 + 10.2.0.0/16) scaled to 0 replicas. No bench artifact answering on real public IPs.
- BGP withdraw — bench BGP advertisements (synthetic prefixes from MÓDULO BGP-1..4) withdrawn from upstream peers. No path to bench from real Internet.
- DNS sanity — external resolver (8.8.8.8) queried for each PURE URL. Result MUST resolve to real-world IP, NOT bench (200.130.x.x). If ANY URL resolves to bench → abort + audit.
Why? Bench personas serve real public IPs (200.130.x.x range). Without PIE-PA, replaying real customer URLs to the bench while the DUT inspects them creates a MITM-class risk: real users could hit bench artifacts thinking they're production.
Common workflows¶
Routine pre-deployment validation¶
- Ingest URLs from SYSLOG.Art Source A
- Wait for PVI to settle (a few minutes for ~10K URLs)
- Switch DUT mode to
staging - Run PURE
- Review report — any URL the DUT classified differently than production reference flags here
Air-gap monthly check¶
- Ingest URLs from Source E (bundled Tranco snapshot)
- PVI runs against the bundled set (no Internet needed)
- Run PURE in
labmode - Review
Production validation (advanced)¶
- Confirm DUT mode is
production - Click "Run PURE" — system enforces PIE-PA gate
- Wait for all 3 PIE-PA layers green (bench pods scale-to-0, BGP withdraw, DNS sanity)
- PURE runs through bench → DUT → real-world destinations
- Restore: pods scale up, BGP re-advertise
Common questions¶
Why 8 sources instead of 1? Different sources expose different URL spaces. Syslog catches what the DUT already saw; SPAN catches what's on the wire (richer). Combining gives the most complete picture.
Can I disable PIE-PA in production? No. The 3 layers are
hard-coded as the production gate. If you genuinely need to bypass,
switch the DUT to prod-partition mode and use the explicit
unlock window — but every byte will be audit-logged.
My PVI batch dropped 80% of URLs. Is that normal? PVI is intentionally aggressive. Real production URL sets are noisy (bots, CDN-cached responses, ad networks). 80% drop rate on a fresh ingestion is within expected range; if you want broader coverage, lower the k-anonymity threshold (default 10).
How big can a URL set be? Practical max ~100K URLs per run. Beyond that PVI becomes the bottleneck (each URL takes ~500ms-3s in stage 3).
Related¶
- ADR 0021 — design lock
- DOM modes primer
- SPAN.Art primer — Discovery Hub source #6