Skip to content

PURE — Production URL Replay Engine — primer

Help Center primer for the Production URL Replay Engine (Test Kind #7). Pairs with ADR 0021.

What it does

Replays real production URLs from your environment through the DUT under test, so you can validate that the DUT correctly inspects / classifies / forwards traffic patterns your real users actually generate.

Distinct from synthetic test kinds (which test what we think your traffic looks like), PURE tests what your traffic actually looks like.

Discovery Hub — where the URLs come from

PURE has 8 sources for URL ingestion:

# Source Best when
A Syslog NGFW already in production with verbose URL logging
B Vendor API PAN Cortex / Cisco SCC available
C PCAP offline session captures (lab forensics)
D HAR browser-recorded user session
E Curated air-gap deployments — pre-bundled Tranco / Umbrella / Majestic
F SPAN SPAN.Art live mirror — richest source
G Cloud-derived OBP cloud-egress observations
H KALI nmap import operator scan results

You can pick one source or combine many. Each ingested URL goes through PVI before reaching the test plan.

PVI — Pre-flight Validation, Ingestion-time

Every ingested URL gets validated by CLONER fn #9 in 3 stages before it enters the test plan:

Stage 1 — HEAD probe (50ms)        ← drops dead URLs (4xx/5xx)
Stage 2 — TLS handshake (200ms)    ← drops cert errors / SNI mismatch
Stage 3 — Full HTTP fetch (500-3000ms) ← drops login-walls, etc.

K-anonymity ≥ 10 enforced: URLs mentioned in fewer than 10 distinct customer Syslog records are dropped (privacy protection — single- customer URL leakage is unacceptable).

PVP — Pre-flight Validation, Pre-test

Right before each PURE run, PVP scopes the URL set to what the DUT can plausibly inspect (vs URLs that customer CDN already cached so the DUT never sees them).

PIE-PA — production safety (MANDATORY)

If the DUT mode is production (per DOM ADR 0014), PIE-PA's 3-layer defense MUST pass before PURE will run:

  1. Pod scale-to-0 — bench persona pods (172.19.0.0/16 + 10.1.0.0/16 + 10.2.0.0/16) scaled to 0 replicas. No bench artifact answering on real public IPs.
  2. BGP withdraw — bench BGP advertisements (synthetic prefixes from MÓDULO BGP-1..4) withdrawn from upstream peers. No path to bench from real Internet.
  3. DNS sanity — external resolver (8.8.8.8) queried for each PURE URL. Result MUST resolve to real-world IP, NOT bench (200.130.x.x). If ANY URL resolves to bench → abort + audit.

Why? Bench personas serve real public IPs (200.130.x.x range). Without PIE-PA, replaying real customer URLs to the bench while the DUT inspects them creates a MITM-class risk: real users could hit bench artifacts thinking they're production.

Common workflows

Routine pre-deployment validation

  1. Ingest URLs from SYSLOG.Art Source A
  2. Wait for PVI to settle (a few minutes for ~10K URLs)
  3. Switch DUT mode to staging
  4. Run PURE
  5. Review report — any URL the DUT classified differently than production reference flags here

Air-gap monthly check

  1. Ingest URLs from Source E (bundled Tranco snapshot)
  2. PVI runs against the bundled set (no Internet needed)
  3. Run PURE in lab mode
  4. Review

Production validation (advanced)

  1. Confirm DUT mode is production
  2. Click "Run PURE" — system enforces PIE-PA gate
  3. Wait for all 3 PIE-PA layers green (bench pods scale-to-0, BGP withdraw, DNS sanity)
  4. PURE runs through bench → DUT → real-world destinations
  5. Restore: pods scale up, BGP re-advertise

Common questions

Why 8 sources instead of 1? Different sources expose different URL spaces. Syslog catches what the DUT already saw; SPAN catches what's on the wire (richer). Combining gives the most complete picture.

Can I disable PIE-PA in production? No. The 3 layers are hard-coded as the production gate. If you genuinely need to bypass, switch the DUT to prod-partition mode and use the explicit unlock window — but every byte will be audit-logged.

My PVI batch dropped 80% of URLs. Is that normal? PVI is intentionally aggressive. Real production URL sets are noisy (bots, CDN-cached responses, ad networks). 80% drop rate on a fresh ingestion is within expected range; if you want broader coverage, lower the k-anonymity threshold (default 10).

How big can a URL set be? Practical max ~100K URLs per run. Beyond that PVI becomes the bottleneck (each URL takes ~500ms-3s in stage 3).