ADR 0014 — DOM (DUT Operating Mode) Production-Safe Family¶
- Status: Accepted (formalized 2026-05-12 with v3.7.0 — scaffolds shipping in v3.7.0)
- Date: 2026-05-10
- Deciders: TLSStress.Art project
- Targets: v5.0 (Phase 1 Materialization scaffolds: DOM-1..10 already merged Wave 5)
- Patent claim family: claims #1..#12 (DOM/OOBI/GATEWAY/RELAY)
Context¶
Through 2026-05 the bench grew from a lab-only TLS decryption tester to a multi-mode platform that customers want to point at production NGFWs. Production traffic is unforgiving: a misclick on a write-mode test can drop live customer flows.
Test types accumulated without a unifying gate: - Branch Office (data plane) - Inspection Profile (5 named profiles + custom 2^10) - SDWAN/CoR (DIA + IPSec) - BGP Saturation (control plane) - MAC/ARP stress (L2 capacity) - PURE Production URL Replay (post-Scope-Freeze)
Each has different production-blast-radius. Current ad-hoc safety is "operator should know better" — insufficient for pilot customers.
Decision¶
Add a DOM (DUT Operating Mode) discriminator that gates every state-changing operation against the operator's declared mode for that DUT. 5 modes; behavior matrix below.
The 5 modes¶
| Mode | Description | Destructive ops? | DDPB chain enforced? |
|---|---|---|---|
greenfield |
New DUT, no production traffic | yes | no |
staging |
Pre-prod mirror | yes (warn-only) | partial (warn-only) |
lab |
Dedicated lab DUT | yes (warn-only) | partial |
production |
Live customer DUT | NO — all blocked | YES — all 7 layers |
prod-partition |
Operator-quarantined slice | partial — explicit unlock | yes — bypass requires audit reason |
DOM family components¶
- DOM-1 — Discriminator: classifies the DUT into one of the 5 modes based on operator declaration + observed signals (live BGP sessions, syslog rate, persona test traffic absence).
- DOM-2 — CAE (Conditional Automation Engine): per-mode policy loader. Decides which operations are auto-approved vs require unlock vs hard-blocked.
- DOM-3 — PDD (Production Drift Detection): monitors for mode
drift (e.g. operator labeled
labbut BGP sessions appear) and alerts. - DOM-4 — CPOS schema: customizable per-test parameter overrides.
- DOM-5 — CPOS atomic 2-phase commit: cross-tier coordinated config push (TierAdapter interface — Personas + DUT + Agents commit together or all rollback). 60s self-healing rollback.
- DOM-6 — Profile Templates v2.0: full scenario state (config + expectations + thresholds) shareable across teams.
- DOM-7 — PIE family: RSM (Replay Session Mode) + IR (Idempotent Restore) + HID (Health Indicator Dashboard) + AAE (Auto-Approve Engine).
- DOM-8 — DUT Permission Indicators: Grafana dashboards visualizing per-DUT mode + active locks + recent CPOS commits.
- DOM-9 — SPP (Smart Profile Predictor): ML cortex sidecar recommends profile adjustments based on past run telemetry.
- DOM-10 — DDPB (Defense-in-Depth Production Blocking): 7-layer chain enforcing the production-mode hard-blocks (UI gate + API middleware + DB constraint + K8s admission webhook + RELAY pre-flight + DUT-side BTO check + audit trail before-and-after).
PIE-PA — special case for PURE in production¶
Per ADR 0021 (PURE), PIE-PA is the 3-layer defense MANDATORY in production-mode to mitigate MITM risk from personas hosting real public IPs (200.130.x.x):
- Pod scale-to-0 (bench personas down)
- BGP withdraw (bench prefixes withdrawn from upstream)
- DNS sanity check (real DNS resolves to real IPs, not bench)
DDPB chain layer 5 (RELAY pre-flight) gates PURE production runs on all 3 PIE-PA layers being green.
Architecture¶
Mode flow¶
operator → declares mode for DUT
↓
DOM-1 Discriminator
↓
CAE (DOM-2) loads policy for that mode
↓
every state-changing op → check policy
→ maybe require unlock
→ maybe block
→ always emit audit event
CPOS 2-phase commit (DOM-5)¶
phase 1 — PREPARE
TierAdapter[Personas].prepare(payload) → ack/nack
TierAdapter[DUT].prepare(payload) → ack/nack
TierAdapter[Agents].prepare(payload) → ack/nack
if any nack → abort, no state change
phase 2 — COMMIT
TierAdapter[Personas].commit() → snapshot
TierAdapter[DUT].commit() → snapshot
TierAdapter[Agents].commit() → snapshot
start 60s self-healing watchdog
watchdog — 60s
if any tier reports unhealthy → rollback all snapshots
if all healthy → finalize
Consequences¶
Pros¶
- Production safety: hard-block prevents misclicks on live DUTs
- Atomic cross-tier config: no partial state across Personas/DUT/Agents
- Audit trail: every mode transition + override logged for compliance
- ML cortex (SPP): operators get profile recommendations from past runs
- Patent moat: DOM/CPOS/PIE family = 12 of 17 claims
Cons / risks¶
- DDPB 7 layers = surface area; misconfigured layer = silent block
- 60s rollback window means transient failures may rollback recoverable config changes
- ML cortex (SPP) is opt-in; if disabled operators lose recommendation feature
Migration¶
- Default DUT mode for existing benches:
lab - Operators can re-classify per-DUT in /admin/dut/
/mode - DDPB chain rolled out behind a feature flag for first 2 releases
References¶
- Memory:
discuss_dom_dut_operating_mode_2026_05_10.md - Memory:
project_pending_master_inventory_2026_05_10.md - Code:
dashboard/src/lib/dom/(DOM-1..10 scaffolds, Wave 5) - Patent claims: #1..#12 (DOM/OOBI/GATEWAY/RELAY family)