ADR 0014 — DOM (DUT Operating Mode) Production-Safe Family¶

Status: Accepted (formalized 2026-05-12 with v3.7.0 — scaffolds shipping in v3.7.0)
Date: 2026-05-10
Deciders: TLSStress.Art project
Targets: v5.0 (Phase 1 Materialization scaffolds: DOM-1..10 already merged Wave 5)
Patent claim family: claims #1..#12 (DOM/OOBI/GATEWAY/RELAY)

Context¶

Through 2026-05 the bench grew from a lab-only TLS decryption tester to a multi-mode platform that customers want to point at production NGFWs. Production traffic is unforgiving: a misclick on a write-mode test can drop live customer flows.

Test types accumulated without a unifying gate: - Branch Office (data plane) - Inspection Profile (5 named profiles + custom 2^10) - SDWAN/CoR (DIA + IPSec) - BGP Saturation (control plane) - MAC/ARP stress (L2 capacity) - PURE Production URL Replay (post-Scope-Freeze)

Each has different production-blast-radius. Current ad-hoc safety is "operator should know better" — insufficient for pilot customers.

Decision¶

Add a DOM (DUT Operating Mode) discriminator that gates every state-changing operation against the operator's declared mode for that DUT. 5 modes; behavior matrix below.

The 5 modes¶

Mode	Description	Destructive ops?	DDPB chain enforced?
`greenfield`	New DUT, no production traffic	yes	no
`staging`	Pre-prod mirror	yes (warn-only)	partial (warn-only)
`lab`	Dedicated lab DUT	yes (warn-only)	partial
`production`	Live customer DUT	NO — all blocked	YES — all 7 layers
`prod-partition`	Operator-quarantined slice	partial — explicit unlock	yes — bypass requires audit reason

DOM family components¶

DOM-1 — Discriminator: classifies the DUT into one of the 5 modes based on operator declaration + observed signals (live BGP sessions, syslog rate, persona test traffic absence).
DOM-2 — CAE (Conditional Automation Engine): per-mode policy loader. Decides which operations are auto-approved vs require unlock vs hard-blocked.
DOM-3 — PDD (Production Drift Detection): monitors for mode drift (e.g. operator labeled lab but BGP sessions appear) and alerts.
DOM-4 — CPOS schema: customizable per-test parameter overrides.
DOM-5 — CPOS atomic 2-phase commit: cross-tier coordinated config push (TierAdapter interface — Personas + DUT + Agents commit together or all rollback). 60s self-healing rollback.
DOM-6 — Profile Templates v2.0: full scenario state (config + expectations + thresholds) shareable across teams.
DOM-7 — PIE family: RSM (Replay Session Mode) + IR (Idempotent Restore) + HID (Health Indicator Dashboard) + AAE (Auto-Approve Engine).
DOM-8 — DUT Permission Indicators: Grafana dashboards visualizing per-DUT mode + active locks + recent CPOS commits.
DOM-9 — SPP (Smart Profile Predictor): ML cortex sidecar recommends profile adjustments based on past run telemetry.
DOM-10 — DDPB (Defense-in-Depth Production Blocking): 7-layer chain enforcing the production-mode hard-blocks (UI gate + API middleware + DB constraint + K8s admission webhook + RELAY pre-flight + DUT-side BTO check + audit trail before-and-after).

PIE-PA — special case for PURE in production¶

Per ADR 0021 (PURE), PIE-PA is the 3-layer defense MANDATORY in production-mode to mitigate MITM risk from personas hosting real public IPs (200.130.x.x):

Pod scale-to-0 (bench personas down)
BGP withdraw (bench prefixes withdrawn from upstream)
DNS sanity check (real DNS resolves to real IPs, not bench)

DDPB chain layer 5 (RELAY pre-flight) gates PURE production runs on all 3 PIE-PA layers being green.

Architecture¶

Mode flow¶

operator → declares mode for DUT
        ↓
  DOM-1 Discriminator
        ↓
  CAE (DOM-2) loads policy for that mode
        ↓
  every state-changing op → check policy
                          → maybe require unlock
                          → maybe block
                          → always emit audit event

CPOS 2-phase commit (DOM-5)¶

phase 1 — PREPARE
  TierAdapter[Personas].prepare(payload) → ack/nack
  TierAdapter[DUT].prepare(payload)      → ack/nack
  TierAdapter[Agents].prepare(payload)   → ack/nack
  if any nack → abort, no state change

phase 2 — COMMIT
  TierAdapter[Personas].commit() → snapshot
  TierAdapter[DUT].commit()      → snapshot
  TierAdapter[Agents].commit()   → snapshot
  start 60s self-healing watchdog

watchdog — 60s
  if any tier reports unhealthy → rollback all snapshots
  if all healthy → finalize

Consequences¶

Pros¶

Production safety: hard-block prevents misclicks on live DUTs
Atomic cross-tier config: no partial state across Personas/DUT/Agents
Audit trail: every mode transition + override logged for compliance
ML cortex (SPP): operators get profile recommendations from past runs
Patent moat: DOM/CPOS/PIE family = 12 of 17 claims

Cons / risks¶

DDPB 7 layers = surface area; misconfigured layer = silent block
60s rollback window means transient failures may rollback recoverable config changes
ML cortex (SPP) is opt-in; if disabled operators lose recommendation feature

Migration¶

Default DUT mode for existing benches: lab
Operators can re-classify per-DUT in /admin/dut//mode
DDPB chain rolled out behind a feature flag for first 2 releases

References¶

Memory: discuss_dom_dut_operating_mode_2026_05_10.md
Memory: project_pending_master_inventory_2026_05_10.md
Code: dashboard/src/lib/dom/ (DOM-1..10 scaffolds, Wave 5)
Patent claims: #1..#12 (DOM/OOBI/GATEWAY/RELAY family)