DOM (DUT Operating Mode) — primer¶

Help Center primer for the DOM family — the safety system that gates production-blocking operations. Pairs with ADR 0014.

What it does¶

Every DUT in the bench is classified into one of 5 operating modes. Each mode has a different safety profile — production mode hard- blocks any state-changing operation that could disturb live customer traffic.

This is the safety net that makes pointing the bench at a real customer NGFW a sane thing to do.

The 5 modes¶

Mode	When to use	Destructive ops?
`greenfield`	Brand-new DUT, no production traffic, free to break	yes
`staging`	Pre-prod mirror — high-realism but can recover	yes (warn-only)
`lab`	Dedicated lab DUT (your bench's home turf)	yes (warn-only)
`production`	Live customer DUT carrying real flows	NO — blocked
`prod-partition`	Production DUT, but operator carved a quarantined slice	partial — explicit unlock per op

Default for new DUTs added to inventory: lab.

How to set / change a DUT's mode¶

Dashboard: /admin/dut/<id>/mode → dropdown picker. Mode changes are audit-logged.

CLI: kubectl annotate dut <name> tlsstress.art/dom-mode=staging (emergency only — UI is preferred so the audit trail is rich).

What gets blocked in `production` mode¶

The DDPB chain (Defense in Depth Production Blocking) enforces 7 layers:

UI gate — destructive buttons are greyed out
API middleware — POST/PUT/DELETE requests rejected with 403
DB constraint — write attempts to test-plan tables rejected
K8s admission webhook — pod scaling / config-map writes blocked
RELAY pre-flight — RELAY.Art refuses to forward write commands
DUT-side BTO check — bidirectional trust orchestration confirms no overlapping intent
Audit trail before-and-after — every blocked attempt logged

Layers 1-4 fail-fast at the operator's keyboard; layers 5-7 catch edge cases the operator-side missed.

What still works in `production`¶

All read-only operations (show-version, telemetry, observation)
PURE Test Kind (Production URL Replay) with PIE-PA gate green
KALI / DoYour offensive tools? No — explicitly hard-blocked
BGP saturation against the production DUT? No — needs unlock

Drift detection (PDD)¶

PDD watches for mode drift: if you labeled the DUT lab but the DUT shows live BGP sessions or production-rate syslog, PDD raises a warning and offers to re-classify.

CPOS — atomic 2-phase commit¶

When you push a config change that touches multiple tiers (Personas + DUT + Agents), CPOS coordinates:

PREPARE phase
  every tier acks "can do" → green light
  any nack → abort, no state change

COMMIT phase
  every tier snapshots + applies
  60s self-healing watchdog
  any tier unhealthy → rollback all snapshots

This means a partial config push cannot leave the bench in a half-applied state.

PIE family¶

Auxiliary subsystems that work alongside DOM:

RSM (Replay Session Mode) — replay a past test exactly
IR (Idempotent Restore) — restore checkpoint w/o side-effects
HID (Health Indicator Dashboard) — operator's at-a-glance view
AAE (Auto-Approve Engine) — pre-approves low-risk operations per per-mode policy

Common questions¶

Can I disable DDPB to get something through? No. The chain is designed to be tamper-resistant — every layer logs to a separate audit channel. If you legitimately need to bypass, switch the DUT to prod-partition and use the explicit unlock window.

Does PDD ever auto-reclassify? No — it only suggests. Operators must confirm.

My team has 50 DUTs. Do I need to set mode 50 times? Modes can be inherited from a template (Profile Templates v2.0, DOM-6).