Test Plans — pre-configured load patterns¶

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.

Status: 15 plans shipped. Catalog at platform/test-plans/catalog.yaml.

Why Test Plans¶

Every test run on this cluster needs three things to be comparable across engagements:

A defined start and end — duration in minutes / hours, not "until the engineer says stop"
A ramp-up methodology — agents do not hit full load instantly; they ramp over a defined timeline so the NGFW gets a chance to warm up
A shared identifier — two engineers in different geographies can say "I ran CAP-FIND-KNEE-30M against this FTD, p99 inflected at 720 RPS" and the comparison is unambiguous

Commercial appliances (Spirent CyberFlood, Ixia BreakingPoint) have these concepts under names like "test profile" / "load specification" / "phase configuration". This project ships them as YAML in git — versionable, reviewable as code, replay-able byte-for-byte across runs.

The 15 plans¶

The catalog is grouped into 7 categories. Each plan has a stable identifier that appears verbatim in the test report.

Category	#	Identifier	Use when
Baseline	1	`BASELINE-SMOKE-5M`	First run after any cluster change
	2	`BASELINE-SLO-30M`	Establish the SLO reference baseline before any other run
Capacity	3	`CAP-FIND-KNEE-30M`	Find the agent count where p99 inflects (the NGFW knee)
	4	`CAP-MAX-1H`	Sustain at 90% of the discovered knee for an hour
Stress	5	`STR-OVERLOAD-15M`	Push to 2× capacity and observe failure modes
	6	`STR-CONNECTION-FLOOD-10M`	Maximum new-TLS-handshake rate (no session reuse)
Soak	7	`SOAK-ENDURANCE-24H`	24 h sustained at 70% capacity — detects leaks
	8	`SOAK-WEEKEND-72H`	72 h with diurnal pattern — peak/trough cycles
Spike	9	`SPIKE-RECOVERY-10X-2M`	10× burst with measured recovery time
	10	`SPIKE-FLASH-CROWD-5M`	Asymmetric flash-crowd pattern (Black Friday style)
Mix	11	`MIX-FULL-ARCHETYPES-1H`	Equal weight across skin/mock/HAR/real-app
	12	`MIX-CLONED-HEAVY-30M`	70% to Cloned slots — realistic enterprise mix
Protocol	13	`H3-ONLY-SESSION-PRESSURE-30M`	100% HTTP/3, fresh session per request
	14	`H2-LONGLIVED-CONN-30M`	100% HTTP/2 with long-lived multiplexed connections
Realistic	15	`REPLAY-REAL-CUSTOMER-1H`	HAR-replay of a real customer trace

Anatomy of a plan¶

- identifier: CAP-FIND-KNEE-30M
  display_name: "Capacity — Find the knee (30 min)"
  description: "Step ramp from 50 → 1000 agents in 5 stages of 2 min each."
  duration_total: 30m
  phases:
    - { name: warm,   duration: 2m, pattern: linear, target: { playwright: 50,   k6: 100 } }
    - { name: s100,   duration: 4m, pattern: step,   target: { playwright: 100,  k6: 500 } }
    - { name: s250,   duration: 4m, pattern: step,   target: { playwright: 250,  k6: 1000 } }
    - { name: s500,   duration: 4m, pattern: step,   target: { playwright: 500,  k6: 2500 } }
    - { name: s750,   duration: 4m, pattern: step,   target: { playwright: 750,  k6: 5000 } }
    - { name: s1000,  duration: 8m, pattern: step,   target: { playwright: 1000, k6: 7500 } }
    - { name: rampdown, duration: 4m, pattern: linear, target: { playwright: 0, k6: 0 } }
  agent_target_strategy: round_robin_all
  persona_mix:
    synthetic_archetypes: { skin: 0.40, mock: 0.30, har: 0.20, real_app: 0.10 }
    synthetic_weight: 1.0
    cloned_weight: 0.0
  protocol_mix: { h2: 0.30, h3: 0.70 }
  cycle_interval_seconds: 3
  ngfw_state_required: decrypt-on
  slo_target_p99_seconds: 0.5

Fields explained¶

Field	What it controls
`identifier`	The contract. Stable forever. Cited in reports. Format: `CATEGORY-CHARACTERISTIC-DURATION`
`display_name`	Human-friendly version for UI dropdowns
`description`	Why an operator would pick this plan. Read by the picker UI
`duration_total`	The total duration the operator promises to allocate. The engine refuses to schedule a plan whose phases exceed this
`phases[]`	Ordered list of timeline phases
`phases[].pattern`	`linear` (smooth ramp), `exponential` (fast accelerate), `exponential_decay` (slow decay), `step` (instant jump), `hold` (steady), `sinusoidal` (diurnal)
`phases[].target`	Target agent counts at the END of the phase
`phases[].target_min` / `target_max`	Used only with `sinusoidal` to define the oscillation envelope
`agent_target_strategy`	How agents pick which persona to hit: `round_robin_all` / `round_robin_first_5` / `random_each_request` / `weighted_top10` / `weighted_cloned_70pct` / `pinned_first_5` / `har_replay_only`
`persona_mix.synthetic_archetypes`	Weights for the 4 archetypes (must sum to 1.0)
`persona_mix.synthetic_weight` + `cloned_weight`	Split between the 20 Synthetic and the 10 Cloned slots (must sum to 1.0)
`protocol_mix`	H2 vs H3 weights (must sum to 1.0)
`cycle_interval_seconds`	How often each agent makes a request (0 = paced by HAR trace)
`tls_session_reuse`	If false, every cycle creates a fresh TLS handshake — pressures the NGFW handshake engine
`ngfw_state_required`	`any` / `decrypt-on` / `decrypt-off`. The engine refuses to start the run if the TLS Decrypt Mode Probe reports a different state
`slo_target_p99_seconds`	The expected p99 for THIS plan. Used to compute pass/fail in the report
`prerequisite`	Human-readable preconditions (e.g., "5 cloned slots must have a SITE_NAME assigned")

How to choose a plan¶

A typical engagement runs 3–5 plans in this order:

BASELINE-SMOKE-5M — confirm the test bed is healthy
BASELINE-SLO-30M — record the SLO reference numbers
CAP-FIND-KNEE-30M — find the agent count where the NGFW knees over
CAP-MAX-1H — confirm the discovered knee holds for an hour
(Optional, if engagement allows time) SOAK-ENDURANCE-24H for leak detection, OR STR-OVERLOAD-15M for failure-mode discovery

For demos / PoVs you may stop at #3 — the knee number is the headline. For acceptance testing or capacity planning, run all five.

Cross-engagement comparison¶

The identifier is the cross-engagement contract. Two SEs in different countries running the same plan against different NGFWs MUST be comparing the same load shape. To make this safe:

Identifiers are immutable. A plan never changes its load shape after publication. To change a load shape, publish a new plan with a new identifier (e.g., CAP-FIND-KNEE-30M-v2)
The catalog is git-versioned. Every change is reviewable as code; a CI job checks identifiers stay stable
Reports cite the catalog version + the identifier: "Run executed CAP-FIND-KNEE-30M from catalog v1, plan-snapshot-hash sha256:abc.... NGFW = Cisco FTD 7.4. Result: knee at 720 concurrent agents, p99 = 487 ms"

API¶

The Dashboard syncs the catalog into Postgres at startup and exposes:

GET /api/test-plans/catalog — full list
GET /api/test-plans/{identifier} — single plan with phases

Both return JSON. No auth required for read; the catalog is public information about which plans exist (not what a run did).

Adding a new plan¶

Append the new entry to platform/test-plans/catalog.yaml with a fresh identifier following the naming convention
Bump the version: 1 at the top of the file (it's the catalog-version metadata)
Open a PR — reviewers verify the identifier doesn't clash and the load shape is sensible
On merge, every Dashboard pod re-syncs at next restart

Deprecating a plan¶

Set deprecated: true and (optionally) replaced_by: NEW-IDENTIFIER. The catalog API still returns the plan (so historical reports keep working), but the Dashboard picker hides it from "available plans" and surfaces it only as "view legacy plans".

platform/test-plans/catalog.yaml — the catalog source of truth
MONITORING_TEST_VALIDITY.md — alerts that confirm the test bed itself is not the bottleneck
TLS_DECRYPT_MODE_VERIFICATION.en.md — ngfw_state_required enforcement
TRACING.md — distributed tracing during runs