Test Plans — pre-configured load patterns¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.
Status: 15 plans shipped. Catalog at
platform/test-plans/catalog.yaml.
Why Test Plans¶
Every test run on this cluster needs three things to be comparable across engagements:
- A defined start and end — duration in minutes / hours, not "until the engineer says stop"
- A ramp-up methodology — agents do not hit full load instantly; they ramp over a defined timeline so the NGFW gets a chance to warm up
- A shared identifier — two engineers in different geographies can say "I ran
CAP-FIND-KNEE-30Magainst this FTD, p99 inflected at 720 RPS" and the comparison is unambiguous
Commercial appliances (Spirent CyberFlood, Ixia BreakingPoint) have these concepts under names like "test profile" / "load specification" / "phase configuration". This project ships them as YAML in git — versionable, reviewable as code, replay-able byte-for-byte across runs.
The 15 plans¶
The catalog is grouped into 7 categories. Each plan has a stable identifier that appears verbatim in the test report.
| Category | # | Identifier | Use when |
|---|---|---|---|
| Baseline | 1 | BASELINE-SMOKE-5M |
First run after any cluster change |
| 2 | BASELINE-SLO-30M |
Establish the SLO reference baseline before any other run | |
| Capacity | 3 | CAP-FIND-KNEE-30M |
Find the agent count where p99 inflects (the NGFW knee) |
| 4 | CAP-MAX-1H |
Sustain at 90% of the discovered knee for an hour | |
| Stress | 5 | STR-OVERLOAD-15M |
Push to 2× capacity and observe failure modes |
| 6 | STR-CONNECTION-FLOOD-10M |
Maximum new-TLS-handshake rate (no session reuse) | |
| Soak | 7 | SOAK-ENDURANCE-24H |
24 h sustained at 70% capacity — detects leaks |
| 8 | SOAK-WEEKEND-72H |
72 h with diurnal pattern — peak/trough cycles | |
| Spike | 9 | SPIKE-RECOVERY-10X-2M |
10× burst with measured recovery time |
| 10 | SPIKE-FLASH-CROWD-5M |
Asymmetric flash-crowd pattern (Black Friday style) | |
| Mix | 11 | MIX-FULL-ARCHETYPES-1H |
Equal weight across skin/mock/HAR/real-app |
| 12 | MIX-CLONED-HEAVY-30M |
70% to Cloned slots — realistic enterprise mix | |
| Protocol | 13 | H3-ONLY-SESSION-PRESSURE-30M |
100% HTTP/3, fresh session per request |
| 14 | H2-LONGLIVED-CONN-30M |
100% HTTP/2 with long-lived multiplexed connections | |
| Realistic | 15 | REPLAY-REAL-CUSTOMER-1H |
HAR-replay of a real customer trace |
Anatomy of a plan¶
- identifier: CAP-FIND-KNEE-30M
display_name: "Capacity — Find the knee (30 min)"
description: "Step ramp from 50 → 1000 agents in 5 stages of 2 min each."
duration_total: 30m
phases:
- { name: warm, duration: 2m, pattern: linear, target: { playwright: 50, k6: 100 } }
- { name: s100, duration: 4m, pattern: step, target: { playwright: 100, k6: 500 } }
- { name: s250, duration: 4m, pattern: step, target: { playwright: 250, k6: 1000 } }
- { name: s500, duration: 4m, pattern: step, target: { playwright: 500, k6: 2500 } }
- { name: s750, duration: 4m, pattern: step, target: { playwright: 750, k6: 5000 } }
- { name: s1000, duration: 8m, pattern: step, target: { playwright: 1000, k6: 7500 } }
- { name: rampdown, duration: 4m, pattern: linear, target: { playwright: 0, k6: 0 } }
agent_target_strategy: round_robin_all
persona_mix:
synthetic_archetypes: { skin: 0.40, mock: 0.30, har: 0.20, real_app: 0.10 }
synthetic_weight: 1.0
cloned_weight: 0.0
protocol_mix: { h2: 0.30, h3: 0.70 }
cycle_interval_seconds: 3
ngfw_state_required: decrypt-on
slo_target_p99_seconds: 0.5
Fields explained¶
| Field | What it controls |
|---|---|
identifier |
The contract. Stable forever. Cited in reports. Format: CATEGORY-CHARACTERISTIC-DURATION |
display_name |
Human-friendly version for UI dropdowns |
description |
Why an operator would pick this plan. Read by the picker UI |
duration_total |
The total duration the operator promises to allocate. The engine refuses to schedule a plan whose phases exceed this |
phases[] |
Ordered list of timeline phases |
phases[].pattern |
linear (smooth ramp), exponential (fast accelerate), exponential_decay (slow decay), step (instant jump), hold (steady), sinusoidal (diurnal) |
phases[].target |
Target agent counts at the END of the phase |
phases[].target_min / target_max |
Used only with sinusoidal to define the oscillation envelope |
agent_target_strategy |
How agents pick which persona to hit: round_robin_all / round_robin_first_5 / random_each_request / weighted_top10 / weighted_cloned_70pct / pinned_first_5 / har_replay_only |
persona_mix.synthetic_archetypes |
Weights for the 4 archetypes (must sum to 1.0) |
persona_mix.synthetic_weight + cloned_weight |
Split between the 20 Synthetic and the 10 Cloned slots (must sum to 1.0) |
protocol_mix |
H2 vs H3 weights (must sum to 1.0) |
cycle_interval_seconds |
How often each agent makes a request (0 = paced by HAR trace) |
tls_session_reuse |
If false, every cycle creates a fresh TLS handshake — pressures the NGFW handshake engine |
ngfw_state_required |
any / decrypt-on / decrypt-off. The engine refuses to start the run if the TLS Decrypt Mode Probe reports a different state |
slo_target_p99_seconds |
The expected p99 for THIS plan. Used to compute pass/fail in the report |
prerequisite |
Human-readable preconditions (e.g., "5 cloned slots must have a SITE_NAME assigned") |
How to choose a plan¶
A typical engagement runs 3–5 plans in this order:
BASELINE-SMOKE-5M— confirm the test bed is healthyBASELINE-SLO-30M— record the SLO reference numbersCAP-FIND-KNEE-30M— find the agent count where the NGFW knees overCAP-MAX-1H— confirm the discovered knee holds for an hour- (Optional, if engagement allows time)
SOAK-ENDURANCE-24Hfor leak detection, ORSTR-OVERLOAD-15Mfor failure-mode discovery
For demos / PoVs you may stop at #3 — the knee number is the headline. For acceptance testing or capacity planning, run all five.
Cross-engagement comparison¶
The identifier is the cross-engagement contract. Two SEs in different countries running the same plan against different NGFWs MUST be comparing the same load shape. To make this safe:
- Identifiers are immutable. A plan never changes its load shape after publication. To change a load shape, publish a new plan with a new identifier (e.g.,
CAP-FIND-KNEE-30M-v2) - The catalog is git-versioned. Every change is reviewable as code; a CI job checks identifiers stay stable
- Reports cite the catalog version + the identifier: "Run executed
CAP-FIND-KNEE-30Mfrom catalog v1, plan-snapshot-hashsha256:abc.... NGFW = Cisco FTD 7.4. Result: knee at 720 concurrent agents, p99 = 487 ms"
API¶
The Dashboard syncs the catalog into Postgres at startup and exposes:
GET /api/test-plans/catalog— full listGET /api/test-plans/{identifier}— single plan with phases
Both return JSON. No auth required for read; the catalog is public information about which plans exist (not what a run did).
Adding a new plan¶
- Append the new entry to
platform/test-plans/catalog.yamlwith a fresh identifier following the naming convention - Bump the
version: 1at the top of the file (it's the catalog-version metadata) - Open a PR — reviewers verify the identifier doesn't clash and the load shape is sensible
- On merge, every Dashboard pod re-syncs at next restart
Deprecating a plan¶
Set deprecated: true and (optionally) replaced_by: NEW-IDENTIFIER. The catalog API still returns the plan (so historical reports keep working), but the Dashboard picker hides it from "available plans" and surfaces it only as "view legacy plans".
Related¶
platform/test-plans/catalog.yaml— the catalog source of truthMONITORING_TEST_VALIDITY.md— alerts that confirm the test bed itself is not the bottleneckTLS_DECRYPT_MODE_VERIFICATION.en.md—ngfw_state_requiredenforcementTRACING.md— distributed tracing during runs