Skip to content

Test Plans — pre-configured load patterns

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions.

Status: 15 plans shipped. Catalog at platform/test-plans/catalog.yaml.

Why Test Plans

Every test run on this cluster needs three things to be comparable across engagements:

  1. A defined start and end — duration in minutes / hours, not "until the engineer says stop"
  2. A ramp-up methodology — agents do not hit full load instantly; they ramp over a defined timeline so the NGFW gets a chance to warm up
  3. A shared identifier — two engineers in different geographies can say "I ran CAP-FIND-KNEE-30M against this FTD, p99 inflected at 720 RPS" and the comparison is unambiguous

Commercial appliances (Spirent CyberFlood, Ixia BreakingPoint) have these concepts under names like "test profile" / "load specification" / "phase configuration". This project ships them as YAML in git — versionable, reviewable as code, replay-able byte-for-byte across runs.

The 15 plans

The catalog is grouped into 7 categories. Each plan has a stable identifier that appears verbatim in the test report.

Category # Identifier Use when
Baseline 1 BASELINE-SMOKE-5M First run after any cluster change
2 BASELINE-SLO-30M Establish the SLO reference baseline before any other run
Capacity 3 CAP-FIND-KNEE-30M Find the agent count where p99 inflects (the NGFW knee)
4 CAP-MAX-1H Sustain at 90% of the discovered knee for an hour
Stress 5 STR-OVERLOAD-15M Push to 2× capacity and observe failure modes
6 STR-CONNECTION-FLOOD-10M Maximum new-TLS-handshake rate (no session reuse)
Soak 7 SOAK-ENDURANCE-24H 24 h sustained at 70% capacity — detects leaks
8 SOAK-WEEKEND-72H 72 h with diurnal pattern — peak/trough cycles
Spike 9 SPIKE-RECOVERY-10X-2M 10× burst with measured recovery time
10 SPIKE-FLASH-CROWD-5M Asymmetric flash-crowd pattern (Black Friday style)
Mix 11 MIX-FULL-ARCHETYPES-1H Equal weight across skin/mock/HAR/real-app
12 MIX-CLONED-HEAVY-30M 70% to Cloned slots — realistic enterprise mix
Protocol 13 H3-ONLY-SESSION-PRESSURE-30M 100% HTTP/3, fresh session per request
14 H2-LONGLIVED-CONN-30M 100% HTTP/2 with long-lived multiplexed connections
Realistic 15 REPLAY-REAL-CUSTOMER-1H HAR-replay of a real customer trace

Anatomy of a plan

- identifier: CAP-FIND-KNEE-30M
  display_name: "Capacity  Find the knee (30 min)"
  description: "Step ramp from 50  1000 agents in 5 stages of 2 min each."
  duration_total: 30m
  phases:
    - { name: warm,   duration: 2m, pattern: linear, target: { playwright: 50,   k6: 100 } }
    - { name: s100,   duration: 4m, pattern: step,   target: { playwright: 100,  k6: 500 } }
    - { name: s250,   duration: 4m, pattern: step,   target: { playwright: 250,  k6: 1000 } }
    - { name: s500,   duration: 4m, pattern: step,   target: { playwright: 500,  k6: 2500 } }
    - { name: s750,   duration: 4m, pattern: step,   target: { playwright: 750,  k6: 5000 } }
    - { name: s1000,  duration: 8m, pattern: step,   target: { playwright: 1000, k6: 7500 } }
    - { name: rampdown, duration: 4m, pattern: linear, target: { playwright: 0, k6: 0 } }
  agent_target_strategy: round_robin_all
  persona_mix:
    synthetic_archetypes: { skin: 0.40, mock: 0.30, har: 0.20, real_app: 0.10 }
    synthetic_weight: 1.0
    cloned_weight: 0.0
  protocol_mix: { h2: 0.30, h3: 0.70 }
  cycle_interval_seconds: 3
  ngfw_state_required: decrypt-on
  slo_target_p99_seconds: 0.5

Fields explained

Field What it controls
identifier The contract. Stable forever. Cited in reports. Format: CATEGORY-CHARACTERISTIC-DURATION
display_name Human-friendly version for UI dropdowns
description Why an operator would pick this plan. Read by the picker UI
duration_total The total duration the operator promises to allocate. The engine refuses to schedule a plan whose phases exceed this
phases[] Ordered list of timeline phases
phases[].pattern linear (smooth ramp), exponential (fast accelerate), exponential_decay (slow decay), step (instant jump), hold (steady), sinusoidal (diurnal)
phases[].target Target agent counts at the END of the phase
phases[].target_min / target_max Used only with sinusoidal to define the oscillation envelope
agent_target_strategy How agents pick which persona to hit: round_robin_all / round_robin_first_5 / random_each_request / weighted_top10 / weighted_cloned_70pct / pinned_first_5 / har_replay_only
persona_mix.synthetic_archetypes Weights for the 4 archetypes (must sum to 1.0)
persona_mix.synthetic_weight + cloned_weight Split between the 20 Synthetic and the 10 Cloned slots (must sum to 1.0)
protocol_mix H2 vs H3 weights (must sum to 1.0)
cycle_interval_seconds How often each agent makes a request (0 = paced by HAR trace)
tls_session_reuse If false, every cycle creates a fresh TLS handshake — pressures the NGFW handshake engine
ngfw_state_required any / decrypt-on / decrypt-off. The engine refuses to start the run if the TLS Decrypt Mode Probe reports a different state
slo_target_p99_seconds The expected p99 for THIS plan. Used to compute pass/fail in the report
prerequisite Human-readable preconditions (e.g., "5 cloned slots must have a SITE_NAME assigned")

How to choose a plan

A typical engagement runs 3–5 plans in this order:

  1. BASELINE-SMOKE-5M — confirm the test bed is healthy
  2. BASELINE-SLO-30M — record the SLO reference numbers
  3. CAP-FIND-KNEE-30M — find the agent count where the NGFW knees over
  4. CAP-MAX-1H — confirm the discovered knee holds for an hour
  5. (Optional, if engagement allows time) SOAK-ENDURANCE-24H for leak detection, OR STR-OVERLOAD-15M for failure-mode discovery

For demos / PoVs you may stop at #3 — the knee number is the headline. For acceptance testing or capacity planning, run all five.

Cross-engagement comparison

The identifier is the cross-engagement contract. Two SEs in different countries running the same plan against different NGFWs MUST be comparing the same load shape. To make this safe:

  • Identifiers are immutable. A plan never changes its load shape after publication. To change a load shape, publish a new plan with a new identifier (e.g., CAP-FIND-KNEE-30M-v2)
  • The catalog is git-versioned. Every change is reviewable as code; a CI job checks identifiers stay stable
  • Reports cite the catalog version + the identifier: "Run executed CAP-FIND-KNEE-30M from catalog v1, plan-snapshot-hash sha256:abc.... NGFW = Cisco FTD 7.4. Result: knee at 720 concurrent agents, p99 = 487 ms"

API

The Dashboard syncs the catalog into Postgres at startup and exposes:

  • GET /api/test-plans/catalog — full list
  • GET /api/test-plans/{identifier} — single plan with phases

Both return JSON. No auth required for read; the catalog is public information about which plans exist (not what a run did).

Adding a new plan

  1. Append the new entry to platform/test-plans/catalog.yaml with a fresh identifier following the naming convention
  2. Bump the version: 1 at the top of the file (it's the catalog-version metadata)
  3. Open a PR — reviewers verify the identifier doesn't clash and the load shape is sensible
  4. On merge, every Dashboard pod re-syncs at next restart

Deprecating a plan

Set deprecated: true and (optionally) replaced_by: NEW-IDENTIFIER. The catalog API still returns the plan (so historical reports keep working), but the Dashboard picker hides it from "available plans" and surfaces it only as "view legacy plans".