Skip to content

Paid Customer Journey — Low-Level Design (LLD)

Scope. The exact end-to-end journey of a paying tlsstress.art customer: signup → verify-email → sign-in/MFA → onboarding-mint → Stripe checkout → webhook → onboarding saga → callback → active → TBI download → on-prem install → metering. Every step below cites the file that implements it. No invented endpoints, flags, or behaviours — read the cited code alongside this document.

Authoritative ADRs. ADR 0056 (Octopus Customer Auto-Provisioning), ADR 0099 (Token Economy v5 / UTXO ledger), ADR 0104 (Usage Attestation / anti-fraud).


1. Components

Component Path Role
customer-app pkg/octopus/customer-app Next.js (App Router, runtime=nodejs). Public API: auth, onboarding, Stripe webhook, provisioning callback, license/usage, cron reconcilers.
provisioning-orchestrator pkg/octopus/provisioning-orchestrator Go + Temporal worker. Runs OnboardingWorkflow (the 12-activity saga) and exposes the HMAC-gated POST /trigger/onboarding.
bootstrap-controller pkg/bootstrap-controller On-prem Go agent. First boot fetches the install manifest, then loops hourly: heartbeat + usage report. Closes the SaaS↔on-prem metering loop.
UTXO ledger pkg/octopus/customer-app/src/lib/utxo Single source of truth for TSU balance (ADR 0099). Mints/spends are idempotent.

TSU — the unit of value

The product meters in TSU (TLS-Stress Units). TSU are minted into a UTXO ledger (mintUTXO) at signup-promo, onboarding completion, subscription refill, and boost purchase; they are spent (spendTSU) when the on-prem deployment reports usage. The ledger is the only authoritative balance (src/lib/utxo/balance.ts).


2. End-to-end sequence

sequenceDiagram
    autonumber
    actor U as Customer
    participant CA as customer-app (Next.js)
    participant DB as Postgres (ledger + accounts)
    participant ST as Stripe
    participant PO as provisioning-orchestrator (Temporal)
    participant BC as bootstrap-controller (on-prem)

    Note over U,CA: A. Account creation
    U->>CA: POST /api/v1/auth/signup
    CA->>CA: Turnstile, OFAC, email pipeline, HIBP, Argon2id
    CA->>DB: createCustomerAndUser (status=signup→email_pending)
    CA-->>U: 201 pending_verification (neutral)
    CA->>U: verify-email link (SES)

    Note over U,CA: B. Verify + sign-in + MFA
    U->>CA: GET /api/auth/verify?token=…
    CA->>DB: consumeVerificationToken + markEmailVerified (→ kyc_pending)
    CA-->>U: 302 /onboarding?step=2fa
    U->>CA: POST /api/v1/auth/mfa/enroll
    CA->>DB: store totpSecret, totpEnabled=false, backupCodes
    U->>CA: POST /api/v1/auth/signin (email+password)
    CA-->>U: 200 mfa_required (partial ACCESS_COOKIE, tfv=false)
    U->>CA: POST /api/v1/auth/signin/mfa (TOTP / backup code)
    CA-->>U: 200 ok (full session, tfv=true)

    Note over U,CA: C. Onboarding mint (free path → active)
    U->>CA: POST /api/v1/onboarding/profile (tier1/2/3)
    CA->>CA: MFA gate + KYC gate
    CA->>DB: mintUTXO (TSU) + activateIfKycPending (kyc_pending→active)

    Note over U,ST: D. Paid checkout
    U->>ST: Stripe Checkout (hosted)
    ST->>CA: POST /api/stripe/webhook (checkout.session.completed)
    CA->>CA: verify signature, claim idempotency, complianceBlock()
    CA->>DB: applyTierFromStripe + recordProvisioningJob
    CA->>PO: enqueueOnboarding → POST /trigger/onboarding (HMAC)

    Note over PO,DB: E. Onboarding saga (12 activities + LIFO compensation)
    PO->>PO: OnboardingWorkflow (KYC…MintDeploymentID…ProvisionTenant…)
    PO->>CA: Activity 9b LinkCustomerAccount → POST /api/internal/provisioning/callback (HMAC)
    CA->>DB: linkProvisionedAccount (dpl_ id + cell, status→active)
    CA-->>PO: 200 provisioned
    PO->>U: SendWelcomeEmail (dashboard URL + onboarding JWT)

    Note over U,BC: F. On-prem install + metering
    U->>CA: GET /api/account/download/[downloadKey] (presigned S3)
    U->>CA: POST /api/license/issue (license JWT)
    BC->>CA: POST /api/install/manifest (bind fingerprint, get modules)
    loop hourly
        BC->>CA: POST /api/license/heartbeat
        BC->>CA: POST /api/usage/report (spendTSU)
    end

Two HMAC legs cross the SaaS↔orchestrator boundary, both keyed on the same shared secret family and both X-Provisioning-Signature: hex(HMAC-SHA256(rawBody)):

  • Outbound — customer-app enqueueOnboarding → orchestrator POST /trigger/onboarding.
  • Echo-back — orchestrator LinkCustomerAccount / ReportProvisioningFailed → customer-app POST /api/internal/provisioning/callback.

3. Stage-by-stage detail

3.1 Signup — POST /api/v1/auth/signup

File: pkg/octopus/customer-app/src/app/api/v1/auth/signup/route.ts

The handler is an anti-enumeration pipeline: every "won't accept" path returns the identical 201 pending_verification body and is padded to MIN_RESPONSE_MS = 600 ms (padTiming) so an attacker cannot time-distinguish outcomes.

# Gate Behaviour on fail
1 Rate limit signup:{ip} + signup-day:{ip} (failClosed: true) 429 rate_limited (timing-padded)
2 Zod SignupSchema (email, password ≥12, acceptedTos: literal(true), turnstileToken) 400 invalid_payload
3 isSanctionedCountry(countryCode) (OFAC) neutral 201 + SIEM log signup.sanctioned_country
4 verifyTurnstile neutral 201
5 validateEmail (RFC 5322, +tag strip, domain blocklist, MX/A/AAAA) neutral 201
6 findUserByEmail (existing user) neutral 201 + log signup.email_already_exists
7 checkPasswordHIBP (k-anonymity, fail-open if HIBP unreachable) 400 breached_password (the one non-neutral fail — the signal helps the user)

On success: hashPasswordPeppered (Argon2id + current pepper version) → createCustomerAndUser → stamp pepperVersion + passwordChangedAtcreateVerificationToken(purpose='email_verify', ttlHours=0.5) → SES verify-email. Email send failure does not block signup (the user can use /api/v1/auth/resend-verification).

Anti-Sybil (TOK-01). Referral bonuses are not minted at signup; the referral relationship is persisted and credited only when this account verifies its email (creditReferralOnVerify), gated and capped per referrer. A promoCode of kind tsu_grant is redeemed at signup inside a single transaction (usedCount increment, mintUTXO, promoRedemptions insert all share the tx).

Account status after signup: signupemail_pending (the createCustomerAndUser default is signup; the verification email moves the user toward verify).

3.2 Verify email — GET /api/auth/verify?token=…

File: pkg/octopus/customer-app/src/app/api/auth/verify/route.ts

consumeVerificationToken(token, 'email_verify') does a single-shot SELECT-then-UPDATE (used_at = now()) on a row matching the SHA-256 of the token with purpose='email_verify' AND expires_at > now() AND used_at IS NULL. On a forged/expired token: 302 /verify-error?code=invalid_or_expired. On success: markEmailVerified advances the account to kyc_pending, fires creditReferralOnVerify (non-fatal), and redirects 302 /onboarding?step=2fa.

3.3 MFA enrollment — POST /api/v1/auth/mfa/enroll

File: pkg/octopus/customer-app/src/app/api/v1/auth/mfa/enroll/route.ts

Requires an authenticated session but not yet TOTP-verified — the exact state after verify-email. Returns secret (base32), qrCodeDataUrl, and 10 single-use backupCodes. Server-side: one atomic UPDATE writes totpSecret (encrypted), totpEnabled=false, and the hashed backupCodes. The flag flips to true only after /api/v1/auth/mfa/verify proves possession.

Re-enrollment hardening (CRIT-4). If TOTP is already enabled, only a fully 2FA-verified session may overwrite the secret; a weaker session (e.g. magic link) gets 403 reenrollment_requires_2fa and must use account recovery.

3.4 Sign-in — POST /api/v1/auth/signin

File: pkg/octopus/customer-app/src/app/api/v1/auth/signin/route.ts

Timing-uniform (MIN_RESPONSE_MS = 700 ms); always runs Argon2 even for an unknown user (against a constant dummy hash) so "unknown user" cannot be time-distinguished from "known user, bad password".

  • Rate limit signin-ip:{ip} (30/5 min) + per-user progressive lockout: after 5 failures, lockoutUntil = now + min(30 min, 2^count s) (atomic failedLoginCount increment).
  • Account-state gate: suspended/deleted401 account_unavailable; email_pending/signup403 email_verification_required.
  • Transparent rehash if pepper/Argon2 params are outdated.
  • Risk engine (H-6)assessRisk (impossible travel, CF bot/threat score, IP reputation). A block verdict denies even a correct password (403 risk_blocked); fail-open to the mandatory-MFA path on engine error.

Response: * totpEnabled=true200 mfa_required with a partial ACCESS_COOKIE (tfv=false); audit mfa_challenge. * totpEnabled=false (narrow first-login window) → 200 mfa_enrollment_required with partial access + refresh cookies so the user can reach /api/v1/auth/mfa/enroll.

3.5 MFA verify (step 2) — POST /api/v1/auth/signin/mfa

File: pkg/octopus/customer-app/src/app/api/v1/auth/signin/mfa/route.ts

Reads the partial ACCESS_COOKIE (must have tfv=false), rate-limits signin-mfa:{sub} (10/5 min), and accepts either:

  • a 6-digit TOTP — verified by verifyTOTPStored; replay-protected by claimTotpStep(userId, step) (AUTH-3): a cryptographically-valid code is single-use per 30 s window; a replay logs mfa.totp_replay_blocked and is rejected.
  • a 12-hex backup code — the regex ^\d{6}$|^[0-9A-Fa-f][0-9A-Fa-f \-]{11,17}$ normalises dashes/spaces/case away; the select-verify-burn cycle runs in a transaction with FOR UPDATE so two racing requests carrying the same code can't both pass (audit MFA-BACKUP-1 — the old [A-Z0-9]{8} shape never matched a real code, so backup codes were structurally unredeemable).

On success: issue a full access token (tfv=true) + refresh token, persist the refresh row, audit login_success, return 200 { ok, next: '/account' }. Legacy-plaintext TOTP secrets are re-encrypted via KMS fire-and-forget.

3.6 Onboarding mint — POST /api/v1/onboarding/profile

File: pkg/octopus/customer-app/src/app/api/v1/onboarding/profile/route.ts

This is the free-tier activation + TSU mint endpoint. Three pre-mint gates:

  1. Rate limit onboarding-profile:{ip} (30/h, failClosed) — anti-farming (ONB-1).
  2. MFA gate (AUTH-01)!session.twoFactorVerified403 mfa_required.
  3. KYC gateevaluateKycGateForAccount (src/lib/kyc/gate.ts); a suspended account or rejected KYC → 403 compliance_hold.
Tier (action) Mint (TSU) sourceRef (idempotency) Pre-req
tier1 (segment + use case) 500,000 onboarding:tier1:{accountId}
tier2 (frequency + destination + volume + cicd/white-label) 300,000 onboarding:tier2:{accountId} tier1
tier3 (pain points, 1–3) 200,000 onboarding:tier3:{accountId} tier1 + tier2
completion bonus (all 3) 100,000 onboarding:bonus:{accountId} tiers 1+2+3
dismiss 0

Each tier is idempotent (re-submitting a completed tier returns the existing row, mints nothing). totalTokensEarned is incremented with a SQL expression to avoid a TOCTOU race. After tier1, activateIfKycPending promotes a kyc_pending account to active (paid accounts are already active via the Stripe webhook; the WHERE status='kyc_pending' clause is the safety filter).

3.7 Stripe checkout + webhook — POST /api/stripe/webhook

File: pkg/octopus/customer-app/src/app/api/stripe/webhook/route.ts

The webhook is the paid path's spine. Envelope handling:

  1. Require stripe-signature; 503 if STRIPE_WEBHOOK_SECRET unset.
  2. stripe.webhooks.constructEvent(..., SIGNING_TOLERANCE_SECS=300); bad sig → 400.
  3. Idempotency claim claimStripeWebhookEvent (at-least-once delivery): a duplicate returns 200 { duplicate: true }; a non-finalized prior attempt is reprocessed (isRetry).
  4. dispatch(event)finalizeStripeWebhookEvent (outcome applied|unhandled|failed|ignored). A handler throw persists failed and returns 500 so Stripe re-delivers with backoff — every handler is idempotent.

Handled event types → handlers:

Event Handler
checkout.session.completed / checkout.session.async_payment_succeeded handleCheckoutCompleted (or handleBoostCheckoutCompleted if metadata.sku starts boost-)
customer.subscription.created / .updated handleSubscriptionChange
customer.subscription.deleted handleSubscriptionDeleted
invoice.paid handleInvoicePaid
invoice.payment_failed handleInvoicePaymentFailed
charge.refunded handleChargeRefunded (clawbackChargeTokens)
charge.dispute.created handleDisputeCreated (suspend-first, then clawback)

complianceBlock — the OFAC/KYC gate on every paid path

complianceBlock(account, countryFromSession) returns a block reason when the country (session-derived, falling back to the account's stored country) is sanctioned, or when evaluateKycGate({status, kycStatus}) denies (suspended / KYC-rejected). It is wired into every money-bearing handler:

  • handleCheckoutCompleted — on block: suspendAccount(compliance_hold:{reason}), finalize without enqueueing onboarding, return ignored (never 5xx, so Stripe doesn't retry-loop a permanent hold).
  • handleSubscriptionChange — re-screens; without it a subscription event would flip a just-held account back to active (audit V2).
  • handleInvoicePaid — skip the recurring mint for a now-sanctioned customer.
  • handleBoostCheckoutCompleted — the boost (one-time TSU) path; the last paid path that lacked the gate (audit STRIPE-DEEP-01).

handleCheckoutCompleted — the happy path

  1. Ignore payment_status='unpaid' (async methods credit later on async_payment_succeeded).
  2. Three-step account matching: stripe_customer_idcustomer_emailauto-provision (autoCreateAccountFromStripe, paid-signup-first; creates an email_pending shell and fires a magic-link via sendOnboardingMagicLink, 15-min TTL).
  3. complianceBlock (above).
  4. Resolve tier from the first line item's price (priceIdToTier). An unmapped price (resolves to free/0) is not applied to a paying checkout — CRITICAL log, leave for manual mapping (audit STRIPE-1); the saga still provisions.
  5. applyTierFromStripe (tier + monthly token quota).
  6. enqueueOnboarding (the saga hand-off, §3.8).
  7. recordProvisioningJob — durably persist the hand-off in provisioning_jobs with the exact onboardingInput (so the reconciler can replay it). Best-effort: a bookkeeping failure never turns the webhook into a 5xx (launch blocker A).

3.8 Enqueue hand-off — enqueueOnboarding

File: pkg/octopus/customer-app/src/lib/provisioning/enqueue.ts

Bridges a settled checkout into the Go orchestrator. Never throws — a hand-off failure must not turn the Stripe webhook into a 5xx.

  • If PROVISIONING_TRIGGER_URL or PROVISIONING_TRIGGER_SECRET is unset → degrade safe: emit customer.onboarding.enqueue {transport:"log", reason:"trigger_not_configured"} and return { enqueued: false }. The reconciler (or the cross-cloud orchestrator) picks it up.
  • Otherwise: POST the snake_case wireBody to the trigger URL with X-Provisioning-Signature = hex(HMAC-SHA256(body)), 10 s timeout. Non-2xx → { enqueued: false, reason: 'http_<status>' }; network error → { enqueued: false, reason: 'fetch_error' }.

The orchestrator's POST /trigger/onboarding (provisioning-orchestrator/cmd/provisioner/main.go::onboardingTriggerHandler) verifies the HMAC (constant-time, 64 KB body cap), unmarshals OnboardingInput, requires stripe_session_id + email, and calls StartOnboarding. The WorkflowID is "onboarding-" + StripeSessionID (worker.go), so a replayed webhook dedupes to the same Temporal execution.

3.9 Onboarding saga — OnboardingWorkflow

File: pkg/octopus/provisioning-orchestrator/internal/workflows/onboarding_temporal.go (activities: internal/activities/activities.go, internal/activities/real_provider_callback.go)

The real Temporal workflow. Activity options: StartToCloseTimeout=30 s, retry with backoff (×2, cap 20 s). The whole saga is bounded by WorkflowTimeout = 15 min (onboarding_run.go) — large enough to clear every activity's retry budget plus the compensation chain so a server-side timeout never fires mid-saga and skips the deferred rollback.

12 activities (steps 1–12; step 9b is the echo-back):

# Activity Compensation pushed (LIFO)
1 KYCCheck (clean result; a non-pass is a workflow decision, ErrKYCFailed, not retried)
2 MintDeploymentID (ULID dpl_) MarkDeploymentIDRolledBack
3 AllocateCell (GeoIP countryCode → nearest cell) ReleaseCellAllocation
4 IssueClientCert (Vault PKI per-cell CA) RevokeClientCert
5 ProvisionTenant (Postgres RLS + Redis namespace) MarkTenantRollbackPending
6 AllocateTokenQuota (tokensForSlug(packageSlug)) BurnTokenQuota
7 ReserveConnectArtSlot ReleaseConnectArtSlot
8 ReserveStunCoordSlot ReleaseStunCoordSlot
9 GenerateOnboardingJWT (→ https://dashboard.tlsstress.art/onboarding?token=…)
9b LinkCustomerAccount (echo-back) — POST the dpl_ id + cell + quota to customer-app; last fallible step — (clears the chain on success)
10 SendWelcomeEmail (post no-return)
11 AppendAuditChain (best-effort)
12 NotifyAdminHighValue (AmountCents ≥ HighValueAmountCentsThreshold = $500, best-effort)

tokensForSlug (onboarding_run.go): free-trial→1,000, pro-monthly→100,000, enterprise-monthly→1,000,000, defense-monthly→10,000,000, default→0.

The LinkCustomerAccount echo-back (Activity 9b)

File: internal/activities/real_provider_callback.go

It POSTs {stripe_session_id, status:"provisioned", deployment_id, cell_id, tokens_quota} to customer-app's POST /api/internal/provisioning/callback, HMAC-signed. Fail-closed when the callback URL/secret is unwired (ErrNotProvisioned). A 404 (customer-app hasn't yet recorded the provisioning_jobs row — webhook race) is ErrTransient so Temporal retries; the row appears within seconds. Only a 200 confirms the link.

It runs as the last fallible step, right before the no-return welcome email, precisely so that if it fails the deferred saga unwinds with the account never flipped active (the provisioning_jobs row stays unprovisioned and the reconciler re-drives) — instead of a permanent half-state.

The no-return point + ReportProvisioningFailed

linked := true flips after step 9b. Then rollback = nil clears the compensation chain before the welcome email (audit V1): otherwise a transient SendWelcomeEmail error (ErrTransient on a Postmark blip) would unwind the whole saga — revoking the cert, zeroing the quota, releasing slots — on a customer who is already active in customer-app, with nothing to repair it (the provisioning_jobs row is already provisioned, excluded from the reconciler).

On any terminal failure the deferred function runs the LIFO compensation stack on a disconnected context (so workflow cancellation can't abort the rollback), then — only if !linked — fires ReportProvisioningFailed (status failed callback → markProvisioningJobFailed) and SendProvisioningFailedEmail with a generic reason (never leaking KYC/sanctions detail). Both were previously dead code with zero call sites.

flowchart TD
    Start([Workflow start]) --> A1[1. KYCCheck]
    A1 -->|!Passed| KFAIL[/return ErrKYCFailed/]
    A1 -->|Passed| A2[2. MintDeploymentID]
    A2 --> A3[3. AllocateCell]
    A3 --> A4[4. IssueClientCert]
    A4 --> A5[5. ProvisionTenant]
    A5 --> A6[6. AllocateTokenQuota]
    A6 --> A7[7. ReserveConnectArtSlot]
    A7 --> A8[8. ReserveStunCoordSlot]
    A8 --> A9[9. GenerateOnboardingJWT]
    A9 --> A9b{9b. LinkCustomerAccount}
    A9b -->|2xx| LINKED["linked = true<br/>rollback = nil<br/>NO-RETURN POINT"]
    A9b -->|error| FAIL
    LINKED --> A10[10. SendWelcomeEmail]
    A10 -->|error| PROVISIONED_NOEMAIL["return error<br/>(provisioned; manual resend)<br/>NO unwind"]
    A10 -->|ok| A11[11. AppendAuditChain best-effort]
    A11 --> A12[12. NotifyAdminHighValue if ≥ $500]
    A12 --> Done([OnboardingResult])

    KFAIL --> FAIL
    FAIL{{"deferred: err != nil"}}
    FAIL --> COMP["Disconnected ctx<br/>run rollback stack LIFO:<br/>RevokeClientCert →<br/>ReleaseStun/ConnectArt →<br/>BurnTokenQuota →<br/>MarkTenantRollbackPending →<br/>ReleaseCellAllocation →<br/>MarkDeploymentIDRolledBack"]
    COMP --> NL{linked?}
    NL -->|false| REPORT["ReportProvisioningFailed<br/>(status:failed callback)<br/>+ SendProvisioningFailedEmail (generic)"]
    NL -->|true| NOOP["no failure notice<br/>(customer IS provisioned)"]
    REPORT --> End([terminal])
    NOOP --> End
    PROVISIONED_NOEMAIL --> End

3.10 Provisioning callback — POST /api/internal/provisioning/callback

File: pkg/octopus/customer-app/src/app/api/internal/provisioning/callback/route.ts

The customer-app receiver of the echo-back. Auth: HMAC-SHA256 over the raw body (PROVISIONING_CALLBACK_SECRET preferred, falling back to PROVISIONING_TRIGGER_SECRET), constant-time compared. Fails closed: no secret → 503; bad sig → 401.

  • status: "failed"markProvisioningJobFailed200.
  • status: "provisioned" → requires deployment_id + cell_id (else 400) → linkProvisionedAccount({stripeSessionId, deploymentId, cellId, tokens}) which links the real dpl_ id to the account (matched on stripe_session_id) and flips it active. If no job matches the session → 404 so the orchestrator retries/escalates rather than assuming success.

This is the step that retires the pending- placeholder deployment_id: before the callback existed, the account's deployment_id stayed pending- forever and activateAccount() was dead code (launch blocker — Cluster A).

3.11 Reconciler watchdog — POST /api/cron/reconcile-provisioning

File: pkg/octopus/customer-app/src/app/api/cron/reconcile-provisioning/route.ts

Closes the durability gap: a hand-off can get stuck (trigger unreachable / cell down / non-2xx / saga failed mid-flight). Auth: x-cron-secret (constant-time, SHA-256). Always returns 200 (it reports).

  • getStaleProvisioningJobs({staleBefore: now − 5 min, maxAttempts: 8, limit: 25}) — the STALE_AFTER_MS window is long enough to let the orchestrator's own retries + callback land first.
  • For each job with a persisted triggerPayload, replay it through enqueueOnboarding (a row with no payload — pre-migration 0039 — is skipped, never re-minted from a guess), then recordProvisioningJob bumps attempts. A job past MAX_ATTEMPTS=8 stops being retried but stays visible (the /api/metrics stuck gauge + AlertManager page on it). A job linked by the callback meanwhile is provisioned and drops out of the scan.

3.12 TBI download + license issue

  • TBI downloadGET /api/account/download/[downloadKey] (src/app/api/account/download/[downloadKey]/route.ts): authenticated session, active (non-suspended/deleted) account, and OFAC/embargo country check (isSanctionedCountry → generic 403 unavailable). Mints a 5-minute presigned S3 URL for the published bootstrap image (the bucket is private) and 302-redirects. Rate-limited 60/h per account.
  • License issuePOST /api/license/issue (src/app/api/license/issue/route.ts): authenticated + MFA-verified (!tfv403 mfa_required), rate-limited 10/h/account. signLicense({accountId, tier, validitySec}) (default 365 d, max 730 d); persists a licenses row storing only jti + kid (never the token). The JWT is shown once for the operator to paste into the on-prem box.

3.13 On-prem install — bootstrap-controller first boot

File: pkg/bootstrap-controller/cmd/bootstrap-controller/main.go

First boot (default state dir /var/lib/tlsstress/bootstrap):

  1. Read the license JWT from …/bootstrap/license.jwt (chmod 0600); empty/missing → Fatalf with a paste hint.
  2. Compute the hardware fingerprint (internal/fingerprint): SHA-256 over the first available of /etc/machine-id, then /sys/class/dmi/id/product_uuid, then hostname + the first stable (non-virtual, non-docker/cni/veth) MAC, joined with |. Non-Linux dev hosts (no machine-id/DMI) fall back to a simulated fingerprint that the cloud binds identically.
  3. POST /api/install/manifest (Bearer license JWT) — verify the JWT, cross-check licenses for revocation + account mismatch, bind to fingerprint on first call (atomic UPDATE … WHERE bound_fingerprint IS NULL so a stolen-JWT race can't double-bind), optionally enroll the L1 usage pubkey, and return the per-tier module image refs + cloud URLs + heartbeatIntervalSec=3600 + gracePeriodSec=86400. Manifests are written under /var/lib/tlsstress/manifests/ (an external systemd unit applies them).
  4. Initial heartbeat (stamps the grace window) + initial usage drain (no-op on first boot).
  5. Enter the scheduled loop: hourly heartbeat + hourly usage report.

An L1 attestation signer (host-held Ed25519, ADR 0104) is loaded/created; an optional L2 lease broker can gate runs against the cloud once the heartbeat goes stale past the grace window.

3.14 Metering loop — heartbeat + usage report

  • HeartbeatPOST /api/license/heartbeat (src/app/api/license/heartbeat/route.ts): Bearer license JWT; updates licenses.last_heartbeat_at + appends a license_heartbeats row; enforces the fingerprint binding and revocation; records L3 TPM posture/quote (ADR 0104). Returns graceExpiresAt = now + 24 h. 24 h without a heartbeat → modules go read-only.
  • Usage reportPOST /api/usage/report (src/app/api/usage/report/route.ts): Bearer license JWT; append-only TSU consumption from the on-prem spend reporter.
  • License checks: jti exists, accountId matches claims.sub (defense-in-depth), not revoked, fingerprint-bound (an unbound license can never report — license_not_bound), fingerprint matches.
  • L1 attestation (ADR 0104): when a usagePubkey is enrolled and mode ≠ off, every report should carry a signed envelope (license, fingerprint, seq range, nonce, events digest). envelope_missing is advisory-tolerant; a seq regression, invalid signature, or replayed nonce on a signed envelope is rejected hard even in advisory mode.
  • Atomic insert + debit (M1/A6): usage rows and the TSU debit (spendTSU(... , tx)) commit in one transaction, de-duped by (license_id, module, client_seq). InsufficientBalanceError rolls the whole unit back and returns 402 quotaExceeded (controller pauses + prompts top-up); AccountSuspendedError403 account_suspended (recognised by the controller's IsLicenseRejected, which stops the loop). Remaining balance is read from the UTXO ledger on every path (single source of truth, C1).

4. Account status state machine

stateDiagram-v2
    [*] --> signup: createCustomerAndUser
    signup --> email_pending: verification email sent
    email_pending --> kyc_pending: GET /api/auth/verify (markEmailVerified)
    kyc_pending --> active: onboarding tier1 (activateIfKycPending) — FREE path
    kyc_pending --> active: provisioning callback linkProvisionedAccount — PAID path
    email_pending --> active: Stripe webhook applyTierFromStripe (paid-signup-first)
    active --> suspended: complianceBlock / refund / dispute / past_due
    suspended --> active: ops review (manual)
    active --> churned: subscription deleted
    active --> deletion_pending: DSAR (GDPR Art.17)
    deletion_pending --> deleted: process-deletions cron
    suspended --> [*]
    deleted --> [*]
Transition Trigger Code
signup → email_pending verification email issued signup/route.ts
email_pending → kyc_pending email verified auth/verify/route.ts::markEmailVerified
kyc_pending → active (free) onboarding tier1 onboarding/profile/route.ts::activateIfKycPending
kyc_pending/email_pending → active (paid) webhook tier apply / saga callback stripe/webhook::applyTierFromStripe, provisioning/callback::linkProvisionedAccount
* → suspended compliance / refund / dispute / dunning stripe/webhook::complianceBlock, handleChargeRefunded, handleDisputeCreated

5. Idempotency, compensation, and failure-mode matrix

Stage Idempotency key Failure mode → behaviour
Signup email uniqueness (DB) conflict race → neutral 201
Verify single-shot used_at UPDATE reused/expired token → 302 /verify-error
MFA verify (backup) FOR UPDATE burn in tx racing same code → only one burns
Onboarding mint sourceRef per tier re-submit completed tier → mint nothing
Stripe webhook claimStripeWebhookEvent (Stripe event id) handler throw → 500, Stripe re-delivers; duplicate → 200 duplicate
Subscription refill mint stripeInvoiceId (per-invoice) retry → no double credit
Enqueue WorkflowID onboarding-{sessionId} trigger down → enqueued:false, reconciler re-drives
Saga activities IdempotencyKey(wfID, name, attempt) activity error → Temporal retry 5×; terminal → LIFO compensation
Callback matched on stripe_session_id no job → 404, orchestrator retries
Usage report (license_id, module, client_seq) duplicate batch → 0 rows, 0 debit

6. Configuration (environment)

Variable Used by Effect when unset
STRIPE_WEBHOOK_SECRET webhook 503 webhook_not_configured
PROVISIONING_TRIGGER_URL / PROVISIONING_TRIGGER_SECRET enqueueOnboarding + orchestrator trigger degrade safe → log-only enqueue; reconciler re-drives
PROVISIONING_CALLBACK_SECRET (falls back to …TRIGGER_SECRET) provisioning callback 503 (fail closed)
CRON_SECRET reconcile cron 503 cron_not_configured
KMS_TOTP_KEY_ARN MFA verify skip legacy-secret re-encryption
USAGE_ATTESTATION_MODE usage report advisory semantics (vs enforce)
ECR_REGISTRY install manifest no registryAuth (non-fatal)
APP_BASE_URL / PUBLIC_APP_URL verify/magic links defaults to https://app.tlsstress.art

7. Cross-references

  • ADR 0056 — docs/ADR/0056-octopus-customer-auto-provisioning.md
  • ADR 0099 — Token Economy v5 (UTXO ledger as single source of truth)
  • ADR 0104 — docs/ADR/0104-usage-attestation-anti-fraud.md
  • Saga shape (pure, unit-testable) — provisioning-orchestrator/internal/workflows/onboarding_run.go
  • KYC/compliance gate — customer-app/src/lib/kyc/gate.ts