Paid Customer Journey — Low-Level Design (LLD)¶
Scope. The exact end-to-end journey of a paying tlsstress.art customer:
signup → verify-email → sign-in/MFA → onboarding-mint → Stripe checkout → webhook → onboarding saga → callback → active → TBI download → on-prem install → metering. Every step below cites the file that implements it. No invented endpoints, flags, or behaviours — read the cited code alongside this document.Authoritative ADRs. ADR 0056 (Octopus Customer Auto-Provisioning), ADR 0099 (Token Economy v5 / UTXO ledger), ADR 0104 (Usage Attestation / anti-fraud).
1. Components¶
| Component | Path | Role |
|---|---|---|
| customer-app | pkg/octopus/customer-app |
Next.js (App Router, runtime=nodejs). Public API: auth, onboarding, Stripe webhook, provisioning callback, license/usage, cron reconcilers. |
| provisioning-orchestrator | pkg/octopus/provisioning-orchestrator |
Go + Temporal worker. Runs OnboardingWorkflow (the 12-activity saga) and exposes the HMAC-gated POST /trigger/onboarding. |
| bootstrap-controller | pkg/bootstrap-controller |
On-prem Go agent. First boot fetches the install manifest, then loops hourly: heartbeat + usage report. Closes the SaaS↔on-prem metering loop. |
| UTXO ledger | pkg/octopus/customer-app/src/lib/utxo |
Single source of truth for TSU balance (ADR 0099). Mints/spends are idempotent. |
TSU — the unit of value¶
The product meters in TSU (TLS-Stress Units). TSU are minted into a UTXO
ledger (mintUTXO) at signup-promo, onboarding completion, subscription refill,
and boost purchase; they are spent (spendTSU) when the on-prem deployment
reports usage. The ledger is the only authoritative balance
(src/lib/utxo/balance.ts).
2. End-to-end sequence¶
sequenceDiagram
autonumber
actor U as Customer
participant CA as customer-app (Next.js)
participant DB as Postgres (ledger + accounts)
participant ST as Stripe
participant PO as provisioning-orchestrator (Temporal)
participant BC as bootstrap-controller (on-prem)
Note over U,CA: A. Account creation
U->>CA: POST /api/v1/auth/signup
CA->>CA: Turnstile, OFAC, email pipeline, HIBP, Argon2id
CA->>DB: createCustomerAndUser (status=signup→email_pending)
CA-->>U: 201 pending_verification (neutral)
CA->>U: verify-email link (SES)
Note over U,CA: B. Verify + sign-in + MFA
U->>CA: GET /api/auth/verify?token=…
CA->>DB: consumeVerificationToken + markEmailVerified (→ kyc_pending)
CA-->>U: 302 /onboarding?step=2fa
U->>CA: POST /api/v1/auth/mfa/enroll
CA->>DB: store totpSecret, totpEnabled=false, backupCodes
U->>CA: POST /api/v1/auth/signin (email+password)
CA-->>U: 200 mfa_required (partial ACCESS_COOKIE, tfv=false)
U->>CA: POST /api/v1/auth/signin/mfa (TOTP / backup code)
CA-->>U: 200 ok (full session, tfv=true)
Note over U,CA: C. Onboarding mint (free path → active)
U->>CA: POST /api/v1/onboarding/profile (tier1/2/3)
CA->>CA: MFA gate + KYC gate
CA->>DB: mintUTXO (TSU) + activateIfKycPending (kyc_pending→active)
Note over U,ST: D. Paid checkout
U->>ST: Stripe Checkout (hosted)
ST->>CA: POST /api/stripe/webhook (checkout.session.completed)
CA->>CA: verify signature, claim idempotency, complianceBlock()
CA->>DB: applyTierFromStripe + recordProvisioningJob
CA->>PO: enqueueOnboarding → POST /trigger/onboarding (HMAC)
Note over PO,DB: E. Onboarding saga (12 activities + LIFO compensation)
PO->>PO: OnboardingWorkflow (KYC…MintDeploymentID…ProvisionTenant…)
PO->>CA: Activity 9b LinkCustomerAccount → POST /api/internal/provisioning/callback (HMAC)
CA->>DB: linkProvisionedAccount (dpl_ id + cell, status→active)
CA-->>PO: 200 provisioned
PO->>U: SendWelcomeEmail (dashboard URL + onboarding JWT)
Note over U,BC: F. On-prem install + metering
U->>CA: GET /api/account/download/[downloadKey] (presigned S3)
U->>CA: POST /api/license/issue (license JWT)
BC->>CA: POST /api/install/manifest (bind fingerprint, get modules)
loop hourly
BC->>CA: POST /api/license/heartbeat
BC->>CA: POST /api/usage/report (spendTSU)
end
Two HMAC legs cross the SaaS↔orchestrator boundary, both keyed on the same shared
secret family and both X-Provisioning-Signature: hex(HMAC-SHA256(rawBody)):
- Outbound — customer-app
enqueueOnboarding→ orchestratorPOST /trigger/onboarding. - Echo-back — orchestrator
LinkCustomerAccount/ReportProvisioningFailed→ customer-appPOST /api/internal/provisioning/callback.
3. Stage-by-stage detail¶
3.1 Signup — POST /api/v1/auth/signup¶
File: pkg/octopus/customer-app/src/app/api/v1/auth/signup/route.ts
The handler is an anti-enumeration pipeline: every "won't accept" path returns
the identical 201 pending_verification body and is padded to MIN_RESPONSE_MS =
600 ms (padTiming) so an attacker cannot time-distinguish outcomes.
| # | Gate | Behaviour on fail |
|---|---|---|
| 1 | Rate limit signup:{ip} + signup-day:{ip} (failClosed: true) |
429 rate_limited (timing-padded) |
| 2 | Zod SignupSchema (email, password ≥12, acceptedTos: literal(true), turnstileToken) |
400 invalid_payload |
| 3 | isSanctionedCountry(countryCode) (OFAC) |
neutral 201 + SIEM log signup.sanctioned_country |
| 4 | verifyTurnstile |
neutral 201 |
| 5 | validateEmail (RFC 5322, +tag strip, domain blocklist, MX/A/AAAA) |
neutral 201 |
| 6 | findUserByEmail (existing user) |
neutral 201 + log signup.email_already_exists |
| 7 | checkPasswordHIBP (k-anonymity, fail-open if HIBP unreachable) |
400 breached_password (the one non-neutral fail — the signal helps the user) |
On success: hashPasswordPeppered (Argon2id + current pepper version) →
createCustomerAndUser → stamp pepperVersion + passwordChangedAt →
createVerificationToken(purpose='email_verify', ttlHours=0.5) → SES
verify-email. Email send failure does not block signup (the user can use
/api/v1/auth/resend-verification).
Anti-Sybil (TOK-01). Referral bonuses are not minted at signup; the
referral relationship is persisted and credited only when this account verifies
its email (creditReferralOnVerify), gated and capped per referrer. A promoCode
of kind tsu_grant is redeemed at signup inside a single transaction
(usedCount increment, mintUTXO, promoRedemptions insert all share the tx).
Account status after signup: signup → email_pending (the createCustomerAndUser
default is signup; the verification email moves the user toward verify).
3.2 Verify email — GET /api/auth/verify?token=…¶
File: pkg/octopus/customer-app/src/app/api/auth/verify/route.ts
consumeVerificationToken(token, 'email_verify') does a single-shot
SELECT-then-UPDATE (used_at = now()) on a row matching the SHA-256 of the token
with purpose='email_verify' AND expires_at > now() AND used_at IS NULL. On a
forged/expired token: 302 /verify-error?code=invalid_or_expired. On success:
markEmailVerified advances the account to kyc_pending, fires
creditReferralOnVerify (non-fatal), and redirects 302 /onboarding?step=2fa.
3.3 MFA enrollment — POST /api/v1/auth/mfa/enroll¶
File: pkg/octopus/customer-app/src/app/api/v1/auth/mfa/enroll/route.ts
Requires an authenticated session but not yet TOTP-verified — the exact state
after verify-email. Returns secret (base32), qrCodeDataUrl, and 10 single-use
backupCodes. Server-side: one atomic UPDATE writes totpSecret (encrypted),
totpEnabled=false, and the hashed backupCodes. The flag flips to true only
after /api/v1/auth/mfa/verify proves possession.
Re-enrollment hardening (CRIT-4). If TOTP is already enabled, only a fully
2FA-verified session may overwrite the secret; a weaker session (e.g. magic link)
gets 403 reenrollment_requires_2fa and must use account recovery.
3.4 Sign-in — POST /api/v1/auth/signin¶
File: pkg/octopus/customer-app/src/app/api/v1/auth/signin/route.ts
Timing-uniform (MIN_RESPONSE_MS = 700 ms); always runs Argon2 even for an
unknown user (against a constant dummy hash) so "unknown user" cannot be
time-distinguished from "known user, bad password".
- Rate limit
signin-ip:{ip}(30/5 min) + per-user progressive lockout: after 5 failures,lockoutUntil = now + min(30 min, 2^count s)(atomicfailedLoginCountincrement). - Account-state gate:
suspended/deleted→401 account_unavailable;email_pending/signup→403 email_verification_required. - Transparent rehash if pepper/Argon2 params are outdated.
- Risk engine (H-6) —
assessRisk(impossible travel, CF bot/threat score, IP reputation). Ablockverdict denies even a correct password (403 risk_blocked); fail-open to the mandatory-MFA path on engine error.
Response:
* totpEnabled=true → 200 mfa_required with a partial ACCESS_COOKIE
(tfv=false); audit mfa_challenge.
* totpEnabled=false (narrow first-login window) → 200 mfa_enrollment_required
with partial access + refresh cookies so the user can reach /api/v1/auth/mfa/enroll.
3.5 MFA verify (step 2) — POST /api/v1/auth/signin/mfa¶
File: pkg/octopus/customer-app/src/app/api/v1/auth/signin/mfa/route.ts
Reads the partial ACCESS_COOKIE (must have tfv=false), rate-limits
signin-mfa:{sub} (10/5 min), and accepts either:
- a 6-digit TOTP — verified by
verifyTOTPStored; replay-protected byclaimTotpStep(userId, step)(AUTH-3): a cryptographically-valid code is single-use per 30 s window; a replay logsmfa.totp_replay_blockedand is rejected. - a 12-hex backup code — the regex
^\d{6}$|^[0-9A-Fa-f][0-9A-Fa-f \-]{11,17}$normalises dashes/spaces/case away; the select-verify-burn cycle runs in a transaction withFOR UPDATEso two racing requests carrying the same code can't both pass (audit MFA-BACKUP-1 — the old[A-Z0-9]{8}shape never matched a real code, so backup codes were structurally unredeemable).
On success: issue a full access token (tfv=true) + refresh token, persist
the refresh row, audit login_success, return 200 { ok, next: '/account' }.
Legacy-plaintext TOTP secrets are re-encrypted via KMS fire-and-forget.
3.6 Onboarding mint — POST /api/v1/onboarding/profile¶
File: pkg/octopus/customer-app/src/app/api/v1/onboarding/profile/route.ts
This is the free-tier activation + TSU mint endpoint. Three pre-mint gates:
- Rate limit
onboarding-profile:{ip}(30/h,failClosed) — anti-farming (ONB-1). - MFA gate (AUTH-01) —
!session.twoFactorVerified→403 mfa_required. - KYC gate —
evaluateKycGateForAccount(src/lib/kyc/gate.ts); asuspendedaccount orrejectedKYC →403 compliance_hold.
| Tier (action) | Mint (TSU) | sourceRef (idempotency) |
Pre-req |
|---|---|---|---|
tier1 (segment + use case) |
500,000 | onboarding:tier1:{accountId} |
— |
tier2 (frequency + destination + volume + cicd/white-label) |
300,000 | onboarding:tier2:{accountId} |
tier1 |
tier3 (pain points, 1–3) |
200,000 | onboarding:tier3:{accountId} |
tier1 + tier2 |
| completion bonus (all 3) | 100,000 | onboarding:bonus:{accountId} |
tiers 1+2+3 |
dismiss |
0 | — | — |
Each tier is idempotent (re-submitting a completed tier returns the existing
row, mints nothing). totalTokensEarned is incremented with a SQL expression to
avoid a TOCTOU race. After tier1, activateIfKycPending promotes a
kyc_pending account to active (paid accounts are already active via the
Stripe webhook; the WHERE status='kyc_pending' clause is the safety filter).
3.7 Stripe checkout + webhook — POST /api/stripe/webhook¶
File: pkg/octopus/customer-app/src/app/api/stripe/webhook/route.ts
The webhook is the paid path's spine. Envelope handling:
- Require
stripe-signature;503ifSTRIPE_WEBHOOK_SECRETunset. stripe.webhooks.constructEvent(..., SIGNING_TOLERANCE_SECS=300); bad sig →400.- Idempotency claim
claimStripeWebhookEvent(at-least-once delivery): a duplicate returns200 { duplicate: true }; a non-finalized prior attempt is reprocessed (isRetry). dispatch(event)→finalizeStripeWebhookEvent(outcomeapplied|unhandled|failed|ignored). A handler throw persistsfailedand returns500so Stripe re-delivers with backoff — every handler is idempotent.
Handled event types → handlers:
| Event | Handler |
|---|---|
checkout.session.completed / checkout.session.async_payment_succeeded |
handleCheckoutCompleted (or handleBoostCheckoutCompleted if metadata.sku starts boost-) |
customer.subscription.created / .updated |
handleSubscriptionChange |
customer.subscription.deleted |
handleSubscriptionDeleted |
invoice.paid |
handleInvoicePaid |
invoice.payment_failed |
handleInvoicePaymentFailed |
charge.refunded |
handleChargeRefunded (clawbackChargeTokens) |
charge.dispute.created |
handleDisputeCreated (suspend-first, then clawback) |
complianceBlock — the OFAC/KYC gate on every paid path¶
complianceBlock(account, countryFromSession) returns a block reason when the
country (session-derived, falling back to the account's stored country) is
sanctioned, or when evaluateKycGate({status, kycStatus}) denies (suspended /
KYC-rejected). It is wired into every money-bearing handler:
handleCheckoutCompleted— on block:suspendAccount(compliance_hold:{reason}), finalize without enqueueing onboarding, returnignored(never5xx, so Stripe doesn't retry-loop a permanent hold).handleSubscriptionChange— re-screens; without it a subscription event would flip a just-held account back toactive(audit V2).handleInvoicePaid— skip the recurring mint for a now-sanctioned customer.handleBoostCheckoutCompleted— the boost (one-time TSU) path; the last paid path that lacked the gate (audit STRIPE-DEEP-01).
handleCheckoutCompleted — the happy path¶
- Ignore
payment_status='unpaid'(async methods credit later onasync_payment_succeeded). - Three-step account matching:
stripe_customer_id→customer_email→ auto-provision (autoCreateAccountFromStripe, paid-signup-first; creates anemail_pendingshell and fires a magic-link viasendOnboardingMagicLink, 15-min TTL). complianceBlock(above).- Resolve tier from the first line item's price (
priceIdToTier). An unmapped price (resolves to free/0) is not applied to a paying checkout —CRITICALlog, leave for manual mapping (audit STRIPE-1); the saga still provisions. applyTierFromStripe(tier + monthly token quota).enqueueOnboarding(the saga hand-off, §3.8).recordProvisioningJob— durably persist the hand-off inprovisioning_jobswith the exactonboardingInput(so the reconciler can replay it). Best-effort: a bookkeeping failure never turns the webhook into a5xx(launch blocker A).
3.8 Enqueue hand-off — enqueueOnboarding¶
File: pkg/octopus/customer-app/src/lib/provisioning/enqueue.ts
Bridges a settled checkout into the Go orchestrator. Never throws — a hand-off
failure must not turn the Stripe webhook into a 5xx.
- If
PROVISIONING_TRIGGER_URLorPROVISIONING_TRIGGER_SECRETis unset → degrade safe: emitcustomer.onboarding.enqueue {transport:"log", reason:"trigger_not_configured"}and return{ enqueued: false }. The reconciler (or the cross-cloud orchestrator) picks it up. - Otherwise:
POSTthe snake_casewireBodyto the trigger URL withX-Provisioning-Signature = hex(HMAC-SHA256(body)), 10 s timeout. Non-2xx →{ enqueued: false, reason: 'http_<status>' }; network error →{ enqueued: false, reason: 'fetch_error' }.
The orchestrator's POST /trigger/onboarding
(provisioning-orchestrator/cmd/provisioner/main.go::onboardingTriggerHandler)
verifies the HMAC (constant-time, 64 KB body cap), unmarshals OnboardingInput,
requires stripe_session_id + email, and calls StartOnboarding. The
WorkflowID is "onboarding-" + StripeSessionID (worker.go), so a replayed
webhook dedupes to the same Temporal execution.
3.9 Onboarding saga — OnboardingWorkflow¶
File: pkg/octopus/provisioning-orchestrator/internal/workflows/onboarding_temporal.go
(activities: internal/activities/activities.go,
internal/activities/real_provider_callback.go)
The real Temporal workflow. Activity options:
StartToCloseTimeout=30 s, retry 5× with backoff (×2, cap 20 s). The whole saga
is bounded by WorkflowTimeout = 15 min (onboarding_run.go) — large enough to
clear every activity's retry budget plus the compensation chain so a
server-side timeout never fires mid-saga and skips the deferred rollback.
12 activities (steps 1–12; step 9b is the echo-back):
| # | Activity | Compensation pushed (LIFO) |
|---|---|---|
| 1 | KYCCheck (clean result; a non-pass is a workflow decision, ErrKYCFailed, not retried) |
— |
| 2 | MintDeploymentID (ULID dpl_) |
MarkDeploymentIDRolledBack |
| 3 | AllocateCell (GeoIP countryCode → nearest cell) |
ReleaseCellAllocation |
| 4 | IssueClientCert (Vault PKI per-cell CA) |
RevokeClientCert |
| 5 | ProvisionTenant (Postgres RLS + Redis namespace) |
MarkTenantRollbackPending |
| 6 | AllocateTokenQuota (tokensForSlug(packageSlug)) |
BurnTokenQuota |
| 7 | ReserveConnectArtSlot |
ReleaseConnectArtSlot |
| 8 | ReserveStunCoordSlot |
ReleaseStunCoordSlot |
| 9 | GenerateOnboardingJWT (→ https://dashboard.tlsstress.art/onboarding?token=…) |
— |
| 9b | LinkCustomerAccount (echo-back) — POST the dpl_ id + cell + quota to customer-app; last fallible step |
— (clears the chain on success) |
| 10 | SendWelcomeEmail (post no-return) |
— |
| 11 | AppendAuditChain (best-effort) |
— |
| 12 | NotifyAdminHighValue (AmountCents ≥ HighValueAmountCentsThreshold = $500, best-effort) |
— |
tokensForSlug (onboarding_run.go): free-trial→1,000, pro-monthly→100,000,
enterprise-monthly→1,000,000, defense-monthly→10,000,000, default→0.
The LinkCustomerAccount echo-back (Activity 9b)¶
File: internal/activities/real_provider_callback.go
It POSTs {stripe_session_id, status:"provisioned", deployment_id, cell_id,
tokens_quota} to customer-app's POST /api/internal/provisioning/callback,
HMAC-signed. Fail-closed when the callback URL/secret is unwired
(ErrNotProvisioned). A 404 (customer-app hasn't yet recorded the
provisioning_jobs row — webhook race) is ErrTransient so Temporal retries; the
row appears within seconds. Only a 200 confirms the link.
It runs as the last fallible step, right before the no-return welcome email,
precisely so that if it fails the deferred saga unwinds with the account never
flipped active (the provisioning_jobs row stays unprovisioned and the reconciler
re-drives) — instead of a permanent half-state.
The no-return point + ReportProvisioningFailed¶
linked := true flips after step 9b. Then rollback = nil clears the
compensation chain before the welcome email (audit V1): otherwise a
transient SendWelcomeEmail error (ErrTransient on a Postmark blip) would
unwind the whole saga — revoking the cert, zeroing the quota, releasing slots —
on a customer who is already active in customer-app, with nothing to repair it
(the provisioning_jobs row is already provisioned, excluded from the
reconciler).
On any terminal failure the deferred function runs the LIFO compensation stack
on a disconnected context (so workflow cancellation can't abort the rollback),
then — only if !linked — fires ReportProvisioningFailed (status failed
callback → markProvisioningJobFailed) and SendProvisioningFailedEmail with a
generic reason (never leaking KYC/sanctions detail). Both were previously dead
code with zero call sites.
flowchart TD
Start([Workflow start]) --> A1[1. KYCCheck]
A1 -->|!Passed| KFAIL[/return ErrKYCFailed/]
A1 -->|Passed| A2[2. MintDeploymentID]
A2 --> A3[3. AllocateCell]
A3 --> A4[4. IssueClientCert]
A4 --> A5[5. ProvisionTenant]
A5 --> A6[6. AllocateTokenQuota]
A6 --> A7[7. ReserveConnectArtSlot]
A7 --> A8[8. ReserveStunCoordSlot]
A8 --> A9[9. GenerateOnboardingJWT]
A9 --> A9b{9b. LinkCustomerAccount}
A9b -->|2xx| LINKED["linked = true<br/>rollback = nil<br/>NO-RETURN POINT"]
A9b -->|error| FAIL
LINKED --> A10[10. SendWelcomeEmail]
A10 -->|error| PROVISIONED_NOEMAIL["return error<br/>(provisioned; manual resend)<br/>NO unwind"]
A10 -->|ok| A11[11. AppendAuditChain best-effort]
A11 --> A12[12. NotifyAdminHighValue if ≥ $500]
A12 --> Done([OnboardingResult])
KFAIL --> FAIL
FAIL{{"deferred: err != nil"}}
FAIL --> COMP["Disconnected ctx<br/>run rollback stack LIFO:<br/>RevokeClientCert →<br/>ReleaseStun/ConnectArt →<br/>BurnTokenQuota →<br/>MarkTenantRollbackPending →<br/>ReleaseCellAllocation →<br/>MarkDeploymentIDRolledBack"]
COMP --> NL{linked?}
NL -->|false| REPORT["ReportProvisioningFailed<br/>(status:failed callback)<br/>+ SendProvisioningFailedEmail (generic)"]
NL -->|true| NOOP["no failure notice<br/>(customer IS provisioned)"]
REPORT --> End([terminal])
NOOP --> End
PROVISIONED_NOEMAIL --> End
3.10 Provisioning callback — POST /api/internal/provisioning/callback¶
File: pkg/octopus/customer-app/src/app/api/internal/provisioning/callback/route.ts
The customer-app receiver of the echo-back. Auth: HMAC-SHA256 over the raw body
(PROVISIONING_CALLBACK_SECRET preferred, falling back to
PROVISIONING_TRIGGER_SECRET), constant-time compared. Fails closed: no
secret → 503; bad sig → 401.
status: "failed"→markProvisioningJobFailed→200.status: "provisioned"→ requiresdeployment_id+cell_id(else400) →linkProvisionedAccount({stripeSessionId, deploymentId, cellId, tokens})which links the realdpl_id to the account (matched onstripe_session_id) and flips itactive. If no job matches the session →404so the orchestrator retries/escalates rather than assuming success.
This is the step that retires the pending- placeholder deployment_id: before
the callback existed, the account's deployment_id stayed pending- forever and
activateAccount() was dead code (launch blocker — Cluster A).
3.11 Reconciler watchdog — POST /api/cron/reconcile-provisioning¶
File: pkg/octopus/customer-app/src/app/api/cron/reconcile-provisioning/route.ts
Closes the durability gap: a hand-off can get stuck (trigger unreachable / cell
down / non-2xx / saga failed mid-flight). Auth: x-cron-secret (constant-time,
SHA-256). Always returns 200 (it reports).
getStaleProvisioningJobs({staleBefore: now − 5 min, maxAttempts: 8, limit: 25})— theSTALE_AFTER_MSwindow is long enough to let the orchestrator's own retries + callback land first.- For each job with a persisted
triggerPayload, replay it throughenqueueOnboarding(a row with no payload — pre-migration 0039 — is skipped, never re-minted from a guess), thenrecordProvisioningJobbumps attempts. A job pastMAX_ATTEMPTS=8stops being retried but stays visible (the/api/metricsstuck gauge + AlertManager page on it). A job linked by the callback meanwhile isprovisionedand drops out of the scan.
3.12 TBI download + license issue¶
- TBI download —
GET /api/account/download/[downloadKey](src/app/api/account/download/[downloadKey]/route.ts): authenticated session, active (non-suspended/deleted) account, and OFAC/embargo country check (isSanctionedCountry→ generic403 unavailable). Mints a 5-minute presigned S3 URL for the published bootstrap image (the bucket is private) and302-redirects. Rate-limited 60/h per account. - License issue —
POST /api/license/issue(src/app/api/license/issue/route.ts): authenticated + MFA-verified (!tfv→403 mfa_required), rate-limited 10/h/account.signLicense({accountId, tier, validitySec})(default 365 d, max 730 d); persists alicensesrow storing onlyjti+kid(never the token). The JWT is shown once for the operator to paste into the on-prem box.
3.13 On-prem install — bootstrap-controller first boot¶
File: pkg/bootstrap-controller/cmd/bootstrap-controller/main.go
First boot (default state dir /var/lib/tlsstress/bootstrap):
- Read the license JWT from
…/bootstrap/license.jwt(chmod 0600); empty/missing →Fatalfwith a paste hint. - Compute the hardware fingerprint (
internal/fingerprint): SHA-256 over the first available of/etc/machine-id, then/sys/class/dmi/id/product_uuid, then hostname + the first stable (non-virtual, non-docker/cni/veth) MAC, joined with|. Non-Linux dev hosts (no machine-id/DMI) fall back to asimulatedfingerprint that the cloud binds identically. POST /api/install/manifest(Bearer license JWT) — verify the JWT, cross-checklicensesfor revocation + account mismatch, bind to fingerprint on first call (atomicUPDATE … WHERE bound_fingerprint IS NULLso a stolen-JWT race can't double-bind), optionally enroll the L1 usage pubkey, and return the per-tier module image refs + cloud URLs +heartbeatIntervalSec=3600+gracePeriodSec=86400. Manifests are written under/var/lib/tlsstress/manifests/(an external systemd unit applies them).- Initial heartbeat (stamps the grace window) + initial usage drain (no-op on first boot).
- Enter the scheduled loop: hourly heartbeat + hourly usage report.
An L1 attestation signer (host-held Ed25519, ADR 0104) is loaded/created; an optional L2 lease broker can gate runs against the cloud once the heartbeat goes stale past the grace window.
3.14 Metering loop — heartbeat + usage report¶
- Heartbeat —
POST /api/license/heartbeat(src/app/api/license/heartbeat/route.ts): Bearer license JWT; updateslicenses.last_heartbeat_at+ appends alicense_heartbeatsrow; enforces the fingerprint binding and revocation; records L3 TPM posture/quote (ADR 0104). ReturnsgraceExpiresAt = now + 24 h. 24 h without a heartbeat → modules go read-only. - Usage report —
POST /api/usage/report(src/app/api/usage/report/route.ts): Bearer license JWT; append-only TSU consumption from the on-prem spend reporter. - License checks:
jtiexists,accountIdmatchesclaims.sub(defense-in-depth), not revoked, fingerprint-bound (an unbound license can never report —license_not_bound), fingerprint matches. - L1 attestation (ADR 0104): when a
usagePubkeyis enrolled and mode ≠off, every report should carry a signed envelope (license, fingerprint, seq range, nonce, events digest).envelope_missingis advisory-tolerant; a seq regression, invalid signature, or replayed nonce on a signed envelope is rejected hard even in advisory mode. - Atomic insert + debit (M1/A6): usage rows and the TSU debit
(
spendTSU(... , tx)) commit in one transaction, de-duped by(license_id, module, client_seq).InsufficientBalanceErrorrolls the whole unit back and returns402 quotaExceeded(controller pauses + prompts top-up);AccountSuspendedError→403 account_suspended(recognised by the controller'sIsLicenseRejected, which stops the loop). Remaining balance is read from the UTXO ledger on every path (single source of truth, C1).
4. Account status state machine¶
stateDiagram-v2
[*] --> signup: createCustomerAndUser
signup --> email_pending: verification email sent
email_pending --> kyc_pending: GET /api/auth/verify (markEmailVerified)
kyc_pending --> active: onboarding tier1 (activateIfKycPending) — FREE path
kyc_pending --> active: provisioning callback linkProvisionedAccount — PAID path
email_pending --> active: Stripe webhook applyTierFromStripe (paid-signup-first)
active --> suspended: complianceBlock / refund / dispute / past_due
suspended --> active: ops review (manual)
active --> churned: subscription deleted
active --> deletion_pending: DSAR (GDPR Art.17)
deletion_pending --> deleted: process-deletions cron
suspended --> [*]
deleted --> [*]
| Transition | Trigger | Code |
|---|---|---|
signup → email_pending |
verification email issued | signup/route.ts |
email_pending → kyc_pending |
email verified | auth/verify/route.ts::markEmailVerified |
kyc_pending → active (free) |
onboarding tier1 | onboarding/profile/route.ts::activateIfKycPending |
kyc_pending/email_pending → active (paid) |
webhook tier apply / saga callback | stripe/webhook::applyTierFromStripe, provisioning/callback::linkProvisionedAccount |
* → suspended |
compliance / refund / dispute / dunning | stripe/webhook::complianceBlock, handleChargeRefunded, handleDisputeCreated |
5. Idempotency, compensation, and failure-mode matrix¶
| Stage | Idempotency key | Failure mode → behaviour |
|---|---|---|
| Signup | email uniqueness (DB) | conflict race → neutral 201 |
| Verify | single-shot used_at UPDATE |
reused/expired token → 302 /verify-error |
| MFA verify (backup) | FOR UPDATE burn in tx |
racing same code → only one burns |
| Onboarding mint | sourceRef per tier |
re-submit completed tier → mint nothing |
| Stripe webhook | claimStripeWebhookEvent (Stripe event id) |
handler throw → 500, Stripe re-delivers; duplicate → 200 duplicate |
| Subscription refill mint | stripeInvoiceId (per-invoice) |
retry → no double credit |
| Enqueue | WorkflowID onboarding-{sessionId} |
trigger down → enqueued:false, reconciler re-drives |
| Saga activities | IdempotencyKey(wfID, name, attempt) |
activity error → Temporal retry 5×; terminal → LIFO compensation |
| Callback | matched on stripe_session_id |
no job → 404, orchestrator retries |
| Usage report | (license_id, module, client_seq) |
duplicate batch → 0 rows, 0 debit |
6. Configuration (environment)¶
| Variable | Used by | Effect when unset |
|---|---|---|
STRIPE_WEBHOOK_SECRET |
webhook | 503 webhook_not_configured |
PROVISIONING_TRIGGER_URL / PROVISIONING_TRIGGER_SECRET |
enqueueOnboarding + orchestrator trigger |
degrade safe → log-only enqueue; reconciler re-drives |
PROVISIONING_CALLBACK_SECRET (falls back to …TRIGGER_SECRET) |
provisioning callback | 503 (fail closed) |
CRON_SECRET |
reconcile cron | 503 cron_not_configured |
KMS_TOTP_KEY_ARN |
MFA verify | skip legacy-secret re-encryption |
USAGE_ATTESTATION_MODE |
usage report | advisory semantics (vs enforce) |
ECR_REGISTRY |
install manifest | no registryAuth (non-fatal) |
APP_BASE_URL / PUBLIC_APP_URL |
verify/magic links | defaults to https://app.tlsstress.art |
7. Cross-references¶
- ADR 0056 —
docs/ADR/0056-octopus-customer-auto-provisioning.md - ADR 0099 — Token Economy v5 (UTXO ledger as single source of truth)
- ADR 0104 —
docs/ADR/0104-usage-attestation-anti-fraud.md - Saga shape (pure, unit-testable) —
provisioning-orchestrator/internal/workflows/onboarding_run.go - KYC/compliance gate —
customer-app/src/lib/kyc/gate.ts