Time-sync fallbacks — when the lab cannot reach a public NTP source¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions. This document complements
TIME_SYNC.md. Read the main doc first — that is what every operator needs. This file covers the harder cases:
- The lab is air-gapped at the UCS / Nexus / NGFW level
- But ONE element (typically the Cloner) has a path to the public internet through an ISP link or proxy
- OR no element at all has internet, and the operator wants to use the laptop accessing the Dashboard as the time source
Three options, ranked by trustworthiness¶
| # | Option | Trustworthy? | When to use |
|---|---|---|---|
| 1 | GPS-disciplined stratum-1 appliance + lab stratum-2 | ✅ forensic-grade | Classified facilities, long-running engagements, regulatory compliance |
| 2 | TLSStress.Art NTP relay running on a node with internet (e.g. Cloner host) | ✅ good for most labs | Air-gap at UCS level, but at least one node can reach public NTP via an ISP link |
| 3 | Browser-clock fallback (operator's laptop) | 🟡 NOT forensic | Last resort. Run-relative timing only; do not cite resulting timestamps in reports |
Option 1 is the gold standard and is documented in TIME_SYNC.md. This file focuses on Options 2 and 3.
Option 2 — TLSStress.Art NTP relay¶
A small Kubernetes deployment that runs chronyd on a chosen node, syncs to public NTP via that node's internet path, and serves time on UDP/123 to all other lab elements (UCS hosts, Nexus 9000, NGFW DUT) over the management network.
When this fits¶
- The Cloner sits on a node with an ISP path (because the Cloner needs to clone real public sites)
- The other UCS hosts, Nexus, and NGFW are all isolated from the public internet
- You want them to inherit a reasonable clock without buying GPS hardware
Architecture¶
Public NTP servers
(time.cloudflare.com,
time.google.com,
pool.ntp.org)
│
▼
[ Node X — has ISP path ]
chronyd (relay)
Pod hostNetwork=true
Listens on UDP/123 of Node X's host IP
│
▼
┌────────────────────────────┐
│ Lab management network │
│ 192.168.90.0/24 │
└────────────────────────────┘
│ │ │
▼ ▼ ▼
UCS host(s) Nexus 9000 NGFW DUT
(chrony) (NTP client) (NTP client)
The relay node is the "stratum 2 anchor" of the lab. Every other element points at it.
Setting it up¶
Step 1 — choose and label the relay node.
# The node must already have a route to public NTP via its primary interface.
# Verify before applying:
ssh <relay-node>
chronyc -h time.cloudflare.com sources # if chrony is preinstalled
# or just:
nc -uv time.cloudflare.com 123 < /dev/null
# expect "succeeded" / "open"
# Then label the node:
kubectl label node <relay-node-name> tlsstress.art/ntp-relay=true
Step 2 — apply the relay manifest.
kubectl apply -f k8s/optional/ntp-relay.yaml
This creates:
- A namespace tlsstress-ntp-relay (PSA: privileged — required for SYS_TIME capability)
- A ConfigMap with chrony.conf
- A Deployment that runs on the labelled node with hostNetwork: true and exposes UDP/123 on the host's primary IP
- A headless Service for in-cluster discovery
Step 3 — point lab elements at the relay.
On every other UCS host (/etc/chrony/chrony.conf):
server <relay-node-host-ip> iburst prefer minpoll 4 maxpoll 6
makestep 1.0 3
rtcsync
sudo systemctl restart chronyd
sudo chronyc waitsync 30 0.05
On the Cisco Nexus 9000:
configure terminal
ntp server <relay-node-host-ip> prefer
ntp source-interface mgmt0
end
write memory
show ntp peer-status
On the NGFW DUT — vendor-specific. For Cisco FTD/ASA:
configure terminal
ntp server <relay-node-host-ip>
write memory
show ntp associations
For Palo Alto:
set deviceconfig system ntp-servers primary-ntp-server-address <relay-node-host-ip>
commit
For Fortinet:
config system ntp
set ntpsync enable
set type custom
config ntpserver
edit 1
set server <relay-node-host-ip>
end
end
end
Step 4 — verify.
# In Grafana, open: TLSStress.Art — Time Sync Status
# Confirm all lab hosts show green (drift < 100 ms)
# From any UCS:
./scripts/check-time-sync.sh --strict
# expect exit 0
Trade-offs¶
✅ Pros: - No additional hardware - Reuses an already-internet-connected node (the Cloner host) — no extra attack surface - Automatic recovery: if the relay node reboots, it resyncs and lab follows - Forensically defensible — every host's NTP source is recorded, audit log shows the chain
⚠️ Cons:
- Single point of failure — if the relay node loses internet, lab clocks slowly drift (chrony holds previous offset for hours, not days)
- Requires NTP authentication keys for production-grade trust (without keys, anyone on the management network can answer NTP queries; this is a low-effort attack)
- The relay node now has a privileged DaemonSet (SYS_TIME capability) — slightly elevated risk
Hardening — add NTP authentication (optional but recommended):
Edit the chrony.conf ConfigMap to add a shared key that lab clients also use:
keyfile /etc/chrony/keys
authselectmode require
Then mount the key file via a Secret. Doc on this is in docs/TIME_SYNC.md#authentication (future enhancement).
Option 3 — Browser-clock fallback (NOT forensic)¶
⚠️ Read this carefully. This option is documented because operators ask for it, but it carries serious caveats. The Dashboard does not ship with this enabled by default.
What it would do¶
The operator opens the Dashboard from their laptop. The Dashboard reads Date.now() from the browser, posts it to a privileged backend endpoint, which then runs a privileged DaemonSet across all nodes to step their clocks to the browser's time. This makes the whole lab "synced to the operator's laptop".
Why this is risky¶
| Risk | Explanation |
|---|---|
| Browser clocks are unreliable | The operator's laptop may be sync'd to NTP, or it may be off by minutes (corporate proxy intercepting NTP, VM with paused virtual clock, mobile hotspot, jet-lagged hardware). The Dashboard cannot tell. |
| Different operators set different times | If two engineers access the lab on different days from different laptops, the lab's clock walks. Cross-engagement comparison breaks silently. |
| Privileged attack surface | A POST endpoint that sets system time on every node is a high-value target. A compromised browser session = compromised time on all hosts. |
| No forensic value | Timestamps written under this scheme cannot be cited in legal proceedings, regulatory filings, or vendor benchmark comparisons. They are "relative time only". |
| Hides the real problem | If the lab cannot reach NTP, the operator should fix that root cause (Option 2 or GPS), not paper over it. |
When it is acceptable¶
Only when ALL of the following are true:
1. No node in the lab has any path to a public or private NTP source
2. The engagement is internal-only — not customer-facing, not for any report that will leave the room
3. The operator explicitly accepts the "non-forensic" caveat and signs the audit log
4. Results from this run will be marked in the database as time_source = browser_fallback and filtered out of cross-engagement comparison queries
What we have built (and what we have NOT)¶
We have built a guarded version of the request side of this flow:
- The Dashboard exposes
POST /api/time-sync/set-from-browser(admin-only). It accepts{ browserTimestampMs, browserTimezone, browserOffsetMinutes, acknowledgement: "NOT_FORENSIC" }and refuses if the browser's claimed time is more than ±24 h off the server's current best guess. - It does NOT auto-execute a clock change. Instead it writes an audit row (with
forensic_grade: false) and returns the exactkubectl exec ... chronyc settimecommand for the operator to run on the cluster control plane. - The admin UI at
/admin/time-syncbuilds the request, requires an explicit "I acknowledge this is NOT forensic" checkbox, and surfaces the manual command. Seedashboard/src/app/api/time-sync/set-from-browser/route.ts.
We have not (and will not) build the auto-execute path:
- The implementation cost is high (privileged DaemonSet + audit-log enforcement on the apply step)
- The operational risk is real (foot-gun for engineers in a hurry)
- The actual demand is low (when would you have a Dashboard-accessible laptop but not even one node with internet for the relay? Almost never in practice)
- The current request-only path is sufficient: the operator copy-pastes one
kubectlcommand and owns the apply
If you have a strong use case for the full automated path, open an issue using the access-request template. We will assess case-by-case and may build it as an opt-in module with hard-coded "non-forensic" markers on resulting runs.
What you can do today instead¶
- Use the admin UI flow above — open
/admin/time-sync, accept the NOT_FORENSIC caveat, copy the returnedkubectl chronyc settimecommand, run it on the control plane. The audit row is automatically written withforensic_grade: falseandtime_source = browser_fallback. -
Set time manually on each host when first installing, using your laptop as the reference but documenting it:
This is honest about being non-forensic — the operator records the manual set in the engagement notes.# On the laptop: date -u # note the UTC time # On each UCS: sudo timedatectl set-time '2026-05-06 14:35:00 UTC' # within ~1s of the laptop's note # Then: sudo systemctl stop chronyd # disable NTP attempts that will fail -
Use Option 2 with a USB-tethered phone as the relay node's internet source if even ISP routing is forbidden. A phone hotspot to one node = one path = enough.
How the Grafana panel surfaces this¶
The dashboard TLSStress.Art — Time Sync Status (loaded automatically when the kustomization is applied) shows:
- Stat panel — clock skew per host: green < 100 ms · yellow < 1 s · orange < 5 s · red > 5 s
- Stat panel — sync status per host: ✓ synced / ✗ NOT synced
- Stat panel — multi-host divergence: max(offset) − min(offset) across the cluster
- Time-series — drift over time per host: catches sustained trends or run-correlated spikes
- Stat panel — minutes since last sync update per host: green < 5 min · red > 30 min
Open this dashboard before starting any test plan. If anything is not green, fix the time-sync first — do not run.
Related¶
TIME_SYNC.md— the basics every operator needs to knowAIRGAP_INSTALL.md— air-gap install, which is the parent scenario for Option 2MONITORING_TEST_VALIDITY.md— the broader validity-alert framework time-sync alerts plug intoUSAGE_POLICY.md— license restrictions apply equally to runs done under any time-sync option