Skip to content

Time-sync fallbacks — when the lab cannot reach a public NTP source

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions. This document complements TIME_SYNC.md. Read the main doc first — that is what every operator needs. This file covers the harder cases:

  • The lab is air-gapped at the UCS / Nexus / NGFW level
  • But ONE element (typically the Cloner) has a path to the public internet through an ISP link or proxy
  • OR no element at all has internet, and the operator wants to use the laptop accessing the Dashboard as the time source

Three options, ranked by trustworthiness

# Option Trustworthy? When to use
1 GPS-disciplined stratum-1 appliance + lab stratum-2 ✅ forensic-grade Classified facilities, long-running engagements, regulatory compliance
2 TLSStress.Art NTP relay running on a node with internet (e.g. Cloner host) ✅ good for most labs Air-gap at UCS level, but at least one node can reach public NTP via an ISP link
3 Browser-clock fallback (operator's laptop) 🟡 NOT forensic Last resort. Run-relative timing only; do not cite resulting timestamps in reports

Option 1 is the gold standard and is documented in TIME_SYNC.md. This file focuses on Options 2 and 3.


Option 2 — TLSStress.Art NTP relay

A small Kubernetes deployment that runs chronyd on a chosen node, syncs to public NTP via that node's internet path, and serves time on UDP/123 to all other lab elements (UCS hosts, Nexus 9000, NGFW DUT) over the management network.

When this fits

  • The Cloner sits on a node with an ISP path (because the Cloner needs to clone real public sites)
  • The other UCS hosts, Nexus, and NGFW are all isolated from the public internet
  • You want them to inherit a reasonable clock without buying GPS hardware

Architecture

Public NTP servers
(time.cloudflare.com,
 time.google.com,
 pool.ntp.org)
        │
        ▼
[ Node X — has ISP path ]
   chronyd (relay)
   Pod hostNetwork=true
   Listens on UDP/123 of Node X's host IP
        │
        ▼
   ┌────────────────────────────┐
   │  Lab management network    │
   │  192.168.90.0/24           │
   └────────────────────────────┘
        │              │            │
        ▼              ▼            ▼
   UCS host(s)    Nexus 9000   NGFW DUT
   (chrony)       (NTP client)  (NTP client)

The relay node is the "stratum 2 anchor" of the lab. Every other element points at it.

Setting it up

Step 1 — choose and label the relay node.

# The node must already have a route to public NTP via its primary interface.
# Verify before applying:
ssh <relay-node>
chronyc -h time.cloudflare.com sources    # if chrony is preinstalled
# or just:
nc -uv time.cloudflare.com 123 < /dev/null
# expect "succeeded" / "open"

# Then label the node:
kubectl label node <relay-node-name> tlsstress.art/ntp-relay=true

Step 2 — apply the relay manifest.

kubectl apply -f k8s/optional/ntp-relay.yaml

This creates: - A namespace tlsstress-ntp-relay (PSA: privileged — required for SYS_TIME capability) - A ConfigMap with chrony.conf - A Deployment that runs on the labelled node with hostNetwork: true and exposes UDP/123 on the host's primary IP - A headless Service for in-cluster discovery

Step 3 — point lab elements at the relay.

On every other UCS host (/etc/chrony/chrony.conf):

server <relay-node-host-ip> iburst prefer minpoll 4 maxpoll 6
makestep 1.0 3
rtcsync
sudo systemctl restart chronyd
sudo chronyc waitsync 30 0.05

On the Cisco Nexus 9000:

configure terminal
ntp server <relay-node-host-ip> prefer
ntp source-interface mgmt0
end
write memory
show ntp peer-status

On the NGFW DUT — vendor-specific. For Cisco FTD/ASA:

configure terminal
ntp server <relay-node-host-ip>
write memory
show ntp associations

For Palo Alto:

set deviceconfig system ntp-servers primary-ntp-server-address <relay-node-host-ip>
commit

For Fortinet:

config system ntp
  set ntpsync enable
  set type custom
  config ntpserver
    edit 1
      set server <relay-node-host-ip>
    end
  end
end

Step 4 — verify.

# In Grafana, open: TLSStress.Art — Time Sync Status
# Confirm all lab hosts show green (drift < 100 ms)

# From any UCS:
./scripts/check-time-sync.sh --strict
# expect exit 0

Trade-offs

Pros: - No additional hardware - Reuses an already-internet-connected node (the Cloner host) — no extra attack surface - Automatic recovery: if the relay node reboots, it resyncs and lab follows - Forensically defensible — every host's NTP source is recorded, audit log shows the chain

⚠️ Cons: - Single point of failure — if the relay node loses internet, lab clocks slowly drift (chrony holds previous offset for hours, not days) - Requires NTP authentication keys for production-grade trust (without keys, anyone on the management network can answer NTP queries; this is a low-effort attack) - The relay node now has a privileged DaemonSet (SYS_TIME capability) — slightly elevated risk

Hardening — add NTP authentication (optional but recommended):

Edit the chrony.conf ConfigMap to add a shared key that lab clients also use:

keyfile /etc/chrony/keys
authselectmode require

Then mount the key file via a Secret. Doc on this is in docs/TIME_SYNC.md#authentication (future enhancement).


Option 3 — Browser-clock fallback (NOT forensic)

⚠️ Read this carefully. This option is documented because operators ask for it, but it carries serious caveats. The Dashboard does not ship with this enabled by default.

What it would do

The operator opens the Dashboard from their laptop. The Dashboard reads Date.now() from the browser, posts it to a privileged backend endpoint, which then runs a privileged DaemonSet across all nodes to step their clocks to the browser's time. This makes the whole lab "synced to the operator's laptop".

Why this is risky

Risk Explanation
Browser clocks are unreliable The operator's laptop may be sync'd to NTP, or it may be off by minutes (corporate proxy intercepting NTP, VM with paused virtual clock, mobile hotspot, jet-lagged hardware). The Dashboard cannot tell.
Different operators set different times If two engineers access the lab on different days from different laptops, the lab's clock walks. Cross-engagement comparison breaks silently.
Privileged attack surface A POST endpoint that sets system time on every node is a high-value target. A compromised browser session = compromised time on all hosts.
No forensic value Timestamps written under this scheme cannot be cited in legal proceedings, regulatory filings, or vendor benchmark comparisons. They are "relative time only".
Hides the real problem If the lab cannot reach NTP, the operator should fix that root cause (Option 2 or GPS), not paper over it.

When it is acceptable

Only when ALL of the following are true: 1. No node in the lab has any path to a public or private NTP source 2. The engagement is internal-only — not customer-facing, not for any report that will leave the room 3. The operator explicitly accepts the "non-forensic" caveat and signs the audit log 4. Results from this run will be marked in the database as time_source = browser_fallback and filtered out of cross-engagement comparison queries

What we have built (and what we have NOT)

We have built a guarded version of the request side of this flow:

  • The Dashboard exposes POST /api/time-sync/set-from-browser (admin-only). It accepts { browserTimestampMs, browserTimezone, browserOffsetMinutes, acknowledgement: "NOT_FORENSIC" } and refuses if the browser's claimed time is more than ±24 h off the server's current best guess.
  • It does NOT auto-execute a clock change. Instead it writes an audit row (with forensic_grade: false) and returns the exact kubectl exec ... chronyc settime command for the operator to run on the cluster control plane.
  • The admin UI at /admin/time-sync builds the request, requires an explicit "I acknowledge this is NOT forensic" checkbox, and surfaces the manual command. See dashboard/src/app/api/time-sync/set-from-browser/route.ts.

We have not (and will not) build the auto-execute path:

  • The implementation cost is high (privileged DaemonSet + audit-log enforcement on the apply step)
  • The operational risk is real (foot-gun for engineers in a hurry)
  • The actual demand is low (when would you have a Dashboard-accessible laptop but not even one node with internet for the relay? Almost never in practice)
  • The current request-only path is sufficient: the operator copy-pastes one kubectl command and owns the apply

If you have a strong use case for the full automated path, open an issue using the access-request template. We will assess case-by-case and may build it as an opt-in module with hard-coded "non-forensic" markers on resulting runs.

What you can do today instead

  • Use the admin UI flow above — open /admin/time-sync, accept the NOT_FORENSIC caveat, copy the returned kubectl chronyc settime command, run it on the control plane. The audit row is automatically written with forensic_grade: false and time_source = browser_fallback.
  • Set time manually on each host when first installing, using your laptop as the reference but documenting it:

    # On the laptop:
    date -u  # note the UTC time
    
    # On each UCS:
    sudo timedatectl set-time '2026-05-06 14:35:00 UTC'  # within ~1s of the laptop's note
    
    # Then:
    sudo systemctl stop chronyd  # disable NTP attempts that will fail
    
    This is honest about being non-forensic — the operator records the manual set in the engagement notes.

  • Use Option 2 with a USB-tethered phone as the relay node's internet source if even ISP routing is forbidden. A phone hotspot to one node = one path = enough.


How the Grafana panel surfaces this

The dashboard TLSStress.Art — Time Sync Status (loaded automatically when the kustomization is applied) shows:

  1. Stat panel — clock skew per host: green < 100 ms · yellow < 1 s · orange < 5 s · red > 5 s
  2. Stat panel — sync status per host: ✓ synced / ✗ NOT synced
  3. Stat panel — multi-host divergence: max(offset) − min(offset) across the cluster
  4. Time-series — drift over time per host: catches sustained trends or run-correlated spikes
  5. Stat panel — minutes since last sync update per host: green < 5 min · red > 30 min

Open this dashboard before starting any test plan. If anything is not green, fix the time-sync first — do not run.