Syslog correlation — TLSStress.Art¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions. This is the operator's view of the lab from the lab's own perspective — what the Cisco Nexus 9000, the NGFW DUT, and the UCS hosts say happened during a run, correlated with the metrics the Dashboard already shows.
Without it: you see a p99 spike at 14:32 and have to SSH into each device to find out why. With it: you open one Grafana dashboard, see the spike, and the matching log events from all three devices appear side-by-side.
What this is NOT¶
It is not a SIEM in the security-monitoring sense. It is a test-run forensics layer focused on operational correlation. The log retention is short (15 days default), the queries are run-specific, and the ACL is the same Dashboard ACL — not a hardened security operations workflow.
For actual security monitoring, your organization should run a separate SIEM (Splunk, Elastic, Cisco SecureX, etc.). This stack does not replace that.
Architecture¶
┌──────────────┐ UDP/TCP 514 ┌─────────────┐
│ Cisco │────────────────────────→│ │
│ Nexus 9000 │ RFC 5424 syslog │ │
└──────────────┘ │ │
│ │
┌──────────────┐ UDP/TCP 514 │ Promtail │ ┌──────┐
│ NGFW DUT │────────────────────────→│ syslog │──→│ Loki │
│ FTD/ASA/ │ RFC 3164 + 5424 │ receiver │ └──────┘
│ Palo/Forti │ │ Deployment │ │
└──────────────┘ │ in obs ns │ │
│ │ ▼
┌──────────────┐ UDP/TCP 514 │ │ ┌─────────┐
│ UCS host │────────────────────────→│ │ │ Grafana │
│ rsyslog │ │ │ │ Explore │
│ (Ubuntu) │ └─────────────┘ └─────────┘
└──────────────┘ │
▼
┌────────────────────┐
│ TLSStress.Art — │
│ Syslog Correlation │
│ dashboard │
└────────────────────┘
Loki labels emitted on every event:
| Label | Source | Example values |
|---|---|---|
app |
Promtail external_label | tlsstress (always) |
cluster |
Promtail external_label | web-agent-cluster |
device_type |
Hostname regex | nexus, ngfw, ucs |
device_role |
Derived from device_type |
switch, firewall, host |
device_hostname |
RFC 5424 hostname field | nexus-1.lab.example.com |
device_app |
RFC 5424 APP-NAME field | mts, eth_port_chan_mgr, kernel, decryptd |
severity |
RFC 5424 severity | info, notice, warning, err, crit, alert, emerg |
facility |
RFC 5424 facility | local0, daemon, kern, etc. |
channel |
Promtail listener | syslog-udp, syslog-tcp |
Pointing each device at the syslog endpoint¶
⚠️ Mandatory: syslog must travel over OOBI only — never over the data plane. This is enforced by two
NetworkPolicyresources at the cluster level (syslog-oobi-onlyallows ingress from 192.168.90.0/24;syslog-deny-data-planedenies ingress from persona/agent CIDRs). Sending syslog over a data-plane VLAN would let it be measured BY the test bed as if it were test traffic, contaminating per-cycle metrics. Full rationale and operator obligations are in SYSLOG_OPERATIONS.md → "Mandatory prerequisite — syslog over OOBI only".
The Promtail Service exposes NodePort 30514 (mapped to the well-known syslog port 514 internally). Devices send to <any-cluster-node-ip>:30514 for UDP or TCP. Pick the OOBI management IP of one of the UCS hosts.
The standard syslog port is 514. We use NodePort 30514 because Kubernetes does not normally allow privileged ports below 1024. If your operator policy prefers port 514 from the device side, set up a firewall NAT rule on the UCS host:
<ucs-oobi-ip>:514 → <ucs-oobi-ip>:30514.
Cisco Nexus 9000 (NX-OS)¶
configure terminal
! Use RFC 5424 — modern, structured, easier for Promtail to parse
logging timestamp microseconds
logging origin-id hostname
! Forward to the Promtail receiver
logging server <ucs-mgmt-ip> 5 use-vrf management transport udp port 30514
! 5 = severity threshold (notifications and worse). Adjust to taste:
! 0 emergencies | 1 alerts | 2 critical | 3 errors
! 4 warnings | 5 notifications | 6 informational | 7 debugging
! Logging buffer (so the Nexus retains last events even if syslog is down)
logging logfile messages 6 size 1048576
! Important categories for an NGFW test bed:
logging level mts 5
logging level monitor 5
logging level eth_port_channel_mgr 5
logging level pixm 5 ! port channel manager
logging level fcoe_mgr 4
end
write memory
show logging server
Verify on the Nexus that the message buffer is being drained:
show logging last 20
Cisco FTD / ASA / Firepower¶
For ASA syntax (FTD via FlexConfig works similarly):
configure terminal
logging enable
logging timestamp
logging buffer-size 1048576
logging buffered notifications
! Forward
logging trap notifications
logging host <data-iface> <ucs-mgmt-ip> 17/30514
! 17 = UDP. Use 6/30514 for TCP.
! For decryption diagnostics specifically (the events we care about
! most for THIS test bed):
logging class crypto 5
logging class ssl 5
logging class connection 4
end
write memory
show logging
For FTD via FMC: navigate to Devices → Platform Settings → Syslog, set:
- Server: <ucs-mgmt-ip> UDP/30514
- Severity: notifications
- Categories: connection, crypto, SSL, intrusion
Palo Alto (PAN-OS)¶
! Enter configure mode
configure
! Define syslog server profile
set shared log-settings syslog tlsstress server tlsstress \
transport UDP port 30514 server <ucs-mgmt-ip> facility LOG_LOCAL3 format BSD
! Forward system log (device events, not traffic)
set shared log-settings system match-list tlsstress-system filter "All Logs" \
send-syslog [ tlsstress ]
! Forward decryption / SSL log specifically
set shared log-settings system match-list tlsstress-decrypt filter "(subtype eq decryption)" \
send-syslog [ tlsstress ]
commit
Fortinet (FortiOS)¶
config log syslogd setting
set status enable
set server "<ucs-mgmt-ip>"
set port 30514
set mode udp
set facility local3
set source-ip "<ngfw-mgmt-ip>"
set format default
end
config log syslogd filter
set severity notification
set ssl enable
end
UCS host (Ubuntu rsyslog)¶
/etc/rsyslog.d/50-tlsstress.conf:
# Forward auth + kernel + system messages to the TLSStress.Art syslog endpoint
*.notice;auth.info @<ucs-mgmt-ip>:30514
# Use double-@ for TCP delivery (more reliable for high-volume logs):
# *.notice;auth.info @@<ucs-mgmt-ip>:30514
Then:
sudo systemctl restart rsyslog
sudo logger -t tlsstress-test "syslog forwarding test from $(hostname)"
Open Grafana Explore, query {device_hostname="<your-host>"} — you should see the test message within 5 seconds.
Verifying the pipeline end-to-end¶
After configuring at least one device:
# 1. Confirm the Promtail pod is up:
kubectl get pods -n observability -l app=promtail-syslog
# expect: 1 Running
# 2. Confirm the Service is reachable from the device side:
kubectl get svc -n observability promtail-syslog
# note the NodePort — should be 30514
# 3. From the device (or a UCS host), send a test message:
logger -n <ucs-mgmt-ip> -P 30514 -d -t tlsstress-test "smoke from $(hostname)"
# 4. Tail Loki via the dashboard or via Grafana Explore:
# Query: {app="tlsstress", device_hostname=~".+"}
# expect: the test message appears within 5 seconds.
The Grafana dashboard¶
TLSStress.Art — Syslog Correlation (Lab Elements) ships with the kustomize overlay. It has:
- Top-level summary — events per minute by device type; log volume per host; severity distribution (warning+ only)
- Side-by-side metrics + logs — agent p99 latency line chart on the left, live syslog stream (warning+) on the right. Spikes in latency line up with bursts in the log stream.
- Per-device deep-dives — separate panels for Nexus events (warning+), NGFW events (info+ to catch decrypt-rule matches and connection deny logs), UCS kernel/systemd (warning+).
Use the device_type and severity filters at the top of the dashboard to narrow the focus during a run.
Common patterns to watch during a run¶
| Symptom | What to search for in the logs |
|---|---|
| TLS handshakes intermittently failing | NGFW decryption events with severity notice+; look for "no decryption profile matched" |
| Persona pod CrashLoopBackOff | UCS device_app=kernel + severity=err+; look for OOM-killer, NIC reset, segfault |
| p99 spike with no agent error | Nexus device_app=eth_port_chan_mgr warnings (port flap, MAC move); Nexus QoS queue-depth alerts |
| TLS Decrypt Probe says "off" | NGFW device_app=decryptd (or vendor equivalent); look for "decryption disabled on interface" |
| Run aborts with fleet readiness alert | Cross-check NGFW connection-deny logs at the same timestamp |
Retention and storage¶
Loki default retention in this stack is 15 days. Run-specific evidence beyond that needs to be exported to a separate audit storage. The Test Run Report (Phase 4 future) will embed selected log excerpts as part of the signed PDF for permanent forensic record.
Air-gap behavior¶
In an air-gapped install, syslog forwarding works internally to the lab — Promtail, Loki, and Grafana run inside the cluster. No external dependency. The only thing the operator must verify is that device-to-Promtail UDP/TCP 514 is unblocked by any internal firewall ACL.
If the lab has a Cisco firewall between the management network and the UCS, ensure ACL allows traffic to <ucs-mgmt-ip>:30514 (or 514 if using NAT redirect).
Related¶
MONITORING_TEST_VALIDITY.md— the broader validity-alert framework that consumes these logsTIME_SYNC.md— without accurate clocks, log correlation is meaninglessAIRGAP_INSTALL.md— the parent scenario where syslog is the only diagnostics channelUSAGE_POLICY.md— license restrictions apply to the logs collected here as well