Syslog correlation — TLSStress.Art¶

Read in your language: English · Português · Español

Scope status (post-Scope-Freeze 2026-05-10) — See ARCHITECTURE.md for the canonical 37 MÓDULOs + 7 Test Kinds + DOM/CPOS/PIE-PA safety architecture. ADRs 0014, 0019-0025 cover post-Freeze additions. This is the operator's view of the lab from the lab's own perspective — what the Cisco Nexus 9000, the NGFW DUT, and the UCS hosts say happened during a run, correlated with the metrics the Dashboard already shows.

Without it: you see a p99 spike at 14:32 and have to SSH into each device to find out why. With it: you open one Grafana dashboard, see the spike, and the matching log events from all three devices appear side-by-side.

What this is NOT¶

It is not a SIEM in the security-monitoring sense. It is a test-run forensics layer focused on operational correlation. The log retention is short (15 days default), the queries are run-specific, and the ACL is the same Dashboard ACL — not a hardened security operations workflow.

For actual security monitoring, your organization should run a separate SIEM (Splunk, Elastic, Cisco SecureX, etc.). This stack does not replace that.

Architecture¶

   ┌──────────────┐      UDP/TCP 514        ┌─────────────┐
   │  Cisco       │────────────────────────→│             │
   │  Nexus 9000  │  RFC 5424 syslog        │             │
   └──────────────┘                          │             │
                                             │             │
   ┌──────────────┐      UDP/TCP 514        │  Promtail   │   ┌──────┐
   │  NGFW DUT    │────────────────────────→│  syslog     │──→│ Loki │
   │  FTD/ASA/    │  RFC 3164 + 5424        │  receiver   │   └──────┘
   │  Palo/Forti  │                          │  Deployment │      │
   └──────────────┘                          │  in obs ns  │      │
                                             │             │      ▼
   ┌──────────────┐      UDP/TCP 514        │             │  ┌─────────┐
   │  UCS host    │────────────────────────→│             │  │ Grafana │
   │  rsyslog     │                          │             │  │ Explore │
   │  (Ubuntu)    │                          └─────────────┘  └─────────┘
   └──────────────┘                                                 │
                                                                    ▼
                                                          ┌────────────────────┐
                                                          │ TLSStress.Art —    │
                                                          │ Syslog Correlation │
                                                          │ dashboard          │
                                                          └────────────────────┘

Loki labels emitted on every event:

Label	Source	Example values
`app`	Promtail external_label	`tlsstress` (always)
`cluster`	Promtail external_label	`web-agent-cluster`
`device_type`	Hostname regex	`nexus`, `ngfw`, `ucs`
`device_role`	Derived from `device_type`	`switch`, `firewall`, `host`
`device_hostname`	RFC 5424 hostname field	`nexus-1.lab.example.com`
`device_app`	RFC 5424 APP-NAME field	`mts`, `eth_port_chan_mgr`, `kernel`, `decryptd`
`severity`	RFC 5424 severity	`info`, `notice`, `warning`, `err`, `crit`, `alert`, `emerg`
`facility`	RFC 5424 facility	`local0`, `daemon`, `kern`, etc.
`channel`	Promtail listener	`syslog-udp`, `syslog-tcp`

Pointing each device at the syslog endpoint¶

⚠️ Mandatory: syslog must travel over OOBI only — never over the data plane. This is enforced by two NetworkPolicy resources at the cluster level (syslog-oobi-only allows ingress from 192.168.90.0/24; syslog-deny-data-plane denies ingress from persona/agent CIDRs). Sending syslog over a data-plane VLAN would let it be measured BY the test bed as if it were test traffic, contaminating per-cycle metrics. Full rationale and operator obligations are in SYSLOG_OPERATIONS.md → "Mandatory prerequisite — syslog over OOBI only".

The Promtail Service exposes NodePort 30514 (mapped to the well-known syslog port 514 internally). Devices send to <any-cluster-node-ip>:30514 for UDP or TCP. Pick the OOBI management IP of one of the UCS hosts.

The standard syslog port is 514. We use NodePort 30514 because Kubernetes does not normally allow privileged ports below 1024. If your operator policy prefers port 514 from the device side, set up a firewall NAT rule on the UCS host: <ucs-oobi-ip>:514 → <ucs-oobi-ip>:30514.

Cisco Nexus 9000 (NX-OS)¶

configure terminal

! Use RFC 5424 — modern, structured, easier for Promtail to parse
logging timestamp microseconds
logging origin-id hostname

! Forward to the Promtail receiver
logging server <ucs-mgmt-ip> 5 use-vrf management transport udp port 30514
! 5 = severity threshold (notifications and worse). Adjust to taste:
!   0 emergencies | 1 alerts | 2 critical | 3 errors
!   4 warnings   | 5 notifications | 6 informational | 7 debugging

! Logging buffer (so the Nexus retains last events even if syslog is down)
logging logfile messages 6 size 1048576

! Important categories for an NGFW test bed:
logging level mts 5
logging level monitor 5
logging level eth_port_channel_mgr 5
logging level pixm 5      ! port channel manager
logging level fcoe_mgr 4

end
write memory
show logging server

Verify on the Nexus that the message buffer is being drained:

show logging last 20

Cisco FTD / ASA / Firepower¶

For ASA syntax (FTD via FlexConfig works similarly):

configure terminal
logging enable
logging timestamp
logging buffer-size 1048576
logging buffered notifications

! Forward
logging trap notifications
logging host <data-iface> <ucs-mgmt-ip> 17/30514
! 17 = UDP. Use 6/30514 for TCP.

! For decryption diagnostics specifically (the events we care about
! most for THIS test bed):
logging class crypto 5
logging class ssl 5
logging class connection 4

end
write memory
show logging

For FTD via FMC: navigate to Devices → Platform Settings → Syslog, set: - Server: <ucs-mgmt-ip> UDP/30514 - Severity: notifications - Categories: connection, crypto, SSL, intrusion

Palo Alto (PAN-OS)¶

! Enter configure mode
configure

! Define syslog server profile
set shared log-settings syslog tlsstress server tlsstress \
    transport UDP port 30514 server <ucs-mgmt-ip> facility LOG_LOCAL3 format BSD

! Forward system log (device events, not traffic)
set shared log-settings system match-list tlsstress-system filter "All Logs" \
    send-syslog [ tlsstress ]

! Forward decryption / SSL log specifically
set shared log-settings system match-list tlsstress-decrypt filter "(subtype eq decryption)" \
    send-syslog [ tlsstress ]

commit

Fortinet (FortiOS)¶

config log syslogd setting
    set status enable
    set server "<ucs-mgmt-ip>"
    set port 30514
    set mode udp
    set facility local3
    set source-ip "<ngfw-mgmt-ip>"
    set format default
end

config log syslogd filter
    set severity notification
    set ssl enable
end

UCS host (Ubuntu rsyslog)¶

/etc/rsyslog.d/50-tlsstress.conf:

# Forward auth + kernel + system messages to the TLSStress.Art syslog endpoint
*.notice;auth.info @<ucs-mgmt-ip>:30514
# Use double-@ for TCP delivery (more reliable for high-volume logs):
# *.notice;auth.info @@<ucs-mgmt-ip>:30514

Then:

sudo systemctl restart rsyslog
sudo logger -t tlsstress-test "syslog forwarding test from $(hostname)"

Open Grafana Explore, query {device_hostname="<your-host>"} — you should see the test message within 5 seconds.

Verifying the pipeline end-to-end¶

After configuring at least one device:

# 1. Confirm the Promtail pod is up:
kubectl get pods -n observability -l app=promtail-syslog
# expect: 1 Running

# 2. Confirm the Service is reachable from the device side:
kubectl get svc -n observability promtail-syslog
# note the NodePort — should be 30514

# 3. From the device (or a UCS host), send a test message:
logger -n <ucs-mgmt-ip> -P 30514 -d -t tlsstress-test "smoke from $(hostname)"

# 4. Tail Loki via the dashboard or via Grafana Explore:
#    Query: {app="tlsstress", device_hostname=~".+"}
# expect: the test message appears within 5 seconds.

The Grafana dashboard¶

TLSStress.Art — Syslog Correlation (Lab Elements) ships with the kustomize overlay. It has:

Top-level summary — events per minute by device type; log volume per host; severity distribution (warning+ only)
Side-by-side metrics + logs — agent p99 latency line chart on the left, live syslog stream (warning+) on the right. Spikes in latency line up with bursts in the log stream.
Per-device deep-dives — separate panels for Nexus events (warning+), NGFW events (info+ to catch decrypt-rule matches and connection deny logs), UCS kernel/systemd (warning+).

Use the device_type and severity filters at the top of the dashboard to narrow the focus during a run.

Common patterns to watch during a run¶

Symptom	What to search for in the logs
TLS handshakes intermittently failing	NGFW `decryption` events with severity `notice`+; look for "no decryption profile matched"
Persona pod CrashLoopBackOff	UCS `device_app=kernel` + `severity=err`+; look for OOM-killer, NIC reset, segfault
p99 spike with no agent error	Nexus `device_app=eth_port_chan_mgr` warnings (port flap, MAC move); Nexus QoS queue-depth alerts
TLS Decrypt Probe says "off"	NGFW `device_app=decryptd` (or vendor equivalent); look for "decryption disabled on interface"
Run aborts with fleet readiness alert	Cross-check NGFW connection-deny logs at the same timestamp

Retention and storage¶

Loki default retention in this stack is 15 days. Run-specific evidence beyond that needs to be exported to a separate audit storage. The Test Run Report (Phase 4 future) will embed selected log excerpts as part of the signed PDF for permanent forensic record.

Air-gap behavior¶

In an air-gapped install, syslog forwarding works internally to the lab — Promtail, Loki, and Grafana run inside the cluster. No external dependency. The only thing the operator must verify is that device-to-Promtail UDP/TCP 514 is unblocked by any internal firewall ACL.

If the lab has a Cisco firewall between the management network and the UCS, ensure ACL allows traffic to <ucs-mgmt-ip>:30514 (or 514 if using NAT redirect).

MONITORING_TEST_VALIDITY.md — the broader validity-alert framework that consumes these logs
TIME_SYNC.md — without accurate clocks, log correlation is meaningless
AIRGAP_INSTALL.md — the parent scenario where syslog is the only diagnostics channel
USAGE_POLICY.md — license restrictions apply to the logs collected here as well