gNMI Roadmap — TLSStress.Art¶
Read in your language: English · Português · Español
Scope status (post-Scope-Freeze 2026-05-10) — gNMI integration sits inside MÓDULO API INFRA.Art (MGMT-light plane). Production-mode writes proxy through RELAY.Art per ADR 0020. Real implementation pending — this remains a roadmap doc; the slot for landing is
dashboard/src/lib/dut-api/gnmi-*.ts.
This document captures the design + decision rationale for adding gNMI (gRPC Network Management Interface) support to TLSStress.Art. It is a roadmap document — no code is shipped under this filename in the repository today. When implementation lands, this document is amended with the actual paths and operator guidance.
The audience: operators, partners, customers, and Cisco Legal stakeholders who ask "do you support gNMI?" and need a definitive answer. The answer today is yes, on the roadmap, with a deliberate phased plan; no, not running in production yet.
What gNMI is¶
gNMI is a network management protocol developed by the OpenConfig consortium (Google + major network vendors). It is the modern successor to SNMP for streaming telemetry and configuration management.
Key characteristics:
| Aspect | Value |
|---|---|
| Transport | gRPC over HTTP/2, mandatory TLS |
| Authentication | TLS certificates (mTLS preferred) or username/password |
| Data model | YANG-defined, vendor-neutral via OpenConfig |
| Operations | Get, Set, Subscribe, Capabilities |
| Subscription modes | STREAM SAMPLE (push periodic), STREAM ON_CHANGE (push on value change), POLL, ONCE |
| Encodings | JSON-IETF, PROTO, JSON, BYTES |
Why gNMI matters for TLSStress.Art¶
| Aspect | SNMP (today) | gNMI (future) |
|---|---|---|
| Transport | UDP (lossy, no auth integrity) | gRPC + TLS (reliable, encrypted, authenticated) |
| Polling model | Client polls every interval | Server streams to client |
| Latency | 30–60 s polling intervals typical | sub-second STREAM SAMPLE |
| Bandwidth efficiency | Repeated full polls | ON_CHANGE deltas only |
| Schema | MIBs (vendor-specific, often outdated) | YANG OpenConfig (vendor-neutral) |
| Auth | Community strings (weak) | mTLS + cert validation |
| Cisco Nexus polling load | 5–10% CPU on the switch | <1% CPU |
Vendor support — honest reality matrix¶
A common misperception is that "all modern devices support gNMI". The truth is more nuanced. As of 2026:
| Vendor / Device | gNMI native? | Coverage notes |
|---|---|---|
| Cisco Nexus 9000 (NX-OS 9.2+) | ✅ Yes | OpenConfig partial + Cisco-specific YANG; feature grpc to enable |
| Cisco IOS-XR / IOS-XE | ✅ Yes | Routers — out of test-bed scope today, but supported |
| Juniper Junos | ✅ Yes | Strong OpenConfig coverage |
| Arista EOS | ✅ Yes | Native, full OpenConfig + EOS extensions |
| Cisco FTD / Firepower (cdFMC or on-prem FMC managed) | ✅ Yes | OpenConfig streaming telemetry via gNMI is natively supported when the FTD is managed by Cloud-Delivered FMC (cdFMC) or on-prem FMC with the appropriate health policy. Activated via FTD health policy deployment; FTD opens a gNMI server on TCP 50051 (DIAL-IN mode) OR establishes outbound gRPC tunnel to a collector (DIAL-OUT mode for restricted-network FTDs). Subscribe modes: Once, Sampled (≥1-min interval), On-change. mTLS authentication required. Reference: https://docs.manage.security.cisco.com/cdfmc/c_openconfig_streaming_telemetry.html |
| Cisco FTD / Firepower (FDM-managed standalone) | ⚠️ Indirect | The standalone FDM-only deployment that TLSStress.Art's REST adapter (PR #198) targets today uses the FDM REST API. gNMI is exposed via the cdFMC / on-prem FMC management plane, not directly through FDM. Operators wanting native gNMI on FTD should manage the device via cdFMC or on-prem FMC. |
| Cisco UCS Manager | ❌ No | XML API + Redfish; gNMI not on Cisco's UCS roadmap |
| Cisco UCS C-series CIMC | ❌ No | Redfish only |
| Palo Alto PAN-OS | ⚠️ Limited | Primary API is XML/REST; gNMI exposure is partial and undocumented for core decryption metrics |
| Fortinet FortiGate (FortiOS 7.0+) | ⚠️ Partial | gRPC streaming for select monitoring fields; not full gNMI compliance for config |
Implication for TLSStress.Art: gNMI is a complement to the existing REST API + SNMP + syslog stack. It is genuinely transformative for the switch tier (Cisco Nexus + future Juniper/Arista support) AND for cdFMC / FMC-managed Cisco FTD (sub-minute streaming of operational state, on-change events for interface flaps, etc.). It adds nothing for UCS hardware health — UCS doesn't speak gNMI — and the standalone FDM-managed FTD that our REST adapter targets today still relies on FDM REST.
For operators on cdFMC, the gNMI integration becomes the lowest-latency pillar for FTD operational telemetry — far ahead of waiting for SNMP polls or REST snapshots.
Where gNMI fits in the existing pillar architecture¶
The current TLSStress.Art telemetry stack has 4 pillars:
1. Metrics (numerical) SNMP, node-exporter, kubelet
2. Events (reactive) Syslog → Loki
3. API (config + state) FTD / Nexus / UCS / Fortinet REST adapters (PR #198/#199/#200/#201)
4. Probes (independent) TLS Decrypt Mode Probe (issuer cert detection)
gNMI becomes the 5th pillar:
5. Streaming telemetry gNMI subscribe streams from compliant switches
(sub-second SAMPLE; ON_CHANGE for state transitions)
Critically, none of the existing pillars are removed. SNMP remains for UCS + NGFW counters. Syslog remains for events. REST API remains for config + decrypt-policy state on NGFW + UCS hardware health.
What we gain by adding gNMI¶
| Capability | Without gNMI (today) | With gNMI |
|---|---|---|
| Per-interface counters (Nexus) | SNMP ifTable polled every 30–60 s | sub-second STREAM SAMPLE |
| Queue depth + drops (Nexus QoS) | SNMP, often missing for newer ASIC features | OpenConfig qos:queues model |
| Optical levels on transceivers | SNMP entSensorTable partial |
OpenConfig terminal-device:optical-channel |
| TCAM utilization (NGFW filter, Nexus ACL) | Partial SNMP, vendor-specific | OpenConfig + Cisco YANG, sub-second |
| State transitions (interface up/down) | Polled at SNMP cadence — miss <1 s flaps | STREAM ON_CHANGE catches every transition |
| Multi-vendor switch support | per-vendor MIB mapping in code | Same Subscribe paths work on Cisco/Juniper/Arista |
The headline win: when a Nexus port flaps for 800 ms during a run, SNMP misses it; gNMI catches it. This is the difference between "p99 spike with no apparent cause" and "p99 spike correlated with port flap at 14:32:01.812".
What we DO NOT gain¶
To be precise about scope:
- ❌ NGFW decrypt policy state — no OpenConfig YANG model exists for vendor-specific decrypt rules; REST API (FDM) or cdFMC/FMC config-side API is the path
- ❌ UCS thermal / power / DIMM ECC — UCS doesn't speak gNMI
- ❌ FTD deploy state (the FDM "DEPLOYED / PENDING" semantic) — FDM REST API only; cdFMC's gNMI exposes operational state but not the deploy queue
- ❌ Replacement for SNMP on UCS — UCS depends on SNMP/Redfish for hardware
What we DO gain on the cdFMC-managed FTD (correction from earlier draft):
- ✅ Interface counters (per-port bytes, errors, drops) at ≥1-min cadence — better than the 5-min REST default we ship today
- ✅ ON_CHANGE notifications for interface state transitions
- ✅ System-level OpenConfig models (CPU, memory, processes)
- ✅ Multi-vendor query consistency — same Subscribe path works on Cisco FTD + Cisco Nexus + Juniper + Arista when those are added
If you read marketing claiming "gNMI replaces everything", that claim is still wrong — but for cdFMC users it replaces a meaningful subset of what we currently poll via REST + SNMP.
Three architectural options (with trade-offs)¶
Option A — Native gNMI client inside the dashboard pod (Node.js)¶
Implement the gNMI client in TypeScript inside the Next.js dashboard process.
| Pro | Con |
|---|---|
| Zero new containers | Node ecosystem gNMI libraries are immature (best clients are Go + Python) |
| Single deployment | Long-running gRPC subscriptions inside Next.js process risks memory issues |
| Reuses existing Postgres + admin auth | gRPC keepalive + reconnect logic in JS is fragile |
Verdict: 🚫 Not recommended.
Option B — Dedicated Go microservice¶
Build a small Go service (~500 LoC) using github.com/openconfig/gnmi. Reads device list from a dut_gnmi_devices Postgres table, establishes Subscribe streams, translates updates into Prometheus metrics OR writes to a dut_gnmi_samples table.
| Pro | Con |
|---|---|
| Go has mature gNMI libraries | New container image — bloats air-gap bundle |
| Native gRPC + HTTP/2 + TLS handling | New language for the team to maintain |
| Long-running subscriptions are Go's strength | More complex deployment surface |
| Separates failure domain from dashboard |
Verdict: ✅ Right answer when gNMI is a product priority (not just exploration).
Option C — gnmic sidecar (recommended for first iteration)¶
gnmic is the open-source CLI tool that is the de facto gNMI client in network observability. Run it as a Kubernetes Deployment with configuration via ConfigMap. It has built-in output to Prometheus exporter, Loki, InfluxDB, NATS.
| Pro | Con |
|---|---|
| Zero gNMI code we maintain — gnmic is mature, open-source | Two configuration systems (REST API in Postgres, gNMI in ConfigMap YAML) |
| Native Prometheus exporter mode fits our stack | Less control over data shape |
| ~2 days of operator + ops work, no new code | gnmic image to bundle in air-gap |
| Validates use case quickly with low investment | Operator manages gnmic config separately from dashboard config |
Verdict: ✅ Recommended for Phase 1.
Phased plan¶
Phase 0 — This document (NOW)¶
- ✅ Roadmap document published in 3 languages
- ✅ Vendor support reality matrix
- ✅ Architectural options analyzed
- ✅ Decision rationale documented for future operators / customers / Cisco Legal
No code shipped. This phase is about institutional knowledge — when someone asks "do you support gNMI?", we point at this document.
Phase 1 — gnmic sidecar (after v1.0 production validation)¶
Trigger: TLSStress.Art has been deployed in at least one real lab, the existing 4-pillar telemetry has been validated end-to-end, and there is operator demand for sub-second switch telemetry.
Scope (~2 days work):
1. k8s/optional/gnmic-sidecar.yaml — Deployment + ConfigMap with gnmic configuration
2. Per-Nexus device gNMI subscription paths (interfaces, qos, system)
3. Prometheus exporter mode emitting series under namespace tlsstress_gnmi_*
4. Grafana dashboard "TLSStress.Art — Nexus Sub-Second Telemetry" reading from those metrics
5. docs/GNMI_OPERATIONS.{md,pt-BR.md,es.md} — operator guide for enabling feature grpc on the Nexus + registering paths in gnmic config
Out of scope for Phase 1:
- Multi-vendor support (only Cisco Nexus initially)
- Set operations (read-only)
- Loki integration for ON_CHANGE (Phase 2)
- gNMI metadata in DUT API snapshots tables (Phase 2)
Phase 2 — Dedicated Go microservice (v2.0)¶
Trigger: Phase 1 has proven the gNMI use case in at least one real lab AND there is demand for either (a) write operations, (b) more vendors, or (c) deeper integration with the existing snapshot / report flow.
Scope (~1–2 weeks work):
1. services/gnmi-collector/ — Go microservice, ~500-1000 LoC
2. dut_gnmi_devices table — registers gNMI-speaking devices (cert-based auth)
3. dut_gnmi_samples table — long-term store for high-value samples
4. Loki integration for ON_CHANGE events
5. Multi-vendor support: Cisco Nexus + Juniper Junos + Arista EOS
6. Test Run Report Annex E "gNMI streaming evidence" with sub-second timing alongside the run window
7. Optional Set operation support with same operator-confirmation flow as REST API writes
Phase 3 — Replace gnmic sidecar with the Go service (v2.x)¶
If Phase 2 ships and is stable, the gnmic sidecar becomes redundant. We retire it. Operators with Phase 1 deployments are migrated by deploying the Go service alongside, dual-running for one release, then turning off gnmic.
Why we are NOT shipping Phase 1 in this iteration¶
Honest reasoning:
- Existing PRs in queue need to land first — adding gNMI on top of an unsettled queue compounds merge complexity
- No end-to-end test in real hardware yet — adding more telemetry pillars before validating existing ones is premature optimization
- gNMI helps switches predominantly — we have one switch (Cisco Nexus). 1 hour of
gnmicconfig will eventually give us what we need, no new microservice required - Spirent / Ixia don't have gNMI either — adding it positions us for the future, but does not put us "ahead" of competitors today; we are already ahead via the multi-vendor adapter pattern
- Operator complexity — every new pillar = another dashboard the operator must learn. Until existing dashboards prove their value in production, adding more is overhead
The "ahead of competitors" move at this moment is operationalizing what we have: pre-flight checks (PR-C), Test Run Report Phase 3 wiring (PR-D), real-lab validation. Not adding more telemetry sources.
When we WILL ship Phase 1¶
Trigger conditions, all of which must be true:
- At least one real-lab deployment with the existing 4 pillars exercising end-to-end
- Test Run Report Phase 3 (Annexes B/C/D) shipped and validated
- PR-C (pre-flight checks) shipped and exercised in a real run
- An operator has explicitly asked for sub-second switch telemetry
- Cisco Nexus 9000 is on a customer's lab + we have credentials
When the four boxes are checked, Phase 1 is ~2 days of work.
Capability comparison vs commercial alternatives¶
| Feature | Spirent CyberFlood | Ixia BreakingPoint | TLSStress.Art (with gNMI Phase 1) |
|---|---|---|---|
| gNMI subscription support | ❌ | ❌ | ✅ (Cisco Nexus initially) |
| OpenConfig YANG models | ❌ | ❌ | ✅ |
| Multi-vendor switch via gNMI | ❌ | ❌ | ✅ (Phase 2) |
| Sub-second port telemetry | ⚠️ vendor-locked SNMP-derived | ⚠️ vendor-locked | ✅ |
| Open architecture | ❌ | ❌ | ✅ |
The structural differentiator is open multi-vendor architecture. gNMI reinforces it; it is not the source of it.
References¶
- gNMI specification — https://github.com/openconfig/gnmi
- OpenConfig YANG models — https://github.com/openconfig/public
gnmic(CLI tool we plan to use in Phase 1) — https://gnmic.openconfig.net/- Cisco Nexus 9000 NX-OS gNMI guide — https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/93x/programmability/guide/b-cisco-nexus-9000-series-nx-os-programmability-guide-93x.html
- Juniper Junos gNMI — https://www.juniper.net/documentation/us/en/software/junos/network-mgmt/topics/topic-map/grpc-overview.html
- Arista EOS gNMI — https://www.arista.com/en/support/toi/eos-4-31-0f/16002-streaming-telemetry-with-gnmi
Related¶
DUT_API_INTEGRATION.md— REST API integration (the "third pillar" gNMI complements)API_FEATURE_CATALOG.md— feature roadmap including gNMI placeholders (category N "long-tail")SYSLOG_OPERATIONS.md— second pillarMONITORING_TEST_VALIDITY.md— first pillar metrics framework