Skip to content

Forenly/splunk-agentic-ops

Repository files navigation

Cadence Cockpit • Closed-Loop Fleet Observability for Field Robots

Track: Observability · Splunk MCP Server · Gemini (Vertex AI) agent · Splunk HEC + REST

Forenly AI Platform's Entry for the Splunk Agentic Ops Hackathon (2026)

A Physical AI observability platform that maintains a real-time digital twin of a deployed field-robot fleet — closing the loop between on-site telemetry and autonomous corrective action through Splunk Cloud HEC, the Splunk MCP Server, and GCP Gemini multi-agent orchestration.


Community

Building this in the open — join the team chat on Discord: https://discord.gg/7JXRwTy2EE


Where this fits in the Forenly lifecycle

Forenly builds one continuous life-cycle for autonomous field robots (robotic mowing / field service), split across three hackathon entries on the same customer and the same fleet:

Stage Project What it does
1. Acquire Lawn Advisor (Forenly/gcp) Generates leads, recommends the right robot for a site, builds a deployment plan, sends the proposal → customer onboards and robots get installed on site.
2. Operate FleetMind / Gemini XPRIZE (Forenly/gemini-xprize) Runs the deployed fleet day-to-day: intake → schedule → dispatch → verify → invoice.
3. Observe & Sustain Cadence Cockpit (this repo) Watches the running fleet's telemetry in Splunk, detects faults and wear before they cause downtime, and closes the loop with autonomous corrective action — feeding repairs and schedule changes back to the Operate layer.

This repo is stage 3: once robots are mowing real sites, how do you keep the whole fleet healthy, on-SLA, and running with as little human intervention as possible?


The Core Idea: from alerts to a closed loop

Most observability stacks stop at the alert — a human still has to read the dashboard, diagnose, and act. Cadence Cockpit treats the deployed fleet as a live digital twin and closes the loop: every robot's telemetry flows into Splunk, a Gemini multi-agent orchestrator triages anomalies through the Splunk MCP Server (a guided query chain to root cause), and instead of just paging someone it executes the corrective action and writes the outcome back to the twin.

Cadence Cockpit System Architecture

                  ┌─────────────────────────────────────────┐
                  │     Splunk Cloud HEC — Fleet Telemetry   │
                  │  (battery · motors · GPS/boundary ·      │
                  │   coverage · dock cycles · faults)       │
                  └────────────────────┬────────────────────┘
                                       │
                                       ▼
                  ┌─────────────────────────────────────────┐
                  │   Gemini Orchestrator  +  Splunk MCP     │
                  │   (root-cause query chain → triage)      │
                  └────────────────────┬────────────────────┘
                                       │
                       [ Multi-Dimensional Fleet Triage ]
                                       │
         ┌─────────────────────────────┼─────────────────────────────┐
         ▼                             ▼                             ▼
┌──────────────────┐          ┌──────────────────┐          ┌──────────────────┐
│   Operational    │          │   Predictive     │          │   SLA & Coverage │
│   Health         │          │   Maintenance    │          │                  │
├──────────────────┤          ├──────────────────┤          ├──────────────────┤
│ • Stuck / tipped │          │ • Battery decay  │          │ • Coverage % vs  │
│ • Slope torque   │          │ • Blade wear     │          │   contracted SLA │
│   loss           │          │ • Motor temp     │          │ • Missed windows │
│ • Boundary/GPS   │          │   trend          │          │ • Fleet balance  │
│   drift          │          │                  │          │   across sites   │
└──────────────────┘          └──────────────────┘          └──────────────────┘

The 3 Simulation Channels

Cadence Cockpit programmatically demonstrates autonomous triage and resolution of fleet anomalies across three channels. Each one follows the same shape: Anomaly → Triage → Closed-loop Execution.

1. 🛠️ Operational Health Loop (real-time faults)

  • The Anomaly: Robot MOW-07 reports sustained torque loss on a slope plus a boundary-GPS drift — telemetry into Splunk shows it has stopped covering Zone B.
  • The Triage: The Gemini agent runs a Splunk MCP query chain (recent faults, last-good run, neighboring robots) and classifies it as a localized, recoverable fault — not a hardware failure.
  • The Execution: The system re-issues a safe return-to-dock + re-route command, reassigns Zone B to an idle robot on the same site, and posts a status update — no human paged unless the retry fails.

2. 🔋 Predictive Maintenance Loop (wear before failure)

  • The Anomaly: Battery charge-cycle data and blade-runtime counters trend toward end-of-life across several robots; one motor's temperature curve is climbing run-over-run.
  • The Triage: The Gemini agent projects a failure window from the Splunk time-series and flags which robots will breach threshold first.
  • The Execution: Instead of waiting for a breakdown, the agent schedules a preventive part swap during a non-mow window, raises the maintenance ticket, and notes the part needed — turning unplanned downtime into planned service.

3. 📊 SLA & Coverage Loop (service-level assurance)

  • The Anomaly: A site's measured coverage drops below its contracted SLA after two missed mow windows, while another site's fleet sits underutilized.
  • The Triage: The Gemini agent flags an SLA-breach risk and identifies the imbalance across sites.
  • The Execution: It rebalances scheduling/dispatch, feeds the change back to the Operate layer (FleetMind), and surfaces the SLA status so billing/credits stay accurate — keeping the customer commitment intact.

Technology Stack

The platform demonstrates a cohesive cyber-physical loop built from:

  1. Splunk Cloud (HEC): Ingests big-data telemetry from the deployed fleet — robot logs, battery/motor metrics, GPS/boundary events, coverage, and dock cycles.
  2. Splunk MCP Server: The guided query layer the agent uses to walk from symptom to root cause (the hackathon's MCP bonus track).
  3. GCP Gemini Orchestrator: The "core mind" that triages fleet anomalies and decides the corrective action.
  4. Real-time Digital Twin: A live model of every robot and site, updated from telemetry and used to validate actions before they execute.
  5. Slack (Socket Mode): Interactive escalation/approval interface for field-ops when an action needs a human in the loop.
  6. FleetMind / Gemini XPRIZE node: The Operate layer this stage feeds corrective actions and schedule changes back into.
  7. Lawn Advisor node: The Acquire layer that originates each deployment in the first place.

🧩 Repository layout

Path What it is
server.py Lightweight stdlib HTTP server — serves the Cockpit UI, bridges events into Splunk HEC (/api/trigger), and runs the real closed loop (/api/triage).
agent/app.py Gemini Orchestrator — pulls context through the MCP layer, reasons with Gemini, emits a closed-loop action plan.
agent/splunk_mcp_server.py Splunk MCP layer — guided query interface the agent uses to walk from symptom → root cause (live Splunk REST search + a high-fidelity digital-twin fallback).
dashboard.html The Cadence Cockpit control room (the three closed-loop channels).
fleetmng.html Interactive fleet digital-twin monitor grid.
scripts/seed_telemetry.py Pushes sample anomaly events into Splunk so the agent's live search has data.
architecture_diagram.png System architecture (Splunk ⇄ AI agent ⇄ data flow).

🚀 Setup & Run Instructions

1. Prerequisites

Python 3.8+. No third-party pip packages are requiredserver.py, the orchestrator, the MCP layer, and the seed script use only the standard library (http.server, urllib, json, ssl).

A Splunk endpoint with HTTP Event Collector enabled. Either works:

  • Splunk Cloud Platform free trial (web-hosted, HTTPS), or
  • a local Splunk Enterprise instance (https://localhost:8088).

2. Configure environment

cp .env.example .env
# then edit .env with your Splunk HEC + (optional) Gemini settings

Key variables (see .env.example for the full list):

PORT=8050
SPLUNK_HEC_URL=https://<stack>.splunkcloud.com:8088/services/collector
SPLUNK_HEC_TOKEN=<your-hec-token>
SPLUNK_REST_URL=https://<stack-or-localhost>:8089   # used by the MCP layer
SPLUNK_USER=<user>
SPLUNK_PASSWORD=<password>

Gemini reasoning is resolved automatically, in order:

  1. GEMINI_API_KEY → Google AI Studio, else
  2. a GCP service-account token → Vertex AI (gemini-2.5-flash, no key needed), else
  3. deterministic sandbox cognition so the demo always runs offline.

3. (Optional) seed Splunk with sample telemetry

python3 scripts/seed_telemetry.py

This ingests one anomaly per channel so the agent's live MCP search returns real events. Verify in Splunk: index=main | head 10.

4. Start the Cockpit

python3 server.py          # -> "Forenly Splunk-Agentic-Ops Dashboard serving on port 8050"

Open http://localhost:8050 and click one of the three channel buttons. Each click: streams a telemetry event into Splunk HEC, then calls /api/triage, which runs the real loop — Splunk MCP search → Gemini root-cause + action plan — and renders the live result in the console.

5. (Optional) run the agent headless

cd agent && python3 app.py   # runs all three closed loops on the CLI

What's real in this demo: events are really ingested into Splunk over HEC; the agent really queries Splunk over the REST API through the MCP layer; and Gemini really produces the root-cause analysis and action plan. The robot fleet itself is a digital-twin simulation — Cadence Cockpit is the observability + agentic-control layer that would sit on top of a live fleet.


⚖️ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


👥 Team

Forenly AI Platform · github.com/Forenly

About

Closed-loop digital twin of an autonomous robot fleet — Physical AI observability & agentic ops on Splunk (Hackathon 2026)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors