Track: Observability · Splunk MCP Server · Gemini (Vertex AI) agent · Splunk HEC + REST
Forenly AI Platform's Entry for the Splunk Agentic Ops Hackathon (2026)
A Physical AI observability platform that maintains a real-time digital twin of a deployed field-robot fleet — closing the loop between on-site telemetry and autonomous corrective action through Splunk Cloud HEC, the Splunk MCP Server, and GCP Gemini multi-agent orchestration.
Building this in the open — join the team chat on Discord: https://discord.gg/7JXRwTy2EE
Forenly builds one continuous life-cycle for autonomous field robots (robotic mowing / field service), split across three hackathon entries on the same customer and the same fleet:
| Stage | Project | What it does |
|---|---|---|
| 1. Acquire | Lawn Advisor (Forenly/gcp) |
Generates leads, recommends the right robot for a site, builds a deployment plan, sends the proposal → customer onboards and robots get installed on site. |
| 2. Operate | FleetMind / Gemini XPRIZE (Forenly/gemini-xprize) |
Runs the deployed fleet day-to-day: intake → schedule → dispatch → verify → invoice. |
| 3. Observe & Sustain | Cadence Cockpit (this repo) | Watches the running fleet's telemetry in Splunk, detects faults and wear before they cause downtime, and closes the loop with autonomous corrective action — feeding repairs and schedule changes back to the Operate layer. |
This repo is stage 3: once robots are mowing real sites, how do you keep the whole fleet healthy, on-SLA, and running with as little human intervention as possible?
Most observability stacks stop at the alert — a human still has to read the dashboard, diagnose, and act. Cadence Cockpit treats the deployed fleet as a live digital twin and closes the loop: every robot's telemetry flows into Splunk, a Gemini multi-agent orchestrator triages anomalies through the Splunk MCP Server (a guided query chain to root cause), and instead of just paging someone it executes the corrective action and writes the outcome back to the twin.
┌─────────────────────────────────────────┐
│ Splunk Cloud HEC — Fleet Telemetry │
│ (battery · motors · GPS/boundary · │
│ coverage · dock cycles · faults) │
└────────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Gemini Orchestrator + Splunk MCP │
│ (root-cause query chain → triage) │
└────────────────────┬────────────────────┘
│
[ Multi-Dimensional Fleet Triage ]
│
┌─────────────────────────────┼─────────────────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Operational │ │ Predictive │ │ SLA & Coverage │
│ Health │ │ Maintenance │ │ │
├──────────────────┤ ├──────────────────┤ ├──────────────────┤
│ • Stuck / tipped │ │ • Battery decay │ │ • Coverage % vs │
│ • Slope torque │ │ • Blade wear │ │ contracted SLA │
│ loss │ │ • Motor temp │ │ • Missed windows │
│ • Boundary/GPS │ │ trend │ │ • Fleet balance │
│ drift │ │ │ │ across sites │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Cadence Cockpit programmatically demonstrates autonomous triage and resolution of fleet anomalies across three channels. Each one follows the same shape: Anomaly → Triage → Closed-loop Execution.
- The Anomaly: Robot
MOW-07reports sustained torque loss on a slope plus a boundary-GPS drift — telemetry into Splunk shows it has stopped covering Zone B. - The Triage: The Gemini agent runs a Splunk MCP query chain (recent faults, last-good run, neighboring robots) and classifies it as a localized, recoverable fault — not a hardware failure.
- The Execution: The system re-issues a safe return-to-dock + re-route command, reassigns Zone B to an idle robot on the same site, and posts a status update — no human paged unless the retry fails.
- The Anomaly: Battery charge-cycle data and blade-runtime counters trend toward end-of-life across several robots; one motor's temperature curve is climbing run-over-run.
- The Triage: The Gemini agent projects a failure window from the Splunk time-series and flags which robots will breach threshold first.
- The Execution: Instead of waiting for a breakdown, the agent schedules a preventive part swap during a non-mow window, raises the maintenance ticket, and notes the part needed — turning unplanned downtime into planned service.
- The Anomaly: A site's measured coverage drops below its contracted SLA after two missed mow windows, while another site's fleet sits underutilized.
- The Triage: The Gemini agent flags an SLA-breach risk and identifies the imbalance across sites.
- The Execution: It rebalances scheduling/dispatch, feeds the change back to the Operate layer (FleetMind), and surfaces the SLA status so billing/credits stay accurate — keeping the customer commitment intact.
The platform demonstrates a cohesive cyber-physical loop built from:
- Splunk Cloud (HEC): Ingests big-data telemetry from the deployed fleet — robot logs, battery/motor metrics, GPS/boundary events, coverage, and dock cycles.
- Splunk MCP Server: The guided query layer the agent uses to walk from symptom to root cause (the hackathon's MCP bonus track).
- GCP Gemini Orchestrator: The "core mind" that triages fleet anomalies and decides the corrective action.
- Real-time Digital Twin: A live model of every robot and site, updated from telemetry and used to validate actions before they execute.
- Slack (Socket Mode): Interactive escalation/approval interface for field-ops when an action needs a human in the loop.
- FleetMind / Gemini XPRIZE node: The Operate layer this stage feeds corrective actions and schedule changes back into.
- Lawn Advisor node: The Acquire layer that originates each deployment in the first place.
| Path | What it is |
|---|---|
server.py |
Lightweight stdlib HTTP server — serves the Cockpit UI, bridges events into Splunk HEC (/api/trigger), and runs the real closed loop (/api/triage). |
agent/app.py |
Gemini Orchestrator — pulls context through the MCP layer, reasons with Gemini, emits a closed-loop action plan. |
agent/splunk_mcp_server.py |
Splunk MCP layer — guided query interface the agent uses to walk from symptom → root cause (live Splunk REST search + a high-fidelity digital-twin fallback). |
dashboard.html |
The Cadence Cockpit control room (the three closed-loop channels). |
fleetmng.html |
Interactive fleet digital-twin monitor grid. |
scripts/seed_telemetry.py |
Pushes sample anomaly events into Splunk so the agent's live search has data. |
architecture_diagram.png |
System architecture (Splunk ⇄ AI agent ⇄ data flow). |
Python 3.8+. No third-party pip packages are required — server.py, the
orchestrator, the MCP layer, and the seed script use only the standard library
(http.server, urllib, json, ssl).
A Splunk endpoint with HTTP Event Collector enabled. Either works:
- Splunk Cloud Platform free trial (web-hosted, HTTPS), or
- a local Splunk Enterprise instance (
https://localhost:8088).
cp .env.example .env
# then edit .env with your Splunk HEC + (optional) Gemini settingsKey variables (see .env.example for the full list):
PORT=8050
SPLUNK_HEC_URL=https://<stack>.splunkcloud.com:8088/services/collector
SPLUNK_HEC_TOKEN=<your-hec-token>
SPLUNK_REST_URL=https://<stack-or-localhost>:8089 # used by the MCP layer
SPLUNK_USER=<user>
SPLUNK_PASSWORD=<password>Gemini reasoning is resolved automatically, in order:
GEMINI_API_KEY→ Google AI Studio, else- a GCP service-account token → Vertex AI (
gemini-2.5-flash, no key needed), else - deterministic sandbox cognition so the demo always runs offline.
python3 scripts/seed_telemetry.pyThis ingests one anomaly per channel so the agent's live MCP search returns
real events. Verify in Splunk: index=main | head 10.
python3 server.py # -> "Forenly Splunk-Agentic-Ops Dashboard serving on port 8050"Open http://localhost:8050 and click one of the three channel buttons.
Each click: streams a telemetry event into Splunk HEC, then calls
/api/triage, which runs the real loop — Splunk MCP search → Gemini
root-cause + action plan — and renders the live result in the console.
cd agent && python3 app.py # runs all three closed loops on the CLIWhat's real in this demo: events are really ingested into Splunk over HEC; the agent really queries Splunk over the REST API through the MCP layer; and Gemini really produces the root-cause analysis and action plan. The robot fleet itself is a digital-twin simulation — Cadence Cockpit is the observability + agentic-control layer that would sit on top of a live fleet.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Forenly AI Platform · github.com/Forenly
