Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ A jambonz application controls phone calls by returning **arrays of verbs** —
- **Webhook (HTTP)**: Your server receives POST requests and returns JSON verb arrays. Stateless and simple.
- **WebSocket**: Persistent bidirectional connection. Required for real-time LLM agents, audio streaming, and TTS token streaming.

**IMPORTANT**: Any application that uses a speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, or `pipeline`) MUST use WebSocket transport.
**IMPORTANT**: Any application that uses a speech-to-speech verb (`openai_s2s`, `google_s2s`, `deepgram_s2s`, `ultravox_s2s`, `elevenlabs_s2s`, `s2s`, or `agent`) MUST use WebSocket transport.

## Core Verbs

Expand All @@ -34,7 +34,7 @@ A jambonz application controls phone calls by returning **arrays of verbs** —
### AI & Real-time
- **openai_s2s** / **google_s2s** / **deepgram_s2s** / **ultravox_s2s** / **elevenlabs_s2s** — Vendor-specific LLM voice conversation.
- **s2s** — Generic LLM voice conversation (use when vendor is determined at runtime).
- **pipeline** — Higher-level voice AI pipeline with integrated turn detection.
- **agent** — Higher-level voice AI agent with integrated turn detection.
- **dialogflow** — Google Dialogflow agent.
- **stream** — Stream raw audio to a websocket endpoint.
- **transcribe** — Real-time call transcription.
Expand Down Expand Up @@ -258,7 +258,7 @@ await client.calls.whisper(call_sid, {"verb": "say", "text": "Hello"})
await client.calls.mute(call_sid, "mute")
await client.calls.redirect(call_sid, "https://example.com/new")
await client.calls.update(call_sid, {"call_status": "completed"})
await client.calls.update_pipeline(call_sid, {"type": "update_instructions", "instructions": "New prompt"})
await client.calls.update_agent(call_sid, {"type": "update_instructions", "instructions": "New prompt"})
```

## TTS Token Streaming
Expand All @@ -273,18 +273,18 @@ await session.flush_tts_tokens()
await session.clear_tts_tokens()
```

## Pipeline Updates
## Agent Updates

Update a running pipeline mid-conversation:
Update a running agent mid-conversation:

```python
await session.update_pipeline({"type": "update_instructions", "instructions": "Now help with billing."})
await session.update_pipeline({"type": "inject_context", "messages": [{"role": "system", "content": "Customer is Gold tier."}]})
await session.update_pipeline({"type": "update_tools", "tools": [...]})
await session.update_pipeline({"type": "generate_reply", "user_input": "Override", "interrupt": True})
await session.update_agent({"type": "update_instructions", "instructions": "Now help with billing."})
await session.update_agent({"type": "inject_context", "messages": [{"role": "system", "content": "Customer is Gold tier."}]})
await session.update_agent({"type": "update_tools", "tools": [...]})
await session.update_agent({"type": "generate_reply", "user_input": "Override", "interrupt": True})
```

## Tool Output (Pipeline)
## Tool Output (Agent)

When the LLM requests a tool call, return the result:

Expand Down Expand Up @@ -348,7 +348,7 @@ audio_svc.on("connection", on_audio_connection)
| `llm:tool-output` | Tool call result (`tool_output()`) |
| `tts:tokens` | Stream TTS text (`send_tts_tokens()`) |
| `tts:flush` | End TTS stream (`flush_tts_tokens()`) |
| `pipeline:update` | Pipeline update (`update_pipeline()`) |
| `agent:update` | Agent update (`update_agent()`) |

## Common Patterns

Expand All @@ -361,17 +361,17 @@ jambonz.say(text="Welcome.").gather(
).say(text="No input. Goodbye.").hangup()
```

### Voice Agent (Pipeline)
### Voice Agent
```python
session.pipeline(
session.agent(
stt={"vendor": "deepgram", "language": "en-US"},
tts={"vendor": "cartesia", "voice": "sonic-english"},
llm={"vendor": "openai", "model": "gpt-4o", "llmOptions": {
"messages": [{"role": "system", "content": "You are a helpful assistant."}]
}},
turnDetection="krisp",
bargeIn={"enable": True},
actionHook="/pipeline-done",
actionHook="/agent-done",
eventHook="/events",
toolHook="/tools",
)
Expand Down Expand Up @@ -410,9 +410,9 @@ jambonz.say(text="Connecting you now.").dial(

## SDK Architecture

The SDK auto-generates verb methods from `specs.json` (from `@jambonz/verb-specifications`). When the spec changes, the SDK automatically picks up new parameters:
The SDK auto-generates verb methods from JSON Schema files (from `@jambonz/schema`). When the schema changes, the SDK automatically picks up new parameters:

1. `specs.json` — bundled verb/component specifications (synced from upstream)
1. `schema/verbs/*.schema.json` — bundled verb schemas (synced from upstream)
2. `verb_registry.py` — maps spec entries to Python methods + synonyms
3. `verb_builder.py` — generates methods at import time from specs + registry
4. `WebhookResponse` and `Session` both extend `VerbBuilder`
Expand Down
14 changes: 7 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ src/jambonz_sdk/
│ ├── verbs.py # All 26+ verb TypedDicts
│ ├── rest.py # REST API request/response types
│ └── session.py # Call session & WebSocket message types
├── verb_builder.py # VerbBuilder — methods auto-generated from specs.json
├── verb_builder.py # VerbBuilder — methods auto-generated from JSON Schema
├── verb_registry.py # Verb definitions: maps spec entries → Python methods
├── webhook/
│ ├── __init__.py
Expand All @@ -48,12 +48,12 @@ src/jambonz_sdk/
- **Transport-agnostic verb building**: Same verb methods on both `WebhookResponse` and `Session`
- **Fluent/chainable API**: All verb methods return `self` for method chaining
- **TypedDict for verb schemas**: Type-safe verb construction matching JSON schemas exactly
- **Auto-generated verb methods**: VerbBuilder methods are generated at import time from `specs.json` + `verb_registry.py` — when the spec changes, the SDK automatically picks up new parameters
- **Auto-generated verb methods**: VerbBuilder methods are generated at import time from JSON Schema files (`@jambonz/schema`) + `verb_registry.py` — when the schema changes, the SDK automatically picks up new parameters
- **aiohttp for both HTTP and WebSocket**: Single dependency for REST client and WS transport

## Verb System

The SDK supports all 26+ jambonz verbs. Verb methods on VerbBuilder are **auto-generated** from the shared `specs.json` (in `/Users/xhoaluu/jambonz/verb-specifications/specs.json`).
The SDK supports all 26+ jambonz verbs. Verb methods on VerbBuilder are **auto-generated** from JSON Schema files bundled from [`@jambonz/schema`](https://github.com/jambonz/schema).

### How verb generation works

Expand All @@ -65,7 +65,7 @@ The SDK supports all 26+ jambonz verbs. Verb methods on VerbBuilder are **auto-g
### Verb List

Audio/Speech: `say`, `play`, `gather`
AI/S2S: `openai_s2s`, `google_s2s`, `deepgram_s2s`, `elevenlabs_s2s`, `ultravox_s2s`, `s2s`, `llm`, `dialogflow`, `pipeline`
AI/S2S: `openai_s2s`, `google_s2s`, `deepgram_s2s`, `elevenlabs_s2s`, `ultravox_s2s`, `s2s`, `llm`, `dialogflow`, `agent`
Call Control: `dial`, `conference`, `enqueue`, `dequeue`, `hangup`, `redirect`, `pause`
Audio Streaming: `listen`, `stream`, `transcribe`
SIP: `sip_decline`, `sip_request`, `sip_refer`
Expand Down Expand Up @@ -105,7 +105,7 @@ Source: https://github.com/jambonz/schema

`AGENTS.md` is the comprehensive developer guide for AI agents working with this SDK.
It covers: verb system, webhook/WebSocket patterns, REST API, env vars, mid-call control,
TTS streaming, pipeline updates, audio streaming, and common application patterns.
TTS streaming, agent updates, audio streaming, and common application patterns.
AI coding agents should read AGENTS.md before generating jambonz Python application code.

### MCP Server
Expand Down Expand Up @@ -171,11 +171,11 @@ pytest # All 279 tests
### Unit tests (`tests/unit/`)
- `test_verb_builder.py` — Parametrized across all 31 verb defs: method existence, correct verb name, all spec properties pass through
- `test_webhook.py` — Webhook contract, HMAC-SHA256 signature protocol, env vars OPTIONS format
- `test_session.py` — WebSocket protocol messages: ack, command, tts:tokens, llm:tool-output, pipeline:update
- `test_session.py` — WebSocket protocol messages: ack, command, tts:tokens, llm:tool-output, agent:update
- `test_ws_client.py` — Message routing: session:new, verb:hook dispatch, auto-reply, binary/JSON robustness
- `test_rest_client.py` — REST API contract: URL construction, HTTP methods, request bodies
- `test_audio_stream.py` — Audio protocol: raw PCM, playAudio JSON, marks, control commands

### Integration tests (`tests/integration/`)
- `test_webhook.py` — Real aiohttp server with IVR menu, actionHook routing, env vars discovery
- `test_websocket.py` — Real WebSocket connections: full protocol compliance, multi-step conversations, inject commands, TTS streaming, pipeline updates
- `test_websocket.py` — Real WebSocket connections: full protocol compliance, multi-step conversations, inject commands, TTS streaming, agent updates
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ async with JambonzClient(

### Spec-driven verb generation

The SDK does **not** hardcode verb method signatures. Instead, verb methods (`.say()`, `.gather()`, `.dial()`, `.pipeline()`, etc.) are **auto-generated at import time** from [JSON Schema](https://github.com/jambonz/schema) files — the same schemas used by the Node.js SDK and the jambonz server.
The SDK does **not** hardcode verb method signatures. Instead, verb methods (`.say()`, `.gather()`, `.dial()`, `.agent()`, etc.) are **auto-generated at import time** from [JSON Schema](https://github.com/jambonz/schema) files — the same schemas used by the Node.js SDK and the jambonz server.

**What this means:**

Expand All @@ -98,15 +98,15 @@ VerbDef("new_verb", "new_verb", doc="Description.")

## Features

- **All 31 jambonz verbs**: say, play, gather, dial, conference, enqueue/dequeue, hangup, pause, redirect, config, tag, dtmf, dub, message, alert, answer, leave, listen/stream, transcribe, openai_s2s, google_s2s, deepgram_s2s, elevenlabs_s2s, ultravox_s2s, s2s, llm, dialogflow, pipeline, sip_decline, sip_request, sip_refer
- **All 31 jambonz verbs**: say, play, gather, dial, conference, enqueue/dequeue, hangup, pause, redirect, config, tag, dtmf, dub, message, alert, answer, leave, listen/stream, transcribe, openai_s2s, google_s2s, deepgram_s2s, elevenlabs_s2s, ultravox_s2s, s2s, llm, dialogflow, agent, sip_decline, sip_request, sip_refer
- **Fluent chainable API**: `.say(...).gather(...).hangup()`
- **Webhook transport**: `WebhookResponse` for HTTP apps (works with aiohttp, FastAPI, Flask, etc.)
- **WebSocket transport**: `create_endpoint` with `Session`, event handling, `send()`/`reply()`
- **REST client**: `JambonzClient` with calls, conferences, queues, mid-call control
- **Audio streaming**: Bidirectional audio via `AudioStream`
- **Mid-call control**: inject commands (mute, whisper, record, DTMF, tag)
- **TTS token streaming**: `send_tts_tokens()` / `flush_tts_tokens()`
- **Pipeline updates**: `update_pipeline()` for mid-conversation LLM changes
- **Agent updates**: `update_agent()` for mid-conversation LLM changes
- **Signature verification**: HMAC-SHA256 webhook signature validation
- **Env vars**: Portal discovery via OPTIONS + runtime reading

Expand All @@ -119,7 +119,7 @@ See the [`examples/`](examples/) directory:
| hello-world | [webhook](examples/hello-world/webhook_app.py) | [websocket](examples/hello-world/websocket_app.py) | Minimal greeting |
| echo | [webhook](examples/echo/webhook_app.py) | [websocket](examples/echo/websocket_app.py) | Speech echo with gather |
| ivr-menu | [webhook](examples/ivr-menu/webhook_app.py) | — | IVR menu with speech + DTMF |
| voice-agent | [webhook](examples/voice-agent/webhook_app.py) | [websocket](examples/voice-agent/websocket_app.py) | LLM pipeline with tool calls |
| voice-agent | [webhook](examples/voice-agent/webhook_app.py) | [websocket](examples/voice-agent/websocket_app.py) | LLM agent with tool calls |
| dial | [webhook](examples/dial/webhook_app.py) | — | Outbound dial with fallback |
| listen-record | [webhook](examples/listen-record/webhook_app.py) | [websocket](examples/listen-record/websocket_app.py) | Audio recording |

Expand Down
22 changes: 11 additions & 11 deletions examples/voice-agent/websocket_app.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Voice Agent - WebSocket example.

LLM-powered voice agent using the pipeline verb with tool calling.
Demonstrates pipeline configuration, eventHook handling, and toolHook handling.
LLM-powered voice agent using the agent verb with tool calling.
Demonstrates agent configuration, eventHook handling, and toolHook handling.

Usage:
python websocket_app.py
Expand Down Expand Up @@ -40,7 +40,7 @@ async def handle_session(session):

print(f"New call: {session.call_sid} from {session.from_}")

# Handle pipeline events
# Handle agent events
async def on_event(evt):
event_type = evt.get("type", "")
if event_type == "turn_end":
Expand All @@ -51,7 +51,7 @@ async def on_event(evt):
)
await session.reply()

session.on("/pipeline-event", on_event)
session.on("/agent-event", on_event)

# Handle tool calls
async def on_tool(evt):
Expand All @@ -71,16 +71,16 @@ async def on_tool(evt):

session.on("/tool-call", on_tool)

# Handle pipeline completion
# Handle agent completion
async def on_complete(evt):
print(f"Pipeline complete: {evt.get('completion_reason', 'unknown')}")
print(f"Agent complete: {evt.get('completion_reason', 'unknown')}")
session.hangup()
await session.reply()

session.on("/pipeline-complete", on_complete)
session.on("/agent-complete", on_complete)

# Start the pipeline
session.pipeline(
# Start the agent
session.agent(
stt={
"vendor": "deepgram",
"language": "en-US",
Expand Down Expand Up @@ -133,9 +133,9 @@ async def on_complete(evt):
turnDetection="krisp",
earlyGeneration=True,
bargeIn={"enable": True, "minSpeechDuration": 0.3},
eventHook="/pipeline-event",
eventHook="/agent-event",
toolHook="/tool-call",
actionHook="/pipeline-complete",
actionHook="/agent-complete",
)
await session.send()

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "jambonz-python-sdk"
version = "0.2.0"
version = "0.3.0"
description = "Python SDK for jambonz CPaaS platform"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
43 changes: 35 additions & 8 deletions scripts/generate_stubs.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/usr/bin/env python3
"""Generate verb_builder.pyi stub file from specs.json + verb_registry.
"""Generate verb_builder.pyi stub file from JSON Schema + verb_registry.

This creates a .pyi type stub that IDEs (VS Code Pylance, PyCharm, mypy)
read for static type checking and autocomplete. Run this after syncing
specs.json or updating verb_registry.py.
the schema or updating verb_registry.py.

Usage:
python scripts/generate_stubs.py
Expand All @@ -19,10 +19,10 @@

from jambonz_sdk.verb_registry import VERB_DEFS

SPECS_PATH = SRC_DIR / "jambonz_sdk" / "specs.json"
SCHEMA_DIR = SRC_DIR / "jambonz_sdk" / "schema" / "verbs"
STUB_PATH = SRC_DIR / "jambonz_sdk" / "verb_builder.pyi"

# Maps specs.json type strings to Python type annotation strings for .pyi
# Maps JSON Schema type strings to Python type annotation strings for .pyi
TYPE_MAP = {
"string": "str",
"number": "int | float",
Expand All @@ -33,7 +33,7 @@


def resolve_type(spec_type) -> str:
"""Convert a specs.json type descriptor to a .pyi type string."""
"""Convert a JSON Schema type descriptor to a .pyi type string."""
if isinstance(spec_type, str):
if spec_type.startswith("#"):
return "dict[str, Any]"
Expand Down Expand Up @@ -61,9 +61,37 @@ def resolve_type(spec_type) -> str:
return "Any"


def _load_schemas() -> dict:
"""Load verb JSON Schemas from the bundled schema directory."""
schemas: dict = {}
for schema_file in sorted(SCHEMA_DIR.glob("*.schema.json")):
with schema_file.open() as f:
schema = json.load(f)
schema_id = schema.get("$id", "")
if schema_id:
spec_name = schema_id.rsplit("/", 1)[-1]
else:
spec_name = schema_file.stem.replace(".schema", "")
properties = {}
for prop_name, prop_def in schema.get("properties", {}).items():
if prop_name == "verb":
continue
properties[prop_name] = prop_def
for entry in schema.get("allOf", []):
if "properties" in entry:
for prop_name, prop_def in entry["properties"].items():
if prop_name == "verb":
continue
properties[prop_name] = prop_def
schemas[spec_name] = {
"properties": properties,
"required": schema.get("required", []),
}
return schemas


def generate() -> str:
with SPECS_PATH.open() as f:
specs = json.load(f)
specs = _load_schemas()

lines = [
'"""Auto-generated type stubs for VerbBuilder.',
Expand All @@ -75,7 +103,6 @@ def generate() -> str:
"",
"from jambonz_sdk.types.verbs import AnyVerb",
"",
"",
"class VerbBuilder:",
" _verbs: list[AnyVerb]",
"",
Expand Down
2 changes: 1 addition & 1 deletion scripts/sync_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from pathlib import Path

# ── Pin the schema version here ──────────────────────────────────────
SCHEMA_VERSION = "v0.1.1"
SCHEMA_VERSION = "v0.2.1"
# ────────────────────────────────────────────────────────────────────

DEST = Path(__file__).resolve().parent.parent / "src" / "jambonz_sdk" / "schema"
Expand Down
8 changes: 4 additions & 4 deletions src/jambonz_sdk/client/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,16 +100,16 @@ async def mute(self, call_sid: str, status: str) -> dict[str, Any]:
"""
return await self.update(call_sid, {"mute_status": status})

async def update_pipeline(
async def update_agent(
self, call_sid: str, data: dict[str, Any]
) -> dict[str, Any]:
"""Send a mid-conversation pipeline update.
"""Send a mid-conversation agent update.

Args:
call_sid: The call to update.
data: Pipeline update payload.
data: Agent update payload.
"""
return await self.update(call_sid, {"pipeline_update": data})
return await self.update(call_sid, {"agent_update": data})

async def noise_isolation(
self, call_sid: str, status: str, opts: dict[str, Any] | None = None
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://jambonz.org/schema/callbacks/pipeline-turn",
"title": "Pipeline EventHook Events",
"description": "Events sent to the pipeline verb's eventHook during a conversation. These are sent as 'pipeline:event' messages over the WebSocket connection.",
"$id": "https://jambonz.org/schema/callbacks/agent-turn",
"title": "Agent EventHook Events",
"description": "Events sent to the agent verb's eventHook during a conversation. These are sent as 'agent:event' messages over the WebSocket connection.",
"type": "object",
"oneOf": [
{
Expand Down Expand Up @@ -84,7 +84,7 @@
{
"properties": {
"type": {
"const": "agent_response",
"const": "llm_response",
"description": "Sent when the LLM has finished generating its response for the current turn. Contains the complete response text."
},
"response": {
Expand Down
Loading
Loading