diff --git a/api-reference/server/services/llm/together.mdx b/api-reference/server/services/llm/together.mdx index e062ca9b..aedcc829 100644 --- a/api-reference/server/services/llm/together.mdx +++ b/api-reference/server/services/llm/together.mdx @@ -95,7 +95,7 @@ from pipecat.services.together import TogetherLLMService llm = TogetherLLMService( api_key=os.getenv("TOGETHER_API_KEY"), - model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", + model="zai-org/GLM-5.1", ) ``` @@ -107,7 +107,7 @@ from pipecat.services.together import TogetherLLMService llm = TogetherLLMService( api_key=os.getenv("TOGETHER_API_KEY"), settings=TogetherLLMService.Settings( - model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", + model="zai-org/GLM-5.1", temperature=0.7, top_p=0.9, max_completion_tokens=1024, diff --git a/api-reference/server/services/stt/together.mdx b/api-reference/server/services/stt/together.mdx new file mode 100644 index 00000000..1cb72f88 --- /dev/null +++ b/api-reference/server/services/stt/together.mdx @@ -0,0 +1,148 @@ +--- +title: "Together AI" +description: "Speech-to-text service using Together AI's real-time transcription API" +--- + +## Overview + +`TogetherSTTService` provides real-time speech recognition using Together AI's WebSocket API with OpenAI-compatible speech-to-text endpoints. It supports streaming transcription with interim results and automatic reconnection. + + + + Pipecat's API methods for Together AI STT + + + Complete transcription example + + + Official Together AI Realtime API documentation + + + Access models and manage API keys + + + +## Installation + +To use Together AI STT services, install the required dependencies: + +```bash +uv add "pipecat-ai[together]" +``` + +## Prerequisites + +### Together AI Account Setup + +Before using Together AI STT services, you need: + +1. **Together AI Account**: Sign up at [Together AI](https://together.ai/) +2. **API Key**: Generate an API key from your account dashboard +3. **Model Selection**: Choose from available transcription models + +### Required Environment Variables + +- `TOGETHER_API_KEY`: Your Together AI API key for authentication + +## Configuration + + + Together AI API key for authentication. + + + + Audio sample rate in Hz. When `None`, uses the pipeline's configured sample + rate. + + + + WebSocket base URL for Together AI API. + + + + Runtime-configurable settings. See [Settings](#settings) below. + + + + P99 latency from speech end to final transcript in seconds. Override for your + deployment. See + [https://github.com/pipecat-ai/stt-benchmark](https://github.com/pipecat-ai/stt-benchmark). + + +### Settings + +Runtime-configurable settings passed via the `settings` constructor argument using `TogetherSTTService.Settings(...)`. These can be updated mid-conversation with `STTUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. + +| Parameter | Type | Default | Description | +| ---------- | ----------------- | --------------------------- | ----------------------------------------- | +| `model` | `str` | `"openai/whisper-large-v3"` | Model identifier. _(Inherited.)_ | +| `language` | `Language \| str` | `Language.EN` | Language for transcription. _(Inherited)_ | + +## Usage + +### Basic Setup + +```python +import os +from pipecat.services.together import TogetherSTTService + +stt = TogetherSTTService( + api_key=os.getenv("TOGETHER_API_KEY"), +) +``` + +### With Custom Settings + +```python +from pipecat.services.together import TogetherSTTService +from pipecat.transcriptions.language import Language + +stt = TogetherSTTService( + api_key=os.getenv("TOGETHER_API_KEY"), + settings=TogetherSTTService.Settings( + model="openai/whisper-large-v3", + language=Language.EN, + ), +) +``` + +### In a Voice Pipeline + +```python +from pipecat.audio.vad.silero import SileroVADAnalyzer +from pipecat.pipeline.pipeline import Pipeline +from pipecat.processors.audio.vad_processor import VADProcessor +from pipecat.services.together import TogetherSTTService + +stt = TogetherSTTService(api_key=os.getenv("TOGETHER_API_KEY")) +vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer()) + +pipeline = Pipeline([ + transport.input(), + vad_processor, + stt, + # ... rest of pipeline +]) +``` + +## Notes + +- Together AI's STT service uses an OpenAI-compatible WebSocket protocol for real-time transcription. +- The service automatically handles reconnection on connection errors. +- Transcription is committed when `VADUserStoppedSpeakingFrame` is received. diff --git a/api-reference/server/services/supported-services.mdx b/api-reference/server/services/supported-services.mdx index 54fc8f3d..f5c670dc 100644 --- a/api-reference/server/services/supported-services.mdx +++ b/api-reference/server/services/supported-services.mdx @@ -155,6 +155,7 @@ Speech-to-Text services receive and audio input and output transcriptions. | [Smallest](/api-reference/server/services/stt/smallest) | `uv add "pipecat-ai[smallest]"` | Pipecat | | [Soniox](/api-reference/server/services/stt/soniox) | `uv add "pipecat-ai[soniox]"` | Pipecat | | [Speechmatics](/api-reference/server/services/stt/speechmatics) | `uv add "pipecat-ai[speechmatics]"` | Pipecat | +| [Together AI](/api-reference/server/services/stt/together) | `uv add "pipecat-ai[together]"` | Pipecat | | [Uplift AI](/api-reference/server/services/stt/upliftai) | `uv pip install git+https://github.com/havkerboi123/pipecat-upliftai-stt.git` | Community | | [Whisper](/api-reference/server/services/stt/whisper) | `uv add "pipecat-ai[whisper]"` | Pipecat | | [xAI](/api-reference/server/services/stt/xai) | `uv add "pipecat-ai[xai]"` | Pipecat | @@ -201,6 +202,7 @@ Text-to-Speech services receive text input and output audio streams or chunks. | [Soniox](/api-reference/server/services/tts/soniox) | `uv add "pipecat-ai[soniox]"` | Pipecat | | [Speechmatics](/api-reference/server/services/tts/speechmatics) | `uv add "pipecat-ai[speechmatics]"` | Pipecat | | [Supertonic](/api-reference/server/services/tts/supertonic) | `uv add pipecat-supertonic` | Community | +| [Together AI](/api-reference/server/services/tts/together) | `uv add "pipecat-ai[together]"` | Pipecat | | [Typecast](/api-reference/server/services/tts/typecast) | `uv add pipecat-ai-typecast` | Community | | [Uplift AI](/api-reference/server/services/tts/upliftai) | `uv pip install git+https://github.com/havkerboi123/pipecat-upliftai-tts.git` | Community | | [Voice.ai](/api-reference/server/services/tts/voiceai) | `uv pip install git+https://github.com/voice-ai/voice-ai-pipecat-tts.git` | Community | diff --git a/api-reference/server/services/tts/together.mdx b/api-reference/server/services/tts/together.mdx new file mode 100644 index 00000000..eac7fbe6 --- /dev/null +++ b/api-reference/server/services/tts/together.mdx @@ -0,0 +1,152 @@ +--- +title: "Together AI" +description: "Text-to-speech service using Together AI's real-time WebSocket API" +--- + +## Overview + +`TogetherTTSService` provides real-time text-to-speech using Together AI's WebSocket API. It supports streaming synthesis with configurable voice and model options, interruption handling, and automatic reconnection. + + + + Pipecat's API methods for Together AI TTS + + + Complete voice bot example + + + Official Together AI TTS WebSocket API documentation + + + Access models and manage API keys + + + +## Installation + +To use Together AI TTS services, install the required dependencies: + +```bash +uv add "pipecat-ai[together]" +``` + +## Prerequisites + +### Together AI Account Setup + +Before using Together AI TTS services, you need: + +1. **Together AI Account**: Sign up at [Together AI](https://together.ai/) +2. **API Key**: Generate an API key from your account dashboard +3. **Model Selection**: Choose from available TTS models and voices + +### Required Environment Variables + +- `TOGETHER_API_KEY`: Your Together AI API key for authentication + +## Configuration + + + Together AI API key for authentication. + + + + WebSocket URL for Together AI TTS API. + + + + Output sample rate for emitted PCM frames. Together AI streams at 24 kHz and + does not support other rates. + + + + Runtime-configurable settings. See [Settings](#settings) below. + + +### Settings + +Runtime-configurable settings passed via the `settings` constructor argument using `TogetherTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/pipecat/fundamentals/service-settings) for details. + +| Parameter | Type | Default | Description | +| -------------------- | ----------------- | ---------------------- | ------------------------------------------------------------- | +| `model` | `str` | `"hexgrad/Kokoro-82M"` | Model identifier. _(Inherited.)_ | +| `voice` | `str` | `"af_heart"` | Voice identifier. _(Inherited.)_ | +| `language` | `Language \| str` | `Language.EN` | Language for synthesis. _(Inherited.)_ | +| `max_partial_length` | `int \| None` | `None` | Maximum partial text length for streaming. `None` for no cap. | + +## Usage + +### Basic Setup + +```python +import os +from pipecat.services.together import TogetherTTSService + +tts = TogetherTTSService( + api_key=os.getenv("TOGETHER_API_KEY"), +) +``` + +### With Custom Settings + +```python +from pipecat.services.together import TogetherTTSService +from pipecat.transcriptions.language import Language + +tts = TogetherTTSService( + api_key=os.getenv("TOGETHER_API_KEY"), + settings=TogetherTTSService.Settings( + model="hexgrad/Kokoro-82M", + voice="af_heart", + language=Language.EN, + ), +) +``` + +### In a Voice Pipeline + +```python +from pipecat.pipeline.pipeline import Pipeline +from pipecat.services.together import TogetherTTSService + +tts = TogetherTTSService( + api_key=os.getenv("TOGETHER_API_KEY"), + settings=TogetherTTSService.Settings( + voice="af_heart", + model="hexgrad/Kokoro-82M", + ), +) + +pipeline = Pipeline([ + # ... upstream processors + llm, + tts, + transport.output(), +]) +``` + +## Notes + +- Together AI TTS streams audio at 24 kHz. The service outputs 24 kHz signed 16-bit mono PCM; the transport layer resamples to the pipeline's configured rate if needed. +- The service supports interruption handling and automatically clears the text buffer when interrupted. +- Audio is streamed incrementally via WebSocket deltas for low-latency synthesis. diff --git a/docs.json b/docs.json index 1a0a1104..f22e5bb4 100644 --- a/docs.json +++ b/docs.json @@ -374,6 +374,7 @@ "api-reference/server/services/stt/smallest", "api-reference/server/services/stt/soniox", "api-reference/server/services/stt/speechmatics", + "api-reference/server/services/stt/together", "api-reference/server/services/stt/upliftai", "api-reference/server/services/stt/whisper", "api-reference/server/services/stt/xai" @@ -448,6 +449,7 @@ "api-reference/server/services/tts/soniox", "api-reference/server/services/tts/speechmatics", "api-reference/server/services/tts/supertonic", + "api-reference/server/services/tts/together", "api-reference/server/services/tts/tts-cache", "api-reference/server/services/tts/typecast", "api-reference/server/services/tts/upliftai",