[MLE-5159] docs(audio-ws): correct response format from WAV to Raw PCM + fix sample voice#252
Merged
rishabh-bhargava merged 2 commits intomainfrom Apr 27, 2026
Merged
Conversation
… PCM" The Together WS endpoint streams raw PCM s16le samples with no RIFF/WAVE header, base64-wrapped per audio_output.delta event. The previous "WAV (PCM s16le)" claim led developers to write the bytes to a .wav file and find that no player accepts them (afplay, QuickTime, VLC all reject the file because there is no WAV magic). Updates the audio format description and the two code samples (Python, Node.js) to save to .pcm rather than .wav, matching the actual on-the-wire format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d event The voice 'tara' belongs to Orpheus, not Kokoro. Kokoro's default voice 'af_heart' is the popular choice and exists in the catalog. Running the sample as written produced an immediate conversation.item.tts.failed (Voice 'tara' is not available for model 'hexgrad/Kokoro-82M'). The Python sample compounded that with an unconditional session_data['session']['id'] access on the first message — when the first message is tts.failed instead of session.created, that crashes with KeyError before any code can react. Added a guard so the sample fails gracefully with the actual error message. JS sample already gated on message.type === 'session.created' so no event-handling change is needed there. Verified end-to-end: with the fixes applied, the sample now writes 257012 bytes (≈ 5.35 s of raw PCM s16le @ 24 kHz mono) to output.pcm. ffmpeg wraps it cleanly and afplay plays it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✱ Stainless preview buildsThis PR will update the go openapi python terraform typescript
|
zainhas
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The WS docs at https://docs.together.ai/reference/audio-speech-websocket are misleading on three independent points; this PR fixes all three.
Format: WAV (PCM s16le), but the WS streams raw PCM s16le bytes with no RIFF/WAVE header. A developer who saves the bytes with a.wavextension gets a file that no standard player will open (afplayreturnsError: AudioFileOpen failed ('typ?')). Updated toFormat: Raw PCM (s16le, mono).output.wav. Updated tooutput.pcm(and the print/console messages match).voice=tara, which belongs to Orpheus, not Kokoro. Running the docs sample literally returns immediately withVoice 'tara' is not available for model 'hexgrad/Kokoro-82M'. Available voices: af_heart, .... Updated tovoice=af_heart. Also added asession.createdguard in the Python sample so a future failure-on-first-event doesn't crash the script withKeyError: 'session'before the user can see what went wrong.Linear: MLE-5159
Test Plan
KeyError: 'session'because the first server event istts.failed. No output file produced.output.pcmfor the three example sentences (≈ 5.35 s of audio at 24 kHz s16le mono).ffmpeg -f s16le -ar 24000 -ac 1 -i output.pcm output.wav→ plays cleanly viaafplay(exit 0). Confirms the fix matches reality.Deploy chain
After this PR merges, the existing
sync-openapi-spec-to-docs.ymlworkflow auto-opens a sync PR againsttogethercomputer/mintlify-docs. Once that secondary PR is approved and merged, Mintlify rebuilds and the changes go live at docs.together.ai.🤖 Generated with Claude Code