diff --git a/docs/codedocs/api-reference/agent.md b/docs/codedocs/api-reference/agent.md
new file mode 100644
index 00000000..f2aed0da
--- /dev/null
+++ b/docs/codedocs/api-reference/agent.md
@@ -0,0 +1,72 @@
+---
+title: "Agent"
+description: "Reference for the realtime agent websocket and think-model discovery APIs."
+---
+
+The Agent domain is the runtime surface for realtime conversational sessions.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+from deepgram.agent.v1.types import AgentV1Settings
+```
+
+Source files:
+
+- `src/deepgram/agent/v1/client.py`
+- `src/deepgram/agent/v1/socket_client.py`
+- `src/deepgram/agent/v1/settings/think/models/client.py`
+
+## `V1Client.connect`
+
+Import path: `client.agent.v1`
+
+```python
+connect(
+    *,
+    authorization: str | None = None,
+    request_options: RequestOptions | None = None,
+) -> Iterator[V1SocketClient]
+```
+
+## `V1SocketClient` Methods
+
+Source: `src/deepgram/agent/v1/socket_client.py`
+
+- `start_listening()`
+- `send_settings(message: AgentV1Settings) -> None`
+- `send_update_speak(message: AgentV1UpdateSpeak) -> None`
+- `send_inject_user_message(message: AgentV1InjectUserMessage) -> None`
+- `send_inject_agent_message(message: AgentV1InjectAgentMessage) -> None`
+- `send_function_call_response(message: AgentV1SendFunctionCallResponse) -> None`
+- `send_keep_alive(message: AgentV1KeepAlive | None = None) -> None`
+- `send_update_prompt(message: AgentV1UpdatePrompt) -> None`
+- `send_update_think(message: AgentV1UpdateThink) -> None`
+- `send_media(message: bytes) -> None`
+- `recv() -> V1SocketClientResponse`
+
+`V1SocketClientResponse` is a union of typed agent events such as `AgentV1ConversationText`, `AgentV1AgentThinking`, `AgentV1FunctionCallRequest`, `AgentV1Warning`, `AgentV1History`, plus raw `bytes` audio.
+
+## Think Model Discovery
+
+Import path: `client.agent.v1.settings.think.models`
+
+```python
+list(*, request_options: RequestOptions | None = None) -> AgentThinkModelsV1Response
+```
+
+This endpoint comes from `src/deepgram/agent/v1/settings/think/models/client.py` and lets you inspect supported think providers before composing agent settings.
+
+## Example
+
+```python
+with client.agent.v1.connect() as agent:
+    agent.send_settings(settings)
+    agent.send_media(b"...audio chunk...")
+    agent.start_listening()
+```
+
+## Implementation Note
+
+Before sending a Pydantic model, the socket client runs `_sanitize_numeric_types(...)`. That helper converts values such as `24000.0` into `24000` so integer-only API fields survive serialization correctly.
diff --git a/docs/codedocs/api-reference/auth.md b/docs/codedocs/api-reference/auth.md
new file mode 100644
index 00000000..f52cede3
--- /dev/null
+++ b/docs/codedocs/api-reference/auth.md
@@ -0,0 +1,49 @@
+---
+title: "Auth"
+description: "Reference for token generation in the Deepgram Python SDK."
+---
+
+The Auth domain is intentionally small. Its main job is to mint temporary access tokens from an API key.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+```
+
+Source files:
+
+- `src/deepgram/auth/client.py`
+- `src/deepgram/auth/v1/client.py`
+- `src/deepgram/auth/v1/tokens/client.py`
+
+## `TokensClient.grant`
+
+Import path: `client.auth.v1.tokens`
+
+```python
+grant(
+    *,
+    ttl_seconds: float | None = OMIT,
+    request_options: RequestOptions | None = None,
+) -> GrantV1Response
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `ttl_seconds` | `float \| None` | API default, documented as 30 seconds | Lifetime for the generated JWT. |
+| `request_options` | `RequestOptions \| None` | `None` | Per-request headers, retries, and timeout. |
+
+## Example
+
+```python
+issuer = DeepgramClient(api_key="dg_server_key")
+token = issuer.auth.v1.tokens.grant(ttl_seconds=60)
+
+client = DeepgramClient(access_token=token.access_token)
+```
+
+## Notes
+
+- The generated docstring in `src/deepgram/auth/v1/tokens/client.py` states that the token carries `usage::write` permission for core voice APIs.
+- Tokens created here do not replace the Manage APIs for project administration.
diff --git a/docs/codedocs/api-reference/deepgram-client.md b/docs/codedocs/api-reference/deepgram-client.md
new file mode 100644
index 00000000..888d90f2
--- /dev/null
+++ b/docs/codedocs/api-reference/deepgram-client.md
@@ -0,0 +1,137 @@
+---
+title: "Deepgram Client"
+description: "Reference for the root sync and async clients, shared configuration, and request options."
+---
+
+The root client layer lives in `src/deepgram/client.py` and `src/deepgram/base_client.py`. Import these classes from `deepgram`.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient, AsyncDeepgramClient, DeepgramClientEnvironment
+from deepgram.core.request_options import RequestOptions
+```
+
+## Classes
+
+### `DeepgramClient`
+
+Source: `src/deepgram/client.py`
+
+```python
+DeepgramClient(
+    *,
+    environment: DeepgramClientEnvironment = DeepgramClientEnvironment.PRODUCTION,
+    api_key: str | None = os.getenv("DEEPGRAM_API_KEY"),
+    headers: dict[str, str] | None = None,
+    timeout: float | None = None,
+    follow_redirects: bool | None = True,
+    httpx_client: httpx.Client | None = None,
+    logging: LogConfig | Logger | None = None,
+    access_token: str | None = None,
+    session_id: str | None = None,
+    transport_factory: callable | None = None,
+)
+```
+
+### `AsyncDeepgramClient`
+
+Source: `src/deepgram/client.py`
+
+```python
+AsyncDeepgramClient(
+    *,
+    environment: DeepgramClientEnvironment = DeepgramClientEnvironment.PRODUCTION,
+    api_key: str | None = os.getenv("DEEPGRAM_API_KEY"),
+    headers: dict[str, str] | None = None,
+    timeout: float | None = None,
+    follow_redirects: bool | None = True,
+    httpx_client: httpx.AsyncClient | None = None,
+    logging: LogConfig | Logger | None = None,
+    access_token: str | None = None,
+    session_id: str | None = None,
+    transport_factory: callable | None = None,
+)
+```
+
+## Constructor Options
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `environment` | `DeepgramClientEnvironment` | `DeepgramClientEnvironment.PRODUCTION` | Chooses HTTPS and WebSocket base URLs. |
+| `api_key` | `str \| None` | `DEEPGRAM_API_KEY` | Standard token-style credential for generated clients. |
+| `access_token` | `str \| None` | `None` | Forces bearer auth through the hand-written override layer. |
+| `session_id` | `str \| None` | auto-generated UUID | Sent as `x-deepgram-session-id` on REST and WebSocket calls. |
+| `headers` | `dict[str, str] \| None` | `None` | Extra shared headers. |
+| `timeout` | `float \| None` | `60` when using default clients | Base timeout in seconds. |
+| `follow_redirects` | `bool \| None` | `True` | Redirect handling for default `httpx` clients. |
+| `httpx_client` | `httpx.Client \| httpx.AsyncClient \| None` | generated default | Bring-your-own HTTP transport, pools, or proxies. |
+| `logging` | `LogConfig \| Logger \| None` | `None` | Shared SDK logging config. |
+| `transport_factory` | callable \| `None` | `None` | Global override for generated websocket transport calls. |
+
+## Domain Properties
+
+These properties are created lazily in `src/deepgram/base_client.py`.
+
+| Property | Return type | Import path |
+|----------|-------------|-------------|
+| `agent` | `AgentClient` / `AsyncAgentClient` | `deepgram.agent` |
+| `auth` | `AuthClient` / `AsyncAuthClient` | `deepgram.auth` |
+| `listen` | `ListenClient` / `AsyncListenClient` | `deepgram.listen` |
+| `manage` | `ManageClient` / `AsyncManageClient` | `deepgram.manage` |
+| `read` | `ReadClient` / `AsyncReadClient` | `deepgram.read` |
+| `self_hosted` | `SelfHostedClient` / `AsyncSelfHostedClient` | `deepgram.self_hosted` |
+| `speak` | `SpeakClient` / `AsyncSpeakClient` | `deepgram.speak` |
+| `voice_agent` | `VoiceAgentClient` / `AsyncVoiceAgentClient` | `deepgram.voice_agent` |
+
+## Supporting Types
+
+### `DeepgramClientEnvironment`
+
+Source: `src/deepgram/environment.py`
+
+```python
+class DeepgramClientEnvironment:
+    PRODUCTION: DeepgramClientEnvironment
+    AGENT: DeepgramClientEnvironment
+
+    def __init__(self, *, base: str, agent: str, production: str)
+```
+
+### `RequestOptions`
+
+Source: `src/deepgram/core/request_options.py`
+
+```python
+class RequestOptions(TypedDict total=False):
+    timeout_in_seconds: int
+    max_retries: int
+    additional_headers: dict[str, Any]
+    additional_query_parameters: dict[str, Any]
+    additional_body_parameters: dict[str, Any]
+    chunk_size: int
+```
+
+## Example
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient(
+    access_token="TEMP_TOKEN",
+    session_id="call-17",
+    headers={"X-App-Name": "triage-worker"},
+)
+
+response = client.read.v1.text.analyze(
+    request={"text": "Customer wants to upgrade the subscription."},
+    intents=True,
+    request_options={"timeout_in_seconds": 15},
+)
+```
+
+## Related Pages
+
+- `/docs/api-reference/listen`
+- `/docs/api-reference/speak`
+- `/docs/client-lifecycle`
diff --git a/docs/codedocs/api-reference/helpers.md b/docs/codedocs/api-reference/helpers.md
new file mode 100644
index 00000000..eee6b774
--- /dev/null
+++ b/docs/codedocs/api-reference/helpers.md
@@ -0,0 +1,110 @@
+---
+title: "Helpers"
+description: "Reference for helper utilities such as TextBuilder and custom transport protocols."
+---
+
+This SDK ships a small but important helper surface: text assembly for TTS and transport protocols for custom WebSocket integrations.
+
+## Imports
+
+```python
+from deepgram.helpers import TextBuilder
+from deepgram.transport import install_transport, restore_transport
+from deepgram.transport_interface import SyncTransport, AsyncTransport
+```
+
+## `TextBuilder`
+
+Source: `src/deepgram/helpers/text_builder.py`
+
+```python
+class TextBuilder:
+    def text(self, content: str) -> TextBuilder
+    def pronunciation(self, word: str, ipa: str) -> TextBuilder
+    def pause(self, duration_ms: int) -> TextBuilder
+    def from_ssml(self, ssml_text: str) -> TextBuilder
+    def build(self) -> str
+```
+
+Related free functions:
+
+```python
+add_pronunciation(text: str, word: str, ipa: str) -> str
+ssml_to_deepgram(ssml_text: str) -> str
+```
+
+### Example
+
+```python
+from deepgram.helpers import TextBuilder
+
+text = (
+    TextBuilder()
+    .text("Take ")
+    .pronunciation("methotrexate", "mɛθəˈtrɛkseɪt")
+    .text(" weekly.")
+    .build()
+)
+```
+
+## Transport Utilities
+
+Source files:
+
+- `src/deepgram/transport.py`
+- `src/deepgram/transport_interface.py`
+
+### Signatures
+
+```python
+install_transport(*, sync_factory: callable | None = None, async_factory: callable | None = None) -> None
+restore_transport() -> None
+```
+
+### Protocols
+
+```python
+class SyncTransport(Protocol):
+    def send(self, data: Any) -> None
+    def recv(self) -> Any
+    def __iter__(self) -> Iterator
+    def close(self) -> None
+
+class AsyncTransport(Protocol):
+    async def send(self, data: Any) -> None
+    async def recv(self) -> Any
+    def __aiter__(self) -> Any
+    async def close(self) -> None
+```
+
+### Example
+
+```python
+from deepgram import DeepgramClient
+
+
+class MyTransport:
+    def __init__(self, url: str, headers: dict[str, str]):
+        self.url = url
+        self.headers = headers
+
+    def send(self, data):
+        ...
+
+    def recv(self):
+        ...
+
+    def __iter__(self):
+        yield from ()
+
+    def close(self):
+        pass
+
+
+client = DeepgramClient(api_key="dg_key", transport_factory=MyTransport)
+```
+
+## When To Use These Helpers
+
+- Use `TextBuilder` when pronunciation correctness matters.
+- Use transport helpers when you need to integrate the SDK with a non-default websocket layer or a controlled test environment.
diff --git a/docs/codedocs/api-reference/listen.md b/docs/codedocs/api-reference/listen.md
new file mode 100644
index 00000000..d5d2304e
--- /dev/null
+++ b/docs/codedocs/api-reference/listen.md
@@ -0,0 +1,245 @@
+---
+title: "Listen"
+description: "Reference for batch and realtime speech-to-text clients in the Deepgram Python SDK."
+---
+
+The Listen domain combines REST transcription and realtime speech recognition. Root import path: `client.listen`.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+from deepgram.core.events import EventType
+from deepgram.listen.v2.types import ListenV2CloseStream
+```
+
+## Module Hierarchy
+
+```python
+client.listen.v1.media
+client.listen.v1.connect(...)
+client.listen.v2.connect(...)
+```
+
+Source files:
+
+- `src/deepgram/listen/client.py`
+- `src/deepgram/listen/v1/client.py`
+- `src/deepgram/listen/v1/media/client.py`
+- `src/deepgram/listen/v1/socket_client.py`
+- `src/deepgram/listen/v2/client.py`
+- `src/deepgram/listen/v2/socket_client.py`
+
+## `MediaClient`
+
+Import path: `client.listen.v1.media`
+
+### Signatures
+
+```python
+transcribe_url(
+    *,
+    url: str,
+    callback: str | None = None,
+    callback_method: MediaTranscribeRequestCallbackMethod | None = None,
+    extra: str | Sequence[str] | None = None,
+    sentiment: bool | None = None,
+    summarize: MediaTranscribeRequestSummarize | None = None,
+    tag: str | Sequence[str] | None = None,
+    topics: bool | None = None,
+    custom_topic: str | Sequence[str] | None = None,
+    custom_topic_mode: MediaTranscribeRequestCustomTopicMode | None = None,
+    intents: bool | None = None,
+    custom_intent: str | Sequence[str] | None = None,
+    custom_intent_mode: MediaTranscribeRequestCustomIntentMode | None = None,
+    detect_entities: bool | None = None,
+    detect_language: bool | None = None,
+    diarize: bool | None = None,
+    dictation: bool | None = None,
+    encoding: MediaTranscribeRequestEncoding | None = None,
+    filler_words: bool | None = None,
+    keyterm: str | Sequence[str] | None = None,
+    keywords: str | Sequence[str] | None = None,
+    language: str | None = None,
+    measurements: bool | None = None,
+    model: MediaTranscribeRequestModel | None = None,
+    multichannel: bool | None = None,
+    numerals: bool | None = None,
+    paragraphs: bool | None = None,
+    profanity_filter: bool | None = None,
+    punctuate: bool | None = None,
+    redact: str | None = None,
+    replace: str | Sequence[str] | None = None,
+    search: str | Sequence[str] | None = None,
+    smart_format: bool | None = None,
+    utterances: bool | None = None,
+    utt_split: float | None = None,
+    version: MediaTranscribeRequestVersion | None = None,
+    mip_opt_out: bool | None = None,
+    request_options: RequestOptions | None = None,
+) -> MediaTranscribeResponse
+
+transcribe_file(
+    *,
+    request: bytes | Iterator[bytes] | AsyncIterator[bytes],
+    callback: str | None = None,
+    callback_method: MediaTranscribeRequestCallbackMethod | None = None,
+    extra: str | Sequence[str] | None = None,
+    sentiment: bool | None = None,
+    summarize: MediaTranscribeRequestSummarize | None = None,
+    tag: str | Sequence[str] | None = None,
+    topics: bool | None = None,
+    custom_topic: str | Sequence[str] | None = None,
+    custom_topic_mode: MediaTranscribeRequestCustomTopicMode | None = None,
+    intents: bool | None = None,
+    custom_intent: str | Sequence[str] | None = None,
+    custom_intent_mode: MediaTranscribeRequestCustomIntentMode | None = None,
+    detect_entities: bool | None = None,
+    detect_language: bool | None = None,
+    diarize: bool | None = None,
+    dictation: bool | None = None,
+    encoding: MediaTranscribeRequestEncoding | None = None,
+    filler_words: bool | None = None,
+    keyterm: str | Sequence[str] | None = None,
+    keywords: str | Sequence[str] | None = None,
+    language: str | None = None,
+    measurements: bool | None = None,
+    model: MediaTranscribeRequestModel | None = None,
+    multichannel: bool | None = None,
+    numerals: bool | None = None,
+    paragraphs: bool | None = None,
+    profanity_filter: bool | None = None,
+    punctuate: bool | None = None,
+    redact: str | None = None,
+    replace: str | Sequence[str] | None = None,
+    search: str | Sequence[str] | None = None,
+    smart_format: bool | None = None,
+    utterances: bool | None = None,
+    utt_split: float | None = None,
+    version: MediaTranscribeRequestVersion | None = None,
+    mip_opt_out: bool | None = None,
+    request_options: RequestOptions | None = None,
+) -> MediaTranscribeResponse
+```
+
+### Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `url` | `str` | — | Hosted media URL for `transcribe_url(...)`. |
+| `request` | `bytes \| Iterator[bytes] \| AsyncIterator[bytes]` | — | Raw file content or chunk iterator for `transcribe_file(...)`. |
+| `model` | `MediaTranscribeRequestModel \| None` | `None` | Speech model such as `nova-3`. |
+| `callback` | `str \| None` | `None` | Callback URL for asynchronous completion. |
+| `callback_method` | enum \| `None` | `None` | HTTP method for the callback. |
+| `language` | `str \| None` | `None` | BCP-47 language hint. |
+| `encoding` | enum \| `None` | `None` | Audio encoding when Deepgram cannot infer it. |
+| `detect_entities` | `bool \| None` | `None` | Entity extraction. |
+| `detect_language` | `bool \| None` | `None` | Dominant-language detection. |
+| `diarize` | `bool \| None` | `None` | Speaker diarization. |
+| `dictation` | `bool \| None` | `None` | Dictation-specific formatting. |
+| `keyterm`, `keywords`, `search`, `replace` | string or sequence | `None` | Prompting, search, and replacement controls. |
+| `topics`, `intents`, `sentiment`, `summarize` | feature flags | `None` | Additional language-intelligence outputs. |
+| `paragraphs`, `utterances`, `utt_split` | layout options | `None` | Segmentation controls for readability and utterance boundaries. |
+| `smart_format`, `punctuate`, `numerals`, `measurements`, `profanity_filter`, `redact` | formatting controls | `None` | Output cleanup and normalization features. |
+| `request_options` | `RequestOptions \| None` | `None` | Per-request headers, query params, retries, and timeout overrides. |
+
+### Example
+
+```python
+response = client.listen.v1.media.transcribe_file(
+    request=open("audio.wav", "rb").read(),
+    model="nova-3",
+    smart_format=True,
+    diarize=True,
+)
+```
+
+## `V1Client.connect`
+
+Import path: `client.listen.v1`
+
+```python
+connect(
+    *,
+    model: ListenV1Model,
+    callback: ListenV1Callback | None = None,
+    callback_method: ListenV1CallbackMethod | None = None,
+    channels: ListenV1Channels | None = None,
+    detect_entities: ListenV1DetectEntities | None = None,
+    diarize: ListenV1Diarize | None = None,
+    dictation: ListenV1Dictation | None = None,
+    encoding: ListenV1Encoding | None = None,
+    endpointing: ListenV1Endpointing | None = None,
+    extra: ListenV1Extra | None = None,
+    interim_results: ListenV1InterimResults | None = None,
+    keyterm: ListenV1Keyterm | None = None,
+    keywords: ListenV1Keywords | None = None,
+    language: ListenV1Language | None = None,
+    mip_opt_out: ListenV1MipOptOut | None = None,
+    multichannel: ListenV1Multichannel | None = None,
+    numerals: ListenV1Numerals | None = None,
+    profanity_filter: ListenV1ProfanityFilter | None = None,
+    punctuate: ListenV1Punctuate | None = None,
+    redact: ListenV1Redact | None = None,
+    replace: ListenV1Replace | None = None,
+    sample_rate: ListenV1SampleRate | None = None,
+    search: ListenV1Search | None = None,
+    smart_format: ListenV1SmartFormat | None = None,
+    tag: ListenV1Tag | None = None,
+    utterance_end_ms: ListenV1UtteranceEndMs | None = None,
+    vad_events: ListenV1VadEvents | None = None,
+    version: ListenV1Version | None = None,
+    authorization: str | None = None,
+    request_options: RequestOptions | None = None,
+) -> Iterator[V1SocketClient]
+```
+
+## `V2Client.connect`
+
+Import path: `client.listen.v2`
+
+```python
+connect(
+    *,
+    model: ListenV2Model,
+    encoding: ListenV2Encoding | None = None,
+    sample_rate: ListenV2SampleRate | None = None,
+    eager_eot_threshold: ListenV2EagerEotThreshold | None = None,
+    eot_threshold: ListenV2EotThreshold | None = None,
+    eot_timeout_ms: ListenV2EotTimeoutMs | None = None,
+    keyterm: ListenV2KeytermParams | None = None,
+    mip_opt_out: ListenV2MipOptOut | None = None,
+    tag: ListenV2Tag | None = None,
+    authorization: str | None = None,
+    request_options: RequestOptions | None = None,
+) -> Iterator[V2SocketClient]
+```
+
+## Socket Client Methods
+
+### `V1SocketClient`
+
+- `start_listening()`
+- `send_media(message: bytes) -> None`
+- `send_finalize(message: ListenV1Finalize | None = None) -> None`
+- `send_close_stream(message: ListenV1CloseStream | None = None) -> None`
+- `send_keep_alive(message: ListenV1KeepAlive | None = None) -> None`
+- `recv() -> V1SocketClientResponse`
+
+### `V2SocketClient`
+
+- `start_listening()`
+- `send_media(message: bytes) -> None`
+- `send_close_stream(message: ListenV2CloseStream | None = None) -> None`
+- `send_configure(message: Any) -> None`
+- `recv() -> V2SocketClientResponse`
+
+### Streaming Example
+
+```python
+with client.listen.v2.connect(model="flux-general-en" encoding="linear16", sample_rate=16000) as connection:
+    connection.on(EventType.MESSAGE, lambda message: print(message))
+    connection.send_media(b"...pcm bytes...")
+    connection.send_close_stream(ListenV2CloseStream(type="CloseStream"))
+    connection.start_listening()
+```
diff --git a/docs/codedocs/api-reference/manage.md b/docs/codedocs/api-reference/manage.md
new file mode 100644
index 00000000..90b1a7f4
--- /dev/null
+++ b/docs/codedocs/api-reference/manage.md
@@ -0,0 +1,156 @@
+---
+title: "Manage"
+description: "Reference for project, key, model, usage, and billing administration APIs."
+---
+
+The Manage domain is the administrative control plane for Deepgram projects. It is broad, so this page groups methods by subclient.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+```
+
+Source files include:
+
+- `src/deepgram/manage/v1/projects/client.py`
+- `src/deepgram/manage/v1/projects/keys/client.py`
+- `src/deepgram/manage/v1/projects/usage/client.py`
+- `src/deepgram/manage/v1/models/client.py`
+
+## `ProjectsClient`
+
+Import path: `client.manage.v1.projects`
+
+```python
+list(*, request_options: RequestOptions | None = None) -> ListProjectsV1Response
+get(project_id: str, *, limit: float | None = None, page: float | None = None, request_options: RequestOptions | None = None) -> GetProjectV1Response
+delete(project_id: str, *, request_options: RequestOptions | None = None) -> DeleteProjectV1Response
+update(project_id: str, *, name: str | None = OMIT, request_options: RequestOptions | None = None) -> UpdateProjectV1Response
+leave(project_id: str, *, request_options: RequestOptions | None = None) -> LeaveProjectV1Response
+```
+
+| Method | Description |
+|--------|-------------|
+| `list()` | Return projects visible to the API key. |
+| `get()` | Fetch one project, optionally paginating embedded results. |
+| `update()` | Rename or update project properties. |
+| `delete()` | Delete a project. |
+| `leave()` | Remove the authenticated member from a project. |
+
+## `KeysClient`
+
+Import path: `client.manage.v1.projects.keys`
+
+```python
+list(project_id: str, *, status: KeysListRequestStatus | None = None, request_options: RequestOptions | None = None) -> ListProjectKeysV1Response
+create(project_id: str, *, request: CreateKeyV1RequestOne, request_options: RequestOptions | None = None) -> CreateKeyV1Response
+get(project_id: str, key_id: str, *, request_options: RequestOptions | None = None) -> GetProjectKeyV1Response
+delete(project_id: str, key_id: str, *, request_options: RequestOptions | None = None) -> DeleteProjectKeyV1Response
+```
+
+## `ModelsClient`
+
+Import paths:
+
+- `client.manage.v1.models`
+- `client.manage.v1.projects.models`
+
+```python
+list(*, request_options: RequestOptions | None = None) -> ListModelsV1Response
+get(model_id: str, *, request_options: RequestOptions | None = None) -> GetModelV1Response
+```
+
+## `UsageClient`
+
+Import path: `client.manage.v1.projects.usage`
+
+```python
+get(
+    project_id: str,
+    *,
+    start: str | None = None,
+    end: str | None = None,
+    accessor: str | None = None,
+    alternatives: bool | None = None,
+    callback_method: bool | None = None,
+    callback: bool | None = None,
+    channels: bool | None = None,
+    custom_intent_mode: bool | None = None,
+    custom_intent: bool | None = None,
+    custom_topic_mode: bool | None = None,
+    custom_topic: bool | None = None,
+    deployment: UsageGetRequestDeployment | None = None,
+    detect_entities: bool | None = None,
+    detect_language: bool | None = None,
+    diarize: bool | None = None,
+    dictation: bool | None = None,
+    encoding: bool | None = None,
+    endpoint: UsageGetRequestEndpoint | None = None,
+    extra: bool | None = None,
+    filler_words: bool | None = None,
+    intents: bool | None = None,
+    keyterm: bool | None = None,
+    keywords: bool | None = None,
+    language: bool | None = None,
+    measurements: bool | None = None,
+    method: UsageGetRequestMethod | None = None,
+    model: str | None = None,
+    multichannel: bool | None = None,
+    numerals: bool | None = None,
+    paragraphs: bool | None = None,
+    profanity_filter: bool | None = None,
+    punctuate: bool | None = None,
+    redact: bool | None = None,
+    replace: bool | None = None,
+    sample_rate: bool | None = None,
+    search: bool | None = None,
+    sentiment: bool | None = None,
+    smart_format: bool | None = None,
+    summarize: bool | None = None,
+    tag: str | None = None,
+    topics: bool | None = None,
+    utt_split: bool | None = None,
+    utterances: bool | None = None,
+    version: bool | None = None,
+    request_options: RequestOptions | None = None,
+) -> UsageV1Response
+```
+
+The `get(...)` signature is intentionally large because it can filter by endpoint shape and nearly every major request feature used across the API surface.
+
+Additional usage subclients:
+
+- `client.manage.v1.projects.usage.breakdown.get(...)`
+- `client.manage.v1.projects.usage.fields.list(...)`
+
+## Billing Subclients
+
+Import path: `client.manage.v1.projects.billing`
+
+- `balances.list(project_id, ...)`
+- `balances.get(project_id, balance_id, ...)`
+- `breakdown.list(project_id, ...)`
+- `fields.list(project_id, ...)`
+- `purchases.list(project_id, ...)`
+
+## Example
+
+```python
+client = DeepgramClient()
+project_id = "PROJECT_ID"
+
+projects = client.manage.v1.projects.list()
+keys = client.manage.v1.projects.keys.list(project_id=project_id status="active")
+usage = client.manage.v1.projects.usage.get(
+    project_id=project_id,
+    start="2026-04-01",
+    end="2026-04-30",
+    model="nova-3",
+)
+```
+
+## Related Modules
+
+- `/docs/api-reference/voice-agent-self-hosted`
+- `/docs/guides/management-and-automation`
diff --git a/docs/codedocs/api-reference/read.md b/docs/codedocs/api-reference/read.md
new file mode 100644
index 00000000..f37e30ee
--- /dev/null
+++ b/docs/codedocs/api-reference/read.md
@@ -0,0 +1,75 @@
+---
+title: "Read"
+description: "Reference for text analysis methods in the Deepgram Python SDK."
+---
+
+The Read domain is the text intelligence surface. It is smaller than Listen or Speak, but it is the main place to request summaries, topics, intents, and sentiment over plain text.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+```
+
+Source files:
+
+- `src/deepgram/read/client.py`
+- `src/deepgram/read/v1/client.py`
+- `src/deepgram/read/v1/text/client.py`
+
+## `TextClient.analyze`
+
+Import path: `client.read.v1.text`
+
+```python
+analyze(
+    *,
+    request: ReadV1RequestParams,
+    callback: str | None = None,
+    callback_method: TextAnalyzeRequestCallbackMethod | None = None,
+    sentiment: bool | None = None,
+    summarize: TextAnalyzeRequestSummarize | None = None,
+    tag: str | Sequence[str] | None = None,
+    topics: bool | None = None,
+    custom_topic: str | Sequence[str] | None = None,
+    custom_topic_mode: TextAnalyzeRequestCustomTopicMode | None = None,
+    intents: bool | None = None,
+    custom_intent: str | Sequence[str] | None = None,
+    custom_intent_mode: TextAnalyzeRequestCustomIntentMode | None = None,
+    language: str | None = None,
+    request_options: RequestOptions | None = None,
+) -> ReadV1Response
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `request` | `ReadV1RequestParams` | — | Text payload, commonly `{"text": "..."}` or a URL-backed request. |
+| `callback` | `str \| None` | `None` | Optional callback URL. |
+| `callback_method` | enum \| `None` | `None` | HTTP method for the callback. |
+| `sentiment` | `bool \| None` | `None` | Return sentiment analysis. |
+| `summarize` | enum \| `None` | `None` | Return summary output. |
+| `topics` | `bool \| None` | `None` | Return topic detection. |
+| `custom_topic`, `custom_topic_mode` | string or sequence + enum | `None` | Constrain or extend topic matching. |
+| `intents` | `bool \| None` | `None` | Return detected intents. |
+| `custom_intent`, `custom_intent_mode` | string or sequence + enum | `None` | Constrain or extend intent matching. |
+| `language` | `str \| None` | `None` | Primary language hint. |
+| `tag` | `str \| Sequence[str] \| None` | `None` | Usage tag metadata. |
+| `request_options` | `RequestOptions \| None` | `None` | Per-request headers, timeout, and retry settings. |
+
+## Example
+
+```python
+response = client.read.v1.text.analyze(
+    request={"text": "The customer wants a refund and sounds frustrated."},
+    language="en",
+    sentiment=True,
+    intents=True,
+    summarize=True,
+)
+
+print(response.results.summary.text)
+```
+
+## Return Type
+
+`ReadV1Response` contains a `results` object with optional sentiment, summary, topic, and intent sections. The shape is model-based rather than plain dictionaries, so you access fields via attributes such as `response.results.summary.text`.
diff --git a/docs/codedocs/api-reference/speak.md b/docs/codedocs/api-reference/speak.md
new file mode 100644
index 00000000..647b5199
--- /dev/null
+++ b/docs/codedocs/api-reference/speak.md
@@ -0,0 +1,109 @@
+---
+title: "Speak"
+description: "Reference for text-to-speech REST and WebSocket clients."
+---
+
+The Speak domain exposes both a REST API that yields audio bytes and a WebSocket API for interactive streaming TTS.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+from deepgram.speak.v1.types import SpeakV1Text
+```
+
+Source files:
+
+- `src/deepgram/speak/client.py`
+- `src/deepgram/speak/v1/client.py`
+- `src/deepgram/speak/v1/audio/client.py`
+- `src/deepgram/speak/v1/socket_client.py`
+
+## `AudioClient.generate`
+
+Import path: `client.speak.v1.audio`
+
+```python
+generate(
+    *,
+    text: str,
+    callback: str | None = None,
+    callback_method: AudioGenerateRequestCallbackMethod | None = None,
+    mip_opt_out: bool | None = None,
+    tag: str | Sequence[str] | None = None,
+    bit_rate: float | None = None,
+    container: AudioGenerateRequestContainer | None = None,
+    encoding: AudioGenerateRequestEncoding | None = None,
+    model: AudioGenerateRequestModel | None = None,
+    sample_rate: float | None = None,
+    speed: float | None = None,
+    request_options: RequestOptions | None = None,
+) -> Iterator[bytes]
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `text` | `str` | — | Text to synthesize. |
+| `callback` | `str \| None` | `None` | Optional callback URL. |
+| `callback_method` | enum \| `None` | `None` | HTTP method for the callback. |
+| `model` | enum \| `None` | `None` | Voice model, commonly an Aura model. |
+| `encoding` | enum \| `None` | `None` | Output audio encoding. |
+| `container` | enum \| `None` | `None` | Output file wrapper. |
+| `sample_rate` | `float \| None` | `None` | Output sample rate. |
+| `bit_rate` | `float \| None` | `None` | Output bitrate. |
+| `speed` | `float \| None` | `None` | Speaking rate multiplier. |
+| `tag` | `str \| Sequence[str] \| None` | `None` | Usage tag metadata. |
+| `request_options` | `RequestOptions \| None` | `None` | Request timeout and `chunk_size` overrides. |
+
+### Example
+
+```python
+audio_chunks = client.speak.v1.audio.generate(
+    text="Hello from Deepgram.",
+    model="aura-2-asteria-en",
+)
+```
+
+## `V1Client.connect`
+
+Import path: `client.speak.v1`
+
+```python
+connect(
+    *,
+    encoding: SpeakV1Encoding | None = None,
+    mip_opt_out: SpeakV1MipOptOut | None = None,
+    model: SpeakV1Model | None = None,
+    sample_rate: SpeakV1SampleRate | None = None,
+    speed: SpeakV1Speed | None = None,
+    authorization: str | None = None,
+    request_options: RequestOptions | None = None,
+) -> Iterator[V1SocketClient]
+```
+
+## `V1SocketClient` Methods
+
+Source: `src/deepgram/speak/v1/socket_client.py`
+
+- `start_listening()`
+- `send_text(message: SpeakV1Text) -> None`
+- `send_flush(message: SpeakV1Flush | None = None) -> None`
+- `send_clear(message: SpeakV1Clear | None = None) -> None`
+- `send_close(message: SpeakV1Close | None = None) -> None`
+- `recv() -> V1SocketClientResponse`
+
+The response union includes binary audio plus metadata types such as `SpeakV1Metadata`, `SpeakV1Flushed`, `SpeakV1Cleared`, and `SpeakV1Warning`.
+
+### Example
+
+```python
+with client.speak.v1.connect(model="aura-2-asteria-en" encoding="linear16", sample_rate=24000) as connection:
+    connection.send_text(SpeakV1Text(text="Hello from the websocket API."))
+    connection.send_flush()
+    connection.send_close()
+    connection.start_listening()
+```
+
+## Related Helper
+
+If you need pronunciation and pause control, build the `text` value with `deepgram.helpers.TextBuilder`. See `/docs/api-reference/helpers` and `/docs/text-builder`.
diff --git a/docs/codedocs/api-reference/voice-agent-self-hosted.md b/docs/codedocs/api-reference/voice-agent-self-hosted.md
new file mode 100644
index 00000000..69fffe06
--- /dev/null
+++ b/docs/codedocs/api-reference/voice-agent-self-hosted.md
@@ -0,0 +1,81 @@
+---
+title: "Voice Agent And Self-Hosted"
+description: "Reference for reusable voice-agent assets and self-hosted distribution credentials."
+---
+
+This page covers the two control-plane domains that sit outside the core runtime websocket: reusable voice-agent assets and self-hosted distribution credentials.
+
+## Imports
+
+```python
+from deepgram import DeepgramClient
+```
+
+## `voice_agent.configurations`
+
+Source: `src/deepgram/voice_agent/configurations/client.py`
+
+```python
+list(project_id: str, *, request_options: RequestOptions | None = None) -> ListAgentConfigurationsV1Response
+create(project_id: str, *, config: str, metadata: dict[str, str] | None = OMIT, api_version: int | None = OMIT, request_options: RequestOptions | None = None) -> CreateAgentConfigurationV1Response
+get(project_id: str, agent_id: str, *, request_options: RequestOptions | None = None) -> AgentConfigurationV1
+update(project_id: str, agent_id: str, *, metadata: dict[str, str], request_options: RequestOptions | None = None) -> AgentConfigurationV1
+delete(project_id: str, agent_id: str, *, request_options: RequestOptions | None = None) -> DeleteAgentConfigurationV1Response
+```
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `project_id` | `str` | — | Project that owns the config. |
+| `config` | `str` | — | JSON string representing the `agent` block from a settings message. |
+| `metadata` | `dict[str, str] \| None` | omitted | Labels for organization or deployment metadata. |
+| `api_version` | `int \| None` | omitted | Configuration API version. |
+| `agent_id` | `str` | — | Stored configuration UUID. |
+
+## `voice_agent.variables`
+
+Source: `src/deepgram/voice_agent/variables/client.py`
+
+```python
+list(project_id: str, *, request_options: RequestOptions | None = None) -> ListAgentVariablesV1Response
+create(project_id: str, *, key: str, value: Any, api_version: int | None = OMIT, request_options: RequestOptions | None = None) -> AgentVariableV1
+get(project_id: str, variable_id: str, *, request_options: RequestOptions | None = None) -> AgentVariableV1
+delete(project_id: str, variable_id: str, *, request_options: RequestOptions | None = None) -> DeleteAgentVariableV1Response
+update(project_id: str, variable_id: str, *, value: Any, request_options: RequestOptions | None = None) -> AgentVariableV1
+```
+
+Variables follow the `DG_<NAME>` style naming noted in the generated docstring and can substitute any JSON value into stored agent configurations.
+
+## `self_hosted.v1.distribution_credentials`
+
+Source: `src/deepgram/self_hosted/v1/distribution_credentials/client.py`
+
+```python
+list(project_id: str, *, request_options: RequestOptions | None = None) -> ListProjectDistributionCredentialsV1Response
+create(
+    project_id: str,
+    *,
+    scopes: DistributionCredentialsCreateRequestScopesItem | Sequence[DistributionCredentialsCreateRequestScopesItem] | None = None,
+    provider: Literal["quay"] | None = None,
+    comment: str | None = OMIT,
+    request_options: RequestOptions | None = None,
+) -> CreateProjectDistributionCredentialsV1Response
+get(project_id: str, distribution_credentials_id: str, *, request_options: RequestOptions | None = None) -> GetProjectDistributionCredentialsV1Response
+delete(project_id: str, distribution_credentials_id: str, *, request_options: RequestOptions | None = None) -> GetProjectDistributionCredentialsV1Response
+```
+
+## Example
+
+```python
+config = client.voice_agent.configurations.create(
+    project_id="PROJECT_ID",
+    config='{"listen":{"provider":{"type":"deepgram","model":"nova-3"}}}',
+    metadata={"team": "ops"},
+)
+
+credential = client.self_hosted.v1.distribution_credentials.create(
+    project_id="PROJECT_ID",
+    scopes=["self-hosted:products"],
+    provider="quay",
+    comment="CI pull credential",
+)
+```
diff --git a/docs/codedocs/architecture.md b/docs/codedocs/architecture.md
new file mode 100644
index 00000000..022f02b6
--- /dev/null
+++ b/docs/codedocs/architecture.md
@@ -0,0 +1,89 @@
+---
+title: "Architecture"
+description: "Understand how the Deepgram Python SDK is assembled from generated clients and hand-written extensions."
+---
+
+The SDK is mostly generated from Deepgram's API definition, but the entry layer and transport hooks are hand-written. That split is important because it explains why the public surface feels consistent while still supporting custom authentication and WebSocket transport overrides.
+
+```mermaid
+graph TD
+  A[DeepgramClient / AsyncDeepgramClient<br/>src/deepgram/client.py] --> B[BaseClient / AsyncBaseClient<br/>src/deepgram/base_client.py]
+  B --> C[SyncClientWrapper / AsyncClientWrapper<br/>src/deepgram/core/client_wrapper.py]
+  C --> D[HttpClient / AsyncHttpClient<br/>src/deepgram/core/http_client.py]
+  B --> E[listen]
+  B --> F[speak]
+  B --> G[read]
+  B --> H[manage]
+  B --> I[auth]
+  B --> J[agent]
+  B --> K[voice_agent]
+  B --> L[self_hosted]
+  E --> M[REST media client]
+  E --> N[Listen socket clients]
+  F --> O[Audio REST client]
+  F --> P[Speak socket clients]
+  J --> Q[Agent socket clients]
+  A --> R[transport.install_transport()]
+  R --> N
+  R --> P
+  R --> Q
+```
+
+## Key Design Decisions
+
+### 1. A hand-written root client wraps generated code
+
+`src/deepgram/client.py` subclasses the generated `BaseClient` and `AsyncBaseClient` from `src/deepgram/base_client.py`. That extra layer exists so the SDK can accept `access_token`, `session_id`, and `transport_factory`, even though the generated base client only knows about `api_key`, headers, timeouts, and the underlying `httpx` client.
+
+This design keeps the generated domain clients stable while letting Deepgram patch real-world integration issues in a small, readable entry point. It also means auth precedence is handled once, near client construction, instead of in every domain method.
+
+### 2. Domain clients are loaded lazily
+
+The generated `BaseClient` stores `_listen`, `_speak`, `_read`, `_manage`, `_auth`, `_agent`, `_voice_agent`, and `_self_hosted` as optional attributes. Each property creates the corresponding client only on first access. That keeps the root client light, avoids import churn at startup, and mirrors the shape of the API surface without forcing every project to import every submodule.
+
+### 3. REST and WebSocket flows share one wrapper
+
+`src/deepgram/core/client_wrapper.py` centralizes environment selection, default headers, timeout lookup, and logging configuration. REST methods call into `HttpClient` or `AsyncHttpClient`, while WebSocket clients call `client_wrapper.get_headers()` and `client_wrapper.get_environment()`. That shared wrapper is why a `session_id` header or authorization override applies consistently to both HTTP requests and realtime connections.
+
+### 4. Retry behavior is centralized in the HTTP layer
+
+`src/deepgram/core/http_client.py` handles retry timing from `Retry-After`, `retry-after-ms`, and `x-ratelimit-reset`, then falls back to exponential backoff with jitter. The generated service clients stay focused on endpoint parameters and response parsing; they do not duplicate retry logic. This is the right split because retry policy is infrastructure, not domain behavior.
+
+### 5. Custom transport support patches generated WebSocket clients globally
+
+`src/deepgram/transport.py` monkey-patches the generated Listen, Speak, and Agent WebSocket modules listed in `_TARGET_MODULES`. That is a pragmatic choice: the generated clients already call `websockets.connect`, so the SDK intercepts those module-level references instead of forking all generated socket code. The trade-off is that transport overrides are process-global, which the docs call out explicitly in the client lifecycle concept page.
+
+## Request And Data Lifecycle
+
+```mermaid
+sequenceDiagram
+  participant App
+  participant Root as DeepgramClient
+  participant Wrapper as ClientWrapper
+  participant Domain as Domain Client
+  participant HTTP as HttpClient / WebSocket
+  participant API as Deepgram API
+
+  App->>Root: construct with api_key or access_token
+  Root->>Wrapper: install headers, session_id, environment
+  App->>Domain: call listen/speak/read/manage method
+  Domain->>Wrapper: resolve headers and environment
+  Domain->>HTTP: send REST request or open websocket
+  HTTP->>API: request with auth + custom headers
+  API-->>HTTP: response or stream event
+  HTTP-->>Domain: parsed body / typed socket payload
+  Domain-->>App: model object, iterator, or event callback
+```
+
+For REST calls, a generated client such as `src/deepgram/listen/v1/media/client.py` converts parameters into request bodies or query strings, then returns parsed Pydantic models. For streaming calls, modules such as `src/deepgram/listen/v2/client.py` build a WebSocket URL, merge `RequestOptions`, and return a socket client whose `start_listening()` loop emits `EventType.OPEN`, `EventType.MESSAGE`, `EventType.ERROR`, and `EventType.CLOSE` from `src/deepgram/core/events.py`.
+
+Incoming WebSocket payloads are parsed in the socket client modules using `construct_type(...)` from the internal unchecked model helper. Unknown message variants are skipped with a warning instead of crashing the stream. That makes the SDK more resilient to protocol evolution, especially for realtime endpoints where the server can add new event shapes over time.
+
+## How The Pieces Fit Together
+
+- Use the root client when you want stable authentication, shared headers, and one place to configure timeouts.
+- Use generated domain clients when you need endpoint coverage; their methods match Deepgram API resources closely.
+- Use socket clients when the endpoint requires bidirectional, realtime traffic; they layer event emission on top of the raw websocket.
+- Use `RequestOptions` when a single request needs custom headers, query parameters, retries, or chunk sizing without changing global client configuration.
+
+The net result is a hybrid SDK: generated enough to keep endpoint coverage broad, but with hand-written seams where Python developers usually need control.
diff --git a/docs/codedocs/client-lifecycle.md b/docs/codedocs/client-lifecycle.md
new file mode 100644
index 00000000..0f351ec8
--- /dev/null
+++ b/docs/codedocs/client-lifecycle.md
@@ -0,0 +1,99 @@
+---
+title: "Client Lifecycle"
+description: "Learn how DeepgramClient and AsyncDeepgramClient handle auth, configuration, sessions, and per-request overrides."
+---
+
+The core abstraction in this SDK is the root client. Everything else hangs off `DeepgramClient` or `AsyncDeepgramClient`, so understanding client construction explains most of the SDK's behavior.
+
+## What It Is
+
+`deepgram.DeepgramClient` and `deepgram.AsyncDeepgramClient` are hand-written subclasses in `src/deepgram/client.py` that extend the generated `BaseClient` classes from `src/deepgram/base_client.py`. They solve three practical problems that the generated code alone does not solve well:
+
+- choosing between API-key auth and bearer-token auth,
+- attaching a stable session identifier to every request and websocket,
+- swapping out the default WebSocket transport when you need a proxy, test double, or alternative runtime.
+
+These root clients relate directly to `DeepgramClientEnvironment`, `RequestOptions`, and the domain clients under `listen`, `speak`, `read`, `manage`, `auth`, `agent`, `voice_agent`, and `self_hosted`.
+
+## How It Works Internally
+
+When you instantiate a client, `src/deepgram/client.py` pulls `access_token`, `session_id`, and `transport_factory` out of `**kwargs` before delegating to the generated base client. It then:
+
+- creates or reuses an `x-deepgram-session-id` header,
+- inserts a placeholder `api_key="token"` if you only passed an access token,
+- overrides `client_wrapper.get_headers()` so REST and WebSocket calls both send `Authorization: bearer <token>`,
+- optionally patches generated websocket modules via `install_transport(...)`.
+
+The generated base classes in `src/deepgram/base_client.py` then create a shared `SyncClientWrapper` or `AsyncClientWrapper`. That wrapper stores the environment, base headers, timeout, and logging config, then exposes lazy properties for each domain client.
+
+## Basic Usage
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient(
+    api_key="YOUR_API_KEY",
+    timeout=30,
+    headers={"X-App-Name": "support-bot"},
+)
+
+response = client.read.v1.text.analyze(
+    request={"text": "The shipment arrived late, but support fixed it quickly."},
+    language="en",
+    sentiment=True,
+    summarize=True,
+)
+
+print(response.results.summary.text)
+```
+
+## Advanced Usage
+
+```python
+from deepgram import AsyncDeepgramClient
+from deepgram.core.request_options import RequestOptions
+
+client = AsyncDeepgramClient(
+    access_token="TEMPORARY_ACCESS_TOKEN",
+    session_id="call-42",
+)
+
+request_options: RequestOptions = {
+    "timeout_in_seconds": 10,
+    "additional_headers": {"X-Trace-Id": "trace-42"},
+    "additional_query_parameters": {"detect_language": ["en", "es"]},
+}
+
+# Any domain method can receive request_options without mutating client-wide config.
+```
+
+<Callout type="warn">If you pass both `api_key` and `access_token`, the SDK will still force bearer-token authorization. That behavior comes from `_apply_bearer_authorization_override(...)` in `src/deepgram/client.py`, so debugging auth issues should start there rather than in the generated REST clients.</Callout>
+
+## Constructor Options
+
+| Option | Type | Default | What it does |
+|--------|------|---------|--------------|
+| `environment` | `DeepgramClientEnvironment` | `DeepgramClientEnvironment.PRODUCTION` | Chooses the base HTTPS and WebSocket endpoints. |
+| `api_key` | `str \| None` | `DEEPGRAM_API_KEY` | Server-side auth credential used by generated clients. |
+| `access_token` | `str \| None` | `None` | Hand-written override that sends bearer auth instead of token auth. |
+| `session_id` | `str \| None` | auto-generated UUID | Added as `x-deepgram-session-id` on every request and websocket. |
+| `headers` | `dict[str, str] \| None` | `None` | Extra headers merged into the base wrapper headers. |
+| `timeout` | `float \| None` | `60` unless a custom client is passed | Request timeout in seconds. |
+| `follow_redirects` | `bool \| None` | `True` | Controls redirect behavior for the default `httpx` client. |
+| `httpx_client` | `httpx.Client` or `httpx.AsyncClient` | generated default | Lets you bring your own transport, pools, proxies, or TLS settings. |
+| `logging` | `LogConfig \| Logger \| None` | `None` | Enables SDK logging in the shared wrapper. |
+| `transport_factory` | callable \| `None` | `None` | Globally replaces generated websocket connector calls. |
+
+## Trade-Offs
+
+<Accordions>
+<Accordion title="Sync vs async clients">
+The sync and async clients expose the same domain hierarchy, but their resource behavior is different. `DeepgramClient` is simpler for scripts, cron jobs, and request-at-a-time server handlers because it uses blocking `httpx.Client` and blocking websocket loops. `AsyncDeepgramClient` is a better fit when you already run an event loop and want to overlap many REST calls or streaming sessions without threads. The trade-off is operational: async gives you more concurrency, but you have to manage application shutdown carefully so pending streams and the async HTTP client do not outlive the loop.
+</Accordion>
+<Accordion title="API key auth vs access token auth">
+API keys are the natural server-side default because the generated base clients already expect them and because they work across the full SDK surface, including manage APIs. Access tokens are better when you need temporary credentials or want tighter scoping, but they are injected through a hand-written override layer rather than the generated model. That makes them convenient, but also means debugging should focus on `src/deepgram/client.py` and the wrapper headers, not just your endpoint code. A practical rule is simple: use API keys for backend services, and mint access tokens with `client.auth.v1.tokens.grant()` when a short-lived credential is the safer boundary.
+</Accordion>
+<Accordion title="Global client config vs RequestOptions">
+Client constructor options are the right place for values that should apply everywhere, such as environment, shared headers, or a standard timeout policy. `RequestOptions` is better for one-off exceptions like a larger `chunk_size`, extra query parameters, or a custom timeout for a single transcription. The advantage is isolation: you can make one noisy or slow request more permissive without weakening the rest of the application. The downside is that per-request overrides are easier to hide inside call sites, so teams should standardize which options belong at client construction and which belong at the edge of an individual request.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/guides/authentication-and-clients.md b/docs/codedocs/guides/authentication-and-clients.md
new file mode 100644
index 00000000..2e1638a3
--- /dev/null
+++ b/docs/codedocs/guides/authentication-and-clients.md
@@ -0,0 +1,114 @@
+---
+title: "Authentication And Clients"
+description: "Set up the Deepgram Python SDK with API keys, temporary tokens, and sync or async clients."
+---
+
+This guide shows the recommended way to bootstrap the SDK, choose the right client class, and switch from long-lived API keys to temporary access tokens when needed.
+
+<Steps>
+<Step>
+
+### Configure the environment
+
+For server-side applications, set `DEEPGRAM_API_KEY` and let the SDK discover it automatically. This works because `BaseClient` in `src/deepgram/base_client.py` defaults `api_key` from the environment.
+
+```bash
+export DEEPGRAM_API_KEY="dg_your_api_key"
+```
+
+</Step>
+<Step>
+
+### Create the root client
+
+Use the sync client for scripts or one-request-at-a-time applications.
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient(
+    timeout=30,
+    headers={"X-App-Name": "docs-example"},
+)
+```
+
+If your app already runs on `asyncio`, use the async client instead.
+
+```python
+from deepgram import AsyncDeepgramClient
+
+client = AsyncDeepgramClient(timeout=30)
+```
+
+</Step>
+<Step>
+
+### Mint a temporary token when you need one
+
+Tokens come from `client.auth.v1.tokens.grant()` in `src/deepgram/auth/v1/tokens/client.py`. Use this when you want a short-lived credential for a downstream client.
+
+```python
+from deepgram import DeepgramClient
+
+issuer = DeepgramClient(api_key="dg_server_key")
+token = issuer.auth.v1.tokens.grant(ttl_seconds=60)
+
+browser_or_edge_client = DeepgramClient(
+    access_token=token.access_token,
+    session_id="session-123",
+)
+```
+
+</Step>
+<Step>
+
+### Make a quick verification call
+
+```python
+response = client.read.v1.text.analyze(
+    request={"text": "The package arrived on time and the customer is happy."},
+    language="en",
+    sentiment=True,
+)
+
+print(response.results.sentiments.average)
+```
+
+</Step>
+</Steps>
+
+## Full Example
+
+```python
+import asyncio
+from deepgram import AsyncDeepgramClient
+
+
+async def main() -> None:
+    client = AsyncDeepgramClient(
+        access_token="TEMPORARY_TOKEN",
+        session_id="support-call-9001",
+    )
+
+    response = await client.read.v1.text.analyze(
+        request={"text": "Please refund the extra shipping charge."},
+        language="en",
+        intents=True,
+        sentiment=True,
+        request_options={
+            "additional_headers": {"X-Trace-Id": "trace-9001"},
+            "timeout_in_seconds": 10,
+        },
+    )
+
+    print(response.results.sentiments.average)
+
+
+asyncio.run(main())
+```
+
+## Why This Pattern Works
+
+- Auth stays at the root client, so domain modules inherit the same headers and session ID.
+- Temporary tokens do not require you to rework the rest of the API surface.
+- `RequestOptions` lets you tune one call without mutating the shared client object.
diff --git a/docs/codedocs/guides/live-transcription.md b/docs/codedocs/guides/live-transcription.md
new file mode 100644
index 00000000..e7357fc4
--- /dev/null
+++ b/docs/codedocs/guides/live-transcription.md
@@ -0,0 +1,120 @@
+---
+title: "Live Transcription"
+description: "Build a realtime transcription pipeline with Listen v1 or Listen v2."
+---
+
+Use this guide when audio arrives live from a microphone, a telephony provider, or a media stream and you need incremental results instead of waiting for a whole file to finish.
+
+<Steps>
+<Step>
+
+### Choose the realtime endpoint
+
+Use Listen v1 if you need the older websocket surface and its broader batch-style query options. Use Listen v2 if your application is conversational and benefits from turn-aware events.
+
+" "Listen v2"]}>
+<Tab value="Listen v1">
+
+```python
+with client.listen.v1.connect(model="nova-3" encoding="linear16", sample_rate=16000) as connection:
+    ...
+```
+
+</Tab>
+<Tab value="Listen v2">
+
+```python
+with client.listen.v2.connect(model="flux-general-en" encoding="linear16", sample_rate=16000) as connection:
+    ...
+```
+
+</Tab>
+</Tabs>
+
+</Step>
+<Step>
+
+### Register event handlers before starting the loop
+
+```python
+from deepgram.core.events import EventType
+
+connection.on(EventType.OPEN, lambda _: print("connected"))
+connection.on(EventType.CLOSE, lambda _: print("closed"))
+connection.on(EventType.ERROR, lambda error: print(f"error: {error}"))
+connection.on(EventType.MESSAGE, lambda message: print(message))
+```
+
+</Step>
+<Step>
+
+### Stream audio in chunks
+
+```python
+def send_audio(path: str) -> None:
+    with open(path, "rb") as handle:
+        while True:
+            chunk = handle.read(4096)
+            if not chunk:
+                break
+            connection.send_media(chunk)
+```
+
+</Step>
+<Step>
+
+### Close the stream cleanly
+
+For Listen v2, send an explicit close message after the final audio chunk so the server can finalize the turn.
+
+```python
+from deepgram.listen.v2.types import ListenV2CloseStream
+
+connection.send_close_stream(ListenV2CloseStream(type="CloseStream"))
+connection.start_listening()
+```
+
+</Step>
+</Steps>
+
+## Complete Example
+
+```python
+import threading
+import time
+from deepgram import DeepgramClient
+from deepgram.core.events import EventType
+from deepgram.listen.v2.types import ListenV2CloseStream, ListenV2TurnInfo
+
+client = DeepgramClient()
+
+with client.listen.v2.connect(
+    model="flux-general-en",
+    encoding="linear16",
+    sample_rate=16000,
+) as connection:
+    def on_message(message):
+        if isinstance(message, ListenV2TurnInfo):
+            print(f"turn={message.turn_index} event={message.event} text={message.transcript}")
+
+    connection.on(EventType.MESSAGE, on_message)
+
+    def producer():
+        with open("audio.wav", "rb") as handle:
+            while True:
+                chunk = handle.read(4096)
+                if not chunk:
+                    break
+                connection.send_media(chunk)
+                time.sleep(0.01)
+        connection.send_close_stream(ListenV2CloseStream(type="CloseStream"))
+
+    threading.Thread(target=producer daemon=True).start()
+    connection.start_listening()
+```
+
+## Operational Notes
+
+- The socket loop is blocking, so send audio from another thread or task if you also need to consume events continuously.
+- Event handlers are the cleanest place to route transcripts into your UI or persistence layer.
+- If your environment cannot use the default websocket stack, configure `transport_factory` on the root client rather than reimplementing the endpoint logic yourself.
diff --git a/docs/codedocs/guides/management-and-automation.md b/docs/codedocs/guides/management-and-automation.md
new file mode 100644
index 00000000..e8bbfa89
--- /dev/null
+++ b/docs/codedocs/guides/management-and-automation.md
@@ -0,0 +1,107 @@
+---
+title: "Management And Automation"
+description: "Automate projects, keys, usage reporting, agent assets, and self-hosted credentials from one client."
+---
+
+The SDK is not only for speech and agent runtime traffic. It also includes administrative APIs for projects, keys, usage, billing, reusable voice-agent assets, and self-hosted distribution credentials.
+
+<Steps>
+<Step>
+
+### Start from the management domains
+
+Administrative features are split across three major roots:
+
+- `client.manage.v1` for project, key, usage, billing, and model APIs,
+- `client.voice_agent` for reusable agent configs and variables,
+- `client.self_hosted.v1` for distribution credentials.
+
+</Step>
+<Step>
+
+### List projects and inspect one project
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+
+projects = client.manage.v1.projects.list()
+project = client.manage.v1.projects.get(project_id=projects.projects[0].project_id)
+print(project.name)
+```
+
+</Step>
+<Step>
+
+### Create operational assets
+
+```python
+project_id = "PROJECT_ID"
+
+key = client.manage.v1.projects.keys.create(
+    project_id=project_id,
+    request={"comment": "automation key", "scopes": ["usage:read"]},
+)
+
+variable = client.voice_agent.variables.create(
+    project_id=project_id,
+    key="DG_SUPPORT_QUEUE",
+    value="priority-escalation",
+)
+```
+
+</Step>
+<Step>
+
+### Pull usage and self-hosted data
+
+```python
+usage = client.manage.v1.projects.usage.get(
+    project_id=project_id,
+    start="2026-04-01",
+    end="2026-04-30",
+    model="nova-3",
+)
+
+credentials = client.self_hosted.v1.distribution_credentials.list(project_id=project_id)
+print(usage.resolution)
+print(credentials.distribution_credentials)
+```
+
+</Step>
+</Steps>
+
+## Complete Example
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+project_id = "PROJECT_ID"
+
+projects = client.manage.v1.projects.list()
+print(f"project count={len(projects.projects)}")
+
+usage = client.manage.v1.projects.usage.get(
+    project_id=project_id,
+    start="2026-04-01",
+    end="2026-04-30",
+    tag="support",
+)
+
+config = client.voice_agent.configurations.create(
+    project_id=project_id,
+    config='{"listen":{"provider":{"type":"deepgram","model":"nova-3"}}}',
+    metadata={"owner": "ops"},
+)
+
+print(usage.resolution)
+print(config.agent_id)
+```
+
+## Recommended Pattern
+
+- Keep runtime and control-plane credentials separate when possible.
+- Use project-level tagging on speech requests so `usage.get(...)` becomes useful later.
+- Treat `voice_agent.configurations` and `self_hosted.v1.distribution_credentials` as deployment assets, not throwaway runtime calls.
diff --git a/docs/codedocs/guides/text-to-speech-workflows.md b/docs/codedocs/guides/text-to-speech-workflows.md
new file mode 100644
index 00000000..761090c7
--- /dev/null
+++ b/docs/codedocs/guides/text-to-speech-workflows.md
@@ -0,0 +1,117 @@
+---
+title: "Text-To-Speech Workflows"
+description: "Generate audio with REST or stream it with WebSockets, including pronunciation control with TextBuilder."
+---
+
+This guide covers the two TTS workflows the SDK supports: one-shot REST generation and realtime streaming synthesis.
+
+<Steps>
+<Step>
+
+### Choose REST or streaming
+
+Use REST when you want a finite audio artifact and can consume bytes as they stream in. Use the websocket when you want to send multiple text messages over one connection or coordinate generation interactively.
+
+</Step>
+<Step>
+
+### Build the text payload
+
+Use plain text for simple prompts, or use `TextBuilder` when pronunciation and pauses matter.
+
+```python
+from deepgram.helpers import TextBuilder
+
+text = (
+    TextBuilder()
+    .text("Take ")
+    .pronunciation("adalimumab", "ˌædəˈljuːməb")
+    .text(" once weekly.")
+    .pause(500)
+    .text(" Contact your clinician if symptoms worsen.")
+    .build()
+)
+```
+
+</Step>
+<Step>
+
+### Generate audio over REST
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+
+chunks = client.speak.v1.audio.generate(
+    text=text,
+    model="aura-2-asteria-en",
+    request_options={"chunk_size": 8192},
+)
+
+with open("output.mp3", "wb") as handle:
+    for chunk in chunks:
+        handle.write(chunk)
+```
+
+</Step>
+<Step>
+
+### Stream audio over WebSocket
+
+```python
+from deepgram.core.events import EventType
+from deepgram.speak.v1.types import SpeakV1Text
+
+with client.speak.v1.connect(model="aura-2-asteria-en" encoding="linear16", sample_rate=24000) as connection:
+    connection.on(EventType.MESSAGE, lambda message: print(type(message).__name__))
+    connection.send_text(SpeakV1Text(text=text))
+    connection.send_flush()
+    connection.send_close()
+    connection.start_listening()
+```
+
+</Step>
+</Steps>
+
+## Complete Example
+
+```python
+from deepgram import DeepgramClient
+from deepgram.core.events import EventType
+from deepgram.helpers import TextBuilder
+from deepgram.speak.v1.types import SpeakV1Text
+
+client = DeepgramClient()
+
+script = (
+    TextBuilder()
+    .text("Welcome back.")
+    .pause(700)
+    .text("Your dosage reminder is ready.")
+    .build()
+)
+
+with client.speak.v1.connect(model="aura-2-asteria-en" encoding="linear16", sample_rate=24000) as connection:
+    audio = []
+
+    def on_message(message):
+        if isinstance(message, bytes):
+            audio.append(message)
+
+    connection.on(EventType.MESSAGE, on_message)
+    connection.send_text(SpeakV1Text(text=script))
+    connection.send_flush()
+    connection.send_close()
+    connection.start_listening()
+
+with open("reminder.raw", "wb") as handle:
+    for chunk in audio:
+        handle.write(chunk)
+```
+
+## When To Prefer Each Path
+
+- Prefer REST when the application wants a file-like result and no bidirectional session logic.
+- Prefer streaming when you want multiple prompts on one connection or need low-latency playback.
+- Prefer `TextBuilder` whenever pronunciation correctness matters more than keeping the text as a single literal string.
diff --git a/docs/codedocs/index.md b/docs/codedocs/index.md
new file mode 100644
index 00000000..62e25d10
--- /dev/null
+++ b/docs/codedocs/index.md
@@ -0,0 +1,116 @@
+---
+title: "Getting Started"
+description: "Start using the Deepgram Python SDK for speech, text, and agent workflows."
+---
+
+The Deepgram Python SDK is the official Python client for Deepgram speech-to-text, text-to-speech, text analysis, management, and voice agent APIs.
+
+## The Problem
+
+- Speech applications usually need separate code paths for batch transcription, live streaming, synthesis, and admin APIs.
+- Realtime voice workflows are hard to wire correctly because WebSocket setup, event handling, and message formats differ by endpoint.
+- Authentication changes between server-side API keys, temporary access tokens, and custom transport requirements can leak into application code.
+- Advanced options such as diarization, summaries, request tags, usage filters, and agent settings are easy to miss when you stay at raw HTTP level.
+
+## The Solution
+
+The SDK wraps Deepgram's API surface behind one root client, exposes sync and async variants, lazily loads domain clients such as `listen`, `speak`, `read`, `manage`, and `agent`, and turns streaming connections into typed socket clients with event callbacks. The hand-written `src/deepgram/client.py` layer adds `access_token`, `session_id`, and `transport_factory` support on top of the generated REST and WebSocket clients.
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+
+with open("audio.wav", "rb") as audio_file:
+    response = client.listen.v1.media.transcribe_file(
+        request=audio_file.read(),
+        model="nova-3",
+        smart_format=True,
+        diarize=True,
+    )
+
+print(response.results.channels[0].alternatives[0].transcript)
+```
+
+<Callout type="info">This is a Python SDK, so the install commands below use Python package managers instead of JavaScript package managers.</Callout>
+
+## Installation
+
+" "pipx"]}>
+<Tab value="pip">
+
+```bash
+pip install deepgram-sdk
+```
+
+</Tab>
+<Tab value="uv">
+
+```bash
+uv add deepgram-sdk
+```
+
+</Tab>
+<Tab value="poetry">
+
+```bash
+poetry add deepgram-sdk
+```
+
+</Tab>
+<Tab value="pipx">
+
+```bash
+pipx install deepgram-sdk
+```
+
+</Tab>
+</Tabs>
+
+If you want the optional `aiohttp`-backed async transport, install the extra:
+
+```bash
+pip install "deepgram-sdk[aiohttp]"
+```
+
+## Quick Start
+
+Set `DEEPGRAM_API_KEY` in your environment, then run the smallest useful transcription example:
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+
+response = client.listen.v1.media.transcribe_url(
+    url="https://dpgr.am/spacewalk.wav",
+    model="nova-3",
+    smart_format=True,
+)
+
+print(response.results.channels[0].alternatives[0].transcript)
+```
+
+Expected output is a transcript string from the first channel and first alternative, for example:
+
+```text
+Yeah, as I say, this mission is a very important step for us.
+```
+
+## Key Features
+
+- Sync and async root clients: `DeepgramClient` and `AsyncDeepgramClient`.
+- Batch speech-to-text via `client.listen.v1.media.transcribe_url()` and `transcribe_file()`.
+- Realtime streaming over typed socket clients for Listen v1, Listen v2, Speak v1, and Agent v1.
+- Text-to-speech over both REST (`speak.v1.audio.generate`) and WebSocket (`speak.v1.connect`).
+- Text intelligence via `read.v1.text.analyze` for sentiment, summaries, topics, and intents.
+- Project, key, usage, billing, self-hosted, and reusable voice-agent configuration APIs.
+- Helper utilities such as `deepgram.helpers.TextBuilder` and per-request overrides through `RequestOptions`.
+
+## Where To Go Next
+
+<Cards>
+  <Card title="Architecture" href="/docs/architecture">See how the root client, wrappers, transports, and domain modules fit together.</Card>
+  <Card title="Core Concepts" href="/docs/client-lifecycle">Understand authentication, batch requests, streaming sockets, and agent workflows.</Card>
+  <Card title="API Reference" href="/docs/api-reference/deepgram-client">Jump to the exact imports, signatures, and constructor options.</Card>
+</Cards>
diff --git a/docs/codedocs/prerecorded-transcription.md b/docs/codedocs/prerecorded-transcription.md
new file mode 100644
index 00000000..9b08a648
--- /dev/null
+++ b/docs/codedocs/prerecorded-transcription.md
@@ -0,0 +1,100 @@
+---
+title: "Prerecorded Transcription"
+description: "Use Listen v1 media methods for URL and file transcription with advanced options and callback support."
+---
+
+Batch transcription in this SDK lives under `client.listen.v1.media`. It is the simplest path when you have a complete file or a hosted recording and want a single response model instead of a live event stream.
+
+## What It Is
+
+`MediaClient` in `src/deepgram/listen/v1/media/client.py` exposes two primary methods:
+
+- `transcribe_url(...)` for media hosted at an accessible URL,
+- `transcribe_file(...)` for raw bytes or iterators of bytes.
+
+Both methods solve the same problem: send prerecorded audio or video to Deepgram's speech-to-text REST API and get back a `MediaTranscribeResponse`. They also accept a large set of analysis options including summaries, topics, intents, speaker diarization, punctuation, and request tagging.
+
+This concept connects directly to `RequestOptions`, Deepgram models such as `nova-3`, and downstream consumers like `read.v1.text.analyze` if you want to run text analysis after transcription.
+
+## How It Works Internally
+
+The generated `MediaClient` converts your keyword arguments into a request body or query string, then delegates to a raw client that performs the HTTP request. The important implementation detail is that `transcribe_file(...)` accepts `bytes`, `Iterator[bytes]`, or `AsyncIterator[bytes]`, so you can stream large local files without loading them fully into memory.
+
+Internally, the shared HTTP stack from `src/deepgram/core/http_client.py` handles timeout resolution, request-body shaping, and retry behavior. If you pass a callback URL, the API may return a request identifier rather than final transcript text, so your application logic should branch on that response mode instead of assuming the transcript is already present.
+
+```mermaid
+flowchart TD
+  A[Audio source] --> B{URL or file?}
+  B -->|URL| C[transcribe_url]
+  B -->|bytes or iterator| D[transcribe_file]
+  C --> E[Raw media client]
+  D --> E
+  E --> F[HttpClient retry + timeout layer]
+  F --> G[Deepgram REST API]
+  G --> H[MediaTranscribeResponse]
+```
+
+## Basic Usage
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+
+response = client.listen.v1.media.transcribe_url(
+    url="https://dpgr.am/spacewalk.wav",
+    model="nova-3",
+    smart_format=True,
+    punctuate=True,
+)
+
+print(response.results.channels[0].alternatives[0].transcript)
+```
+
+## Advanced Usage
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+
+def read_file_in_chunks(path: str, chunk_size: int = 8192):
+    with open(path, "rb") as handle:
+        while True:
+            chunk = handle.read(chunk_size)
+            if not chunk:
+                break
+            yield chunk
+
+response = client.listen.v1.media.transcribe_file(
+    request=read_file_in_chunks("support-call.wav"),
+    model="nova-3",
+    diarize=True,
+    paragraphs=True,
+    utterances=True,
+    detect_entities=True,
+    summarize="v2",
+    tag=["support", "priority-high"],
+    request_options={
+        "timeout_in_seconds": 120,
+        "additional_query_parameters": {"detect_language": ["en", "es"]},
+    },
+)
+```
+
+<Callout type="warn">`transcribe_file(...)` supports generators specifically so you do not have to `read()` large media files into memory first. If you already know a request will complete asynchronously via `callback`, do not write client code that assumes `response.results.channels` is always present.</Callout>
+
+## Choosing URL vs File Input
+
+`transcribe_url(...)` is ideal when the media already lives in cloud storage or a public asset bucket. It minimizes upload time on your side and keeps the application code small. `transcribe_file(...)` is better when the recording lives on local disk, comes from another service in memory, or must not be re-hosted to an external URL.
+
+## Trade-Offs
+
+<Accordions>
+<Accordion title="Hosted URLs vs direct file upload">
+Hosted URLs reduce the amount of data your application has to move because the SDK sends a small JSON request instead of streaming the whole asset through your process. That usually means simpler code and better behavior in serverless environments where upload time matters. Direct file upload is still the better choice when the media is private, generated locally, or too sensitive to copy into a separate storage layer just to call the API. In practice, use URLs for durable shared assets and file upload for just-in-time or local recordings.
+</Accordion>
+<Accordion title="Full response now vs callback later">
+Returning the transcript inline is easier because the application can continue in one request-response flow. Callback mode is better for large files, offline pipelines, or workloads where you do not want a worker blocked waiting for the response body. The cost is complexity: you need an endpoint that can receive the callback and reconcile it to the original job, and your code must treat the initial API response as job submission rather than final transcript data. That trade-off is worth it when throughput matters more than single-call simplicity.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/realtime-streaming.md b/docs/codedocs/realtime-streaming.md
new file mode 100644
index 00000000..2dde24b0
--- /dev/null
+++ b/docs/codedocs/realtime-streaming.md
@@ -0,0 +1,99 @@
+---
+title: "Realtime Streaming"
+description: "Work with typed WebSocket clients for live speech recognition and streaming text-to-speech."
+---
+
+Realtime work in this SDK is built around socket clients returned by `connect(...)` methods. These sockets are still generated from the API definition, but the developer experience is shaped by `EventEmitterMixin` and typed send helpers.
+
+## What It Is
+
+The main streaming entry points are:
+
+- `client.listen.v1.connect(...)` in `src/deepgram/listen/v1/client.py`,
+- `client.listen.v2.connect(...)` in `src/deepgram/listen/v2/client.py`,
+- `client.speak.v1.connect(...)` in `src/deepgram/speak/v1/client.py`.
+
+Each returns a context-managed socket client that exposes `start_listening()`, `recv()`, iteration, and endpoint-specific send helpers such as `send_media()`, `send_text()`, `send_flush()`, or `send_close_stream()`.
+
+These socket clients relate directly to `EventType` in `src/deepgram/core/events.py` and to the optional transport override layer in `src/deepgram/transport.py`.
+
+## How It Works Internally
+
+The `connect(...)` methods build a WebSocket URL from the active environment, encode query parameters, merge authorization and `RequestOptions`, and then call `websockets.connect`. Once connected, the socket client loops over incoming frames, parses JSON messages into typed models with `construct_type(...)`, and emits lifecycle events through `EventEmitterMixin`.
+
+Listen v2 is optimized for conversational turn detection and exposes events such as `ListenV2TurnInfo`. Speak v1 returns a mixed stream of binary audio chunks and metadata events. Unknown JSON messages are skipped with a warning instead of tearing down the entire connection, which is why long-lived streams can survive protocol additions.
+
+```mermaid
+sequenceDiagram
+  participant App
+  participant Connect as connect(...)
+  participant Socket as Socket Client
+  participant API as Deepgram WebSocket API
+
+  App->>Connect: with client.listen.v2.connect(...)
+  Connect->>API: open websocket with headers + query params
+  Connect-->>App: V2SocketClient
+  App->>Socket: on(EventType.MESSAGE, handler)
+  App->>Socket: send_media(...)
+  API-->>Socket: Connected / TurnInfo / Errors
+  Socket-->>App: MESSAGE callbacks
+  App->>Socket: send_close_stream()
+  Socket->>App: CLOSE callback
+```
+
+## Basic Usage
+
+```python
+from deepgram import DeepgramClient
+from deepgram.core.events import EventType
+
+client = DeepgramClient()
+
+with client.listen.v2.connect(
+    model="flux-general-en",
+    encoding="linear16",
+    sample_rate=16000,
+) as connection:
+    connection.on(EventType.MESSAGE, lambda message: print(getattr(message, "type", type(message).__name__)))
+    connection.start_listening()
+```
+
+## Advanced Usage
+
+```python
+from deepgram import DeepgramClient
+from deepgram.core.events import EventType
+from deepgram.speak.v1.types import SpeakV1Text
+
+client = DeepgramClient()
+
+with client.speak.v1.connect(model="aura-2-asteria-en" encoding="linear16", sample_rate=24000) as connection:
+    audio_chunks = []
+
+    def on_message(message):
+        if isinstance(message, bytes):
+            audio_chunks.append(message)
+
+    connection.on(EventType.MESSAGE, on_message)
+    connection.send_text(SpeakV1Text(text="Hello from the streaming TTS API."))
+    connection.send_flush()
+    connection.send_close()
+    connection.start_listening()
+```
+
+<Callout type="warn">Register handlers before calling `start_listening()`. The listening loop emits `OPEN`, then streams messages immediately, and it blocks the current thread until the websocket closes.</Callout>
+
+## Listen v1 vs Listen v2
+
+Use Listen v1 when you want the older speech-to-text websocket with a wider set of query parameters that mirror batch transcription. Use Listen v2 when your application is conversational and you care about turn segmentation, end-of-turn thresholds, or the `flux-general-en` style realtime model behavior shown in the examples.
+
+## Trade-Offs
+
+<Accordions>
+<Accordion title="Event-driven callbacks vs manual recv()">
+Callback-driven code is the better default because `start_listening()` already emits `OPEN`, `MESSAGE`, `ERROR`, and `CLOSE` in a consistent order. That keeps business logic close to the event type and makes it easy to pipe transcripts or audio into another service. Manual `recv()` is still useful in controlled loops, tests, or when you want pull-based backpressure rather than a push callback style. The trade-off is complexity: callback code is easier to start, while explicit `recv()` logic is easier to reason about in finite-state workflows.
+</Accordion>
+<Accordion title="Default websockets transport vs custom transport_factory">
+The built-in transport is the simplest option and keeps you aligned with the generated code paths the SDK expects. `transport_factory` is valuable when your environment needs a proxy-aware transport, a mocked websocket in tests, or an alternate runtime integration, but `src/deepgram/transport.py` applies that patch globally to the process. That means it is powerful and low-friction, but not scoped to a single client instance in the way an `httpx_client` override is. Use it deliberately, especially in applications that create many clients with different transport assumptions.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/text-builder.md b/docs/codedocs/text-builder.md
new file mode 100644
index 00000000..f837016d
--- /dev/null
+++ b/docs/codedocs/text-builder.md
@@ -0,0 +1,86 @@
+---
+title: "Text Builder"
+description: "Use the TextBuilder helper to assemble Deepgram TTS markup with pronunciations, pauses, and SSML conversion."
+---
+
+`TextBuilder` is the main hand-written helper outside the root client layer. It exists because Deepgram's text-to-speech APIs support inline pronunciation and pause controls, but writing those control markers by hand is error-prone.
+
+## What It Is
+
+The helper lives in `src/deepgram/helpers/text_builder.py` and is exported through `deepgram.helpers`. It solves three problems:
+
+- building inline pronunciation JSON snippets without manual string escaping,
+- validating pause durations and IPA values before you hit the API,
+- converting a small SSML subset into Deepgram's inline TTS format.
+
+It relates directly to `client.speak.v1.audio.generate(...)` and `client.speak.v1.connect(...)`, because those methods ultimately need a text string that may contain pronunciation and pause controls.
+
+## How It Works Internally
+
+`TextBuilder` keeps an internal `_parts` list plus counters for pronunciations, pauses, and effective character count. `pronunciation(...)` validates IPA with `validate_ipa(...)`, enforces a 500-pronunciation limit, JSON-encodes the control block, and increments the logical character count by the original word length. `pause(...)` validates the range and increment rules, enforces a 50-pause limit, and appends a `{pause:duration}` marker.
+
+`from_ssml(...)` calls `ssml_to_deepgram(...)`, which strips an outer `<speak>` tag, converts IPA `<phoneme>` tags into inline JSON markers, and maps `<break time="..."/>` tags into pause markers. `build()` finally joins the parts and raises if the effective text exceeds 2000 characters.
+
+```mermaid
+flowchart TD
+  A[TextBuilder] --> B[text()]
+  A --> C[pronunciation()]
+  A --> D[pause()]
+  A --> E[from_ssml()]
+  B --> F[_parts]
+  C --> F
+  D --> F
+  E --> F
+  F --> G[build()]
+  G --> H[Deepgram TTS-ready string]
+```
+
+## Basic Usage
+
+```python
+from deepgram.helpers import TextBuilder
+
+text = (
+    TextBuilder()
+    .text("Take ")
+    .pronunciation("azathioprine", "ˌæzəˈθaɪəpriːn")
+    .text(" twice daily.")
+    .pause(500)
+    .text(" Do not exceed the prescribed dosage.")
+    .build()
+)
+```
+
+## Advanced Usage
+
+```python
+from deepgram.helpers import TextBuilder
+
+ssml = """
+<speak>
+  Welcome back.
+  <break time="700ms"/>
+  The drug name is <phoneme alphabet="ipa" ph="ˌædəˈljuːməb">adalimumab</phoneme>.
+</speak>
+"""
+
+text = TextBuilder().from_ssml(ssml).build()
+print(text)
+```
+
+<Callout type="warn">The helper enforces limits in user space before you call the API: maximum 500 pronunciations, maximum 50 pauses, and maximum 2000 effective characters. Pause duration must stay between 500 and 5000 milliseconds in 100-millisecond increments, so `pause(750)` is invalid even though it looks reasonable.</Callout>
+
+## When To Use It
+
+Use `TextBuilder` any time your application needs stable pronunciation for medical, legal, or brand terminology. It is also the best path when your source material begins as SSML but your delivery target is Deepgram's inline TTS format. If you only ever send plain text, the helper is optional.
+
+## Trade-Offs
+
+<Accordions>
+<Accordion title="TextBuilder vs writing inline markers manually">
+Manual inline markers give you total control over the final string, but they are also easy to break because you have to keep JSON escaping, pause syntax, and character limits straight yourself. `TextBuilder` trades some raw flexibility for safer composition and better validation, which is almost always the right trade in production code. It also makes intent obvious when another engineer reads the code, because `pronunciation(...)` communicates meaning more clearly than a pasted JSON fragment inside a long string. Manual formatting is still acceptable for tiny one-off demos, but it scales poorly once you have many terms to control.
+</Accordion>
+<Accordion title="SSML conversion vs authoring Deepgram format directly">
+SSML conversion is useful when content already comes from another speech system or a CMS that stores SSML fragments. It lets you preserve upstream authoring patterns while still targeting Deepgram's TTS API, and the conversion step is intentionally narrow so the supported tags are predictable. Authoring the Deepgram format directly is simpler when your application owns the content end to end and only needs pronunciations and pauses. In that case, `text()`, `pronunciation()`, and `pause()` are usually clearer than maintaining a parallel SSML representation.
+</Accordion>
+</Accordions>
diff --git a/docs/codedocs/types.md b/docs/codedocs/types.md
new file mode 100644
index 00000000..c3420ef3
--- /dev/null
+++ b/docs/codedocs/types.md
@@ -0,0 +1,123 @@
+---
+title: "Types"
+description: "Important exported Python types, enums, and protocols used across the Deepgram Python SDK."
+---
+
+This project does not export TypeScript interfaces because it is a Python SDK, but it does export a number of Python model types, enums, and protocols that matter when you build typed application code.
+
+## Root Configuration Types
+
+### `DeepgramClientEnvironment`
+
+Source: `src/deepgram/environment.py`
+
+```python
+class DeepgramClientEnvironment:
+    PRODUCTION: DeepgramClientEnvironment
+    AGENT: DeepgramClientEnvironment
+
+    def __init__(self, *, base: str, agent: str, production: str)
+```
+
+Use this when you need to point the SDK at a different Deepgram environment.
+
+### `RequestOptions`
+
+Source: `src/deepgram/core/request_options.py`
+
+```python
+class RequestOptions(TypedDict total=False):
+    timeout_in_seconds: int
+    max_retries: int
+    additional_headers: dict[str, Any]
+    additional_query_parameters: dict[str, Any]
+    additional_body_parameters: dict[str, Any]
+    chunk_size: int
+```
+
+Use this as the final keyword argument on service methods when one call needs different headers, timeout, retries, or response chunk sizing.
+
+## Event Types
+
+### `EventType`
+
+Source: `src/deepgram/core/events.py`
+
+```python
+class EventType(str, Enum):
+    OPEN = "open"
+    MESSAGE = "message"
+    ERROR = "error"
+    CLOSE = "close"
+```
+
+This enum is shared by Listen, Speak, and Agent socket clients. The emitted order comes from the socket clients' `start_listening()` implementations.
+
+## Transport Protocols
+
+### `SyncTransport` and `AsyncTransport`
+
+Source: `src/deepgram/transport_interface.py`
+
+```python
+class SyncTransport(Protocol):
+    def send(self, data: Any) -> None
+    def recv(self) -> Any
+    def __iter__(self) -> Iterator
+    def close(self) -> None
+
+class AsyncTransport(Protocol):
+    async def send(self, data: Any) -> None
+    async def recv(self) -> Any
+    def __aiter__(self) -> Any
+    async def close(self) -> None
+```
+
+These are the contracts your custom transport must satisfy if you pass `transport_factory` to the root client.
+
+## High-Value Exported Models
+
+The root package dynamically exports a very large generated model surface through `src/deepgram/__init__.py`. The most important ones for application code are usually the request and settings models rather than every response schema.
+
+### Agent settings models
+
+```python
+from deepgram.agent.v1.types import (
+    AgentV1Settings,
+    AgentV1SettingsAgent,
+    AgentV1SettingsAgentListen,
+    AgentV1SettingsAgentListenProvider_V1,
+    AgentV1SettingsAudio,
+    AgentV1SettingsAudioInput,
+)
+from deepgram.types.think_settings_v1 import ThinkSettingsV1
+from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi
+from deepgram.types.speak_settings_v1 import SpeakSettingsV1
+from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
+```
+
+Use these when building realtime voice-agent settings payloads.
+
+### Realtime message models
+
+```python
+from deepgram.listen.v2.types import ListenV2CloseStream, ListenV2Connected, ListenV2TurnInfo
+from deepgram.speak.v1.types import SpeakV1Text, SpeakV1Flush, SpeakV1Close
+```
+
+Use these for websocket send and receive flows.
+
+### Text-analysis and transcription request models
+
+```python
+from deepgram.requests.read_v1request import ReadV1RequestParams
+from deepgram.listen.v1.media.types.media_transcribe_response import MediaTranscribeResponse
+```
+
+Use these when you want clearer local typing around request and response payloads.
+
+## Practical Guidance
+
+- Reach for the root client classes and `RequestOptions` first; they are the types that shape every call site.
+- Use agent and websocket models when the method signature expects a model instance, not a plain dictionary.
+- Treat the many generated response models as discoverable types that refine downstream code once you know which endpoint your application truly depends on.
diff --git a/docs/codedocs/voice-agents.md b/docs/codedocs/voice-agents.md
new file mode 100644
index 00000000..09c61c79
--- /dev/null
+++ b/docs/codedocs/voice-agents.md
@@ -0,0 +1,111 @@
+---
+title: "Voice Agents"
+description: "Combine the realtime Agent v1 websocket with reusable voice-agent configurations and variables."
+---
+
+Deepgram exposes two related but distinct agent surfaces in this SDK. The realtime path lives under `client.agent.v1`, while the reusable configuration and variable APIs live under `client.voice_agent`.
+
+## What It Is
+
+The realtime `agent.v1` websocket builds an active conversational session. You send settings, stream user audio, receive transcripts and agent responses, and can update prompt, think, or speak settings during the session. The REST-style `voice_agent` clients manage reusable configuration assets so you do not have to send the full agent configuration inline every time.
+
+This split exists because a live conversation and a reusable agent definition solve different problems. The websocket is session state. The `voice_agent.configurations` and `voice_agent.variables` modules are deployment-time configuration APIs.
+
+## How It Works Internally
+
+`src/deepgram/agent/v1/client.py` opens a websocket against the environment's `agent` URL and returns `V1SocketClient`. That socket exposes methods such as `send_settings`, `send_update_prompt`, `send_update_think`, `send_update_speak`, `send_media`, and `send_function_call_response`.
+
+Inside `src/deepgram/agent/v1/socket_client.py`, outbound models are serialized with `_sanitize_numeric_types(...)` before sending. That helper exists because some integer-like fields in generated models are typed as floats, and the API rejects JSON such as `44100.0` for integer fields like `sample_rate`. On the REST side, `src/deepgram/voice_agent/configurations/client.py` and `src/deepgram/voice_agent/variables/client.py` manage stored agent definitions and template variables by project.
+
+```mermaid
+graph TD
+  A[voice_agent.configurations.create] --> B[Stored agent config]
+  C[voice_agent.variables.create] --> D[Template variables]
+  B --> E[agent.v1 websocket session]
+  D --> E
+  E --> F[send_settings]
+  E --> G[send_media]
+  E --> H[ConversationText / Thinking / Audio events]
+```
+
+## Basic Usage
+
+```python
+from deepgram import DeepgramClient
+from deepgram.agent.v1.types import (
+    AgentV1Settings,
+    AgentV1SettingsAgent,
+    AgentV1SettingsAgentListen,
+    AgentV1SettingsAgentListenProvider_V1,
+    AgentV1SettingsAudio,
+    AgentV1SettingsAudioInput,
+)
+from deepgram.types.speak_settings_v1 import SpeakSettingsV1
+from deepgram.types.speak_settings_v1provider import SpeakSettingsV1Provider_Deepgram
+from deepgram.types.think_settings_v1 import ThinkSettingsV1
+from deepgram.types.think_settings_v1provider import ThinkSettingsV1Provider_OpenAi
+
+client = DeepgramClient()
+
+settings = AgentV1Settings(
+    audio=AgentV1SettingsAudio(
+        input=AgentV1SettingsAudioInput(encoding="linear16", sample_rate=24000)
+    ),
+    agent=AgentV1SettingsAgent(
+        listen=AgentV1SettingsAgentListen(
+            provider=AgentV1SettingsAgentListenProvider_V1(type="deepgram" model="nova-3")
+        ),
+        think=ThinkSettingsV1(
+            provider=ThinkSettingsV1Provider_OpenAi(type="open_ai" model="gpt-4o-mini"),
+            prompt="Keep answers brief.",
+        ),
+        speak=SpeakSettingsV1(
+            provider=SpeakSettingsV1Provider_Deepgram(type="deepgram" model="aura-2-asteria-en")
+        ),
+    ),
+)
+```
+
+## Advanced Usage
+
+```python
+from deepgram import DeepgramClient
+
+client = DeepgramClient()
+project_id = "PROJECT_ID"
+
+config = client.voice_agent.configurations.create(
+    project_id=project_id,
+    config='{"listen":{"provider":{"type":"deepgram","model":"nova-3"}}}',
+    metadata={"team": "support", "tier": "prod"},
+)
+
+client.voice_agent.variables.create(
+    project_id=project_id,
+    key="DG_BRAND_NAME",
+    value="Acme Health",
+)
+
+think_models = client.agent.v1.settings.think.models.list()
+print(think_models.models[0])
+```
+
+<Callout type="warn">Deleting a voice-agent configuration is not just housekeeping. The generated `ConfigurationsClient.delete(...)` docstring explicitly warns that removing a configuration UUID that production traffic still references can cause an outage.</Callout>
+
+## How The Two Surfaces Relate
+
+- `agent.v1` is for active sessions.
+- `voice_agent.configurations` is for reusable agent definitions.
+- `voice_agent.variables` is for template substitution data.
+- `agent.v1.settings.think.models.list()` is a discovery endpoint that helps you choose supported think models before composing settings.
+
+## Trade-Offs
+
+<Accordions>
+<Accordion title="Inline settings vs reusable stored configuration">
+Inline settings are ideal during prototyping because everything needed to start the session lives in the same Python process and the same source file. That makes debugging fast and keeps configuration changes close to the code sending audio into the session. Stored configurations are better for teams that need reviewable assets, shared configuration UUIDs, or environment-specific promotion workflows. The trade-off is governance versus agility: inline settings move faster, while stored configs reduce duplication and make rollouts more deliberate.
+</Accordion>
+<Accordion title="Realtime agent sessions vs voice_agent REST management">
+The websocket is the live execution surface, so use it when the conversation is happening now and timing matters. The REST management APIs are not a replacement for that session; they are control-plane endpoints used to create, read, update metadata, and delete reusable assets by project. Treating them as separate layers produces cleaner systems: build configurations ahead of time, then reference them from the realtime path. Trying to do both jobs in one layer usually creates brittle startup logic and makes rollbacks harder.
+</Accordion>
+</Accordions>