feat: Audio (Voice) mode + model routing & selection redesign#409
Open
alichherawalla wants to merge 138 commits into
Open
feat: Audio (Voice) mode + model routing & selection redesign#409alichherawalla wants to merge 138 commits into
alichherawalla wants to merge 138 commits into
Conversation
Implements on-device text-to-speech using OuteTTS 0.3 (454 MB) + WavTokenizer (73 MB) via llama.rn, with react-native-audio-api for playback. Two interface modes (user-switchable from Settings): - Chat Mode: play/stop TTSButton on each assistant message bubble - Audio Mode: waveform bubbles with auto-TTS after streaming, transcript expand, speed cycling, and PCM audio persisted to disk per message for repeat playback New files: - src/constants/ttsModels.ts — model URLs, RAM thresholds, cache config - src/services/ttsService.ts — download, load, generate, persist, play - src/stores/ttsStore.ts — Zustand store with Chat + Audio Mode actions - src/hooks/useTTS.ts — convenience hook with RAM gate and weighted progress - src/components/TTSButton/index.tsx — Chat Mode play/stop per message - src/components/AudioMessageBubble/index.tsx — waveform bubble component - src/screens/TTSSettingsScreen/index.tsx — download, mode, speed, cache Modified: - Message type: audioPath, waveformData, audioDurationSeconds, isGeneratingAudio - ChatMessage: Audio Mode branch + TTSButton in meta row - SettingsScreen: Text to Speech nav row - Navigation: TTSSettings route - stores/index.ts, services/index.ts: exports Tests: 42 unit + integration tests covering service, store, and full flows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Revert ChatMessage to main (avoids pre-existing complexity lint failure when the file enters the push-range diff) - Add Audio Mode + TTSButton to MessageRenderer instead — clean, under limit - Move audioPath/waveformData/audioDurationSeconds/isGeneratingAudio fields from types/index.ts to types/tts.ts via module augmentation (keeps index.ts under the 350-line max) - Add react-native-audio-api global mock to jest.setup.ts so all test suites that transitively import ttsService can resolve the native module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In finalizeStreamingMessage, after addMessage() saves the assistant reply, check if Audio Mode is active and model is loaded — if so, fire useTTSStore.generateAndSave() in the background so the waveform bubble auto-generates instead of spinning indefinitely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, TTSButton placement Critical fixes for TTS Audio Mode: - Add updateMessageAudio() to chatStore — writes audioPath, waveformData, audioDurationSeconds, isGeneratingAudio back to the conversation message (without this, the waveform bubble spun forever after generation) - Wire auto-TTS trigger in useChatScreen via useEffect on isStreamingForThisConversation: detects streaming → stopped, checks Audio Mode + model loaded, calls triggerAudioModeGeneration() which sets isGeneratingAudio:true, fires generateAndSave, then writes audio fields or clears the flag on error - Fix isGenerating logic: show spinner only when isGeneratingAudio===true, not for every assistant message missing audioPath (which made all old messages spin forever in Audio Mode) - Fix TTSButton placement: add metaExtra prop to ChatMessage/MessageMetaRow so TTSButton renders inline in the timestamp row rather than below the bubble Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Voice row (volume icon + Chat/Audio/N/A badge) to the quick settings popover in the chat input. Tapping it: - Toggles between Chat and Audio mode when models are downloaded - Auto-loads/unloads the TTS model on switch - Navigates to TTSSettings when models are not yet downloaded This makes Audio Mode accessible without leaving the chat screen. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ChatInput test mock for src/stores was missing useTTSStore, causing Popovers.tsx (which now uses useTTSStore) to throw on render. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. checkDownloadStatus() never called on TTSSettingsScreen mount → store always showed models as not downloaded after fresh app start 2. speak() race condition: stop() during generation didn't prevent playback → set isSpeakingFlag=true before generate(), check it after, use finally 3. RNFS.stat() on directory reports block size (~0), not total file size → replaced with readDir() recursive sum of individual .pcm file sizes 4. Historical messages without audio showed broken play button in Audio Mode → AudioMessageBubble only rendered when msg.audioPath || msg.isGeneratingAudio Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaced stat() mock with readDir() mocks matching the new recursive file-size summation approach. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nto feat/tts-implementation
Replaces slider controls with a [–] value [+] stepper row for precise numeric input in settings screens. Supports min/max/step, optional decimal formatting, and testID for E2E automation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes @react-native-community/slider from GenerationSettingsModal, ModelSettingsScreen, and TTSSettingsScreen. Every numeric control (temperature, top-p, GPU layers, speed, etc.) now uses the stepper for touch-friendly precise adjustment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MediaAttachment gains audioFormat and audioDurationSeconds fields
- audioRecorderService.stopRecording() now returns { path, durationSeconds }
instead of just the path, enabling accurate audio bubble scrubbing
- ChatInput/Attachments.addAudioAttachment stores the duration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…send In Audio Mode, user voice recordings now appear as right-aligned audio bubbles instead of text messages, making both sides of the conversation audio-native. - Voice.ts: adds file-based transcription path (audioRecorderService + whisperService.transcribeFile) and onAutoSend callback for atomic send with audio attachment. Multimodal models skip transcription entirely. - ChatInput: passes onAutoSend in Audio Mode; builds MediaAttachment inline to avoid async state-update race; uses attachmentsRef for sync reads. - AudioMessageBubble: adds isUser prop for right-aligned primary-tinted style. - MessageRenderer: renders user audio attachments as AudioMessageBubble before the normal message path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The streaming-complete useEffect only listed isStreamingForThisConversation in its deps, so activeConversation was captured stale. When streaming ended, the last message was always the old value — TTS generation was never triggered. Fix: read conversation and last message directly from useChatStore.getState() inside the effect instead of relying on the closed-over activeConversation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When no Whisper model is installed and the user taps the mic, show a CustomAlert offering to download Whisper Small (466 MB) immediately, rather than navigating away to VoiceSettings. UnavailableButton also now shows a download icon + percentage while the model is being fetched, so feedback is in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a TEXT TO SPEECH section alongside IMAGE GENERATION and TEXT GENERATION in the chat settings modal. Shows mode toggle (chat/audio), enable switch, speed stepper, and auto-play toggle. Deep-links to TTSSettingsScreen for full configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WHISPER_MODELS grows from 5 to 10 entries covering English-only and Multilingual variants for tiny/base/small/medium, plus Large v3 Turbo and Large v3. whisperService.downloadFromUrl(url, modelId) downloads any ggml .bin file from an arbitrary URL — enables installing community models from HuggingFace. whisperStore exposes it as downloadFromUrl action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites the voice settings screen with three sections: - Active model card with inline download progress and remove action - Curated models grouped by English-only / Multilingual (all sizes, tiny → large-v3) - Live HuggingFace search bar (500 ms debounce) that queries ASR repos; tap a repo to expand and browse its ggml .bin files; tap a file to confirm and download via downloadFromUrl huggingFaceService gains searchWhisperRepos() and getWhisperFiles() to power the HF search without coupling to the LLM model browser. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
llmMessages builds an input_audio content block from audio attachments when the active model reports audio support, bypassing Whisper entirely. llm.ts exposes getMultimodalSupport() so the voice layer can detect this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ttsStore: adds interfaceMode, speed, autoPlay, enabled settings; generateAndSave flow for Audio Mode; updateMessageAudio - ttsService: OuteTTS generate+save path for AI audio bubbles - TTSButton: play/stop per-message with generation spinner - KokoroTTSManager + kokoroModels: scaffold for Tier 1 Kokoro TTS (not yet wired to react-native-executorch, marked not started) - App.tsx: mounts KokoroTTSManager near root - packages: react-native-executorch, background-downloader, dr.pogodin/react-native-fs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ChatMessage: long-press action sheet gains Speak option (delegates to ttsStore) - ModelSettingsScreen: suppress pre-existing exhaustive-deps lint warning - Tests: update GenerationSettingsModal and ModelSettingsScreen tests for NumericStepper (gpu-layers-stepper-increment) replacing slider testIDs - TTS_IMPLEMENTATION_PLAN: rewritten to reflect Audio Mode bidirectional voice conversation, stale closure fix, and implementation status Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sages
Two bugs causing broken Audio Mode:
1. AudioRecorder was recording at the system default rate (~44.1 kHz),
producing WAV that Whisper interprets as static ('TV static' / [SOUND]).
Fix: pass a preset with sampleRate:16000, BitDepth.Bit16 so the file
is Whisper-compatible 16 kHz mono int16 PCM from the start.
2. buildOAIMessages was always including audio attachments as input_audio
content blocks, even for models that don't support audio input (e.g.
remote Qwen 3.5 2B / Gemma 42B). Those models replied 'I cannot hear
audio'. Fix: buildOAIMessages now accepts supportsAudio flag (default
false) and only emits input_audio parts when the model declares audio
support. llm.ts passes multimodalSupport.audio when calling it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
playFromFile was treating WAV bytes as raw Float32 PCM — designed for OuteTTS output only. WAV files have a 44-byte RIFF header plus int16 samples; reinterpreting them as Float32 produces pure static. Fix: use AudioContext.decodeAudioData(filePath) which properly parses the WAV header and decodes samples. The file:// prefix is added if missing. MessageRenderer now wraps user and assistant audio bubbles in a container View with paddingHorizontal:16 and marginVertical:8, matching the ChatMessage container layout so bubbles align correctly with the chat edges instead of touching screen borders. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audio type attachments were falling through to the FadeInImage branch, causing Image to try to load the WAV file path — resulting in a broken image placeholder that stretched the user bubble very wide (the 'super long' bubble issue). Audio attachments now render as a compact mic icon + 'Voice message' badge (matching the document badge style), keeping the bubble compact. In Audio Mode they never reach this code — they render as AudioMessageBubble. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add isAudioModeMessage to Message type and updateMessageAudio signature. Set flag in triggerAudioModeGeneration so mode switches don't reformat old text messages. MessageRenderer now checks msg.isAudioModeMessage instead of global ttsMode for assistant audio bubbles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 2: handlePlayPause calls speak() for AI bubbles (empty audioPath) instead of playMessage with empty string. Remove isGenerating spinner. Bug 3: WaveformBars gets flex:1 + overflow:hidden, WAVEFORM_BARS 40→28, bubble overflow:hidden, maxWidth 80%→88%. Bug 4: user bubble flips play row order (speed+duration left, play right). Bug 5: voice cycling chip on AI bubbles reads/writes kokoroVoiceId. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix guard: was checking isModelLoaded (OuteTTS, always false) instead of kokoroReady — so isAudioModeMessage was never stamped and all AI messages rendered as text in audio mode - Add sentence-level streaming TTS: Kokoro now starts speaking each sentence as soon as LLM finishes generating it, instead of waiting for the full response - Fix waveform invisible in idle state: min bar height 3→6px and empty waveform now renders a sine-wave placeholder instead of nearly-invisible flat bars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds memory-rag capability and conversationRagService spec so Jarvis can retrieve relevant context from past conversations and inject it into the system prompt — giving it cross-chat intelligence without requiring the user to repeat themselves. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Stamp isAudioModeMessage BEFORE checking TTS engine readiness — so AI messages always render as audio bubbles even when Kokoro hasn't downloaded yet - Add minWidth: 220 to audio bubble so flex:1 waveform container has space to expand (previously collapsed to 0 since bubble shrinks to content in flex-end alignment) - Audio mode input: hide text pill, show centered VoiceRecordButton with 'Hold to speak' / 'Release to send' hint — clearly communicates the interface mode - User voice recordings now render as AudioMessageBubble in BOTH chat and audio mode — tap play to hear your recording back regardless of which interface is active Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MessageRenderer now renders ALL assistant messages as audio bubbles when interfaceMode=audio (not just isAudioModeMessage-stamped ones), fixing old messages showing as text after enabling audio mode - Removed voiceChip from play row; added dedicated voice row below controls with mic icon + voice name + chevron-right to cycle voices - AudioMessageBubble: streaming-only messages (no audioPath) correctly fall through to speak(transcript) for on-demand playback - ChatInput audio mode: added +/settings buttons back on left side so users can attach photos and configure tools while in audio mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Type the whisper HF jest.fn mocks with rest params so spreading args into them type-checks under tsc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add OuteTTSEngine download tests: routes through the shared background download engine, falls back to RNFS when unavailable, treats truncated on-disk files as not-downloaded, and rejects/cleans up incomplete downloads. Bumps the pro submodule to the rerouted engines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Transcription tab had a custom search bar (icon + border) that looked different from the Text/Image tabs. Reuse the Models screen's shared searchContainer/searchInput styles (and deviceBanner) so the search field is identical across all model tabs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- CLAUDE.md: require searching for and reusing existing components, styles, hooks, and services before building new ones (prevents UI/logic drift like the divergent search field). - docs/design/MODEL_ROUTING.md: design plan for dynamic model routing/orchestration — text-vs-image classification, load-on-demand with memory-budget eviction, STT/TTS as I/O modalities, and a phased rollout grounded in the existing intentClassifier + activeModelService. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- §5.3: default classifier is SmolLM2-135M-Instruct (~100MB, runs on the existing llama.rn runtime, kept pinned/reserved in the budget); heuristics-first; all-MiniLM-L6-v2 (embeddings) noted as the better-per-MB upgrade if a small embeddings runtime is ever added. - §6: routing + the classifier only run when 2+ generation models are available; a single model is used directly with zero overhead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Core of model routing's memory guarantee (docs/design/MODEL_ROUTING.md
§5.1-5.2): keep only what's needed resident in RAM.
- policy.ts (pure): computeBudgetMB derives a RAM budget from device memory
(min of 60% and total-minus-1.5GB headroom); planEviction picks victims so
an incoming model fits - generation models (text/image) are mutually
exclusive, pinned models (the ~100MB SMOL classifier) are never evicted,
otherwise LRU.
- index.ts: ModelResidencyManager.ensureResident(spec, {load, unload}) runs
the plan, unloads victims, then loads the target; register() accounts for
already-loaded/pinned models; evictAll() for memory warnings. Load/unload
are injected, so it's decoupled from the text/image/whisper/tts services
and unit-testable.
Not yet wired into the live send path (next step). 11 unit tests cover the
budget math, mutual exclusion, pinned protection, LRU, and the manager.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"Enhance Image Prompts" runs the prompt through a text model, so it can't work without one. Both toggles (Model Settings + the generation settings modal) are now disabled and dimmed with a "Download a text model to enable" hint when no text model is downloaded. The generation service already skips enhancement when no text model is loaded, so this is the matching UI gate. Tests updated to seed a text model where the enabled state is asserted, plus a new test for the disabled-without-text-model case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route text + image model loads through modelResidencyManager so memory is managed by a single budget (docs/design/MODEL_ROUTING.md), overriding the old hardcoded "<=4GB unloads text" logic: - activeModelService load paths call makeRoomFor() to evict (LRU, by estimated runtime RAM) and fit the device budget before loading, register the model on load, and release on unload. - The old per-load critical-memory gate is replaced by the residency budget: a model that can't fit even after eviction is blocked. - Policy is budget-driven, not hard mutual-exclusion: a high-RAM device can keep a text + image model co-resident if they fit; a constrained device evicts to make room. makeRoomFor reports whether the model fits. Tests updated for the budget-driven behaviour (co-resident when it fits, evict/block when it doesn't); residency manager beforeEach reset added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tting The fast/optimised swap-strategy setting is obsolete now the residency manager owns model swapping (docs/design/MODEL_ROUTING.md). Remove it everywhere: - appStore: drop the setting + default; rehydrate strips it from old persisted state. - ModelLoadingStrategy type removed. - intentClassifier: always restore the original text model after classifying (the residency manager fits it back into memory); the "memory mode keeps classifier loaded" branch is gone. - Remove both UI toggles (Model Settings + generation settings modal) and the chat preload's strategy gate. Tests updated/removed accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- appStore.lastTextModelId: persisted preference set when the user picks a text model (in useModelLoading). Unlike activeModelId it is not cleared when the residency manager evicts the model, so routing can reload it on demand. - ChatScreen preloads lastTextModelId in the background on open (when no generation model is already loaded), so the user can start typing while it loads. Foundation for on-demand text-model routing (next: classify + load/select). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the chat has only an image model (or none) loaded and the user sends a message, it now classifies the request and routes correctly instead of always generating an image: - shouldRouteToImageGenerationFn classifies by fast heuristics when no text model is loaded (a chat request returns false instead of forcing image). - handleSendFn: for a chat request with no text model, ensureTextModelForChat loads the last-selected text model (residency evicts the image model to fit) or opens the model selector when none was ever chosen. - startGenerationFn routes by live model state (llmService.isModelLoaded), so a model loaded mid-send generates text rather than mis-routing to image. - ChatMessageArea shows a "Loading <model>" bar above the input while the model loads (also covers the chat-open background preload). Tests cover heuristic routing with no text model and the load-or-select branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When no text model is loaded, route text-vs-image with the configured classifier model (SmolLM2) via the LLM intent classifier for real intelligence, instead of keyword heuristics. Falls back to heuristics only when no classifier model is downloaded. Shows the "Understanding your request..." bar while classifying. The classifier loads through activeModelService, so the residency manager accounts for it; with llama.rn's single context it can't be pinned separately from the main text model, so it stays loaded until a text model is needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
So LLM routing works out of the box: when image-only routing needs a classifier but none is configured, download SmolLM2-135M-Instruct (~100-145MB) in the background via the normal text-model path (visible in the Download Manager) and select it as settings.classifierModelId on completion. Fetches the GGUF from HuggingFace dynamically (prefers Q8_0) so it's robust to exact filenames. Heuristics handle that first turn; subsequent turns use the SMOL model. Guarded against duplicate downloads and no-ops once a classifier is set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DEV_UNLOCK_PRO = __DEV__, which is true in jest, so loadProFeatures always activated Pro and the "no activation without entitlement" assertions failed. Set __DEV__ = false in these suites (restored after) so they exercise the production gating they're meant to verify. Full suite now green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chat showed a full-screen LoadingScreen whenever a model was loading, replacing the conversation. Remove it so the chat stays visible and model loading is shown inline via the "Loading model" bar above the input (the image model already loaded inline). Bumps pro (voice-model spinner). Fixes a test that passed vacuously because the full-screen loading hid the chat body (its loadedSettings was incomplete, so hasPendingSettings was actually true) — now loadedSettings matches the full settings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…STT) Kill the cold-start wait: on launch, preload the user's selected models in priority order, in the background, sequentially (one native load at a time so the UI stays responsive). - modelResidency.canLoadWithoutEviction: a model is preloaded only if it fits the RAM budget without evicting a higher-priority one already warmed, so it self-limits on small devices (text always wins; the rest fill the remainder). - modelPreloader walks text/image/STT (and TTS via the audio.preload hook), loading each only if available + fits + not already loaded. - whisperStore registers the STT model with the residency manager on load so the budget accounts for it. - App boot fires preloadSelectedModels() after the UI is shown (fire-and-forget). Bumps pro (audio.preload hook). Tests cover the fits check, priority ordering, skip-on-no-fit, run-once, and the empty case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a chat request arrived with no text model, routing opened the model selector but dropped the message (the input had already cleared), forcing a retype. Now the message is stashed when the selector opens and replayed automatically once the user picks a text model. - handleSendFn calls setPendingMessage(text, attachments) before aborting. - useChatScreen.handleModelSelect replays the stashed message after the model loads, then clears it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jest.fn mocks were called with args via spread; type them with rest params so tsc passes (babel ran them fine, but the type-check failed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- new audio.onStreamingToken hook fired from chatStore streaming sink; pro consumes it for real-time sentence-by-sentence TTS. - chat-mode "Select text" action: long-press menu opens a selectable sheet for partial copy (gated to chat mode; audio bubbles unaffected). - bump pro submodule (streaming TTS, unified playback, live speed, no-text audio mode, select-text-free transcript handling). - accumulated session work: model routing/residency, warm preload, hardware SoC/NPU detection, generation service, RAG embedding. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntences Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the two bulky model cards with a single compact Models control: a labelled strip of four type icons (Text/Image/Voice/Speech), emerald when that type has an active model. Tap → a manager bottom sheet with drill-in rows: - Text/Image reuse the existing model picker - Speech → Whisper picker (single active STT model, download+select) - Voice → the pro voice picker (Kokoro voices) Adds a reactive voiceSummary to the core ui-mode store (mirrored from pro) so the Voice icon reflects voice state without core importing pro. Bumps pro (single-Kokoro voice picker). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the header's text-model name + image badge with a compact "Models ▾" affordance that opens the same Models bottom sheet used on home. Text/Image rows open the existing chat model selector; Speech/Voice open the Whisper/Kokoro pickers. Decouple ModelsManagerSheet from the home-only LoadingState type so it's screen-agnostic (chat cross-imports the sheets for now; shared extraction is a follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…screens Voice and transcription are now managed entirely via the Models flow (Models tab + home/chat Models sheet), so both standalone settings screens are removed: - delete the two Settings rows, the VoiceSettings/TTSSettings routes, and VoiceSettingsScreen (+ its test) - clean the paths that opened them: chat generation-settings link, and the audio-mode toggle now routes to Models → Voice - bump pro (drops the TTS settings screen registration) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Whisper now tracks every model present on disk (presentModelIds), not just the single active one. The Transcription tab + the Speech picker show each on-disk model as downloaded with the active one checked; tapping a downloaded-but- inactive model selects it (no re-download), download is per-model, and delete is per-model. Adds selectModel / deleteModelById / refreshPresentModels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hook; bump pro ActiveModelsSection was orphaned when the home cards became the Models strip. Remove it (+ its test mock), the audioSummaryLabel HOOKS entry, and fix the autoPlay-referencing tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uting - waveform helpers (meanAbsAmplitude / buildWaveformEnvelope / waveformFromText) - stream playback clock - streamingSpeech coordinator: gating (voice mode + engine ready), thinking is never spoken, queue drains through the engine, trailing-partial flush, reset - whisperStore multi-model: refreshPresentModels / selectModel (no re-download) / deleteModelById (active + non-active) - ttsStore: play→synthesize routing, seek no-op for streaming, setEngine fallback to default, live engine speed on updateSettings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… home Models card Relocate ModelsSummaryRow/ModelsManagerSheet/VoiceModelsSheet/WhisperPickerSheet from src/screens/HomeScreen/components to src/components/models so the home and chat screens consume one shared implementation instead of two parallel copies. Give the collapsed home Models card the surface+shadows.small treatment to match the other home cards. Add RNTL coverage for ModelsSummaryRow and ModelsManagerSheet, and rewrite the VoiceModelsPanel test for the voice-picker behaviour. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add knip.json scoped to the core app (src/**, with App.tsx + tests + the pro loader as entries; pro/ is a separate repo with its own usage and stays out of the project graph). Remove the dead code knip found and grep confirmed unused across both src and pro: - delete unused LoadingOverlay screen component - drop the legacy ChatToolbar alias (only QueueRow is consumed) - remove the dead LoadingScreen chat component and its orphaned import - remove unused exports: items extractQuantization (huggingface has its own), getHook, showLoadingAlert, processSSELines, parseSSEFromText (+ barrel re-export), PILL_ICONS_WIDTH, SYSTEM_PROMPT_RESERVE, CONTEXT_SAFETY_MARGIN, PRO_URL - drop redundant default exports duplicating named ones (DebugLogsScreen, RemoteServerModal, RemoteServersScreen) Kept intentional surfaces knip flags but that are real: the _clear*ForTesting seams, the provider ModelLoadState/ProviderFactory types, ThemeMode, and stripMarkdownForSpeech (consumed by pro, invisible to a src-scoped graph). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The earlier home/chat/settings redesign left several RNTL suites asserting UI
that no longer exists. Update the tests (no app code) to assert the current
behaviour:
- HomeScreen / HomeScreenSpotlight: assert the collapsed ModelsSummaryRow
(Models label + Text/Image/Voice/Speech captions) and that per-model details,
load/unload, eject, and "browse more" now live inside the manager + picker
sheets opened from it; drop the deleted LoadingOverlay mock.
- ChatScreen: header now shows a generic "Models" selector that opens the
manager sheet (then the per-type picker), not the model name inline; reset
lastTextModelId between tests for isolation.
- SettingsScreen: the Voice Transcription and Text to Speech rows were removed;
assert their absence and cover the remaining rows.
- ChatInputModeToggle: not-ready tap navigates to ModelsTab { initialTab:
'voice' }.
- TranscriptionModelsTab: cover the multi-model store API (presentModelIds,
selectModel without re-download, per-model deleteModelById).
Full suite: 5629 passing, tsc + eslint clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Advance the pro pointer to the conflict-resolved merge of feat/email-calendar-tools into fix/kokoro-install-status, so this branch references a pro commit that exists on origin and the pro PR merges cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch builds Audio (Voice) mode end to end and reworks how models are routed, loaded, and selected across the app. The audio feature lives entirely in the private
prosubmodule behind a slot/hook seam, so free builds keep their default behaviour and never link pro code.It is the full
feat/audio-mode-proline of work, branched offfeat/pro-feature-registry(137 commits).What's in it
Voice mode (pro submodule, behind the slot/hook seam)
Model routing & residency
Models UX redesign
Housekeeping
src/components/models/so home and chat use one implementation.Pro stays private
prois a git submodule (offgrid-pro). This repo only tracks its commit pointer — no pro source is committed here. None of the commits in this PR modify theprogitlink or.gitmodules.Verification
tsc --noEmitclean, eslint clean.ggml-hexagon/*.sobinaries are intentionally left unstaged.