Skip to content

feat(voice): add voice input with local whisper transcription and claude cleanup#25

Open
tyulyukov wants to merge 1 commit intomainfrom
marcode/voice-prompting-flow
Open

feat(voice): add voice input with local whisper transcription and claude cleanup#25
tyulyukov wants to merge 1 commit intomainfrom
marcode/voice-prompting-flow

Conversation

@tyulyukov
Copy link
Copy Markdown
Owner

Summary

  • Server-side voice transcription: Added Effect-based TranscriptionService with local Whisper model support via HuggingFace transformers
  • Child process architecture: Whisper inference runs in a forked child process to avoid blocking the main thread during model loading/inference
  • Model management: Automatic installation of Whisper models (tiny/base/small/medium) on first use, cached in ~/.marcode/whisper-models/
  • LLM cleanup: Optional Claude-powered cleanup of raw transcription (removes filler words, fixes grammar while preserving technical terms)
  • Voice settings: New VoiceSettingsSection in settings for model selection, language choice, and cleanup toggle
  • Web integration: Voice recording hook (useVoiceRecording), mic button in composer, keybinding mod+shift+v to toggle recording
  • WebSocket API: New methods for transcription, cleanup, model install/delete; push channel for download progress
  • Persistent context: Additional directories now removed from attachments popover (refactored to DirectoryPickerPopover in composer footer)

Testing

  • Run bun fmt, bun lint, bun typecheck to verify code quality
  • Voice recording toggle via mod+shift+v keybinding
  • Test transcription with audio input in different languages (if enabled)
  • Verify model installation flow and progress updates
  • Test cleanup toggle with and without Claude cleanup enabled
  • Verify additional directories picker still works in composer footer
  • Check WebSocket test expectations updated for new whisper field in server config

- Server: Add LocalWhisperTranscriptionService with Xenova/whisper model support, model caching, and child process management
- Web: Add VoiceMicButton and useVoiceRecording hook with real-time frequency visualization during recording
- Contracts: Add transcription schemas and WebSocket methods for transcribe/cleanup/install/delete
- Settings: Add voice language, model selection, and optional LLM cleanup toggle (Claude polishes raw transcription)
- Keybindings: Add mod+shift+v to toggle voice recording
- Audio: Add WAV-to-PCM decoding for Whisper inference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant