feat(audio): add AudioModule for issue #1932 by GuoZhuoRan · Pull Request #2507 · dimensionalOS/dimos

GuoZhuoRan · 2026-06-16T12:47:50Z

Adds mic audio capture and chunked publishing as AudioStamped on an Out stream, mirroring CameraModule. Validated on macOS Apple Silicon at 50 Hz / 20 ms frames with both synthetic (sine tone) and real mic sources.

dimos/msgs/audio_msgs/AudioStamped.py: Python overlay wrapping foxglove_msgs.RawAudio for LCM encode/decode, with from_pcm() and to_numpy() helpers. Flags that builtin_interfaces.Time (not std_msgs.Header) is the wire type, so frame_id is not preserved.
dimos/hardware/sensors/audio/module.py: AudioModule(Module) with AudioConfig(ModuleConfig), async def main() lifecycle, @rpc start/stop, @Skill record_clip.
examples/audio/validate_audio_module.py: LCM round-trip assert + live stream rate/timestamp validation.

Problem

Closes DIM-XXX

Solution

How to Test

Contributor License Agreement

I have read and approved the CLA.

@rpc

Adds mic audio capture and chunked publishing as AudioStamped on an Out stream, mirroring CameraModule. Validated on macOS Apple Silicon at 50 Hz / 20 ms frames with both synthetic (sine tone) and real mic sources. - dimos/msgs/audio_msgs/AudioStamped.py: Python overlay wrapping foxglove_msgs.RawAudio for LCM encode/decode, with from_pcm() and to_numpy() helpers. Flags that builtin_interfaces.Time (not std_msgs.Header) is the wire type, so frame_id is not preserved. - dimos/hardware/sensors/audio/module.py: AudioModule(Module) with AudioConfig(ModuleConfig), async def main() lifecycle, @rpc start/stop, @Skill record_clip. - examples/audio/validate_audio_module.py: LCM round-trip assert + live stream rate/timestamp validation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-06-16T12:53:02Z

Greptile Summary

This PR adds a full audio pipeline to dimos, centered on AudioModule (mic capture via PortAudio or synthetic sine tone, chunked as AudioStamped on an Out stream). In addition to what the PR description highlights, the single new file module.py also ships SpeakerModule, SpeechToTextModule, TextToSpeechModule, and FunVoiceEffectsModule, plus an audio_speech_loopback blueprint — a significantly larger surface than the description implies.

AudioStamped wraps foxglove_msgs.RawAudio for LCM transport with from_pcm() / to_numpy() helpers and acknowledged limitations (no frame_id on the wire).
AudioModule mirrors CameraModule's lifecycle pattern (async def main with yield, @rpc start/stop, @skill record_clip) and supports both real PortAudio capture and a synthetic sine-tone fallback.
The file also introduces SpeakerModule, SpeechToTextModule (whisper.cpp / faster-whisper / openai-whisper with VAD and AEC), TextToSpeechModule (pyttsx3 / macOS say / OpenAI), and FunVoiceEffectsModule (pitch shift, ring mod, bitcrush, echo), all wired together through an autoconnect blueprint.

Confidence Score: 5/5

Safe to merge with the caveat that all new audio modules share some class-level mutable defaults that would corrupt multi-instance deployments.

All new findings are non-blocking quality issues. The two most notable are: FunVoiceEffectsModule._processor and SpeakerModule._stream_lock are class-level objects rather than per-instance — innocuous in the current single-instance blueprints but a latent correctness hazard if these modules are ever instantiated more than once. The validate script also uses time.monotonic() explicitly, making its timestamp-monotonicity assertion vacuously true.

dimos/hardware/sensors/audio/module.py deserves a second pass on the class-level attribute declarations for SpeakerModule and FunVoiceEffectsModule.

Important Files Changed

Filename	Overview
dimos/hardware/sensors/audio/module.py	1701-line new file adding AudioModule, SpeakerModule, SpeechToTextModule, TextToSpeechModule, and FunVoiceEffectsModule, plus autoconnect blueprints. Contains a shared class-level `_FunVoiceProcessor` instance and a shared `threading.Lock` in SpeakerModule that would corrupt multi-instance usage, and record_clip's format documentation is misleading.
dimos/msgs/audio_msgs/AudioStamped.py	New LCM-serializable AudioStamped wrapper with from_pcm factory, to_numpy helper, and lcm_encode/decode. The from_pcm fallback timestamp still uses time.monotonic() (flagged in prior review rounds); otherwise the encode/decode logic is correct.
examples/audio/validate_audio_module.py	Validation script for LCM round-trip and live stream rate. Explicitly passes time.monotonic() to from_pcm() — inconsistent with the production module — making the timestamp monotonicity check trivially vacuous.
dimos/robot/all_blueprints.py	Auto-generated blueprint registry updated to include audio-speech-loopback and demo-audio entries pointing to the new module.
dimos/hardware/sensors/audio/init.py	New empty package init with license header only.
dimos/msgs/audio_msgs/init.py	New empty package init with license header only.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant MIC as Microphone / Synthetic
    participant AM as AudioModule
    participant STT as SpeechToTextModule
    participant TTS as TextToSpeechModule
    participant FVE as FunVoiceEffectsModule
    participant SPK as SpeakerModule

    MIC->>AM: PCM frames (PortAudio callback / asyncio loop)
    AM->>AM: "wrap → AudioStamped (ts=time.time())"
    AM-->>STT: mic_audio Out stream
    STT->>STT: VAD + AEC filter
    STT->>STT: whisper transcription
    STT-->>TTS: speech_text Out stream
    TTS->>TTS: pyttsx3 / macos-say / OpenAI TTS
    TTS-->>FVE: tts_audio_raw Out stream
    TTS-->>STT: tts_reference_audio (AEC ref)
    FVE->>FVE: pitch / ringmod / bitcrush / echo
    FVE-->>SPK: tts_audio Out stream
    SPK->>SPK: sd.OutputStream.write()

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant MIC as Microphone / Synthetic
    participant AM as AudioModule
    participant STT as SpeechToTextModule
    participant TTS as TextToSpeechModule
    participant FVE as FunVoiceEffectsModule
    participant SPK as SpeakerModule

    MIC->>AM: PCM frames (PortAudio callback / asyncio loop)
    AM->>AM: "wrap → AudioStamped (ts=time.time())"
    AM-->>STT: mic_audio Out stream
    STT->>STT: VAD + AEC filter
    STT->>STT: whisper transcription
    STT-->>TTS: speech_text Out stream
    TTS->>TTS: pyttsx3 / macos-say / OpenAI TTS
    TTS-->>FVE: tts_audio_raw Out stream
    TTS-->>STT: tts_reference_audio (AEC ref)
    FVE->>FVE: pitch / ringmod / bitcrush / echo
    FVE-->>SPK: tts_audio Out stream
    SPK->>SPK: sd.OutputStream.write()

_{Reviews (8): Last reviewed commit: "fix: use wall-clock audio timestamps and..." | Re-trigger Greptile}

greptile-apps · 2026-06-16T12:53:09Z

+    @skill
+    def record_clip(self, seconds: float = 1.0) -> bytes:
+        """Record and return a clip of raw PCM audio.
+
+        Collects frames from the live audio stream for `seconds` seconds and
+        returns them concatenated as raw S16LE PCM bytes.
+        """
+        import threading
+
+        buf: list[bytes] = []
+        done = threading.Event()
+        collected = [0.0]
+
+        def on_frame(msg: AudioStamped) -> None:
+            buf.append(msg.data)
+            collected[0] += self.config.frame_ms / 1000.0
+            if collected[0] >= seconds:
+                done.set()
+
+        unsub = self.audio.subscribe(on_frame)
+        done.wait(timeout=seconds + 2.0)
+        unsub()
+        return b"".join(buf)


record_clip silently returns empty bytes if the module is not running

If record_clip is called before start() or after stop(), no frames will ever arrive, done.wait will time out after seconds + 2.0 seconds, and the method returns b"" with no error or log message. Callers have no way to distinguish a successful empty recording from a misconfigured call. At minimum, a log warning on timeout (or a raised exception) would surface the problem.

greptile-apps · 2026-06-16T12:53:10Z

+    def __repr__(self) -> str:
+        n_samples = len(self.data) // (2 if "16" in self.sample_format else 4)
+        return (
+            f"AudioStamped(rate={self.sample_rate}, ch={self.channels}, "
+            f"fmt={self.sample_format}, samples={n_samples}, ts={self.ts:.6f})"
+        )


The n_samples heuristic does not divide by self.channels, so for multi-channel audio the repr reports total interleaved samples (e.g. 320 for 20 ms of stereo 16 kHz) rather than samples per channel (160). The existing byte-width check ("16" in self.sample_format) also silently falls back to 4 bytes/sample for any unknown format string, which could produce a nonsensical count.

Suggested change

def __repr__(self) -> str:

n_samples = len(self.data) // (2 if "16" in self.sample_format else 4)

return (

f"AudioStamped(rate={self.sample_rate}, ch={self.channels}, "

f"fmt={self.sample_format}, samples={n_samples}, ts={self.ts:.6f})"

)

def __repr__(self) -> str:

bytes_per_sample = 2 if "16" in self.sample_format else 4

n_frames = len(self.data) // (bytes_per_sample * self.channels)

return (

f"AudioStamped(rate={self.sample_rate}, ch={self.channels}, "

f"fmt={self.sample_format}, frames={n_frames}, ts={self.ts:.6f})"

)

leshy · 2026-06-16T14:58:02Z

+    audio metadata.  Serialises to/from foxglove_msgs.RawAudio on the wire.
+    """
+
+    msg_name = "foxglove_msgs.RawAudio"  # wire type used for LCM


we don't use foxglove, where does this come from?

Foxglove is not a new dependency; it's already mirrored into dimos_lcm, and RawAudio is the only audio type in there, so I reused it. Left a note that it's a stand-in pending a native Header-bearing type.

leshy · 2026-06-16T14:59:01Z

+    def lcm_encode(self) -> bytes:
+        """Encode to foxglove_msgs.RawAudio wire bytes.
+
+        NOTE: frame_id and seq from self.header are NOT preserved (the wire


ros2 header has no seq

why not preserve frame_id?

My comment is wrong; I will fix this issue. Frame_id does exist, but based on the RawAudio format, it only carries a timestamp; there is no frame_id on the wire type to put in. Preserving a frame_id means adding a header-bearing audio type to dimos-lcm, and we can discuss it today

- Remove all mentions of `seq` (ROS2 std_msgs/Header has no seq field) - Reword frame_id note: dropped because RawAudio has no frame_id field on the wire, not by design choice - Mark foxglove_msgs.RawAudio as a temporary stand-in pending team decision on a native Header-bearing LCM type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-06-17T02:38:27Z

+        ts: float | None = None,
+    ) -> AudioStamped:
+        """Construct from raw PCM bytes."""
+        t = ts if ts is not None else time.monotonic()


The from_pcm factory's fallback timestamp uses time.monotonic(), which returns an opaque system-relative counter (seconds since boot) rather than a Unix wall-clock time. Any caller that omits the ts argument — including external consumers of this public API — will create an AudioStamped whose ts field is near 0–86400 rather than near the Unix epoch (~1.7 × 10⁹). This makes Timestamped.dt() return a date in 1970 and breaks cross-stream alignment with any module that uses time.time().

Suggested change

t = ts if ts is not None else time.monotonic()

t = ts if ts is not None else time.time()

Add demo_audio blueprint to module.py and regenerate all_blueprints.py so AudioModule is accessible via: dimos run demo-audio (blueprint) dimos run audio-module (standalone module) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-06-17T08:33:39Z

+        (TextToSpeechModule, "text", "speech_text"),
+        (TextToSpeechModule, "audio", "tts_audio"),
+        (SpeakerModule, "audio", "tts_audio"),
+    ]
+)
+
+
+demo_audio = autoconnect(
+    AudioModule.blueprint(),


TextToSpeechModule publishes frames with time.monotonic() timestamps

Every chunk published from _worker_loop uses time.monotonic() as its timestamp. time.monotonic() returns a system-relative counter (seconds since boot), not a Unix wall-clock time. Downstream consumers calling Timestamped.dt() will get dates in 1970, and cross-stream alignment with any module that uses time.time() (e.g., CameraModule) will fail. Replace with time.time() to match the rest of the stack.

GuoZhuoRan requested review from leshy, mustafab0, paul-nechifor and spomichter as code owners June 16, 2026 12:47

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

leshy reviewed Jun 16, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

GuoZhuoRan and others added 2 commits June 17, 2026 14:01

feat(audio): register AudioModule in dimos CLI

459ab1f

Add demo_audio blueprint to module.py and regenerate all_blueprints.py so AudioModule is accessible via: dimos run demo-audio (blueprint) dimos run audio-module (standalone module) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(audio): add audio speech loopback pipeline

cbe8411

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

GuoZhuoRan added 4 commits June 17, 2026 16:42

feat(audio): add fun voice effects chain

732e60c

fix(audio): add OpenAI TTS timeout + fail-fast

68858a4

fix: reduce audio loopback echo and STT backlog

49391ee

fix: use wall-clock audio timestamps and clean subscriptions

f36bb77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): add AudioModule for issue #1932#2507

feat(audio): add AudioModule for issue #1932#2507
GuoZhuoRan wants to merge 8 commits into
dimensionalOS:mainfrom
GuoZhuoRan:feat/audio-module-1932

GuoZhuoRan commented Jun 16, 2026

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

leshy Jun 16, 2026

Uh oh!

GuoZhuoRan Jun 17, 2026 •

edited

Loading

Uh oh!

leshy Jun 16, 2026

Uh oh!

GuoZhuoRan Jun 17, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 17, 2026

Uh oh!

greptile-apps Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	t = ts if ts is not None else time.monotonic()
	t = ts if ts is not None else time.time()

Conversation

GuoZhuoRan commented Jun 16, 2026

Problem

Solution

How to Test

Contributor License Agreement

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

leshy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

GuoZhuoRan Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leshy Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

GuoZhuoRan Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

GuoZhuoRan Jun 17, 2026 •

edited

Loading

GuoZhuoRan Jun 17, 2026 •

edited

Loading