-
Notifications
You must be signed in to change notification settings - Fork 705
h264 transport with memory2 support #2472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
fe7a7de
bccc44c
4e01f9a
683e6a9
adaf929
c540b21
a19705c
dc578e3
32fea13
91ebd3b
f02984a
0588e34
12a4239
e98edac
ee9a851
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -299,7 +299,7 @@ def _patrol_step(self) -> None: | |
| ) | ||
|
|
||
| annotated = draw_bounding_box( | ||
| image.data.copy(), | ||
| image.require_raw("SecurityModule._detection_step").copy(), | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why change image type? doesn't this break now if using diff transport? you should transport the actual type modules require, feel free to use some shim that reconstructs the image on .data access
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomCC7 I think it's better to leave Image as is. If you want to have an encoded image message, can't you create a new message type? I think this change complicates the current Image class too much. We had this in the past as well where Image supported CUDA storage and the Image was overly complex.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For module-to-module transport cases, since all encode-decode are handled within transport layer, the image shouldn't require an overload. This addition was required to store encoded videoframe directly into recorder so that we enjoy smaller file size. And since we are doing encoding in transport layer, the message type can't be changed in the current design. The other possible choice is to add a dedicated module to convert type, or directly modify recorder's storage process to encode directly. But then we'll need to duplicate the encode logic into separate places.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @paul-nechifor @leshy created a new pr that patches to this pr with a new design. Can check if that's the right direction: #2504. It basically mimic the current way of handling jpeg for transport/storage. Generally I would say the core conflict here is that we are handling two usecases - a new transport that leverage video as a more efficient encoding during transportation and a new storage format for recorder - both need compression but they are in different layer (so we have to somewhat duplicate the encode/decode logic in two places). One possible future cleanup I mentioned to @leshy before would be to reposition the recorder closer to the transport/import layer, so it can naturally record the already-encoded transport payload instead of receiving decoded module messages and encoding again for storage. That would look more like: That may be a cleaner long-term architecture because transport and recording could share the same encoded packet path directly. But it is a larger architectural change. This patch keeps the existing module/recorder model and follows the repo’s current JPEG-style layering while avoiding encoded Image semantics. |
||
| list(best.bbox), | ||
| label=best.name, | ||
| confidence=best.confidence, | ||
|
|
@@ -340,7 +340,7 @@ def _follow_step(self) -> None: | |
| twist = self._visual_servo.compute_twist(best.bbox, latest_image.width) | ||
| self.cmd_vel.publish(twist) | ||
|
|
||
| overlay = latest_image.data.copy() | ||
| overlay = latest_image.require_raw("SecurityModule._follow_step").copy() | ||
| if hasattr(best, "mask") and best.mask is not None: | ||
| mask_bool = best.mask > 0 | ||
| green = np.zeros_like(overlay) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| # Copyright 2026 Dimensional Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from dimos.msgs.sensor_msgs.Image import H264_IMAGE_ENCODING, Image | ||
|
|
||
|
|
||
| class H264ImageCodec: | ||
| """memory2 codec for already-H.264 encoded Image payloads. | ||
|
|
||
| This codec deliberately does not decode pixels. It persists an ``Image`` whose | ||
| ``encoding`` is ``"h264"`` and restores the same encoded image on read. A | ||
| separate H.264 decode session turns the encoded stream back into raw Images | ||
| for visualization or module consumption. | ||
| """ | ||
|
|
||
| CODEC_ID = "h264" | ||
|
|
||
| def encode(self, value: Image) -> bytes: | ||
| if value.encoding != H264_IMAGE_ENCODING: | ||
| raise ValueError( | ||
| f"H264ImageCodec stores encoded Images; got encoding={value.encoding!r}" | ||
| ) | ||
| return value.lcm_encode() | ||
|
|
||
| def decode(self, data: bytes) -> Image: | ||
| image = Image.lcm_decode(data) | ||
| if image.encoding != H264_IMAGE_ENCODING: | ||
| raise ValueError( | ||
| f"H264ImageCodec expected encoded Image; got encoding={image.encoding!r}" | ||
| ) | ||
| return image | ||
|
|
||
|
|
||
| def is_h264_image(image: Image) -> bool: | ||
| return image.encoding == H264_IMAGE_ENCODING | ||
|
|
||
|
|
||
| __all__ = ["H264ImageCodec", "is_h264_image"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| # Copyright 2026 Dimensional Inc. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from pathlib import Path | ||
| import platform | ||
|
|
||
| import numpy as np | ||
| import pytest | ||
|
|
||
| from dimos.memory2.backend import Backend | ||
| from dimos.memory2.codecs.base import codec_from_id, codec_id | ||
| from dimos.memory2.codecs.jpeg import JpegCodec | ||
| from dimos.memory2.store.sqlite import SqliteStore | ||
| from dimos.memory2.video.h264 import H264ImageCodec | ||
| from dimos.msgs.sensor_msgs.Image import H264_IMAGE_ENCODING, Image, ImageFormat | ||
|
|
||
| _SKIP_SQLITE_VEC = platform.machine() == "aarch64" or platform.system() == "Darwin" | ||
|
|
||
|
|
||
| def _raw_image(seq: int, fmt: ImageFormat = ImageFormat.RGB) -> Image: | ||
| data = np.full((2, 2, 3), seq, dtype=np.uint8) | ||
| if fmt == ImageFormat.GRAY: | ||
| data = np.full((2, 2), seq, dtype=np.uint8) | ||
| return Image.from_numpy(data, format=fmt, frame_id="cam", ts=float(seq)) | ||
|
|
||
|
|
||
| def _encoded_image(seq: int, *, key: bool = True) -> Image: | ||
| return Image.encoded( | ||
| data=b"\x00\x00\x00\x01\x65" + bytes([seq]), | ||
| encoding=H264_IMAGE_ENCODING, | ||
| format=ImageFormat.RGB, | ||
| frame_id="cam", | ||
| ts=float(seq), | ||
| codec_metadata={ | ||
| "seq": seq, | ||
| "codec": "h264", | ||
| "bitstream": "annex_b", | ||
| "is_keyframe": key, | ||
| "keyframe_seq": seq if key else 0, | ||
| "pts": seq * 90, | ||
| "width": 2, | ||
| "height": 2, | ||
| "channels": 3, | ||
| "dtype": "uint8", | ||
| }, | ||
| ) | ||
|
|
||
|
|
||
| def test_h264_image_codec_roundtrips_encoded_image() -> None: | ||
| codec = H264ImageCodec() | ||
| image = _encoded_image(1) | ||
|
|
||
| decoded = codec.decode(codec.encode(image)) | ||
|
|
||
| assert decoded == image | ||
| assert decoded.encoding == H264_IMAGE_ENCODING | ||
| assert decoded.codec_metadata["seq"] == 1 | ||
| assert decoded.width == 2 | ||
| assert decoded.height == 2 | ||
|
|
||
|
|
||
| def test_h264_image_codec_rejects_raw_images() -> None: | ||
| codec = H264ImageCodec() | ||
|
|
||
| with pytest.raises(ValueError, match="encoded Images"): | ||
| codec.encode(_raw_image(1)) | ||
|
|
||
|
|
||
| def test_codec_id_and_factory_support_h264_for_image() -> None: | ||
| codec = H264ImageCodec() | ||
|
|
||
| assert codec_id(codec) == "h264" | ||
| assert isinstance(codec_from_id("h264", "dimos.msgs.sensor_msgs.Image.Image"), H264ImageCodec) | ||
|
|
||
|
|
||
| def test_h264_stream_stores_encoded_images_with_normal_backend(tmp_path: Path) -> None: | ||
| if _SKIP_SQLITE_VEC: | ||
| pytest.skip("sqlite-vec extension not loadable here") | ||
| db = tmp_path / "h264.db" | ||
| with SqliteStore(path=str(db)) as store: | ||
| stream = store.stream("cam", Image, codec="h264") | ||
| stored = stream.append(_encoded_image(1), ts=1.0) | ||
| assert stored.data.encoding == H264_IMAGE_ENCODING | ||
| assert stored.data.codec_metadata["seq"] == 1 | ||
|
|
||
| with SqliteStore(path=str(db), must_exist=True) as reopened: | ||
| stream = reopened.stream("cam", Image) | ||
| obs = stream.first() | ||
| assert obs.data.encoding == H264_IMAGE_ENCODING | ||
| assert obs.data.codec_metadata["seq"] == 1 | ||
| assert obs.data.width == 2 | ||
|
|
||
|
|
||
| def test_h264_replay_emits_encoded_images(tmp_path: Path) -> None: | ||
| if _SKIP_SQLITE_VEC: | ||
| pytest.skip("sqlite-vec extension not loadable here") | ||
| store = SqliteStore(path=str(tmp_path / "replay.db")) | ||
| stream = store.stream("cam", Image, codec="h264") | ||
| stream.append(_encoded_image(1), ts=1.0) | ||
| stream.append(_encoded_image(2, key=False), ts=2.0) | ||
|
|
||
| replayed = list(store.replay().streams.cam.iterate()) | ||
|
|
||
| assert [image.encoding for image in replayed] == [H264_IMAGE_ENCODING, H264_IMAGE_ENCODING] | ||
| assert [image.codec_metadata["seq"] for image in replayed] == [1, 2] | ||
|
|
||
|
|
||
| def test_default_image_stream_still_uses_jpeg_codec(tmp_path: Path) -> None: | ||
| if _SKIP_SQLITE_VEC: | ||
| pytest.skip("sqlite-vec extension not loadable here") | ||
| store = SqliteStore(path=str(tmp_path / "jpeg.db")) | ||
| stream = store.stream("rgb", Image) | ||
|
|
||
| source = stream._source | ||
| assert isinstance(source, Backend) | ||
| assert isinstance(source.codec, JpegCodec) | ||
|
|
||
|
|
||
| def test_encoded_images_reject_pixel_operations() -> None: | ||
| image = _encoded_image(1) | ||
|
|
||
| with pytest.raises(ValueError, match="requires raw Image data"): | ||
| image.to_rgb() | ||
| with pytest.raises(ValueError, match="requires raw Image data"): | ||
| image.as_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current transports have
type: ignorefor legacy reasons. You shouldn't copy that. Use proper types.