This repository contains the implementation of an open-vocabulary semantic mapping pipeline based on TALOS and Voxeland. By leveraging the capabilities of LVLMs and category-agnostic segmentation models, TALOS manages to provide a rich semantic descriptions of individual images, unconstrained by the categories which exist on any particular training dataset. These open-vocabulary object detections are integrated over time into a persistent coherent map in Voxeland, which uses a probabilistic voxel-based representation to combine the different detections and represent both the geometry and the semantic categories present in the scene.
Please see the individual repos for TALOS and Voxeland for detailed instructions.