Realtime Gaussian-splat inspector with CUDA/CPU rendering, SigLIP2/CLIP text querying, DINOv2 image-feature querying, queried-feature bounding boxes, and Nerfstudio-style camera-path video export.
Mini Viewer loads Gaussian scenes from .ply files or NumPy folders, overlays bbox scripts through viser_bbox, visualizes aligned feature tensors, queries those tensors with text, image prompts, or precomputed vectors, exports query bounding boxes, and renders camera paths to MP4.
- Scene loading: Inria/3DGS-style
.plyfiles or NumPy folders withcoord.npy,quat.npy,scale.npy,opacity.npy, andcolor.npy. - CUDA renderer:
gsplatbackend with PyTorch CUDA 12.4 wheels. - CPU renderer: torch/numpy fallback backend for CPU-only runs, forced CPU rerendering, or failed CUDA rerenders.
- Feature tensors: aligned
.npy,.npz,.pt, or.pthtensors for SigLIP, CLIP, DINO, or custom embeddings. - SigLIP2 text queries: default text encoder is
google/siglip2-so400m-patch16-512. - CLIP text queries: optional OpenCLIP path through
--feature-type clip. - DINOv2 image queries: default visual encoder is
facebook/dinov2-base; queries use an image path or a precomputed vector. - Feature maps: PCA/preview recoloring of high-dimensional feature tensors.
- Query bbox: threshold the query score, toggle bbox visibility, and export
outputs/query_bbox.json. - Camera paths: place cameras, export a Nerfstudio-style
camera_path.json, and render MP4 video. - Release checks: static validation script, pytest smoke test, GitHub Actions CI,
.gitattributes, and.gitignore.
README.md Main setup and usage guide.
env.yml Full Conda environment: CUDA 12.4, gsplat, viewer, query encoders.
requirements.txt Pip equivalent of env.yml.
pyproject.toml Project metadata, console script, Ruff, and pytest config.
CHANGELOG.md Release notes.
RELEASE_CHECKLIST.md Manual release checklist.
run_viewer.py Main viewer CLI.
core/splat.py PLY/NumPy/aligned-feature loading.
core/renderer.py CUDA/torch/CPU-fallback rendering.
core/viewer.py nerfview/viser integration.
actions/language_feature.py Generic SigLIP/CLIP/DINO/query-vector feature UI.
actions/camera_path.py Camera placement, export, and GUI video rendering.
models/clip_query.py Lazy SigLIP2, CLIP, and DINOv2 query encoders.
scripts/download_siglip2.py SigLIP2 cache helper.
scripts/download_dino.py DINOv2 cache helper.
scripts/render_camera_path.py Headless camera-path renderer.
scripts/smoke_test_loaders.py Data-loader smoke test.
scripts/validate_release.py Static release validator.
Old split files such as requirements-common.txt, requirements-cpu.txt, requirements-cuda124.txt, requirements-language.txt, environment-mini-viewer-*.yml, and README_patch.md are obsolete.
Use this path for a full CUDA-capable workstation/server environment. The same environment can still run CPU mode.
git clone git@github.com:RunyiYang/Mini_Viewer.git
cd Mini_Viewer
conda env remove -n mini-viewer -y || true
conda env create -f env.yml
conda activate mini-viewer
pip install -e .
pip install -e ./viser_bboxValidate the installation:
python - <<'PY'
import torch
print('torch:', torch.__version__)
print('torch cuda build:', torch.version.cuda)
print('cuda available:', torch.cuda.is_available())
try:
import gsplat
print('gsplat: OK')
except Exception as exc:
print('gsplat import failed:', repr(exc))
try:
import transformers
import open_clip
print('query encoder deps: OK')
except Exception as exc:
print('query encoder deps failed:', repr(exc))
PYConda is preferred because it installs ffmpeg, but a pip-only setup is available:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e .
python -m pip install -e ./viser_bboxenv.yml and requirements.txt use the CUDA 12.4 PyTorch wheel as the superset setup. CPU rendering still works from that environment:
python run_viewer.py \
--ply /path/to/scene.ply \
--device cpu \
--backend torchOn a machine where the CUDA gsplat wheel cannot be installed, remove or comment out this line in requirements.txt or env.yml:
gsplat==1.5.3+pt24cu124
Then run only with:
--device cpu --backend torchRequired files:
coord.npy float array, shape (N, 3)
quat.npy float array, shape (N, 4)
scale.npy float array, shape (N, 3)
opacity.npy float array, shape (N,) or (N, 1)
color.npy RGB/color array, float [0,1] or uint8 [0,255]
Optional files:
normal.npy
valid_feat_mask.npy
features.npy / features.npz / features.pt / features.pth
Use --npy-scale-log if NumPy scale.npy values are log-scales and should be exponentiated.
The PLY loader expects Inria/3DGS-style vertex properties such as:
x, y, z
scale_0, scale_1, scale_2
rot_0, rot_1, rot_2, rot_3
f_dc_0, f_dc_1, f_dc_2
opacity
Use --feature-file, --language-feature, or --dino-feature to load an aligned feature tensor. The loader accepts .npy, .npz, .pt, and .pth and will search common keys such as:
features, feature, embeddings, language_feature, clip_features,
siglip_features, dino_features, dinov2_features, image_features, visual_features
The first dimension must match the loaded splat count, the original splat count before masks/downsampling, or the valid_feat_mask.npy count.
python run_viewer.py \
--ply /path/to/scene.ply \
--device auto \
--backend auto \
--port 8080Open:
http://localhost:8080
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--device cuda \
--backend gsplat \
--port 8080python run_viewer.py \
--folder-npy /work/runyi_yang/Worldcept/example/scannetpp_v2_mcmc_3dgs_lang_large/val/09c1414f1b \
--feature-file /path/to/siglip2_features.npy \
--feature-type siglip2 \
--siglip-model google/siglip2-so400m-patch16-512 \
--hf-cache-dir /work/runyi_yang/hf_cache \
--device cuda \
--backend gsplatThe backward-compatible alias also works:
--language-feature /path/to/siglip2_features.npyDINOv2 is visual-only: query it with an image path or a precomputed query vector, not text.
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--dino-feature /path/to/dino_features.npy \
--feature-type dinov2 \
--query-image /path/to/query_crop.png \
--dino-model facebook/dinov2-base \
--hf-cache-dir /work/runyi_yang/hf_cache \
--device cuda \
--backend gsplatInside the viewer, paste another image path into Text prompt / DINO image path and press Query text / image / feature.
This works for SigLIP2, CLIP, DINOv2, or any custom aligned feature space:
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--feature-file /path/to/point_features.npy \
--query-feature /path/to/query_vector.npy \
--feature-type dinov2 \
--device cpu \
--backend torchpython run_viewer.py \
--folder-npy /path/to/scene_folder \
--feature-file /path/to/features.npy \
--device cuda \
--backend gsplat \
--force-cpu-render \
--cpu-fallback-splats 80000CPU fallback is enabled by default. Disable it only when CUDA errors should fail loudly:
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--device cuda \
--backend gsplat \
--no-cpu-render-fallbackPre-download SigLIP2:
python scripts/download_siglip2.py \
--cache-dir /work/runyi_yang/hf_cachePre-download DINOv2:
python scripts/download_dino.py \
--model facebook/dinov2-base \
--cache-dir /work/runyi_yang/hf_cacheCPU model-backed queries are disabled by default because they are slow. Enable them only for debugging:
--enable-feature-model-on-cpuThe Feature Query folder exposes:
- Feature Map: PCA/preview recoloring of loaded aligned features.
- Reset RGB: return to RGB splat colors.
- Normal Map: visualize normals.
- Text prompt / DINO image path: text for SigLIP/CLIP, image path for DINOv2.
- Threshold: score cutoff for recolor/prune/bbox.
- Query text / image / feature: run cosine query.
- Prune by Query: show only selected splats.
- Show query bbox: draw bbox over selected splats.
- Export query bbox: write
outputs/query_bbox.json.
In the viewer:
- Move the camera to the desired pose.
- Press Add Camera.
- Repeat for all keyframes.
- Press Export Cameras.
- Press Render Video.
Default outputs:
outputs/camera_path.json
outputs/render.mp4
Headless render command:
python scripts/render_camera_path.py \
--ply /path/to/scene.ply \
--camera-path outputs/camera_path.json \
--output outputs/render.mp4 \
--device cuda \
--backend gsplatCPU fallback video rendering:
python scripts/render_camera_path.py \
--folder-npy /path/to/scene_folder \
--camera-path outputs/camera_path.json \
--output outputs/render_cpu.mp4 \
--device cpu \
--backend torchInstall once:
pip install -e ./viser_bboxRun with a bbox script:
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--bbox-script docs/bboxes/demo.txt \
--device cudaExample script syntax:
wall0 = Wall(-2, -2, 0, 2, -2, 0, 3.0, 0.18)
door0 = Door(wall0, 0.0, -2.0, 1.0, 0.9, 2.1)
bbox0 = Bbox(Sofa, 0.8, 0.3, 0.7, 0.0, 1.5, 0.9, 1.0)
--ply PATH Load a Gaussian PLY.
--folder-npy PATH Load NumPy splat arrays.
--feature-file PATH Load aligned feature tensor.
--language-feature PATH Backward-compatible alias for --feature-file.
--dino-feature PATH Backward-compatible DINO alias for --feature-file.
--feature-type {siglip2,siglip,clip,dino,dinov2,dino2}
--query-feature PATH Load precomputed query embedding.
--query-image PATH Initial DINO/DINOv2 image query.
--siglip-model MODEL_ID Default: google/siglip2-so400m-patch16-512.
--dino-model MODEL_ID Default: facebook/dinov2-base.
--hf-cache-dir PATH Hugging Face cache path.
--device {auto,cuda,cpu}
--backend {auto,gsplat,torch}
--cpu-render-fallback Retry failed CUDA frames on CPU; enabled by default.
--no-cpu-render-fallback Disable automatic CPU fallback.
--cpu-fallback-splats INT Downsample cap for CPU fallback frames.
--force-cpu-render Render all viewer frames through CPU torch backend.
--max-cpu-splats INT CPU renderer downsample cap.
--enable-feature-model-on-cpu Allow SigLIP/CLIP/DINO encoders on CPU.
--camera-path PATH Camera-path JSON output/input path.
--video-output PATH GUI video output path.
--render-width INT
--render-height INT
--render-fps INT
--render-seconds FLOAT
--bbox-script PATH
--npy-scale-log
--port INT
Both hyphen and underscore aliases are accepted for patched options, for example --folder-npy and --folder_npy.
Pass a valid feature file:
--feature-file /path/to/features.npyThe feature tensor must align with the loaded splats after masks/downsampling.
DINOv2 is not a text model. Use one of these:
--query-image /path/to/query_crop.pngor:
--query-feature /path/to/query_vector.npyThe query vector and loaded point features must share the same final dimension. For example, facebook/dinov2-base image queries are normally 768-D, so aligned DINO splat features should also be 768-D.
Use the CPU fallback controls:
--cpu-render-fallback --cpu-fallback-splats 80000or force CPU rendering:
--force-cpu-render --cpu-fallback-splats 80000Pre-download the model while online, then run with --hf-cache-dir /path/to/hf_cache.
Run the static release check:
python scripts/validate_release.pyRun tests:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytestRun syntax-focused Ruff checks:
ruff check . --select E9,F63,F7,F82Before publishing a GitHub release, run at least one CPU command and one CUDA command on a real scene:
python scripts/smoke_test_loaders.py \
--folder-npy /path/to/scene_folder \
--feature-file /path/to/features.npy \
--device cpu
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--feature-file /path/to/features.npy \
--feature-type siglip2 \
--device cuda \
--backend gsplatAlso test one DINO path if DINO features are part of the release:
python run_viewer.py \
--folder-npy /path/to/scene_folder \
--dino-feature /path/to/dino_features.npy \
--feature-type dinov2 \
--query-image /path/to/query_crop.png \
--device cuda \
--backend gsplatAdd an explicit LICENSE file before publishing a public release. The project owner should choose the license; this patch does not assign legal terms automatically.
- Nerfview for the interactive rendering scaffold.
- Viser for the WebGL viewer frontend.
- GSplat for CUDA Gaussian splat rasterization.
- Hugging Face Transformers for SigLIP2 and DINOv2 model loading.
If you use Mini Viewer in research, please consider citing:
@article{wu2023mars,
author = {Wu, Zirui and Liu, Tianyu and Luo, Liyi and Zhong, Zhide and Chen, Jianteng and Xiao, Hongmin and Hou, Chao and Lou, Haozhe and Chen, Yuantao and Yang, Runyi and Huang, Yuxin and Ye, Xiaoyu and Yan, Zike and Shi, Yongliang and Liao, Yiyi and Zhao, Hao},
title = {MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving},
journal = {CICAI},
year = {2023}
}
@misc{yang2024spectrally,
title = {Spectrally Pruned Gaussian Fields with Neural Compensation},
author = {Runyi Yang and Zhenxin Zhu and Zhou Jiang and Baijun Ye and Xiaoxue Chen and Yifei Zhang and Yuantao Chen and Jian Zhao and Hao Zhao},
year = {2024},
eprint = {2405.00676},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
@article{zheng2024gaussiangrasper,
title = {Gaussiangrasper: 3d language gaussian splatting for open-vocabulary robotic grasping},
author = {Zheng, Yuhang and Chen, Xiangyu and Zheng, Yupeng and Gu, Songen and Yang, Runyi and Jin, Bu and Li, Pengfei and Zhong, Chengliang and Wang, Zengmao and Liu, Lina and others},
journal = {IEEE Robotics and Automation Letters},
year = {2024},
publisher = {IEEE}
}
@article{li2025scenesplat,
title = {SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining},
author = {Li, Yue and Ma, Qi and Yang, Runyi and Li, Huapeng and Ma, Mengjiao and Ren, Bin and Popovic, Nikola and Sebe, Nicu and Konukoglu, Ender and Gevers, Theo and others},
journal = {arXiv preprint arXiv:2503.18052},
year = {2025}
}