Full codebase review — Stage 1 preprocessing + 4DGS pipeline + interactive viewer#2
Full codebase review — Stage 1 preprocessing + 4DGS pipeline + interactive viewer#2adityasingh2400 wants to merge 63 commits into
Conversation
Scaffolds the full 4-stage pipeline so all 4 team members can clone and build in parallel against well-defined interface contracts. What's included: - config.yaml: centralized paths shared by all stages - scripts/utils.py: shared config loader (DRY across stages) - Stage stubs with function signatures + docstrings: - stage1_sync.py (Arshia): audio sync + frame extraction - stage2_colmap.py (Divij): COLMAP pose recovery + LLFF export - stage3_4dgs.py (Aditya): 4DGS training with auto format detection - stage4_viewer.py (Mia): viewer HTTP server - stage5/6: post-MVP stubs for gap repair + temporal polish - validate_contracts.py: checks Contracts A, B, C with colored output - download_demo_scene.py: pre-baked .splat for viewer dev from minute 0 - server/gemini_proxy.py: WebSocket proxy with all 7 Gemini Live tools - viewer/: HTML + JS with orbit/zoom/time controls + Gemini Live client - Makefile: make sync, colmap, train, view, proxy, demo, validate - .gitignore, .env.example, requirements.txt Dual-path contracts (COLMAP binary + LLFF poses_bounds.npy) so 4DGS has two input format options. Stage 3 auto-detects which is available.
Vite-based web viewer that loads Gaussian Splat .ply/.splat files with orbit/zoom controls, time scrubbing, playback speed control, camera presets, keyboard shortcuts, and Director Mode for automated cinematic camera paths. Currently loads a demo splat; will switch to multi-frame mode when 4DGS training output is available via manifest.json.
Single-command script that validates data, trains 4DGS, exports per-frame .ply files, and generates the viewer manifest. Supports --fast mode for quick iteration and --export-only to re-export from existing checkpoints. Also updates temporal grid resolution to 40 (matching 80 actual frames).
- Implement stage2_colmap.py: feature extraction, exhaustive matching, sparse mapper, LLFF export, dense reconstruction (cloud only) - Add --sparse-only and --strategy flags for local vs cloud runs - Add cloud_setup.sh for one-command cloud box provisioning - Update requirements.txt with pycolmap, config.yaml with cloud host Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pycolmap.match_exhaustive aborts with a fatal error when writing matches for large image sets (320 images). Shell out to colmap exhaustive_matcher binary instead, with pycolmap as fallback if binary not on PATH. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pycolmap crashes in nohup environments where PATH is stripped and shutil.which() returns None, forcing fallback to pycolmap which aborts on large datasets. Now calls colmap binary directly for extraction, matching, and mapping. pycolmap kept only for Reconstruction loading and LLFF conversion (read-only, no crash risk). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Set QT_QPA_PLATFORM=offscreen so colmap binary works on headless servers - Support .jpg frames in addition to .png (cloud box has .jpg files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GPU SIFT extraction/matching requires OpenGL which isn't available on the RunPod headless instance. Force CPU mode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matching uses CUDA directly — no OpenGL needed on headless server. CPU matching would take hours; A100 GPU takes ~5 minutes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apt COLMAP has no CUDA, so 'all' (320 imgs) takes hours on CPU. n_per_cam=5 gives 20 images, ~190 pairs, CPU matching in ~5 min. Enough frames to get cross-camera overlap without exhaustive cost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apt COLMAP binary always initializes OpenGL for the matcher regardless of use_gpu flag. pycolmap with device=cpu bypasses OpenGL entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wide-angle multi-camera sports setups have limited cross-camera overlap. Relax init_min_num_inliers, init_min_tri_angle, and abs_pose thresholds so the mapper can seed a reconstruction from sparse cross-cam matches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- image.cam_from_world() replaces image.rotation_matrix() + image.tvec - cam.focal_length is a direct attribute in pycolmap 4.x - pts_world dtype explicit float64 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical fixes: - Output to sparse_/ (not sparse/0/) matching 4DGS loader - Generate points3D_multipleview.ply (triggers MultipleView detection) - Generate poses_bounds_multipleview.npy (correct filename) - Flatten images as imageN.jpg for 4DGS name extraction compatibility - Add point cloud downsampling to <40k points via open3d - Add sparse-to-PLY fallback when dense reconstruction is skipped - GPU SIFT with high-quality settings matching 4DGS multipleviewprogress.sh - Near depth clamped to 0.01 minimum to prevent rendering artifacts
Detect CUDA support from colmap -h output ("without CUDA" string).
Skip patch_match_stereo with clear message instead of hard-failing.
4DGS can train from sparse point cloud (points3D.bin) alone — dense
is optional extra initialization density.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Untrack scene/sparse/0/*.bin, poses_bounds.npy, metadata.json so Aditya can pull COLMAP outputs directly from the repo. Also ignore .jpg frames (previously only .png was excluded). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts: # scripts/stage2_colmap.py
- restructure_for_4dgs.py: rewrites COLMAP images.bin/cameras.bin for 4DGS MultipleView loader (imageN.jpg naming, sequential camera IDs), converts points3D.bin → PLY, handles 3-camera setup (cam01 unregistered) - runpod_setup.sh: full pipeline from git clone to exported per-frame PLYs - configs/: A100-optimized training configs (batch=4, fast=5k/quality=10k iters) - HANDOFF.md: current project state for agent continuity
Fast config reduced to 3k-iter smoke test (data validation only). Quality config back to full 14k iterations with batch=2 to leave VRAM for denser Gaussian populations during densification.
- Added loading and error screens with improved accessibility features. - Introduced scene name display in the header. - Enhanced director mode button functionality and styling. - Updated loading progress display and error handling. - Refined CSS styles for better visual consistency and usability. - Adjusted frame counter formatting for improved readability. - Added event listeners for playback controls and director mode toggling.
… animation, removing unnecessary frequency control checks for camera movement.
…ap filling Replaces GPU-trained 4D reconstruction with instant bullet-time for a single user-selected moment. Gemini analyzes synced multi-camera video to find key moments via natural language, then Nano Banana Pro generates synthetic views between cameras using recursive edge-inward filling with up to 14 reference images. Viewer gets drag-to-rotate image strip mode with real/AI source badges. - bullet_time/ package: schemas, moment_detector, gap_filler, pipeline CLI - viewer: ImageStripPlayer with drag-to-rotate, bullet-time UI mode - server: Gemini Live proxy with find_moment/build_strip/show_strip tools - Removes scanline texture overlay, adds real vs AI-generated frame badges Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The model was generating identical views because the prompt described
abstract geometry ("25% along the arc") instead of visual effects.
New prompt specifies exact degree rotation per step, clockwise subject
rotation, leftward background shift, and frozen pose. Also upgrades
to gemini-3-pro-image-preview for highest quality output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the cold cyberpunk palette (cyan/orange on blue-black) with ReRoute's warm premium aesthetic: rose/amber accents on maroon-tinted dark backgrounds, cream text, DM Sans + Outfit + JetBrains Mono fonts, generous border radii. Viewer: warm dark backgrounds, rose (#D44060) for spatial, amber (#D4956A) for temporal, hover lifts, warm glows, custom scrollbar, gradient buttons. About page: full ReRoute light theme — cream (#FAF6F1) background, maroon (#7A1B2D) accents, white cards with warm shadows and hover effects. DESIGN.md updated to document the new direction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adopt ReRoute warm aesthetic for viewer + marketing page
- Landing page: FREEZEFRAME wordmark + logo + upload zone, nothing else - Fake processing screen: 4 sequential animated steps - Viewer: fullscreen strip, minimal wordmark, 5-bar listening indicator - Theme: cream background, solid maroon accents, no transparency - Sizes scaled up throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…imation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Upload → 5 thumbnail circles appear staggered around central agent circle - Agent circle holds the listening bars, pulses on listening/speaking states - Dashed orbit ring connects thumbnails visually - Thumbnails float independently with subtle animation - On click/voice trigger: thumbnails merge into center with scale+fade - Agent circle absorbs with a brief pulse - triggerMerge() exposed for Phase 3 voice integration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- server/gemini_proxy.py: WebSocket proxy bridging browser ↔ Gemini Live with sportscaster personality, 3 voice tools (describe/explain/navigate), moment catalog loaded on startup, audio/transcript relay - viewer/src/gemini_live.js: mic capture via AudioWorklet, PCM streaming, 24kHz audio playback queue, navigate → boomerang animation, overlay text - viewer/public/pcm-processor.js: Float32→Int16 AudioWorklet processor - viewer/index.html: add #viewer-overlay-text div - viewer/src/styles.css: overlay text styles (output/input/navigate/error) - viewer/src/main.js: import and call connectVoice() after viewer init Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Geometry-first pipeline with fg/bg separation and ghost prevention - Precomputed 4 key moments: Keanu dodge, Kobe fadeaway, roundhouse kick, water throw - 5-camera support with 28-degree gaps - Extreme black/white clamping for smoother boomerang playback - Concurrent Nano Banana polish calls - Depth Anything V2 on MPS for fast local depth estimation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keanu: 101/90/100/100/100, Fadeaway: 437/420/429/444/448, Kick: 733/718/736/739/744, Water: 1132/1123/1137/1139/1142 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrote viewer UI with new styles, controls, splat player, and image-strip player - Updated Gemini Live voice integration and proxy server - Added bullet-time pipeline catalog and VGGT pipeline scripts - Reorganized docs into docs/ directory - Updated .gitignore to exclude large generated directories - Cleaned up deprecated stage1 preprocessing and empty gitkeep dirs
The logo, wordmark, and upload zone were sitting slightly too low on the viewport. Adds margin-top: -60px to #landing-inner to shift the centered content group upward, improving the visual weight distribution on the landing screen.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
* Clean up * fix(vercel): add vercel.json to set rootDirectory=viewer * fix(vercel): remove invalid rootDirectory, scope commands to viewer/ * fix(viewer): remove broken raw_videos symlink Vite's prepareOutDir followed the public/raw_videos symlink (pointing to gitignored ../../raw_videos training data) and failed with ENOENT on Vercel. The symlink is unreferenced in viewer code; safe to drop.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Full codebase review of all work so far:
Stage 1 (Arshia): Audio sync + frame extraction pipeline (
stage1/preprocess.py). Takes 4 raw phone videos, detects clap sync point via librosa onset detection, extracts time-aligned frames as 1-indexed JPGs incam01/-cam04/folders.Stage 3 pipeline (
run_training.py): End-to-end script that validates data, trains 4D Gaussian Splatting, exports per-frame .ply files, and generates the viewer manifest. Supports--fastand--export-onlymodes.Interactive viewer (
viewer/): Vite + Three.js + Spark.js web app with orbit/zoom controls, timeline scrubbing, playback speed control, camera presets with smooth animation, keyboard shortcuts, and Director Mode for automated cinematic camera paths.4DGS training configs (
4DGaussians/arguments/multipleview/replay.pyandreplay_fast.py): Tuned for 4 cameras, 80 frames, 720x1280 vertical video.@greptileai