Skip to content

Full codebase review — Stage 1 preprocessing + 4DGS pipeline + interactive viewer#2

Open
adityasingh2400 wants to merge 63 commits into
greptile-basefrom
main
Open

Full codebase review — Stage 1 preprocessing + 4DGS pipeline + interactive viewer#2
adityasingh2400 wants to merge 63 commits into
greptile-basefrom
main

Conversation

@adityasingh2400

Copy link
Copy Markdown
Owner

Summary

Full codebase review of all work so far:

  • Stage 1 (Arshia): Audio sync + frame extraction pipeline (stage1/preprocess.py). Takes 4 raw phone videos, detects clap sync point via librosa onset detection, extracts time-aligned frames as 1-indexed JPGs in cam01/-cam04/ folders.

  • Stage 3 pipeline (run_training.py): End-to-end script that validates data, trains 4D Gaussian Splatting, exports per-frame .ply files, and generates the viewer manifest. Supports --fast and --export-only modes.

  • Interactive viewer (viewer/): Vite + Three.js + Spark.js web app with orbit/zoom controls, timeline scrubbing, playback speed control, camera presets with smooth animation, keyboard shortcuts, and Director Mode for automated cinematic camera paths.

  • 4DGS training configs (4DGaussians/arguments/multipleview/replay.py and replay_fast.py): Tuned for 4 cameras, 80 frames, 720x1280 vertical video.

@greptileai

adityasingh2400 and others added 30 commits March 27, 2026 18:06
Scaffolds the full 4-stage pipeline so all 4 team members can
clone and build in parallel against well-defined interface contracts.

What's included:
- config.yaml: centralized paths shared by all stages
- scripts/utils.py: shared config loader (DRY across stages)
- Stage stubs with function signatures + docstrings:
  - stage1_sync.py (Arshia): audio sync + frame extraction
  - stage2_colmap.py (Divij): COLMAP pose recovery + LLFF export
  - stage3_4dgs.py (Aditya): 4DGS training with auto format detection
  - stage4_viewer.py (Mia): viewer HTTP server
  - stage5/6: post-MVP stubs for gap repair + temporal polish
- validate_contracts.py: checks Contracts A, B, C with colored output
- download_demo_scene.py: pre-baked .splat for viewer dev from minute 0
- server/gemini_proxy.py: WebSocket proxy with all 7 Gemini Live tools
- viewer/: HTML + JS with orbit/zoom/time controls + Gemini Live client
- Makefile: make sync, colmap, train, view, proxy, demo, validate
- .gitignore, .env.example, requirements.txt

Dual-path contracts (COLMAP binary + LLFF poses_bounds.npy) so 4DGS
has two input format options. Stage 3 auto-detects which is available.
Vite-based web viewer that loads Gaussian Splat .ply/.splat files
with orbit/zoom controls, time scrubbing, playback speed control,
camera presets, keyboard shortcuts, and Director Mode for automated
cinematic camera paths. Currently loads a demo splat; will switch to
multi-frame mode when 4DGS training output is available via manifest.json.
Single-command script that validates data, trains 4DGS, exports
per-frame .ply files, and generates the viewer manifest. Supports
--fast mode for quick iteration and --export-only to re-export
from existing checkpoints. Also updates temporal grid resolution
to 40 (matching 80 actual frames).
- Implement stage2_colmap.py: feature extraction, exhaustive matching,
  sparse mapper, LLFF export, dense reconstruction (cloud only)
- Add --sparse-only and --strategy flags for local vs cloud runs
- Add cloud_setup.sh for one-command cloud box provisioning
- Update requirements.txt with pycolmap, config.yaml with cloud host

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pycolmap.match_exhaustive aborts with a fatal error when writing matches
for large image sets (320 images). Shell out to colmap exhaustive_matcher
binary instead, with pycolmap as fallback if binary not on PATH.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pycolmap crashes in nohup environments where PATH is stripped and
shutil.which() returns None, forcing fallback to pycolmap which aborts
on large datasets. Now calls colmap binary directly for extraction,
matching, and mapping. pycolmap kept only for Reconstruction loading
and LLFF conversion (read-only, no crash risk).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Set QT_QPA_PLATFORM=offscreen so colmap binary works on headless servers
- Support .jpg frames in addition to .png (cloud box has .jpg files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GPU SIFT extraction/matching requires OpenGL which isn't available
on the RunPod headless instance. Force CPU mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matching uses CUDA directly — no OpenGL needed on headless server.
CPU matching would take hours; A100 GPU takes ~5 minutes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apt COLMAP has no CUDA, so 'all' (320 imgs) takes hours on CPU.
n_per_cam=5 gives 20 images, ~190 pairs, CPU matching in ~5 min.
Enough frames to get cross-camera overlap without exhaustive cost.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apt COLMAP binary always initializes OpenGL for the matcher regardless
of use_gpu flag. pycolmap with device=cpu bypasses OpenGL entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wide-angle multi-camera sports setups have limited cross-camera overlap.
Relax init_min_num_inliers, init_min_tri_angle, and abs_pose thresholds
so the mapper can seed a reconstruction from sparse cross-cam matches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- image.cam_from_world() replaces image.rotation_matrix() + image.tvec
- cam.focal_length is a direct attribute in pycolmap 4.x
- pts_world dtype explicit float64

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical fixes:
- Output to sparse_/ (not sparse/0/) matching 4DGS loader
- Generate points3D_multipleview.ply (triggers MultipleView detection)
- Generate poses_bounds_multipleview.npy (correct filename)
- Flatten images as imageN.jpg for 4DGS name extraction compatibility
- Add point cloud downsampling to <40k points via open3d
- Add sparse-to-PLY fallback when dense reconstruction is skipped
- GPU SIFT with high-quality settings matching 4DGS multipleviewprogress.sh
- Near depth clamped to 0.01 minimum to prevent rendering artifacts
Detect CUDA support from colmap -h output ("without CUDA" string).
Skip patch_match_stereo with clear message instead of hard-failing.
4DGS can train from sparse point cloud (points3D.bin) alone — dense
is optional extra initialization density.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Untrack scene/sparse/0/*.bin, poses_bounds.npy, metadata.json so
Aditya can pull COLMAP outputs directly from the repo.
Also ignore .jpg frames (previously only .png was excluded).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
#	scripts/stage2_colmap.py
- restructure_for_4dgs.py: rewrites COLMAP images.bin/cameras.bin for
  4DGS MultipleView loader (imageN.jpg naming, sequential camera IDs),
  converts points3D.bin → PLY, handles 3-camera setup (cam01 unregistered)
- runpod_setup.sh: full pipeline from git clone to exported per-frame PLYs
- configs/: A100-optimized training configs (batch=4, fast=5k/quality=10k iters)
- HANDOFF.md: current project state for agent continuity
Fast config reduced to 3k-iter smoke test (data validation only).
Quality config back to full 14k iterations with batch=2 to leave
VRAM for denser Gaussian populations during densification.
- Added loading and error screens with improved accessibility features.
- Introduced scene name display in the header.
- Enhanced director mode button functionality and styling.
- Updated loading progress display and error handling.
- Refined CSS styles for better visual consistency and usability.
- Adjusted frame counter formatting for improved readability.
- Added event listeners for playback controls and director mode toggling.
… animation, removing unnecessary frequency control checks for camera movement.
mia373 and others added 25 commits March 27, 2026 23:07
…ap filling

Replaces GPU-trained 4D reconstruction with instant bullet-time for a single
user-selected moment. Gemini analyzes synced multi-camera video to find key
moments via natural language, then Nano Banana Pro generates synthetic views
between cameras using recursive edge-inward filling with up to 14 reference
images. Viewer gets drag-to-rotate image strip mode with real/AI source badges.

- bullet_time/ package: schemas, moment_detector, gap_filler, pipeline CLI
- viewer: ImageStripPlayer with drag-to-rotate, bullet-time UI mode
- server: Gemini Live proxy with find_moment/build_strip/show_strip tools
- Removes scanline texture overlay, adds real vs AI-generated frame badges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The model was generating identical views because the prompt described
abstract geometry ("25% along the arc") instead of visual effects.
New prompt specifies exact degree rotation per step, clockwise subject
rotation, leftward background shift, and frozen pose. Also upgrades
to gemini-3-pro-image-preview for highest quality output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the cold cyberpunk palette (cyan/orange on blue-black) with ReRoute's
warm premium aesthetic: rose/amber accents on maroon-tinted dark backgrounds,
cream text, DM Sans + Outfit + JetBrains Mono fonts, generous border radii.

Viewer: warm dark backgrounds, rose (#D44060) for spatial, amber (#D4956A) for
temporal, hover lifts, warm glows, custom scrollbar, gradient buttons.

About page: full ReRoute light theme — cream (#FAF6F1) background, maroon
(#7A1B2D) accents, white cards with warm shadows and hover effects.

DESIGN.md updated to document the new direction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adopt ReRoute warm aesthetic for viewer + marketing page
- Landing page: FREEZEFRAME wordmark + logo + upload zone, nothing else
- Fake processing screen: 4 sequential animated steps
- Viewer: fullscreen strip, minimal wordmark, 5-bar listening indicator
- Theme: cream background, solid maroon accents, no transparency
- Sizes scaled up throughout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…imation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Upload → 5 thumbnail circles appear staggered around central agent circle
- Agent circle holds the listening bars, pulses on listening/speaking states
- Dashed orbit ring connects thumbnails visually
- Thumbnails float independently with subtle animation
- On click/voice trigger: thumbnails merge into center with scale+fade
- Agent circle absorbs with a brief pulse
- triggerMerge() exposed for Phase 3 voice integration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- server/gemini_proxy.py: WebSocket proxy bridging browser ↔ Gemini Live
  with sportscaster personality, 3 voice tools (describe/explain/navigate),
  moment catalog loaded on startup, audio/transcript relay
- viewer/src/gemini_live.js: mic capture via AudioWorklet, PCM streaming,
  24kHz audio playback queue, navigate → boomerang animation, overlay text
- viewer/public/pcm-processor.js: Float32→Int16 AudioWorklet processor
- viewer/index.html: add #viewer-overlay-text div
- viewer/src/styles.css: overlay text styles (output/input/navigate/error)
- viewer/src/main.js: import and call connectVoice() after viewer init

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Geometry-first pipeline with fg/bg separation and ghost prevention
- Precomputed 4 key moments: Keanu dodge, Kobe fadeaway, roundhouse kick, water throw
- 5-camera support with 28-degree gaps
- Extreme black/white clamping for smoother boomerang playback
- Concurrent Nano Banana polish calls
- Depth Anything V2 on MPS for fast local depth estimation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keanu: 101/90/100/100/100, Fadeaway: 437/420/429/444/448,
Kick: 733/718/736/739/744, Water: 1132/1123/1137/1139/1142

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrote viewer UI with new styles, controls, splat player, and image-strip player
- Updated Gemini Live voice integration and proxy server
- Added bullet-time pipeline catalog and VGGT pipeline scripts
- Reorganized docs into docs/ directory
- Updated .gitignore to exclude large generated directories
- Cleaned up deprecated stage1 preprocessing and empty gitkeep dirs
The logo, wordmark, and upload zone were sitting slightly too low
on the viewport. Adds margin-top: -60px to #landing-inner to shift
the centered content group upward, improving the visual weight
distribution on the landing screen.
@vercel

vercel Bot commented May 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
freezeframe Ready Ready Preview, Comment May 26, 2026 4:10pm

* Clean up

* fix(vercel): add vercel.json to set rootDirectory=viewer

* fix(vercel): remove invalid rootDirectory, scope commands to viewer/

* fix(viewer): remove broken raw_videos symlink

Vite's prepareOutDir followed the public/raw_videos symlink (pointing
to gitignored ../../raw_videos training data) and failed with ENOENT
on Vercel. The symlink is unreferenced in viewer code; safe to drop.
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c3de8738-8db9-43a6-be70-ed0a93b74598

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants