I'm working toward a single long-term research goal: generative models that construct and reason about persistent, interactive 3D worlds. I'm building the foundation systematically — one layer at a time, from diffusion internals to spatial intelligence.
The arc:
- 🔬 Now — Diffusion Architecture: Replacing U-Nets with scratch-built Diffusion Transformers (DiT). Mapping GFLOPs scaling behavior across resolutions. Understanding the math before touching the API.
- 🎬 Next — Video Generation: Cross-frame attention for temporal consistency. Fighting flickering directly. Evaluating with FVD, not vibes.
- 🧊 Then — 3D Spatial Generation: Score Distillation Sampling over NeRF / 3D Gaussian Splatting. Documenting gradient flow strictly.
- 🌍 Goal — Large World Models: Action-conditioned generative systems that build coherent, navigable 3D environments.
Every project ships with a technical report. Results are reported honestly, including failures.
Core ML
Tooling
