Deferred perf ticket — see docs/perf/009-gpu-driven-rendering.md.
Summary
Replace the scene graph's per-mesh CPU draw loop (one set_bind_group + draw_indexed per mesh, ~340 calls/frame on Sponza across shadow + main + depth-prepass passes) with a single draw_indexed_indirect_count call backed by a GPU-side frustum-cull compute pass. Collapses to one draw per render pass, regardless of mesh count.
Why deferred
Pure CPU-side optimization on a GPU-bound benchmark. The perf README's own rule of thumb: "Sponza is GPU-bound, not CPU-bound. Don't chase CPU micro-optimizations expecting FPS improvement." Render-total CPU is already ~4 ms against a 16.7 ms vsync budget after the landed 001-017 wins (uniform pool, frustum cull, matrix-inverse cache, shadow cascade cache). Shaving another ~600 µs won't move FPS on Sponza — we'd be optimizing a resource we already have in surplus.
Reopen criteria
- A CPU-bound scene arrives — 10 000+ mesh count, many small static props, or CPU-expensive per-frame state updates pushing
render_total CPU past the vsync budget.
- Ticket 008 (visibility buffer) reopens. 008's shading pass needs a shared vertex/index buffer + per-mesh descriptor buffer — exactly what this ticket builds. Hard prerequisite in that direction.
- Bindless texture support lands in wgpu. The current "one
set_bind_group per draw" pattern is partly about per-material texture binds. Bindless makes indirect multi-draw a straightforward win without the material-binding workarounds the ticket's notes describe.
Effort
~1 week for the baseline draw_indexed_indirect_count path with GPU frustum cull. Material indirection still requires either bindless (not widely supported in wgpu 29) or a texture-array trick — that's where the risk sits, and why it's scoped at "week" not "days."
Files
native/shared/src/renderer/mod.rs — shared VB/IB, descriptor buffer, GPU cull compute shader, render pass using draw_indexed_indirect_count.
native/shared/src/scene.rs — reworking of per-node GPU resources.
Deferred perf ticket — see docs/perf/009-gpu-driven-rendering.md.
Summary
Replace the scene graph's per-mesh CPU draw loop (one
set_bind_group+draw_indexedper mesh, ~340 calls/frame on Sponza across shadow + main + depth-prepass passes) with a singledraw_indexed_indirect_countcall backed by a GPU-side frustum-cull compute pass. Collapses to one draw per render pass, regardless of mesh count.Why deferred
Pure CPU-side optimization on a GPU-bound benchmark. The perf README's own rule of thumb: "Sponza is GPU-bound, not CPU-bound. Don't chase CPU micro-optimizations expecting FPS improvement." Render-total CPU is already ~4 ms against a 16.7 ms vsync budget after the landed 001-017 wins (uniform pool, frustum cull, matrix-inverse cache, shadow cascade cache). Shaving another ~600 µs won't move FPS on Sponza — we'd be optimizing a resource we already have in surplus.
Reopen criteria
render_totalCPU past the vsync budget.set_bind_groupper draw" pattern is partly about per-material texture binds. Bindless makes indirect multi-draw a straightforward win without the material-binding workarounds the ticket's notes describe.Effort
~1 week for the baseline
draw_indexed_indirect_countpath with GPU frustum cull. Material indirection still requires either bindless (not widely supported in wgpu 29) or a texture-array trick — that's where the risk sits, and why it's scoped at "week" not "days."Files
native/shared/src/renderer/mod.rs— shared VB/IB, descriptor buffer, GPU cull compute shader, render pass usingdraw_indexed_indirect_count.native/shared/src/scene.rs— reworking of per-node GPU resources.