Skip to content

perf ticket 009: GPU-driven rendering — indirect multi-draw + GPU cull #28

@proggeramlug

Description

@proggeramlug

Deferred perf ticket — see docs/perf/009-gpu-driven-rendering.md.

Summary

Replace the scene graph's per-mesh CPU draw loop (one set_bind_group + draw_indexed per mesh, ~340 calls/frame on Sponza across shadow + main + depth-prepass passes) with a single draw_indexed_indirect_count call backed by a GPU-side frustum-cull compute pass. Collapses to one draw per render pass, regardless of mesh count.

Why deferred

Pure CPU-side optimization on a GPU-bound benchmark. The perf README's own rule of thumb: "Sponza is GPU-bound, not CPU-bound. Don't chase CPU micro-optimizations expecting FPS improvement." Render-total CPU is already ~4 ms against a 16.7 ms vsync budget after the landed 001-017 wins (uniform pool, frustum cull, matrix-inverse cache, shadow cascade cache). Shaving another ~600 µs won't move FPS on Sponza — we'd be optimizing a resource we already have in surplus.

Reopen criteria

  • A CPU-bound scene arrives — 10 000+ mesh count, many small static props, or CPU-expensive per-frame state updates pushing render_total CPU past the vsync budget.
  • Ticket 008 (visibility buffer) reopens. 008's shading pass needs a shared vertex/index buffer + per-mesh descriptor buffer — exactly what this ticket builds. Hard prerequisite in that direction.
  • Bindless texture support lands in wgpu. The current "one set_bind_group per draw" pattern is partly about per-material texture binds. Bindless makes indirect multi-draw a straightforward win without the material-binding workarounds the ticket's notes describe.

Effort

~1 week for the baseline draw_indexed_indirect_count path with GPU frustum cull. Material indirection still requires either bindless (not widely supported in wgpu 29) or a texture-array trick — that's where the risk sits, and why it's scoped at "week" not "days."

Files

  • native/shared/src/renderer/mod.rs — shared VB/IB, descriptor buffer, GPU cull compute shader, render pass using draw_indexed_indirect_count.
  • native/shared/src/scene.rs — reworking of per-node GPU resources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions