Skip to content

Lumen follow-up: benchmark harness for SSGI quality / convergence #24

@proggeramlug

Description

@proggeramlug

Across 014 and 016, the Metal drawable-stall flake on macOS made steady-state FPS comparisons unreliable — every version change required waiting 5-10 minutes between runs for the OS to release its hold on the swapchain. This made it impossible to attach solid numbers to V1-V4 perf claims like "32 rays/probe + importance sampling matches 32 rays/probe uniform". Visual captures (docs/perf/ticket-*-after.png) are the de-facto acceptance signal.

A proper harness would let us actually measure:

  • Temporal convergence rate (how many frames to hit some noise threshold)
  • Equal-quality-at-lower-rays comparison
  • SSIM vs. reference on a fixed camera pose
  • FPS under consistent Metal state

Scope

  • A dedicated benchmark binary (can reuse the intel-sponza example) that:
    • Forces a clean Metal state on start (workaround or explicit wait)
    • Runs a warmup window (~60 frames) before measurements
    • Captures N frames with timestamp queries + readback
    • Writes results to JSON for regression comparison
  • Python SSIM/PSNR harness comparing output frames vs. a baseline capture
  • CI-friendly output format (assertion-style thresholds per metric)

Stretch

  • Multi-pose sweep: stand in 4-6 fixed camera positions, capture at each
  • Multi-config sweep: toggle SSGI on/off, HW/SW path, quality preset 0-4
  • Result CSV for quick before/after comparison during GI work

Context

Flake documentation lives informally in the V5-V14 commit messages. Originally in the 014 closure: "FPS ≥ 40 fps ... was flake-bound throughout V8-V14 testing". A repeatable harness would close that loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions