feat: DLSS Frame Generation with Multi-Frame Generation and Dynamic mode by Gabrieli2806 · Pull Request #7 · Minecraft-Radiance/MCVR

Gabrieli2806 · 2026-04-12T07:18:57Z

Summary

Adds DLSS Frame Generation (DLSS-G) support with Multi-Frame Generation (MFG) and Dynamic mode.

Note: This PR builds on top of #6 (DLSS Ultra Performance mode). The diff will shrink to just the FG commits once #6 is merged.

Features

DLSS Frame Generation via NGX SDK (\dlssg_wrapper, \NGX_VK_CREATE_DLSSG\ / \NGX_VK_EVALUATE_DLSSG)
Multi-Frame Generation multipliers: x2, x3, x4, x5, x6 interpolated frames per real frame
Dynamic mode: per-frame \multiFrameCount = ceil(monitorHz / baseFps) - 1, automatically scales to target monitor refresh rate
Same-frame presentation: interpolated frames are presented immediately after the real frame in the same \present()\ call (fixes DLSS-G indicator flickering)
GLFW bindings for monitor refresh rate detection (\glfwGetPrimaryMonitor\ / \glfwGetVideoMode)

Java counterpart

Radiance mod changes: Gabrieli2806/Radiance feat/dlss-frame-gen

- Add dlssg_wrapper.hpp/cpp with DlssFG class wrapping NGX Frame Gen - Extend NgxContext with queryFrameGenAvailable() and initFrameGen() - Add FG attribute handling and initialization in DLSSModule - Integrate FG evaluation in render framework (double-present for interp frames) - Create interpolated frame images and blit pipeline

- dlssg_wrapper: evaluate() now accepts multiFrameCount and multiFrameIndex params - dlss_wrapper: added queryMaxMultiFrameCount() capability query - dlss_module: changed from bool to uint32_t frameGenMultiFrameCount_, 2D interp images - dlss_module: parse off/x2/x3/x4 enum values, clamp to hardware max - render_framework: multi-frame evaluate loop and multi-present in present()

…nous approach - Remove interpPresentThreadFunc() async thread and all threading infrastructure (mutexes, condition variables, atomics, dedicated command pool) - Implement pipelined synchronous approach: store interp frames from frame N, present them at the START of frame N+1's present() when GPU fence is already signaled - Add PendingInterpPresent struct and presentPendingInterpolatedFrames() method - Fix crash on world entry with Frame Generation enabled (ACCESS_VIOLATION) - Fix watchdog timeout from render thread stuck in present() - Near-zero wait for interp frame fences since a full render frame elapses

- Parse x5 (multiFrameCount=4) and x6 (multiFrameCount=5) attribute values - Auto mode uses UINT32_MAX sentinel, resolved to hardware max in build() - Logs auto mode selection with resolved count

- Add glfwGetPrimaryMonitor/glfwGetVideoMode GLFW bindings - Query monitor refresh rate at init for dynamic FG target - Dynamic mode: per-frame multiFrameCount = ceil(targetHz / baseFps) - 1 - Allocate interp images to hardware max, use only what's needed - Reset frameGenDynamic_ when switching to non-dynamic modes

- Replace pipelined presentation (deferred to next frame) with same-frame - Present interpolated frames immediately after real frame in present() - Remove PendingInterpPresent struct and related state - Rename presentPendingInterpolatedFrames -> presentInterpolatedFrames

Turns the V2 render graph from 'tracing into empty TLAS' into 'tracing real chunk geometry' by plumbing Minecraft's chunk mesh output through the V2 scene services. Also fixes risks #1 (use-after-free on chunk re-upload), #4 (chunk origin TODO), and #7 (garbage energy LUT) from the original 4-pass analysis. The big shift: chunks.cpp now tees the mesh worker output straight into CmdChunkSubmit after BlockMesher::mesh() returns. This path catches ~100% of in-game chunk uploads (the rebuildSingle fallback tee is still fixed but only fires on the slow path). ChunkId now 3D (x, sectionY, z): * scene_types.hpp: added sectionY field and updated std::hash * Was a latent design bug - 24 column sections collided on a single {x, z} key, so 23/24 of every column's data was silently dropped * replay_recorder.{hpp,cpp}: bumped binary version 1 -> 2 and added sectionY to on-disk record; older replays invalid (intentional) Deferred-delete via ResourceGC (risk #1 fix): * blas_service.{hpp,cpp}: init() now takes ResourceGC*; retireBlas() wraps VkAccelerationStructureKHR + vk2::Buffer in shared_ptr and defers destruction via gc->defer(lambda). Chunks re-meshed rapidly in the same pose will no longer free in-flight BLAS data. * tlas_service.{hpp,cpp}: same pattern for old TLAS buffers on rebuild, plus for the scratch buffer if its ever replaced mid-flight. * gpu_upload_service.{hpp,cpp}: same pattern for vertex/index buffers on chunk re-upload and removeChunk. * ResourceGC has a 32-frame ring, framesInFlight=2, so deferred destructors stay alive for at least 31 frames - plenty of slack. Chunk origin baked into TLAS instance transform (risk #4 fix): * blas_service.hpp: BlasData now caches originX/Y/Z as float (copied from GpuChunkData on build). * tlas_service.cpp: inst.transform.matrix now writes translation (originX, originY, originZ) in the column-3 slot instead of an identity matrix. The closest-hit shader gl_ObjectToWorldEXT automatically reflects this - no rchit changes needed. * Dropped chunkOriginBuffer_ and chunkOriginArraySize_ - origin is in the transform, no parallel SSBO needed. Resolves the tlas_service.cpp:73 TODO from the original session summary. Energy LUT initialized to 1.0 (risk #7 fix): * scene_resource_service.{hpp,cpp}: added pendingEnergyLutStaging_ member. init() creates a host-visible staging buffer filled with half-float 1.0 (0x3C00) covering all 64x64x4 channels. * New runDeferredInit(cmd) records vkCmdCopyBufferToImage on first call, with proper layout transitions UNDEFINED -> TRANSFER_DST -> SHADER_READ_ONLY. Staging buffer freed after copy. * Called from engine_app.cpp processScene() first frame only. Production chunk tee (chunks.cpp, +88 lines): * After BlockMesher::mesh() returns the meshOutput, copy solid + cutout + translucent PBR triangles into a merged vertex array with index re-offsetting for V2. * Post as CmdChunkSubmit via EngineServices::bridge(). * Only fires when useV2Engine is true and the bridge has been initialized (EngineApp::instance() != nullptr check). * This is the hot path - the rebuildSingle fallback path tee in ChunkProxy.cpp is kept working (with sectionY) for completeness. ChunkProxy.cpp fix: * The existing rebuildSingle tee was using {cx, cz} as the key with no Y component - all 24 column sections silently collided. Now derives sectionY from cmd.originY >> 4 and uses a proper 3D key. CmdChunkSubmit/CmdChunkRemove schema: * bridge_service.hpp: added int32 sectionY field to both command structs, placed between chunkZ and existing origin fields for locality. * engine_app.cpp: handleCommand uses the 3D key throughout. engine_app.cpp: * Scene service init() calls now pass &services_->frame().gc() so the deferred-delete wiring is functional. * processScene() first-frame path calls sceneRes().runDeferredInit(cmd) after scene UBO update but before RT adapter reads. * Chunk command handlers use ChunkId{x, sectionY, z} throughout. Smoke test results (in-game, 90 seconds of V2 gameplay): * Pre-fix baseline (empty TLAS): 1.65 ms avg, 651 fps, sky-only * After-fix with real chunks: 4.17 ms avg, 240 fps (vsync capped) * Chunks flowed: 30323 -> 31007 (684 new chunk meshes through tee) * Engine log: 99832 bytes, 0 errors, 0 warnings, 0 validation issues * Deferred-delete: confirmed by service init log 'deferred-delete: on' on all three scene services. * Process exit: clean in 1s, no deadlock The +2.5 ms per frame is real BLAS build + TLAS rebuild + closest-hit shader work replacing the previous sky-gradient miss-only path. The V2 graph has real headroom - it's hitting 240 Hz vsync, not the GPU ceiling. Closes risks 1, 4, 7 from the original 4-pass analysis. Unblocks PR37 (denoiser), PR39 (real materials), PR40 (DLSS-RR) which all need real RT output to validate against.

Gabrieli2806 added 7 commits April 11, 2026 01:36

feat(dlss): add Ultra Performance mode support

06ecbc9

feat: add x5, x6, and auto FG multiplier modes in DLSS module

0f7d1f6

- Parse x5 (multiFrameCount=4) and x6 (multiFrameCount=5) attribute values - Auto mode uses UINT32_MAX sentinel, resolved to hardware max in build() - Logs auto mode selection with resolved count

Gabrieli2806 mentioned this pull request Apr 12, 2026

feat: DLSS Frame Generation with Multi-Frame Generation and Dynamic mode Minecraft-Radiance/Radiance#223

Open

Delete .github/copilot-instructions.md

cf494d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: DLSS Frame Generation with Multi-Frame Generation and Dynamic mode#7

feat: DLSS Frame Generation with Multi-Frame Generation and Dynamic mode#7
Gabrieli2806 wants to merge 8 commits into
Minecraft-Radiance:mainfrom
Gabrieli2806:feat/dlss-frame-gen

Gabrieli2806 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Gabrieli2806 commented Apr 12, 2026

Summary

Features

Java counterpart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant