feat: DLSS Frame Generation with Multi-Frame Generation and Dynamic mode#7
Open
Gabrieli2806 wants to merge 8 commits into
Open
feat: DLSS Frame Generation with Multi-Frame Generation and Dynamic mode#7Gabrieli2806 wants to merge 8 commits into
Gabrieli2806 wants to merge 8 commits into
Conversation
added 7 commits
April 11, 2026 01:36
- Add dlssg_wrapper.hpp/cpp with DlssFG class wrapping NGX Frame Gen - Extend NgxContext with queryFrameGenAvailable() and initFrameGen() - Add FG attribute handling and initialization in DLSSModule - Integrate FG evaluation in render framework (double-present for interp frames) - Create interpolated frame images and blit pipeline
- dlssg_wrapper: evaluate() now accepts multiFrameCount and multiFrameIndex params - dlss_wrapper: added queryMaxMultiFrameCount() capability query - dlss_module: changed from bool to uint32_t frameGenMultiFrameCount_, 2D interp images - dlss_module: parse off/x2/x3/x4 enum values, clamp to hardware max - render_framework: multi-frame evaluate loop and multi-present in present()
…nous approach - Remove interpPresentThreadFunc() async thread and all threading infrastructure (mutexes, condition variables, atomics, dedicated command pool) - Implement pipelined synchronous approach: store interp frames from frame N, present them at the START of frame N+1's present() when GPU fence is already signaled - Add PendingInterpPresent struct and presentPendingInterpolatedFrames() method - Fix crash on world entry with Frame Generation enabled (ACCESS_VIOLATION) - Fix watchdog timeout from render thread stuck in present() - Near-zero wait for interp frame fences since a full render frame elapses
- Parse x5 (multiFrameCount=4) and x6 (multiFrameCount=5) attribute values - Auto mode uses UINT32_MAX sentinel, resolved to hardware max in build() - Logs auto mode selection with resolved count
- Add glfwGetPrimaryMonitor/glfwGetVideoMode GLFW bindings - Query monitor refresh rate at init for dynamic FG target - Dynamic mode: per-frame multiFrameCount = ceil(targetHz / baseFps) - 1 - Allocate interp images to hardware max, use only what's needed - Reset frameGenDynamic_ when switching to non-dynamic modes
- Replace pipelined presentation (deferred to next frame) with same-frame - Present interpolated frames immediately after real frame in present() - Remove PendingInterpPresent struct and related state - Rename presentPendingInterpolatedFrames -> presentInterpolatedFrames
PEQHUB
referenced
this pull request
in PEQHUB/MCVR
Jun 9, 2026
Turns the V2 render graph from 'tracing into empty TLAS' into 'tracing real chunk geometry' by plumbing Minecraft's chunk mesh output through the V2 scene services. Also fixes risks #1 (use-after-free on chunk re-upload), #4 (chunk origin TODO), and #7 (garbage energy LUT) from the original 4-pass analysis. The big shift: chunks.cpp now tees the mesh worker output straight into CmdChunkSubmit after BlockMesher::mesh() returns. This path catches ~100% of in-game chunk uploads (the rebuildSingle fallback tee is still fixed but only fires on the slow path). ChunkId now 3D (x, sectionY, z): * scene_types.hpp: added sectionY field and updated std::hash * Was a latent design bug - 24 column sections collided on a single {x, z} key, so 23/24 of every column's data was silently dropped * replay_recorder.{hpp,cpp}: bumped binary version 1 -> 2 and added sectionY to on-disk record; older replays invalid (intentional) Deferred-delete via ResourceGC (risk #1 fix): * blas_service.{hpp,cpp}: init() now takes ResourceGC*; retireBlas() wraps VkAccelerationStructureKHR + vk2::Buffer in shared_ptr and defers destruction via gc->defer(lambda). Chunks re-meshed rapidly in the same pose will no longer free in-flight BLAS data. * tlas_service.{hpp,cpp}: same pattern for old TLAS buffers on rebuild, plus for the scratch buffer if its ever replaced mid-flight. * gpu_upload_service.{hpp,cpp}: same pattern for vertex/index buffers on chunk re-upload and removeChunk. * ResourceGC has a 32-frame ring, framesInFlight=2, so deferred destructors stay alive for at least 31 frames - plenty of slack. Chunk origin baked into TLAS instance transform (risk #4 fix): * blas_service.hpp: BlasData now caches originX/Y/Z as float (copied from GpuChunkData on build). * tlas_service.cpp: inst.transform.matrix now writes translation (originX, originY, originZ) in the column-3 slot instead of an identity matrix. The closest-hit shader gl_ObjectToWorldEXT automatically reflects this - no rchit changes needed. * Dropped chunkOriginBuffer_ and chunkOriginArraySize_ - origin is in the transform, no parallel SSBO needed. Resolves the tlas_service.cpp:73 TODO from the original session summary. Energy LUT initialized to 1.0 (risk #7 fix): * scene_resource_service.{hpp,cpp}: added pendingEnergyLutStaging_ member. init() creates a host-visible staging buffer filled with half-float 1.0 (0x3C00) covering all 64x64x4 channels. * New runDeferredInit(cmd) records vkCmdCopyBufferToImage on first call, with proper layout transitions UNDEFINED -> TRANSFER_DST -> SHADER_READ_ONLY. Staging buffer freed after copy. * Called from engine_app.cpp processScene() first frame only. Production chunk tee (chunks.cpp, +88 lines): * After BlockMesher::mesh() returns the meshOutput, copy solid + cutout + translucent PBR triangles into a merged vertex array with index re-offsetting for V2. * Post as CmdChunkSubmit via EngineServices::bridge(). * Only fires when useV2Engine is true and the bridge has been initialized (EngineApp::instance() != nullptr check). * This is the hot path - the rebuildSingle fallback path tee in ChunkProxy.cpp is kept working (with sectionY) for completeness. ChunkProxy.cpp fix: * The existing rebuildSingle tee was using {cx, cz} as the key with no Y component - all 24 column sections silently collided. Now derives sectionY from cmd.originY >> 4 and uses a proper 3D key. CmdChunkSubmit/CmdChunkRemove schema: * bridge_service.hpp: added int32 sectionY field to both command structs, placed between chunkZ and existing origin fields for locality. * engine_app.cpp: handleCommand uses the 3D key throughout. engine_app.cpp: * Scene service init() calls now pass &services_->frame().gc() so the deferred-delete wiring is functional. * processScene() first-frame path calls sceneRes().runDeferredInit(cmd) after scene UBO update but before RT adapter reads. * Chunk command handlers use ChunkId{x, sectionY, z} throughout. Smoke test results (in-game, 90 seconds of V2 gameplay): * Pre-fix baseline (empty TLAS): 1.65 ms avg, 651 fps, sky-only * After-fix with real chunks: 4.17 ms avg, 240 fps (vsync capped) * Chunks flowed: 30323 -> 31007 (684 new chunk meshes through tee) * Engine log: 99832 bytes, 0 errors, 0 warnings, 0 validation issues * Deferred-delete: confirmed by service init log 'deferred-delete: on' on all three scene services. * Process exit: clean in 1s, no deadlock The +2.5 ms per frame is real BLAS build + TLAS rebuild + closest-hit shader work replacing the previous sky-gradient miss-only path. The V2 graph has real headroom - it's hitting 240 Hz vsync, not the GPU ceiling. Closes risks 1, 4, 7 from the original 4-pass analysis. Unblocks PR37 (denoiser), PR39 (real materials), PR40 (DLSS-RR) which all need real RT output to validate against.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds DLSS Frame Generation (DLSS-G) support with Multi-Frame Generation (MFG) and Dynamic mode.
Features
Java counterpart