Skip to content

feat(rdma): add dmabuf-first MR registration with MORI_DISABLE_DMABUF_REG#355

Open
jhchouuu wants to merge 3 commits into
mainfrom
feat/rdma-dmabuf-reg
Open

feat(rdma): add dmabuf-first MR registration with MORI_DISABLE_DMABUF_REG#355
jhchouuu wants to merge 3 commits into
mainfrom
feat/rdma-dmabuf-reg

Conversation

@jhchouuu

@jhchouuu jhchouuu commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Introduce RdmaDeviceContext::RegisterRdmaMemoryRegionAuto, a dmabuf-first MR registration path that falls back to classic ibv_reg_mr when the HIP driver can't export a dmabuf fd or ibv_reg_dmabuf_mr fails. The non-VMM symmetric memory allocator switches to this wrapper.

Why

  • Classic ibv_reg_mr on large GPU heaps occasionally hits EINVAL on hosts where the peermem kernel module is missing / unhappy or where RLIMIT_MEMLOCK is too low. The dmabuf path bypasses both: the kernel pins the dmabuf object, not user pages.
  • This is the same direction NVSHMEM took (transport_ib_common.cpp tries ibv_reg_dmabuf_mr first, falls back silently). rocSHMEM still uses classic ibv_reg_mr only, so adopting dmabuf moves mori ahead on the AMD side.

Behavior

Path Before After
Non-VMM symmetric memory MR ibv_reg_mr (pinned) dmabuf-first, fallback to ibv_reg_mr
VMM RegisterRdmaChunks dmabuf (existing) unchanged
MORI_DISABLE_DMABUF_REG=1/true/on/yes n/a forces classic ibv_reg_mr

Design notes

  • The env var is snapshotted into a RdmaDeviceContext member at construction (mirrors the Context::sdmaEnabled/p2pDisabled cache). All MRs on the same context pick the same path, avoiding hard-to-debug mixed-mode states from late env mutations.
  • On ibv_reg_dmabuf_mr failure we log a MORI_APP_WARN once per call and silently fall back to ibv_reg_mr — same policy as NVSHMEM.
  • The dmabuf fd is closed immediately after registration (kernel holds its own ref via the MR).

Test plan

  • Existing CI green (build + smoke)
  • Multi-node RDMA dispatch/combine still passes with default (dmabuf) path
  • MORI_DISABLE_DMABUF_REG=1 forces classic path and still passes
  • On a host without dmabuf support, init falls back to ibv_reg_mr without aborting

jhchouuu added 3 commits June 3, 2026 21:15
…_REG

- Add RdmaDeviceContext::RegisterRdmaMemoryRegionAuto which tries
  ibv_reg_dmabuf_mr first (export dmabuf fd via
  hipMemGetHandleForAddressRange) and falls back to classic ibv_reg_mr
  on any failure. Avoids RLIMIT_MEMLOCK on large GPU heaps and the
  EINVAL we hit on hosts where the peermem path is unhappy.
- Snapshot MORI_DISABLE_DMABUF_REG into a RdmaDeviceContext member at
  construction (matching the Context::sdmaEnabled / p2pDisabled cache
  pattern) so all MRs on the same context pick the same path.
- Route the symmetric_memory non-VMM RDMA registration through the new
  Auto wrapper. The VMM path already has its own dmabuf fd and keeps
  calling RegisterRdmaMemoryRegionDmabuf directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant