GPU NMS kernel and refactor of NMS operator#4893
Conversation
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 170 to 187 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 191 to 192 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/verify/test_nms.cpp
Lines 54 to 57 in 5ca611f
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/verify/test_nms.cpp
Line 201 in 5ca611f
beacuse of operator::flatten same name clashing.
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 378 to 384 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 388 to 397 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 405 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 433 to 449 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 451 to 455 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 463 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 481 to 487 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 491 to 557 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 559 to 575 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 583 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 601 to 607 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 611 to 620 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 622 to 629 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 637 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 655 to 661 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 665 to 798 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 800 to 866 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 874 in c2ddb73
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in c2ddb73
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 168 to 182 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 379 to 385 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 389 to 398 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 406 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 435 to 451 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in b5c1e77
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in b5c1e77
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
[format.py] reported by reviewdog 🐶
AMDMIGraphX/src/targets/gpu/prepare_nonmaxsuppression.cpp
Lines 168 to 182 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 220 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 299 to 302 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 346 to 348 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 379 to 385 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 389 to 398 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 406 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 435 to 451 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in 1011256
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in 1011256
There was a problem hiding this comment.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
format.py
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 453 to 457 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 465 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 484 to 490 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 494 to 560 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 562 to 578 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 586 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 605 to 611 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 615 to 624 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 626 to 633 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 641 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 660 to 666 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 670 to 803 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Lines 805 to 871 in 32c779d
[format.py] reported by reviewdog 🐶
AMDMIGraphX/test/gpu/nonmaxsuppression.cpp
Line 879 in 32c779d
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4893 +/- ##
===========================================
+ Coverage 92.88% 92.89% +0.01%
===========================================
Files 587 587
Lines 30348 30350 +2
===========================================
+ Hits 28187 28193 +6
+ Misses 2161 2157 -4
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a GPU implementation of ONNX NonMaxSuppression (NMS) and refactors the core nonmaxsuppression op to better represent “dynamic” output sizing in MIGraphX IR by returning a tuple {indices, num_selected} (with optional slicing applied by the ONNX parser).
Changes:
- Introduces a 3-stage GPU NMS pipeline (
gpu::nms_sort→gpu::nms_filter→gpu::nms_compact) with new kernels and JIT compilers, and updates GPU lowering to select the pipeline for static shapes (ref fallback for dynamic inputs). - Refactors
op::nonmaxsuppressionto always return a tuple (raw indices + selected count) and updates ONNX parsing/tests to extract (and optionally slice) the indices output. - Renames shape tuple flattening helper to
flatten_shapesand updates GPU JIT paths to use it; adds/updates documentation and tests (including new GPU NMS tests).
Reviewed changes
Copilot reviewed 33 out of 35 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/verify/test_nms.cpp | Removes legacy verify-based NMS tests. |
| test/ref/nonmaxsuppression.cpp | Updates ref tests to slice indices using the tuple’s num_selected. |
| test/op_shape_test.cpp | Updates NMS shape expectations to a tuple {indices, num_selected}. |
| test/onnx/parse/nms_use_dyn_output_false_test.cpp | Removes test tied to removed use_dyn_output option/behavior. |
| test/onnx/parse/nms_test.cpp | Updates ONNX parse test to extract tuple elem 0 (indices). |
| test/onnx/parse/nms_dynamic_classes_test.cpp | Updates dynamic-classes parse test for tuple-return NMS. |
| test/onnx/parse/nms_dynamic_boxes_test.cpp | Updates dynamic-boxes parse test for tuple-return NMS. |
| test/onnx/parse/nms_dynamic_batch_test.cpp | Updates dynamic-batch parse test for tuple-return NMS. |
| test/onnx/gen_onnx.py | Removes custom use_dyn_output attribute generation in NMS ONNX test models. |
| test/multi_target/multitarget_test.cpp | Updates multitarget test to slice indices using tuple count. |
| test/gpu/nonmaxsuppression.cpp | Adds extensive GPU NMS functional tests (plus a dynamic-shape fallback test). |
| src/targets/gpu/nms_ops.cpp | Adds new internal GPU ops (gpu::nms_sort/filter/compact) and their shapes. |
| src/targets/gpu/lowering.cpp | Lowers nonmaxsuppression to the GPU pipeline for static shapes; ref fallback for dynamic. |
| src/targets/gpu/kernels/include/migraphx/kernels/sort.hpp | Adds block-wide bitonic sort helpers used by NMS kernels. |
| src/targets/gpu/kernels/include/migraphx/kernels/nonmaxsuppression.hpp | Adds GPU kernel implementations for sort/filter/compact stages. |
| src/targets/gpu/jit/topk.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/reduce.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/pointwise.cpp | Switches to flatten_shapes when building HIP compile inputs. |
| src/targets/gpu/jit/nonmaxsuppression.cpp | Adds JIT compilers for gpu::nms_sort/filter/compact kernels. |
| src/targets/gpu/include/migraphx/gpu/compile_hip_code_object.hpp | Clarifies meaning of global/local fields in compile options. |
| src/targets/gpu/device/include/migraphx/gpu/device/scan.hpp | Adds/clarifies block scan documentation/overload. |
| src/targets/gpu/compile_hip_code_object.cpp | Documents compute_block_size intent. |
| src/targets/gpu/code_object_op.cpp | Uses flatten_shapes for shape compatibility checks with tuple inputs. |
| src/targets/gpu/CMakeLists.txt | Adds nms_ops.cpp to GPU target build. |
| src/shape.cpp | Renames shape flattening helper to flatten_shapes. |
| src/onnx/parse_nonmaxsuppression.cpp | Parses NMS as tuple-return op and optionally slices indices under MIGRAPHX_USE_DYNAMIC_NMS. |
| src/onnx/onnx.cpp | Removes propagation of removed use_dyn_output option into parser. |
| src/onnx/include/migraphx/onnx/onnx_parser.hpp | Removes onnx_parser::use_dyn_output member. |
| src/include/migraphx/shape.hpp | Renames public API flatten → flatten_shapes. |
| src/include/migraphx/op/nonmaxsuppression.hpp | Refactors NMS op to return tuple {indices, num_selected} and adds runtime shape validation. |
| src/include/migraphx/onnx.hpp | Removes public onnx_options::use_dyn_output. |
| docs/reference/MIGraphX-dev-env-vars.rst | Documents MIGRAPHX_USE_DYNAMIC_NMS env var behavior. |
| docs/dev/onnx_operators.rst | Updates operator matrix entry for NonMaxSuppression. |
| const auto iou_packed = num_boxes * (num_boxes - 1) / 2; | ||
|
|
| // num_batches/num_classes/num_boxes are kept as op attributes because the filter inputs | ||
| // is a scratch buffer from which these can't be recovered. | ||
| // inputs = {sorted_boxes, sorted_scores, sorted_box_indices, output_indices, output_bc_counts} |
| MIGRAPHX_REGISTER_OP(nms_filter); | ||
|
|
||
| // TODO: This should work in-place, saving memory. Need to update IR to handle it. | ||
| // Needs a make_tuple type of operator that reuses the indicies input. |
| | NonMaxSuppression | ✅ | FP8, FP16, | | | ||
| | | | FP32, FP64 | | | ||
| | | | | | | ||
| | | | | | |
| /// Flatten subshapes to a single vector of non-tuple type of shapes | ||
| MIGRAPHX_EXPORT std::vector<shape> flatten(const std::vector<shape>& shapes); | ||
| MIGRAPHX_EXPORT std::vector<shape> flatten_shapes(const std::vector<shape>& shapes); | ||
|
|
| @@ -56,8 +56,6 @@ struct onnx_options | |||
| /// Since loop will become a tensor of max iter size a huge number can cause overflow during | |||
| /// shape computations. | |||
| int64_t limit_max_iterations = std::numeric_limits<uint16_t>::max(); | |||
| const auto num_batches = boxes_s.lens()[0]; | ||
| const auto num_boxes = boxes_s.lens()[1]; | ||
| const auto num_classes = scores_s.lens()[1]; | ||
| const auto aligned_b = bit_ceil(static_cast<std::uint64_t>(num_boxes)); |
There was a problem hiding this comment.
[format.py] reported by reviewdog 🐶
| const auto aligned_b = bit_ceil(static_cast<std::uint64_t>(num_boxes)); | |
| const auto aligned_b = bit_ceil(static_cast<std::uint64_t>(num_boxes)); |
Motivation
Technical Details
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable