[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional by cad-rlc · Pull Request #19111 · pytorch/executorch

cad-rlc · 2026-04-24T15:18:38Z

Summary

Optimized Cadence Vision DSP operators for ResNet18 and ResNet50 inference. All operators are DMA-enabled with ping-pong tiling and functionally verified (int8 quantized, NCHW layout).

Operators

Conv2d (`quantized_conv2d_nchw`)

Kernel variants: 7x7j2, 3x3j1, 3x3j2, 1x1j1, 1x1j2
Modes: DMA ping-pong tiling (with iDMA) and cache-only (no DMA)
Dispatch: Automatic kernel selection based on layer config (kernel size, stride, dilation)
Quantization: int8 asymmetric input × symmetric weights, per-tensor output scaling
Bias correction: 24-bit clamped kernel bias with post-kernel residual correction
Config generator: Python tool to generate per-DRAM-size layer config headers

MaxPool2d (`maxpool_exec_mxnj2`)

Kernel: Arbitrary MxN kernel size, stride-2
Modes: DMA tiled and cache-only (no DMA)
Layout: NCHW float32

Mean / AdaptiveAvgPool (`mean_exec_dma`)

Kernel: SIMD-optimized channel-wise mean with DMA tiling
Layout: NCHW float32, reduces spatial dims to 1x1

Quantize / Dequantize (`quantize_per_tensor`, `dequantize_per_tensor`)

Modes: DMA ping-pong and HW-optimized (no DMA)
Types: int8 asymmetric (asym8s)

Quantized ReLU (`quantized_relu`)

Modes: DMA ping-pong and HW-optimized (no DMA)
Type: int8 clamp

Quantized Linear (`quantized_linear_out`)

Mode: SIMD with DMA tiling
Type: int8 input × int8 weights, int32 bias

Add (`op_add`)

Mode: DMA ping-pong element-wise float32 add

Softmax (`op_softmax`)

Mode: HW-optimized softmax

Build Configuration

Supports configurable DRAM buffer sizes.
Automatic DMA vs cache-only dispatch based on DRAM availability

cc @mcremon-meta @hsharma35 @zonglinpengmeta

[ghstack-poisoned]

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

This require us to move to create_runtime API v2 -> v4. This should be backwards compatible (i.e. old PTE should be able to load), and should also be supported on slightly older version of XNNPACK 3p library given the v4 got introduced 2 years ago. This patch add a new workspace pointer member in the XnnpackBackend instance. This should be done also for the weight cache. Which is left as a TODO here for now.

Resolving errors in functionality

Minor code modification in ping-pong-process

Correcting the MIN_FLT32 value and adding MIN_ABS_FLT32.

…tes in files

…rrectness.

…into stable-branch

This reverts commit fb32e93, reversing changes made to fcccda3.

# Conflicts: # Makefile # backends/cadence/aot/ref_implementations.py # backends/cadence/generic/operators/CMakeLists.txt # backends/cadence/generic/operators/op_dequantize_per_tensor.cpp # backends/cadence/generic/operators/op_im2row.cpp # backends/cadence/generic/operators/op_quantize_per_tensor.cpp # backends/cadence/generic/operators/op_quantized_layer_norm.cpp # backends/cadence/generic/operators/op_requantize.cpp # backends/cadence/generic/operators/quantized_add_out.cpp # backends/cadence/generic/operators/quantized_conv2d_nchw_out.cpp # backends/cadence/generic/operators/quantized_conv2d_nhwc_out.cpp # backends/cadence/generic/operators/quantized_fully_connected_out.cpp # backends/cadence/generic/operators/quantized_linear_out.cpp # backends/cadence/generic/operators/quantized_matmul_out.cpp # backends/cadence/generic/operators/quantized_relu_out.cpp # backends/cadence/runtime/TARGETS # backends/cadence/utils/runtime/BUCK # backends/cadence/utils/runtime/TARGETS # backends/cadence/vision/kernels/kernels.cpp # backends/cadence/vision/kernels/targets.bzl # backends/cadence/vision/operators/operators.h # backends/cadence/vision/operators/targets.bzl # backends/cadence/vision/third-party/targets.bzl # install_requirements.py

…rlap fix Summary: All 20 conv layers + 1 maxpool layer now run via DMA-tiled kernels. ResNet18 int8 quantized 64x64: 47.2M cycles, 57.6x speedup over generic. Config generator: - generate_combined_configs.py: extracts conv2d + maxpool from PTE files into a single combined header layer_configs.h - generate_layer_configs.py: kernel names use _dma/_no_dma suffixes - resnet18_layers.json: extracted layer params for ResNet18 Operators: - layer_configs.h: combined header with 29 conv + 1 maxpool configs - conv_kernel_dispatcher.c: _dma/_no_dma kernel name suffixes, CONV_DISPATCH printf for all branches - All includes migrated from separate conv/maxpool headers to combined operators/layer_configs.h Maxpool DMA executor: - maxpool_exec_2x2j2.c: DMA-tiled executor with ping-pong buffers - Supports arbitrary kernel sizes with overlap handling: per-tile source row computed from output_rows * stride_h - pad_h, MIN_FLT32 fill provides top/left/bottom/right padding - op_max_pool2d_with_indices.cpp: DMA path via config lookup Logs: - resnet18_all_dma.log: inference log, 57.6x speedup, Top-1 class 111 - resnet18_all_dma_vs_generic.txt: per-op performance comparison

- Rename maxpool executor: maxpool_exec_2x2j2 -> maxpool_exec_mxnj2 (arbitrary kernel size, stride-2) - Add mean_exec_dma.c and mean_executors.h for SIMD-optimized mean operator - Remove CADENCE_CONV2D_GENERIC macro and all debug printf from vision/operators - Add DMA buffer config headers for multiple DRAM sizes (4k/8k/16k/24k/32k/61k) - Reorganize logs: remove old scattered logs, add structured per-model DRAM-sweep logs - Add layerwise performance reports for ResNet18 (cache + no-cache cores)

# Conflicts: # backends/cadence/vision/operators/op_quantized_conv_out.cpp

…ntized_conv_out.cpp - Merge quantized_conv2d_nchw_out_per_tensor.cpp into op_quantized_conv_out.cpp - Add DMA-optimized NCHW conv path with XAI kernel dispatch - Add 6 specialized typed variants (asym8s, asym8u, dilated, depthwise) - Add 4 conv1d variants (ncl, nlc) in generic::native namespace - Remove old quantized_conv2d_nchw_out_per_tensor.cpp - Update CMakeLists.txt to remove old file reference

pytorch-bot · 2026-04-24T15:18:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19111

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

⚠️ 11 Awaiting Approval

As of commit 0475f66 with merge base c5c5b3a ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-04-24T15:18:57Z

The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:

ciflow/trunk

Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows.

github-actions · 2026-04-24T15:19:32Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

larryliu0820 and others added 30 commits July 22, 2024 21:29

Add Multimodal Runner

42e18b6

[ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

9445703

[ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

7e4908a

[ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

d32990f

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

880aa19

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

1d16f92

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update base for Update on "[llava][8/N] Add Multimodal Runner"

204adf9

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

525b4d4

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update base for Update on "[llava][8/N] Add Multimodal Runner"

35d678b

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

92e6a65

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update base for Update on "[llava][8/N] Add Multimodal Runner"

2fe30ee

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

b426432

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update base for Update on "[llava][8/N] Add Multimodal Runner"

4cb4710

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Update on "[llava][8/N] Add Multimodal Runner"

e1ea218

Differential Revision: [D60101911](https://our.internmc.facebook.com/intern/diff/D60101911) [ghstack-poisoned]

Initialize stable-branch

cd2bbd0

Update ref_implementations.py to make the directory general

18dee23

We want to keep our ptes

1768d29

Adding ptes

5bc4c1d

Update maxpool2df.c

23bd9de

Resolving errors in functionality

Update op_softmax.cpp

1bb44cf

Minor code modification in ping-pong-process

Add DMA-optimized quantize/dequantize/add/relu operators

6303d7a

Update dtypes.h

b537167

Correcting the MIN_FLT32 value and adding MIN_ABS_FLT32.

Conv2D integrated.

4276646

Adding linear, mean operators with dma and tiling with necessary upda…

0ea8a92

…tes in files

Conv Testing (Only for reference)

288f04e

Stop tracking generated portable_type/c10 build artifacts

dcb515a

Add generated config headers and benchmark logs

13b44fe

Enable cache coherency; remove DRAM boundary check; fix functional co…

26bb402

…rrectness.

Merge branch 'conv-testing' of https://github.com/cad-rlc/executorch …

5706283

…into stable-branch

Suraj Raut added 15 commits April 7, 2026 06:37

Cadence Vision: Config Generator for DMA/cache

4ae9702

Code Cleanup

7350281

Functional resnet50

db56bb1

code cleanup

a3abf4e

Merge remote-tracking branch 'upstream'

fb32e93

Revert "Merge remote-tracking branch 'upstream'"

95338ed

This reverts commit fb32e93, reversing changes made to fcccda3.

Merge remote-tracking branch 'upstream/main'

30a1746

Renaming cadence generic ops in Cmake.

a672605

[Cadence Vision] Upstream sync.

9d738fe

Merge upstream/main into origin/main

b8c344d

Merge remote-tracking branch 'upstream/main'

4470e35

# Conflicts: # backends/cadence/vision/operators/op_quantized_conv_out.cpp

cad-rlc requested review from GregoryComer, JacobSzwejbka, digantdesai, kimishpatel, kirklandsign, larryliu0820, lucylq, manuelcandales and mergennachin as code owners April 24, 2026 15:18

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111

[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111
cad-rlc wants to merge 48 commits intopytorch:mainfrom
cad-rlc:main

cad-rlc commented Apr 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cad-rlc commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Operators

Conv2d (quantized_conv2d_nchw)

MaxPool2d (maxpool_exec_mxnj2)

Mean / AdaptiveAvgPool (mean_exec_dma)

Quantize / Dequantize (quantize_per_tensor, dequantize_per_tensor)

Quantized ReLU (quantized_relu)

Quantized Linear (quantized_linear_out)

Add (op_add)

Softmax (op_softmax)

Build Configuration

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19111

❗ 1 Active SEVs

⚠️ 11 Awaiting Approval

Uh oh!

pytorch-bot Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cad-rlc commented Apr 24, 2026 •

edited

Loading

Conv2d (`quantized_conv2d_nchw`)

MaxPool2d (`maxpool_exec_mxnj2`)

Mean / AdaptiveAvgPool (`mean_exec_dma`)

Quantize / Dequantize (`quantize_per_tensor`, `dequantize_per_tensor`)

Quantized ReLU (`quantized_relu`)

Quantized Linear (`quantized_linear_out`)

Add (`op_add`)

Softmax (`op_softmax`)

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

This PR needs a `release notes:` label