Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974) by apullin · Pull Request #18974 · pytorch/executorch

apullin · 2026-04-17T14:47:56Z

Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137

pytorch-bot · 2026-04-17T14:48:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18974

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 1 Awaiting Approval, 1 New Failure, 2 Cancelled Jobs, 3 Unrelated Failures

As of commit a28c6cc with merge base bf64fa1 ():

AWAITING APPROVAL - The following workflow needs approval before CI can run:

Lint (gh)

NEW FAILURE - The following job has failed:

trunk / test-huggingface-transformers-macos (smollm2-135m|coreml_fp32_gpu|--quantize) / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh)
##[error]The operation was canceled.
trunk / test-models-macos-coreml (edsr) / macos-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
trunk / unittest-release / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-17T14:48:05Z

@apullin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99857137.

digantdesai

Review automatically exported from Phabricator review in Meta.

github-actions · 2026-04-17T14:48:46Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD. The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace the graph on FakeTensors for shape propagation, but aten.bmm rejects int8/int16 FakeTensors, causing failures for any quantized mm ops. Since mm→bmm is a pure shape transformation (adding a batch dim of 1), we can set the output metadata directly: unsqueeze the mm's FakeTensor for the bmm node, and use the original for the squeeze. No need to re-execute the op. Reviewed By: digantdesai Differential Revision: D99857137

pytorch-bot · 2026-04-21T16:57:42Z

~~Workflows were awaiting approval.~~ CI has now been triggered for the ciflow labels on this PR.

Summary: This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD. The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace the graph on FakeTensors for shape propagation, but aten.bmm rejects int8/int16 FakeTensors, causing failures for any quantized mm ops. Since mm→bmm is a pure shape transformation (adding a batch dim of 1), we can set the output metadata directly: unsqueeze the mm's FakeTensor for the bmm node, and use the original for the squeeze. No need to re-execute the op. Reviewed By: digantdesai Differential Revision: D99857137

apullin requested a review from digantdesai as a code owner April 17, 2026 14:47

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 17, 2026

digantdesai approved these changes Apr 17, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops~~ Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974) Apr 20, 2026

apullin force-pushed the export-D99857137 branch from 316e474 to 7802809 Compare April 20, 2026 14:47

apullin force-pushed the export-D99857137 branch 2 times, most recently from b4a1625 to 5439a12 Compare April 21, 2026 16:57

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 21, 2026

apullin force-pushed the export-D99857137 branch from 5439a12 to a28c6cc Compare April 26, 2026 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974)#18974

Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974)#18974
apullin wants to merge 1 commit intopytorch:mainfrom
apullin:export-D99857137

apullin commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

digantdesai left a comment

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

pytorch-bot Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

apullin commented Apr 17, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18974

❗ 1 Active SEVs

❌ 1 Awaiting Approval, 1 New Failure, 2 Cancelled Jobs, 3 Unrelated Failures

Uh oh!

meta-codesync Bot commented Apr 17, 2026

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 17, 2026

This PR needs a release notes: label

Uh oh!

pytorch-bot Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

apullin commented Apr 17, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Apr 17, 2026 •

edited

Loading

This PR needs a `release notes:` label

pytorch-bot Bot commented Apr 21, 2026 •

edited

Loading