Skip to content

Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974)#18974

Open
apullin wants to merge 1 commit intopytorch:mainfrom
apullin:export-D99857137
Open

Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974)#18974
apullin wants to merge 1 commit intopytorch:mainfrom
apullin:export-D99857137

Conversation

@apullin
Copy link
Copy Markdown
Contributor

@apullin apullin commented Apr 17, 2026

Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137

@apullin apullin requested a review from digantdesai as a code owner April 17, 2026 14:47
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18974

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 Awaiting Approval, 1 New Failure, 2 Cancelled Jobs, 3 Unrelated Failures

As of commit a28c6cc with merge base bf64fa1 (image):

AWAITING APPROVAL - The following workflow needs approval before CI can run:

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 17, 2026

@apullin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99857137.

Copy link
Copy Markdown
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync Bot changed the title Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops Fix ConvertMmToBmmPass for quantized (int8/int16) mm ops (#18974) Apr 20, 2026
apullin pushed a commit to apullin/executorch that referenced this pull request Apr 20, 2026
Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137
apullin pushed a commit to apullin/executorch that referenced this pull request Apr 20, 2026
Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137
@apullin apullin force-pushed the export-D99857137 branch 2 times, most recently from b4a1625 to 5439a12 Compare April 21, 2026 16:57
apullin pushed a commit to apullin/executorch that referenced this pull request Apr 21, 2026
Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 21, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 21, 2026

Workflows were awaiting approval. CI has now been triggered for the ciflow labels on this PR.

Summary:

This diff is experimental, but appears to address incomplete support for INT pathways for BMM. TBD.

The pass converts rank-2 mm to rank-3 bmm (required by TOSA spec) via
unsqueeze/bmm/squeeze. Previously it called super().call() to re-trace
the graph on FakeTensors for shape propagation, but aten.bmm rejects
int8/int16 FakeTensors, causing failures for any quantized mm ops.

Since mm→bmm is a pure shape transformation (adding a batch dim of 1),
we can set the output metadata directly: unsqueeze the mm's FakeTensor
for the bmm node, and use the original for the squeeze. No need to
re-execute the op.

Reviewed By: digantdesai

Differential Revision: D99857137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants