Add rlformers forward-pass features to ExecuTorch backbone for on-device export parity (#19096) by ifed-ucsd · Pull Request #19096 · pytorch/executorch

ifed-ucsd · 2026-04-23T21:53:03Z

Summary:

The 730M dense model checkpoint uses several rlformers features that the ExecuTorch XNNPACK export path did not implement. Without these, the exported model produces numerically incorrect output.

This diff adds support for 8 missing features:

normalize_tok_embeddings — scaleless RMSNorm after embedding lookup
qk_norm_before_rope — conversion from GenAI args (attention code already supported it)
scale_query_by — custom scalar multiplier on Q after QK norm
use_attn_o_gate — sigmoid gate on attention output using a learned linear projection of the layer input
use_attn_o_norm — scaleless per-head RMSNorm on attention output (applied before o_gate)
use_residual_gate — NormPreservingResidualConnection with learned per-dim gates for both attention and FFN residual connections
use_ffn_learnable_scales — RMSNormWithInputScale replacing standard post-FFN norm, computing rms_norm(gamma * x) instead of gamma * rms_norm(x)
output_soft_cap_temp — tanh(logits/temp) * temp soft capping on output logits

All features are off by default (backward compatible). They activate when the corresponding fields are set in the checkpoint's params.json and propagated through model_args_conversion.

Weight key mappings added for: attention.og.weight, add_attn.gate, add_ffn.gate, post_ffn_norm.weight.

Differential Revision: D102030169

pytorch-bot · 2026-04-23T21:53:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19096

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 2 New Failures, 4 Unrelated Failures

As of commit 6d2a84a with merge base f9f29e7 ():

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

periodic / test-models-linux (buck2, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-models-linux (emformer_transcribe, portable, linux.2xlarge) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-23T21:53:12Z

@ifed-ucsd has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102030169.

github-actions · 2026-04-23T21:53:58Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ice export parity (pytorch#19096) Summary: The 730M dense model checkpoint uses several rlformers features that the ExecuTorch XNNPACK export path did not implement. Without these, the exported model produces numerically incorrect output. This diff adds support for 8 missing features: 1. `normalize_tok_embeddings` — scaleless RMSNorm after embedding lookup 2. `qk_norm_before_rope` — conversion from GenAI args (attention code already supported it) 3. `scale_query_by` — custom scalar multiplier on Q after QK norm 4. `use_attn_o_gate` — sigmoid gate on attention output using a learned linear projection of the layer input 5. `use_attn_o_norm` — scaleless per-head RMSNorm on attention output (applied before o_gate) 6. `use_residual_gate` — NormPreservingResidualConnection with learned per-dim gates for both attention and FFN residual connections 7. `use_ffn_learnable_scales` — RMSNormWithInputScale replacing standard post-FFN norm, computing `rms_norm(gamma * x)` instead of `gamma * rms_norm(x)` 8. `output_soft_cap_temp` — `tanh(logits/temp) * temp` soft capping on output logits All features are off by default (backward compatible). They activate when the corresponding fields are set in the checkpoint's params.json and propagated through model_args_conversion. Weight key mappings added for: `attention.og.weight`, `add_attn.gate`, `add_ffn.gate`, `post_ffn_norm.weight`. Differential Revision: D102030169

…ice export parity (pytorch#19096) Summary: Pull Request resolved: pytorch#19096 The 730M dense model checkpoint uses several rlformers features that the ExecuTorch XNNPACK export path did not implement. Without these, the exported model produces numerically incorrect output. This diff adds support for 8 missing features: 1. `normalize_tok_embeddings` — scaleless RMSNorm after embedding lookup 2. `qk_norm_before_rope` — conversion from GenAI args (attention code already supported it) 3. `scale_query_by` — custom scalar multiplier on Q after QK norm 4. `use_attn_o_gate` — sigmoid gate on attention output using a learned linear projection of the layer input 5. `use_attn_o_norm` — scaleless per-head RMSNorm on attention output (applied before o_gate) 6. `use_residual_gate` — NormPreservingResidualConnection with learned per-dim gates for both attention and FFN residual connections 7. `use_ffn_learnable_scales` — RMSNormWithInputScale replacing standard post-FFN norm, computing `rms_norm(gamma * x)` instead of `gamma * rms_norm(x)` 8. `output_soft_cap_temp` — `tanh(logits/temp) * temp` soft capping on output logits All features are off by default (backward compatible). They activate when the corresponding fields are set in the checkpoint's params.json and propagated through model_args_conversion. Weight key mappings added for: `attention.og.weight`, `add_attn.gate`, `add_ffn.gate`, `post_ffn_norm.weight`. Differential Revision: D102030169

…ice export parity (pytorch#19096) Summary: The 730M dense model checkpoint uses several rlformers features that the ExecuTorch XNNPACK export path did not implement. Without these, the exported model produces numerically incorrect output. This diff adds support for 8 missing features: 1. `normalize_tok_embeddings` — scaleless RMSNorm after embedding lookup 2. `qk_norm_before_rope` — conversion from GenAI args (attention code already supported it) 3. `scale_query_by` — custom scalar multiplier on Q after QK norm 4. `use_attn_o_gate` — sigmoid gate on attention output using a learned linear projection of the layer input 5. `use_attn_o_norm` — scaleless per-head RMSNorm on attention output (applied before o_gate) 6. `use_residual_gate` — NormPreservingResidualConnection with learned per-dim gates for both attention and FFN residual connections 7. `use_ffn_learnable_scales` — RMSNormWithInputScale replacing standard post-FFN norm, computing `rms_norm(gamma * x)` instead of `gamma * rms_norm(x)` 8. `output_soft_cap_temp` — `tanh(logits/temp) * temp` soft capping on output logits All features are off by default (backward compatible). They activate when the corresponding fields are set in the checkpoint's params.json and propagated through model_args_conversion. Weight key mappings added for: `attention.og.weight`, `add_attn.gate`, `add_ffn.gate`, `post_ffn_norm.weight`. Differential Revision: D102030169

ifed-ucsd requested a review from lucylq as a code owner April 23, 2026 21:53

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 23, 2026

meta-codesync Bot changed the title ~~Add rlformers forward-pass features to ExecuTorch backbone for on-device export parity~~ Add rlformers forward-pass features to ExecuTorch backbone for on-device export parity (#19096) Apr 23, 2026

ifed-ucsd force-pushed the export-D102030169 branch from 7feccab to 9fbe028 Compare April 23, 2026 21:58

ifed-ucsd force-pushed the export-D102030169 branch from 9fbe028 to 8b2da23 Compare April 23, 2026 22:17

ifed-ucsd force-pushed the export-D102030169 branch from 8b2da23 to 62852da Compare April 23, 2026 22:21

ifed-ucsd force-pushed the export-D102030169 branch from 62852da to 6d2a84a Compare April 23, 2026 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rlformers forward-pass features to ExecuTorch backbone for on-device export parity (#19096)#19096

Add rlformers forward-pass features to ExecuTorch backbone for on-device export parity (#19096)#19096
ifed-ucsd wants to merge 1 commit intopytorch:mainfrom
ifed-ucsd:export-D102030169

ifed-ucsd commented Apr 23, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ifed-ucsd commented Apr 23, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19096

❗ 1 Active SEVs

❌ 2 New Failures, 4 Unrelated Failures

Uh oh!

meta-codesync Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ifed-ucsd commented Apr 23, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

This PR needs a `release notes:` label