RL: running normalizations for states and rewards/returns by azrael417 · Pull Request #104 · NVIDIA/TorchFort

azrael417 · 2026-04-20T08:47:43Z

This MR does the following:

adds a running states as well as rewards and returns normalization mode.
fixes the advantage normalization: erroneously this was done per training step, but should be done after advantages have been computed once for the full rollout. This was fixed.
adding tests which test the running normalizer.
catching stack traces for expected to fail tests to avoid cluttering test reports.

This one should be merged after #103. I will rebase before accordingly.

azrael417 · 2026-04-21T05:42:06Z

/build_and_test

github-actions · 2026-04-21T05:42:13Z

🚀 Build workflow triggered! View run

github-actions · 2026-04-21T05:55:09Z

✅ Build workflow passed! View run

…rithms. Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

…istory instead of calling it per each batch (which was technically wrong) Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 · 2026-04-23T05:22:35Z

/build_and_test

github-actions · 2026-04-23T05:22:44Z

🚀 Build workflow triggered! View run

github-actions · 2026-04-23T05:37:14Z

✅ Build workflow passed! View run

romerojosh

Changes LGTM! Merge when ready.

romerojosh · 2026-04-27T16:51:20Z

+  // state normalizer
+  if (state_normalizer_) {
+    auto normalizer_path = root_dir / "state_normalizer.pt";
+    state_normalizer_->save(normalizer_path.native());


Not necessarily a blocker but I wonder if there are ways we can merge some of these separate .pt files into fewer separate files in the checkpoint directories to make things easier to manage. Something to think about generally, not just in this PR. We can do something about it in a follow up.

Yes, I agree. I know how to do that in python but need to see if libtorch offers similar features. Generally you can create a dict, store all sorts of tensors in it and later retrieve them with the keys. We can do that here to in a follow-up cleanup.

azrael417 requested a review from romerojosh April 20, 2026 08:47

azrael417 self-assigned this Apr 20, 2026

azrael417 force-pushed the tkurth/rl-running-reward-normalization branch from 5e2ea04 to 8285d4b Compare April 20, 2026 08:48

azrael417 changed the title ~~R: running state normalization~~ RL: running state normalization Apr 20, 2026

azrael417 changed the title ~~RL: running state normalization~~ RL: running normalizations for states and rewards/returns Apr 20, 2026

azrael417 force-pushed the tkurth/rl-running-reward-normalization branch 2 times, most recently from 36f8b62 to 837d637 Compare April 20, 2026 17:19

azrael417 marked this pull request as ready for review April 21, 2026 05:41

azrael417 added 15 commits April 22, 2026 22:21

Adding Running Stats Normalizer for states/observations fopr all algo…

3cf8fd1

…rithms. Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

fixing advantage buffer normalization by calling once for the whole h…

b043d87

…istory instead of calling it per each batch (which was technically wrong) Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

fixing formatting

c054a67

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

Adding missing files for normalizer

68f7aab

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

updating license header in test

cd6a458

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

fixing code formatting

0e3684d

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding rewards and return normalization

4e31cbe

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding tests for returns and rewards normalization

7f26036

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

fixing indentation

917fbdc

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

updating documentation

42e2b31

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding normalizeReturns and normalizeAdvantages to virtual base class

3f1fc21

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

suppress stack trace printing for tests which are supposed to fail.

0533e91

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding normalization of stored q values

03fe1b3

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding back DDPG action state env check

647e69d

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding back DDPG action state env check

db92165

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 force-pushed the tkurth/rl-running-reward-normalization branch from 9b68689 to db92165 Compare April 23, 2026 05:22

romerojosh approved these changes Apr 27, 2026

View reviewed changes

azrael417 merged commit 0a6e96c into master Apr 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL: running normalizations for states and rewards/returns#104

RL: running normalizations for states and rewards/returns#104
azrael417 merged 15 commits into
masterfrom
tkurth/rl-running-reward-normalization

azrael417 commented Apr 20, 2026 •

edited

Loading

Uh oh!

azrael417 commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

azrael417 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

romerojosh left a comment

Uh oh!

romerojosh Apr 27, 2026

Uh oh!

azrael417 Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azrael417 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azrael417 commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

azrael417 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

romerojosh left a comment

Choose a reason for hiding this comment

Uh oh!

romerojosh Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

azrael417 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

azrael417 commented Apr 20, 2026 •

edited

Loading