Skip to content

RL: running normalizations for states and rewards/returns#104

Merged
azrael417 merged 15 commits into
masterfrom
tkurth/rl-running-reward-normalization
Apr 28, 2026
Merged

RL: running normalizations for states and rewards/returns#104
azrael417 merged 15 commits into
masterfrom
tkurth/rl-running-reward-normalization

Conversation

@azrael417
Copy link
Copy Markdown
Collaborator

@azrael417 azrael417 commented Apr 20, 2026

This MR does the following:

  • adds a running states as well as rewards and returns normalization mode.
  • fixes the advantage normalization: erroneously this was done per training step, but should be done after advantages have been computed once for the full rollout. This was fixed.
  • adding tests which test the running normalizer.
  • catching stack traces for expected to fail tests to avoid cluttering test reports.

This one should be merged after #103. I will rebase before accordingly.

@azrael417 azrael417 requested a review from romerojosh April 20, 2026 08:47
@azrael417 azrael417 self-assigned this Apr 20, 2026
@azrael417 azrael417 force-pushed the tkurth/rl-running-reward-normalization branch from 5e2ea04 to 8285d4b Compare April 20, 2026 08:48
@azrael417 azrael417 changed the title R: running state normalization RL: running state normalization Apr 20, 2026
@azrael417 azrael417 changed the title RL: running state normalization RL: running normalizations for states and rewards/returns Apr 20, 2026
@azrael417 azrael417 force-pushed the tkurth/rl-running-reward-normalization branch 2 times, most recently from 36f8b62 to 837d637 Compare April 20, 2026 17:19
@azrael417 azrael417 marked this pull request as ready for review April 21, 2026 05:41
@azrael417
Copy link
Copy Markdown
Collaborator Author

/build_and_test

@github-actions
Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions
Copy link
Copy Markdown

✅ Build workflow passed! View run

…rithms.

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
…istory instead of calling it per each batch (which was technically wrong)

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
@azrael417 azrael417 force-pushed the tkurth/rl-running-reward-normalization branch from 9b68689 to db92165 Compare April 23, 2026 05:22
@azrael417
Copy link
Copy Markdown
Collaborator Author

/build_and_test

@github-actions
Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions
Copy link
Copy Markdown

✅ Build workflow passed! View run

Copy link
Copy Markdown
Collaborator

@romerojosh romerojosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! Merge when ready.

// state normalizer
if (state_normalizer_) {
auto normalizer_path = root_dir / "state_normalizer.pt";
state_normalizer_->save(normalizer_path.native());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a blocker but I wonder if there are ways we can merge some of these separate .pt files into fewer separate files in the checkpoint directories to make things easier to manage. Something to think about generally, not just in this PR. We can do something about it in a follow up.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I know how to do that in python but need to see if libtorch offers similar features. Generally you can create a dict, store all sorts of tensors in it and later retrieve them with the keys. We can do that here to in a follow-up cleanup.

@azrael417 azrael417 merged commit 0a6e96c into master Apr 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants