RL: running normalizations for states and rewards/returns#104
Conversation
5e2ea04 to
8285d4b
Compare
36f8b62 to
837d637
Compare
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
…rithms. Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
…istory instead of calling it per each batch (which was technically wrong) Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
9b68689 to
db92165
Compare
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
romerojosh
left a comment
There was a problem hiding this comment.
Changes LGTM! Merge when ready.
| // state normalizer | ||
| if (state_normalizer_) { | ||
| auto normalizer_path = root_dir / "state_normalizer.pt"; | ||
| state_normalizer_->save(normalizer_path.native()); |
There was a problem hiding this comment.
Not necessarily a blocker but I wonder if there are ways we can merge some of these separate .pt files into fewer separate files in the checkpoint directories to make things easier to manage. Something to think about generally, not just in this PR. We can do something about it in a follow up.
There was a problem hiding this comment.
Yes, I agree. I know how to do that in python but need to see if libtorch offers similar features. Generally you can create a dict, store all sorts of tensors in it and later retrieve them with the keys. We can do that here to in a follow-up cleanup.
This MR does the following:
This one should be merged after #103. I will rebase before accordingly.