Optimize dataloaders for better GPU utilization by ComDec · Pull Request #1 · ComDec/ChemGFN

ComDec · 2025-09-11T15:52:47Z

Summary

Pre-tokenize prompts in dataset classes to avoid repeated CPU work
Enable DataLoader persistent workers and prefetching for smoother GPU feeding

Testing

pytest -q (fails: ModuleNotFoundError: No module named 'chemgfn.data.ChemGFN_datamodule')

https://chatgpt.com/codex/tasks/task_e_68c2de9672d483288c6490407cf805ac

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

chatgpt-codex-connector · 2025-09-11T15:54:54Z

    def __len__(self):
        return len(self.prompts)


[P0] Buffer dataset length uses undefined attribute

BufferDataPipe.__len__ now returns len(self.prompts) even though __init__ no longer assigns self.prompts after replacing it with pre-tokenized self.encoded_prompts. Any call to len(dataset) (for example when initializing DataLoader) will raise AttributeError: 'BufferDataPipe' object has no attribute 'prompts', breaking training and evaluation regardless of inputs. Consider basing the length on self.encoded_prompts or self.total_size.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-09-11T15:54:54Z

    def __len__(self):
        return len(self.prompts)


[P0] Molecule dataset length uses undefined attribute

MolOptDataPipe.__init__ also drops the self.prompts assignment when pre-tokenizing, but __len__ still returns len(self.prompts). Instantiating the dataset and querying its length will immediately raise AttributeError, so neither training nor validation loaders can function. The length should be derived from self.encoded_prompts/self.molecules instead.

Useful? React with 👍 / 👎.

Pre-tokenize prompts and tune dataloaders for better GPU utilization

66fa594

ComDec added the codex label Sep 11, 2025 — with ChatGPT Codex Connector

chatgpt-codex-connector Bot reviewed Sep 11, 2025

View reviewed changes

ComDec added 2 commits September 11, 2025 12:46

Improve model and training efficiency

d2e7051

Optimize gfn utilities for faster generation

a457e28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize dataloaders for better GPU utilization#1

Optimize dataloaders for better GPU utilization#1
ComDec wants to merge 3 commits into
devfrom
codex/improve-code-execution-efficiency

ComDec commented Sep 11, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Sep 11, 2025

Uh oh!

chatgpt-codex-connector Bot Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ComDec commented Sep 11, 2025

Summary

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant