ChemGFN is a minimal, camera-ready codebase for training GFlowNets with LLMs on two tasks:
- SMILES optimization with grammar-constrained generation
- VarExpr24 arithmetic generation (variable-length expressions)
- Hydra-based configuration for training and evaluation
- Grammar-constrained sampling for SMILES
- Reproducible evaluation scripts for paper configs
chemgfn/core models, data modules, and utilitiesconfigs/Hydra configs for data, models, experiments, and trainersscripts/batch evaluation helperstests/unit and integration testsdata/expected data locations (user provided)
- Python 3.10
- PyTorch 2.0+
- CUDA optional for GPU training and evaluation
Conda (recommended):
conda env create -f environment.yaml
conda activate chemgfn
pip install -e .Pip:
pip install -r requirements.txt
pip install -e .Default configs expect the following files:
- SMILES:
data/SMILES/sidechain_prompts_sa.json - VarExpr24:
data/24_points/prompts.txt - VarExpr24 buffer:
data/24_points/buffer_24_non_zero.pt
If your data lives elsewhere, update the paths under configs/data/.
SMILES optimization (TB baseline):
python chemgfn/train.py experiment=SMILES_basic/SMILES_cfg_TBVarExpr24 (TB baseline):
python chemgfn/train.py experiment=VarExpr24/VarExpr24_TB_no_data_buffer_hitCommon overrides:
python chemgfn/train.py \
experiment=SMILES_basic/SMILES_cfg_TB \
trainer.devices=1 \
trainer.max_steps=5000Single run:
python chemgfn/eval.py \
experiment=SMILES_basic/SMILES_cfg_TB \
ckpt_path="/path/to/checkpoint.ckpt"Batch evaluation (paper configs):
scripts/run_eval_all.shfor SMILES tasksscripts/run_eval_expr24_all.shfor VarExpr24 tasks
Update the ckpt_path entries in those scripts to match your local checkpoints and adjust
the GPU list if needed.
The following configs reproduce the reported results.
SMILES (baseline and ablations):
configs/experiment/SMILES_basic/SMILES_cfg_TB.yamlconfigs/experiment/SMILES_basic/SMILES_cfg_no_TB.yamlconfigs/experiment/SMILES_basic/SMILES_cfg_subTB.yamlconfigs/experiment/SMILES_basic/SMILES_cfg_TB_wo_ref.yamlconfigs/experiment/SMILES_SubM/SMILES_cfg_TB_subM_replay_add_len_func.yamlconfigs/experiment/SMILES_SubM/SMILES_cfg_SubTB_subM_full.yamlconfigs/experiment/SMILES_RapTB/SMILES_cfg_RapTB_v2_kmin_5_to_2_mix_fix.yamlconfigs/experiment/SMILES_RapTB/SMILES_cfg_RapTB_v2_kmin_5_to_2_mix_fix_subM.yamlconfigs/experiment/SMILES_RapTB/SMILES_cfg_RapTB_v2_kmin_5_to_2_max_only.yamlconfigs/experiment/SMILES_RapTB/SMILES_cfg_RapTB_v2_kmin_5_to_2_soft_only.yaml
SMILES length-15:
configs/experiment/SMILES_Length/SMILES_cfg_TB_len_15.yamlconfigs/experiment/SMILES_Length/SMILES_cfg_subTB_len_15.yamlconfigs/experiment/SMILES_Length/SMILES_cfg_RapTB_v2_kmin_12_to_8_mix_fix_len15.yamlconfigs/experiment/SMILES_Length/SMILES_cfg_RapTB_v2_kmin_12_to_8_mix_fix_len15_subM.yaml
VarExpr24:
configs/experiment/VarExpr24/VarExpr24_TB_no_data_buffer_hit.yamlconfigs/experiment/VarExpr24/VarExpr24_SubTB_no_data_buffer_hit.yamlconfigs/experiment/VarExpr24/VarExpr24_RapTB_kmin_7_to_3_mix_wo_dbuff_hit_tune.yamlconfigs/experiment/VarExpr24/VarExpr24_TB_no_data_buffer_hit_subM_div_on_valid.yamlconfigs/experiment/VarExpr24/VarExpr24_SubTB_no_data_buffer_hit_subM_div_on_valid.yamlconfigs/experiment/VarExpr24/VarExpr24_RapTB_kmin_7_to_3_mix_wo_dbuff_hit_tune_subM_div_on_valid.yamlconfigs/experiment/VarExpr24/VarExpr24_TB_no_data_buffer_hit_oracle.yamlconfigs/experiment/VarExpr24/VarExpr24_SubTB_no_data_buffer_hit_oracle.yamlconfigs/experiment/VarExpr24/VarExpr24_RapTB_kmin_7_to_3_mix_wo_dbuff_hit_tune_oracle.yamlconfigs/experiment/VarExpr24/VarExpr24_RootSubTBLogZ_no_data_buffer_hit_dense.yamlconfigs/experiment/VarExpr24/VarExpr24_RootSubTBLogZ_no_data_buffer_hit_dense_oracle.yamlconfigs/experiment/VarExpr24/VarExpr24_TB_no_data_buffer_hit_PRT.yamlconfigs/experiment/VarExpr24/VarExpr24_SubTB_no_data_buffer_hit_PRT.yamlconfigs/experiment/VarExpr24/VarExpr24_RapTB_kmin_7_to_3_mix_wo_dbuff_hit_tune_PRT.yaml
pytest tests -vSee tests/README_TESTS.md for more detail.