By
David Li*,
Nikita Gushchin*,
Dmitry Abulkhanov,
Eric Moulines,
Ivan Oseledets,
Maxim Panov,
Alexander Korotin
Diffusion Language Models can generate high-quality text, but their iterative reverse-diffusion sampling makes inference slow. IDLM speeds them up by distilling a pretrained many-step diffusion language model into a few-step generator.
Instead of simply matching every teacher step, IDLM uses an Inverse Distillation view for discrete token spaces.The paper reports 4×–64× fewer inference steps while preserving the teacher model’s generation quality.
IDLM/
├── configs/ # Hydra configs: data, model, algo, strategy, callbacks, etc.
│ ├── algo/ # ar, mdlm, duo, duo_base, d3pm, sedd
│ ├── data/ # OpenWebText and TinyGSM/GSM8K configs
│ │ ├── tiny-gsm.yaml # TinyGSM training/evaluation data config
│ │ └── gsm8k-test.yaml # GSM8K/TinyGSM benchmark evaluation config
│ ├── model/ # tiny / small / medium model configs
│ ├── noise/ # diffusion noise schedules
│ └── config.yaml # main experiment config
├── integral/ # precomputed tokenizer / integration assets, including SmolLM TinyGSM support
├── models/ # DiT backbone, EMA utilities, attention tests
├── scripts/ # training and generation recipes
├── algo.py # model families and IDLM distillation logic
├── dataloader.py # tokenizers, OpenWebText/TinyGSM datasets, dataloaders
├── main.py # Hydra + Lightning entry point
├── metrics.py # perplexity, entropy, BPD, NLL metrics
├── trainer_base.py # shared training / sampling base classes
├── utils.py # logging and helper utilities
├── requirements.txt # environment note / dependency list
└── LICENSE
git clone https://github.com/David-cripto/IDLM.git
cd IDLMTo get started, create a conda environment containing the required dependencies.
conda create -n idlm python=3.12
conda activate idlm
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1- IDLM-MDLM. Trained on OpenWebText:
- IDLM-MDLM. Trained on TinyGSM:
- IDLM-Duo. Trained on OpenWebText:
- IDLM-Duo. Trained on TinyGSM:
- IDLM-DCD. Trained on OpenWebText:
This section provides reference training entry points for unconditional OpenWebText distillation and conditional TinyGSM distillation. The scripts are written as Hydra override recipes, update the dataset cache and checkpoint paths before launching a run.
The OpenWebText recipes train IDLM students for unconditional language generation. Before executing the scripts, configure the cache_dir parameter in configs/data/openwebtext-split.yaml to specify the desired output path.
bash scripts/train_idlm_mdlm.shbash scripts/train_idlm_duo.shbash scripts/train_idlm_dcd.shThe TinyGSM recipes train IDLM students for conditional mathematical reasoning on TinyGSM-style question-answer examples. Before executing the scripts, configure cache_dir in configs/data/tiny-gsm.yaml and replace the training.finetune_path placeholder in each script with the corresponding pretrained teacher checkpoint. Our TinyGSM distillation runs use the teacher checkpoints from the S-FLM repository: MDLM for IDLM-MDLM and Duo for IDLM-Duo.
bash scripts/train_idlm_mdlm_tynigsm.shbash scripts/train_idlm_duo_tynigsm.shThis section separates unconditional OpenWebText generation from conditional TinyGSM benchmark evaluation.
The generation scripts sweep over 4, 8, 16, and 32 sampling steps.
Before running them, set eval.generated_samples_path to a real JSON output path.
mkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=mdlm \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-mdlm \
sampling.steps=16 \
sampling.num_sample_batches=10 \
sampling.predictor=ancestral_cache \
sampling.noise_removal=ancestral \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_mdlm_16steps.jsonmkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=duo \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-duo \
sampling.steps=16 \
sampling.num_sample_batches=10 \
sampling.noise_removal=greedy \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_duo_16steps.jsonmkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=duo \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-dcd \
sampling.steps=4 \
sampling.num_sample_batches=10 \
sampling.noise_removal=greedy \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_duo_4steps.jsonbash scripts/generation_idlm_mdlm.sh
bash scripts/generation_idlm_duo.sh
bash scripts/generation_idlm_dcd.shGenerated sample files contain:
{
"generative_ppl": 0.0,
"entropy": 0.0,
"generated_seqs": []
}We release the TinyGSM IDLM checkpoints on Hugging Face: IDLM-MDLM TinyGSM and IDLM-Duo TinyGSM. To evaluate these conditional models on the TinyGSM benchmark, use the .ckpt files from the Hugging Face repositories together with the TinyGSM evaluation code from the S-FLM repository.
By default, Hydra writes experiment outputs under:
outputs/<dataset>/<date>/<time>/
TensorBoard logs are written under:
tb_logs/
Checkpoints are written according to the checkpointing config in configs/config.yaml.
If you find this repository useful, please cite:
@article{li2026idlm,
title={IDLM: Inverse-distilled Diffusion Language Models},
author={Li, David and Gushchin, Nikita and Abulkhanov, Dmitry and Moulines, Eric and Oseledets, Ivan and Panov, Maxim and Korotin, Alexander},
journal={arXiv preprint arXiv:2602.19066},
year={2026}
}Our codebase is inspired by recent Discrete Diffusion Models projects. Namely, MDLM, Duo, and S-FLM.
This project is released under the MIT License. See LICENSE for details.
