PriFT: Prior-Support Guided Supervised Fine-Tuning

This is the official repository for "PriFT: Prior-Support Guided Supervised Fine-Tuning" by Ke Wang*, Shuangqi Li*, Mathieu Salzmann, and Pascal Frossard.

PriFT is a token-reweighted supervised fine-tuning framework that derives token weights from a frozen pretrained reference model instead of the online model being optimized. The pretrained reference gives a prior-support signal that is decoupled from the optimization trajectory.

This repository is based on DFT.

Methods

The training loss is selected through the loss.method config field:

`loss.method`	Per-token weight	Source
`sft`	`1`	—
`dft`	`sg(p_online(y_t))`	online model
`prift_prob`	`p_ref(y_t)`	frozen pretrained reference
`prift_mass`	`1[ u_t >= 0.5 ]`	frozen pretrained reference

where, under the frozen reference distribution p_ref(· | x, y_<t),

u_t = sum_{v : p_ref(v) <= p_ref(y_t)} p_ref(v)

Configuration

New fields in verl/trainer/config/sft_trainer.yaml:

loss:
  method: dft                 # {sft, dft, prift_prob, prift_mass}
  reference_model_path: null  # defaults to model.partial_pretrain when null (the pretrained model in our paper)

(No need to manually configure them if you use our scripts below)

Environments

cd Path-to-Project
bash envs/install_prift_envs.sh

This creates the three pinned reproduction environments:

prift-train-qwen (training env, based on suggested setup from DFT repo)
prift-eval-qwen25 (eval env, based on suggested setup from Qwen2.5-math repo)
prift-eval-qwen3 (eval env, based on suggested setup from Qwen3 repo)

Training uses verl. Please use prift-eval-qwen25 for evaluating Qwen2.5 models and prift-eval-qwen3 for evaluating Qwen3 models.

Getting started

Step1: Prepare data:

conda activate prift-train-qwen
bash scripts/prepare_data.sh 100000

This generates:

verl/data/numina_cot/train.parquet
verl/data/math500/test.parquet

Step2: Launch training:

Please use the prift-train-qwen environment for training. The METHOD argument selects the token weighting rule:

sft: standard supervised fine-tuning
dft: online-model probability weighting
prift_prob: pretrained-reference probability weighting
prift_mass: pretrained-reference cumulative-mass thresholding

conda activate prift-train-qwen

NPROC=4
MICRO_BSZ=4
MODEL=Qwen/Qwen2.5-Math-7B
METHOD=prift_prob
REFERENCE_MODEL=Qwen/Qwen2.5-Math-7B

bash scripts/train.sh $MODEL $METHOD $NPROC $MICRO_BSZ $REFERENCE_MODEL

Checkpoints are written under verl/checkpoints/numina-cot-${METHOD}-$(basename ${MODEL}).

Step3: Evaluation:

Please use prift-eval-qwen25 for Qwen2.5 models and prift-eval-qwen3 for Qwen3-8B-Base. The evaluation script resolves the latest global_step_* checkpoint automatically and writes summary_metrics.json to the output directory.

conda activate prift-eval-qwen25

PROMPT_TYPE="qwen-boxed"
CUDA_VISIBLE_DEVICES=0
N_SAMPLING=16
TEMPERATURE=1
MODEL_NAME_OR_PATH=verl/checkpoints/numina-cot-prift_prob-Qwen2.5-Math-7B
OUTPUT_DIR=outputs/prift_prob-Qwen2.5-Math-7B

bash scripts/eval_math.sh \
  --model $MODEL_NAME_OR_PATH \
  --output-dir $OUTPUT_DIR \
  --prompt-type $PROMPT_TYPE \
  --n-sampling $N_SAMPLING \
  --temperature $TEMPERATURE \
  --cuda-visible-devices $CUDA_VISIBLE_DEVICES

We also provide wrapper scripts to reproduce the PriFT runs used in the paper. By default these wrappers run prift_prob and prift_mass:

bash scripts/reproduce_qwen25_math_1p5b.sh 4
bash scripts/reproduce_qwen25_math_7b.sh 4
bash scripts/reproduce_qwen3_8b_base.sh 4

Acknowledgements

This codebase builds directly on DFT. We thank the authors for providing the repo.

Citation

@article{wang2026prift,
  title={PriFT: Prior-Support Guided Supervised Fine-Tuning},
  author={Wang, Ke and Li, Shuangqi and Salzmann, Mathieu and Frossard, Pascal},
  journal={arXiv preprint arXiv:2606.09396},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
envs		envs
math_evaluation		math_evaluation
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PriFT: Prior-Support Guided Supervised Fine-Tuning

Methods

Configuration

Environments

Getting started

Step1: Prepare data:

Step2: Launch training:

Step3: Evaluation:

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PriFT: Prior-Support Guided Supervised Fine-Tuning

Methods

Configuration

Environments

Getting started

Step1: Prepare data:

Step2: Launch training:

Step3: Evaluation:

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages