Official implementation of the paper
"GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning"
Authors: Paolo Mandica, Michał Brzozowski, Zuzanna Dubanowska, Neo Christopher Chung
Samsung AI Center, Warsaw, Poland
Paper • Installation • Quick Start • Citation
GPart is implemented following the standard interface of the 🤗 Hugging Face Parameter-Efficient Fine-Tuning (PEFT) library and is fully compatible with PEFT.
GPart is a parameter-efficient fine-tuning method that removes the low-rank bottleneck entirely.
Instead of factorizing updates as in LoRA-style approaches, GPart optimizes a
This yields a fine-tuning pipeline with:
- End-to-end isometry in the trainable subspace.
- A single clean capacity hyperparameter:
d. - Minimal storage cost: the trainable vector plus one seed.
- GPart
GPart is a parameter-efficient fine-tuning (PEFT) method introduced in the paper
“GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning.”
The method is built on a simple idea: instead of constraining updates through a low-rank matrix parameterization, optimize a low-dimensional vector
The paper motivates this formulation by arguing that low-rank adapters distort geometry through bilinear reconstruction, while GPart preserves distances in the trainable subspace and offers a cleaner parameterization for PEFT.
Compared with low-rank PEFT methods, GPart is designed to be structurally simpler and more direct.
- No low-rank bottleneck: updates are not reconstructed through a bilinear factorization.
- End-to-end isometric mapping: the trainable subspace preserves Euclidean geometry.
- Minimal state: the adapter can be reconstructed from the trainable vector and a random seed.
- One main capacity knob:
dcontrols the size of the trainable subspace.
This repository contains the code used to evaluate GPart on:
- Natural language understanding with RoBERTa on GLUE.
- Computer vision with ViT on multiple image classification benchmarks.
- Mathematical reasoning with decoder-only LLMs fine-tuned on MetaMathQA and evaluated on GSM8K and MATH.
This repository uses uv for dependency and environment management.
uv sync
source .venv/bin/activate# RoBERTa-base with GPart
python src/scripts/glue/finetune_roberta_glue.py --adapter_type gpart
# RoBERTa-large with GPart
python src/scripts/glue/finetune_roberta_glue.py --adapter_type gpart --model_size large
# Selected tasks with a fixed seed
python src/scripts/glue/finetune_roberta_glue.py \
--adapter_type gpart \
--tasks sst2 qnli \
--seed 123
# Parameter count only
python src/scripts/glue/finetune_roberta_glue.py \
--adapter_type gpart \
--compute_params_onlyCommand-Line Overrides
Override default hyperparameters directly from the command line. Arguments after the main flags are captured as key-value pairs:
python src/scripts/glue/finetune_roberta_glue.py \
--adapter_type gpart \
adapter.d 16384 \
adapter.isometric False \
training.lr 0.001 \
training.head_lr 0.002 \
training.batch_size 16Aggregate Results
python src/scripts/glue/collect_results_glue.py logs/roberta_glue_gpartThis project uses a dataclass-based configuration system — no YAML files. All configs are Python dataclasses with type safety, IDE autocomplete, and a single source of truth. Adding a new field to any config class automatically propagates everywhere without manual updates.
Values are resolved with the following precedence (later overrides earlier):
1. Dataclass defaults ← Python default values in the dataclass definition
2. Adapter-specific configs ← Pre-defined instances per adapter type (e.g., GPART_BASE_CONFIG)
3. Task-specific configs ← Per-adapter, per-task overrides (e.g., epochs=60 for SST2)
4. Model-size configs ← Large-model variants when --model_size large (e.g., GPART_LARGE_CONFIG)
5. CLI overrides ← Key-value pairs after the main flags
The central object is ExperimentConfig, which composes three sub-configs:
@dataclass
class ExperimentConfig:
adapter: AdapterConfig # Adapter hyperparameters (d, r, dropout, etc.)
training: TrainingConfig # Training hyperparameters (lr, batch_size, etc.)
task_metadata: dict # Dataset info, metrics, num_labels per task
task_configs: dict # Task-specific overrides per adapterTrainingConfig controls the training loop:
| Field | Default | Description |
|---|---|---|
batch_size |
32 |
Training batch size |
max_seq_length |
512 |
Maximum sequence length |
weight_decay |
0.1 |
Weight decay for regularization |
warmup_ratio |
0.06 |
Fraction of steps for LR warmup |
model_selection |
"best" |
Model selection strategy (see below) |
lr |
1e-3 |
Base learning rate (overridden by task configs) |
head_lr |
1e-3 |
Learning rate for classifier head |
AdapterConfig is the base class extended by each adapter type. Each subclass adds its own fields (e.g., d for GPart, r and alpha for LoRA). Fields are automatically included in logging and serialization — no manual listing needed.
TaskConfig provides per-task overrides that take precedence over training defaults:
| Field | Description |
|---|---|
epochs |
Number of training epochs for this task |
lr |
Task-specific base learning rate |
head_lr |
Task-specific head learning rate |
batch_size |
Task-specific batch size |
When you pass --model_size large, the system selects:
- Large adapter instance — e.g.,
GPART_LARGE_CONFIGinstead ofGPART_BASE_CONFIG - Large task configs — e.g.,
GPART_LARGE_TASK_CONFIGSwith different epochs/lrs (if defined)
- Create a config file in
src/configs/adapter_configs/:
# src/configs/adapter_configs/my_adapter.py
from dataclasses import dataclass, field
from typing import Dict, List
from configs.base_config import AdapterConfig, TaskConfig
@dataclass
class MyAdapterConfig(AdapterConfig):
type: str = "my_adapter"
my_param: int = 42
MY_ADAPTER_BASE_CONFIG = MyAdapterConfig()
MY_ADAPTER_LARGE_CONFIG = MyAdapterConfig(my_param=84)
MY_ADAPTER_TASK_CONFIGS: Dict[str, TaskConfig] = { ... }- Register it in
src/configs/adapter_configs/__init__.py:
from .my_adapter import MyAdapterConfig, MY_ADAPTER_BASE_CONFIG, ...
ADAPTER_CONFIG_REGISTRY["my_adapter"] = {
"config_class": MyAdapterConfig,
"base": MY_ADAPTER_BASE_CONFIG,
"large": MY_ADAPTER_LARGE_CONFIG,
"task_configs": MY_ADAPTER_TASK_CONFIGS,
}- It's ready —
my_adapterautomatically appears in--adapter_typechoices andALLOWED_ADAPTERS.
The system has two separate configuration layers:
| Layer | Class | Purpose |
|---|---|---|
| Experiment config | GPARTConfig(AdapterConfig) |
What experiment to run (defaults, task overrides) |
| PEFT config | GPartConfig(PeftConfig) |
How to construct the adapter model |
The get_peft_config() function in src/utils/adapter_utils.py bridges them — it renames fields (e.g., dropout → gpart_dropout), adds PEFT-specific fields, and constructs the GPartConfig object that get_peft_model() expects. This separation keeps the experiment system decoupled from PEFT library internals.
Supported datasets:
cifar10cifar100fgvcflowers102eurosatresisc45oxfordpetsstandfordcarsdtd
All datasets are downloaded automatically by the finetuning script, except dtd, which must be manually downloaded from the DTD website.
After downloading, extract the archive into the data/ directory. The expected structure is:
data/
└── dtd/
├── images/
├── imdb/
└── labels/
# ViT-Base on FGVC Aircraft
python src/scripts/vision/finetune_ViT.py --dataset fgvc --model_size base
# ViT-Large on CIFAR-100
python src/scripts/vision/finetune_ViT.py --dataset cifar100 --model_size large
# Custom optimization settings
python src/scripts/vision/finetune_ViT.py \
--dataset flowers102 \
--model_size base \
--head_lr 5e-3 \
--base_lr 6e-3 \
--num_train_epochs 30python src/scripts/vision/collect_results_ViT.pySupported base models include:
google/gemma-7bQwen/Qwen2.5-0.5BQwen/Qwen2.5-3BQwen/Qwen2.5-7Bmeta-llama/Llama-3.1-8B
# Qwen2.5-0.5B with GPart
python src/scripts/math/finetune_metamath.py \
--model_name Qwen/Qwen2.5-0.5B \
--adapter_type gpart \
--d 131072
# Qwen2.5-7B with GPart
python src/scripts/math/finetune_metamath.py \
--model_name Qwen/Qwen2.5-7B \
--adapter_type gpart \
--d 524288
# With custom training settings
python src/scripts/math/finetune_metamath.py \
--model_name Qwen/Qwen2.5-0.5B \
--adapter_type gpart \
--d 131072 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--learning_rate 2e-4# Base model
python src/scripts/math/eval_math.py \
--model_path Qwen/Qwen2.5-0.5B \
--dataset gsm8k
# GPart fine-tuned model
python src/scripts/math/eval_math.py \
--model_path Qwen/Qwen2.5-0.5B \
--adapter_path logs/metamath_qwen-qwen2.5-0.5b_gpart_d131072_drop0.05_lr0.0002_bs4_ga4_ep2_seq2048_nosysprompt_seed42_131k \
--dataset mathTo use GPart in your own repository, follow these steps:
Copy the peft folder from this repository into your project:
# From your project root
cp -r /path/to/GPart/peft .If you're using uv for dependency management, you can configure it to use the local PEFT copy instead of downloading from PyPI. Add the following to your pyproject.toml:
# Add peft to the dependencies
dependencies = [
"peft"
]
# Add the peft local path as source
[tool.uv.sources]
peft = { path = "peft", editable = true }This tells uv to use the local peft package from the specified path.
Once the PEFT folder is in your project, you can use GPart just like any other PEFT adapter:
import torch
from transformers import AutoModelForCausalLM
from peft import TaskType, get_peft_model
from peft.tuners.gpart import GPartConfig
# 1. Define the GPart adapter configuration
adapter_config = GPartConfig(
d=131072, # Capacity parameter (adjust for your use case)
target_modules=["q_proj", "v_proj"], # Modules to adapt
task_type=TaskType.CAUSAL_LM, # Task type (CAUSAL_LM, SEQ_CLS, etc.)
)
# 2. Load your base model
model = AutoModelForCausalLM.from_pretrained(
args.model_name,
trust_remote_code=True,
torch_dtype=torch_dtype,
)
# 3. Wrap the model with GPart adapter
model = get_peft_model(model, adapter_config)
# 4. Train as usual with your preferred training loop
# The model now has GPart adapters injected and ready for trainingFor reproducible results:
- Run multiple seeds for each setting.
- Track the model checkpoint, task, and
d. - Preserve the random seed used for partition generation.
- Use the provided result collection scripts for final aggregation.
Because the GPart adapter is reconstructed from the trainable vector and the partition seed, the seed is part of the effective model state.
We welcome contributions! This repository uses a fork-based workflow — fork the repo, create a branch, and submit a pull request.
Quick summary:
- Fork the repository
- Create a branch in your fork for each feature/experiment
- Format your code with Black before submitting
- Submit a Pull Request when you're ready to merge into
main - PR review required — at least one approval before merging
See CONTRIBUTING.md for the complete guide including setup instructions, branch naming conventions, and PR templates.
The main branch is protected:
- ✅ No direct pushes — all changes via pull requests only
- ✅ At least 1 approving review required
- ✅ Branch must be up to date before merging
- ✅ No force pushes allowed
If you use this repository in academic work, please cite:
@misc{mandica2026gpart,
title={GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning},
author={Paolo Mandica and Michał Brzozowski and Zuzanna Dubanowska and Neo Christopher Chung},
year={2026},
eprint={2605.14841},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.14841},
}This project is licensed under the Apache-2.0 License.

