Vocabulary Pruning Engine for Multilingual GLiNER Models by ALI-AL-MARJANI · Pull Request #366 · urchade/GLiNER

ALI-AL-MARJANI · 2026-06-03T14:20:46Z

Problem

Multilingual GLiNER models (mDeBERTa-v3) carry a 250k-token embedding matrix.
For single-language deployments, >60% of these embeddings are never accessed.
This creates an unnecessary memory and cold-start bottleneck for edge/CPU deployments.

Solution

scripts/prune_gliner_vocab.py tokenises a target-language corpus, identifies the
active token intersection, slices word_embeddings.weight, rebuilds tokenizer.json,
and saves a fully self-contained pruned model loadable via GLiNER.from_pretrained().

Benchmark — `urchade/gliner_multi-v2.1` (English Wikipedia, 100k articles)

Metric	Original	Pruned	Change
Vocabulary	250,105	90,840	−63.7%
Model size	1,155.8 MB	666.5 MB	−42.3% (−489 MB)
Entity correctness	—	6/6 PASS ✓	Lossless

Files changed

scripts/prune_gliner_vocab.py — pruning engine (new)
scripts/validate_pruned_model.py — 3-tier correctness validator (new)
docs/vocab_pruning.md — documentation with benchmarks (new)
docs/index.md — added to toctree
gliner/modeling/encoder.py — bugfix: token_lengths kwarg leaked into BiEncoder
labels encoder forward pass
gliner/model.py — bugfix: bare import onnxruntime replaced with try/except

Usage

python scripts/prune_gliner_vocab.py \
    --model_id urchade/gliner_multi-v2.1 \
    --dataset_for_vocab wikipedia \
    --output_dir ./pruned_en \
    --lang en

validator

- scripts/prune_gliner_vocab.py: prune multilingual GLiNER vocab to target language - scripts/validate_pruned_model.py: 3-tier validator (PASS/SCORE_DRIFT/ENTITY_FAIL), --score_tol flag - docs/vocab_pruning.md: full documentation page with benchmark results - docs/index.md: add vocab_pruning to toctree - gliner/modeling/encoder.py: fix token_lengths kwarg leak in BiEncoder.encode_labels - gliner/model.py: wrap onnxruntime import in try/except for ARM compatibility Benchmarked on urchade/gliner_multi-v2.1 (mDeBERTa-v3, 250k vocab): English Wikipedia corpus -> 250,105 -> 90,840 tokens (63.7% reduction) Model size: 1155.8 -> 666.5 MB (42.3% smaller), ALL PASS entity correctness

…_attention

Ali322O added 11 commits May 18, 2026 15:47

WIP 1

787edb1

WIP3

3412f49

WIP3

c38107f

feat: vocabulary pruning engine with lossless English prune + 3-tier

87c9f11

validator

docs: update README with vocabulary pruning results and structure

e0e0ba0

feat: FlashDeBERTa integration via use_flash_attention config + flash…

6f9e90a

…_attention

feat: entity type description conditioning

7597602

sliding-window long-document inference

ca3e14f

feat: hard negative sampling for NER training

42ac490

feat: joint NER+RE training script

7ae3b29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocabulary Pruning Engine for Multilingual GLiNER Models#366

Vocabulary Pruning Engine for Multilingual GLiNER Models#366
ALI-AL-MARJANI wants to merge 11 commits into
urchade:mainfrom
ALI-AL-MARJANI:feature/vocab-pruning-engine

ALI-AL-MARJANI commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ALI-AL-MARJANI commented Jun 3, 2026

Problem

Solution

Benchmark — urchade/gliner_multi-v2.1 (English Wikipedia, 100k articles)

Files changed

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Benchmark — `urchade/gliner_multi-v2.1` (English Wikipedia, 100k articles)