Skip to content

Vocabulary Pruning Engine for Multilingual GLiNER Models#366

Open
ALI-AL-MARJANI wants to merge 11 commits into
urchade:mainfrom
ALI-AL-MARJANI:feature/vocab-pruning-engine
Open

Vocabulary Pruning Engine for Multilingual GLiNER Models#366
ALI-AL-MARJANI wants to merge 11 commits into
urchade:mainfrom
ALI-AL-MARJANI:feature/vocab-pruning-engine

Conversation

@ALI-AL-MARJANI

Copy link
Copy Markdown

Problem

Multilingual GLiNER models (mDeBERTa-v3) carry a 250k-token embedding matrix.
For single-language deployments, >60% of these embeddings are never accessed.
This creates an unnecessary memory and cold-start bottleneck for edge/CPU deployments.

Solution

scripts/prune_gliner_vocab.py tokenises a target-language corpus, identifies the
active token intersection, slices word_embeddings.weight, rebuilds tokenizer.json,
and saves a fully self-contained pruned model loadable via GLiNER.from_pretrained().

Benchmark — urchade/gliner_multi-v2.1 (English Wikipedia, 100k articles)

Metric Original Pruned Change
Vocabulary 250,105 90,840 −63.7%
Model size 1,155.8 MB 666.5 MB −42.3% (−489 MB)
Entity correctness 6/6 PASS ✓ Lossless

Files changed

  • scripts/prune_gliner_vocab.py — pruning engine (new)
  • scripts/validate_pruned_model.py — 3-tier correctness validator (new)
  • docs/vocab_pruning.md — documentation with benchmarks (new)
  • docs/index.md — added to toctree
  • gliner/modeling/encoder.py — bugfix: token_lengths kwarg leaked into BiEncoder
    labels encoder forward pass
  • gliner/model.py — bugfix: bare import onnxruntime replaced with try/except

Usage

python scripts/prune_gliner_vocab.py \
    --model_id urchade/gliner_multi-v2.1 \
    --dataset_for_vocab wikipedia \
    --output_dir ./pruned_en \
    --lang en

Ali322O added 11 commits May 18, 2026 15:47
  - scripts/prune_gliner_vocab.py: prune multilingual GLiNER vocab to target language
  - scripts/validate_pruned_model.py: 3-tier validator (PASS/SCORE_DRIFT/ENTITY_FAIL),
  --score_tol flag
  - docs/vocab_pruning.md: full documentation page with benchmark results
  - docs/index.md: add vocab_pruning to toctree
  - gliner/modeling/encoder.py: fix token_lengths kwarg leak in BiEncoder.encode_labels
  - gliner/model.py: wrap onnxruntime import in try/except for ARM compatibility

  Benchmarked on urchade/gliner_multi-v2.1 (mDeBERTa-v3, 250k vocab):
    English Wikipedia corpus -> 250,105 -> 90,840 tokens (63.7% reduction)
    Model size: 1155.8 -> 666.5 MB (42.3% smaller), ALL PASS entity correctness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant