modelopt
Here are 5 public repositories matching this topic...
Guide for serving fine-tuned Qwen3.5-27B (dense, NVFP4) on DGX Spark via native vLLM. Includes critical config fixes for modelopt export_hf_checkpoint() that prevent silent FP32 dequantization.
-
Updated
Apr 21, 2026 - Python
Lna-Lab production pipeline: GGUF -> modelopt-format NVFP4 + working MTP head for vLLM on RTX PRO 6000 Blackwell (SM120). Stages 2 (NVFP4) and 3 (MTP graft) are Lna-Lab originals; stage 1 (GGUF->bf16) reuses li-yifei/gguf-to-nvfp4.
-
Updated
Apr 27, 2026 - Python
-
Updated
May 1, 2026 - Python
Improve this page
Add a description, image, and links to the modelopt topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the modelopt topic, visit your repo's landing page and select "manage topics."