ruixiang63

Follow

🎯

Focusing

Ruixiang Wang ruixiang63

🎯

Focusing

Follow

Passion “ψ(｀∇´)ψ

47 followers · 183 following

NVIDIA
https://ruixiang63.github.io/

Achievements

Achievements

Highlights

Pro

ruixiang63/README.md

Hi, I'm Ruixiang 👋

I am Senior DevTech Engineer at NVIDIA.

🚀 Recent Open Source Contributions

llama.cpp

#23869 — Speed-bench: standardized speculative decoding performance evaluation benchmark
#18039 — Eagle3 speculative decoding: 1.2–3.28× speedup across many model families
#22105 — DFlash speculative decoding: up to 8× speedup on Qwen3 models

HuggingFace Transformers

#45665 — Performance fix: eliminated implicit H2D copies in Gated DeltaNet

Unsloth

This NVIDIA-Unsloth blog explains the following optimizations in detail.
#534 — Double-buffered checkpoint reload via CUDA streams + events, +8.4% on 8B, +6.7% on 14B fine-tuning speedup
#4173 — Packed-sequence metadata caching, +14.3% fine-tuning speedup on Qwen3-14B QLoRA SFT
#535 — GPT-OSS MoE expert routing optimization, ~10-15% fine-tuning speedup on GPT-OSS models

✍️ Technical Writing — NVIDIA Developer Blog

Model Quantization Series:

Pinned Loading

Research-Project-Title-Embedding Research-Project-Title-Embedding Public

This project aims to improve the quality eBay product title embedding. Here are the slides and my master thesis. The source code is in company's repo and not able to release now.

1
microgpt-cpp microgpt-cpp Public

C++ version of MicroGPT with GPU acceleration

C++
llama.cpp llama.cpp Public

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++ 4 2
ggml-org/llama.cpp ggml-org/llama.cpp Public

LLM inference in C/C++

C++ 116k 19.4k
unslothai/unsloth unslothai/unsloth Public

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

Python 66.1k 5.9k
unslothai/unsloth-zoo unslothai/unsloth-zoo Public

Utils for Unsloth https://github.com/unslothai/unsloth

Python 273 266