A state-of-the-art web UI crafted to streamline rapid and effortless RVC inference — featuring a model downloader, voice splitter, batch inference, training pipeline, real-time conversion, and a full CLI.
Note
v2.2.0 ships a major security hardening pass, a ~3× training speedup bundle (--fast_train + --bf16_adamw), and Applio-parity accuracy patches for small (10-minute) datasets. See the Changelog below for the full breakdown. Pull requests are still welcome and will be reviewed.
Note
If you want to use old version switch to v1 branch.
- Voice Inference — Single & batch conversion, TTS, pitch shifting, formant shifting, audio cleaning, Whisper transcription
- Real-Time Conversion — Live mic voice conversion with VAD and low-latency processing
- 30+ F0 Methods — rmvpe, crepe, fcpe, harvest, hybrid, and many more
- F0 Autotune — Automatic pitch correction with configurable strength
- Audio Cleaning — Built-in denoising for cleaner output
- Audio Separation — Vocal/instrumental isolation (MDX-Net, Roformer, BS-Roformer), karaoke, reverb removal, denoising
- Auto Pretrained Download — Automatically downloads pretrained models from HuggingFace
- End-to-End Training — Dataset creation → preprocessing → feature extraction → training → model export
- 4 Vocoders — HiFi-GAN NSF (Default), BigVGAN, MRF-HiFi-GAN, RefineGAN
- 5 Optimizers — AdamW, RAdam, AnyPrecisionAdamW, AdaBelief, AdaBeliefV2
- Training Quality Improvements — Multi-scale mel spectrogram loss (8 scales), scaled v3 discriminator loss, proper feature loss gradient flow, cuDNN benchmark
- Advanced Options — Gradient accumulation, torch.compile(), 8-bit Adam, cosine annealing LR, overtraining detection
- Architecture Support — RVC and SVC (from Vietnamese-RVC)
- Embedder Mix — Layer-wise embedding mixing with configurable ratios (from Vietnamese-RVC)
- 🚀 3× Faster Training —
--fast_trainflag bundles TF32 matmul + cuDNN benchmark + torch.compile + expandable_segments allocator. Vocal-quality-safe (no loss/numerics changes). See Training Boost. - 🚀 bf16 Auto-Mode —
--bf16_adamwflag (Applio-parity shortcut) forces AnyPrecisionAdamW + bf16 autocast. Recommended on Ampere+ GPUs (A100/H100/RTX 30xx+/40xx+). - 🎯 Applio-Parity Accuracy —
per_preprocess=3.0s(was 3.7s) yields ~26% more training chunks on small datasets.--chunk_len/--overlap_lennow apply to Automatic cut mode. Saved.pthembedsembedder_model+dataset_length+overtrain_infoprovenance fields.
- Safe Deserialization — All
torch.load()calls route throughsafe_torch_load(forcesweights_only=True). Restrictedpickle.Unpicklerwhitelist blocks every known RCE gadget. See Security Patches. - Path Traversal Guards —
validate_path_within()wired into 20+os.path.joinsites in inference + training. Blocks../../etc/cron.d/evilstyle escapes from GUI/CLI inputs. - Hardened Downloaders — All 5 downloaders (HuggingFace, Google Drive, Mega, MediaFire, PixelDrain) enforce: 8 GB size cap, extension whitelist, filename sanitization,
timeout=300son every network call. Fixedtempfile.mktempTOCTOU race ingdown.py. - No Silent Failures — Bare
except:clauses in checkpoint-load (was silently restarting training from epoch 1) and ONNX export replaced with typed exceptions. MEGA nonce migrated fromrandom.randint→secrets.randbits(32).
- CLI — Full command-line interface via
rvc-cli - ZLUDA Support — Full AMD GPU support via ZLUDA
- XPU Support — Intel GPU support via XPU backend
- Push to Hub — Upload trained models directly to HuggingFace Hub
- 44 Languages — Full UI translation support
Advanced RVC Inference supports the same vocoders as Vietnamese-RVC:
| Vocoder | Description | Pitch Required |
|---|---|---|
| Default (HiFi-GAN NSF) | HiFi-GAN with Neural Sine Filter. Adds harmonic sine wave injection for improved pitch accuracy. Recommended for best compatibility. | Yes |
| BigVGAN | Snake activations with Anti-Aliasing (SnakeBeta + AMP blocks). State-of-the-art audio quality. | Yes |
| MRF-HiFi-GAN | HiFi-GAN with Multi-Receptive Field fusion. Richer feature extraction with MRF blocks. | Yes |
| RefineGAN | U-Net based vocoder with parallel residual blocks and anti-aliased resampling. High-fidelity spectral detail. | Yes |
When training without pitch guidance (pitch_guidance=False), a plain HiFi-GAN generator (no NSF) is used automatically regardless of the selected vocoder.
Advanced RVC Inference provides 5 carefully selected optimizers for model training, covering the most effective choices for RVC/audio model training:
| Optimizer | Category | Rating | Best For |
|---|---|---|---|
| AdamW | PyTorch Built-in | ⭐⭐⭐⭐⭐ | General-purpose, most reliable (default) |
| RAdam | PyTorch Built-in | ⭐⭐⭐⭐ | Warmup-free training, short training runs |
| AnyPrecisionAdamW | Mixed-Precision | ⭐⭐⭐⭐ | Bfloat16 training, long runs with Kahan summation |
| AdaBelief | Belief-Based | ⭐⭐⭐ | Better conditioned adaptive learning rates |
| AdaBeliefV2 | Belief-Based | ⭐⭐⭐ | Stable deep training with AMSGrad + InverseSqrt scheduler |
See the Optimizer Reference Guide for detailed descriptions, hyperparameters, and recommendations.
git clone https://github.com/ArkanDash/Advanced-RVC-Inference.git
cd Advanced-RVC-Inference
pip install -r requirements.txtOr install from PyPI:
pip install git+https://github.com/ArkanDash/Advanced-RVC-Inference.gitGPU Support (CUDA)
pip install git+https://github.com/ArkanDash/Advanced-RVC-Inference.git
pip install onnxruntime-gpuZLUDA (AMD GPU)
ZLUDA allows CUDA applications to run on AMD GPUs. Just install PyTorch with ZLUDA support — Advanced RVC will auto-detect and configure itself.
# Follow the ZLUDA installation guide for your AMD GPU
# Then install Advanced RVC normally — ZLUDA is auto-detected
pip install git+https://github.com/ArkanDash/Advanced-RVC-Inference.git# Launch the web UI
rvc-gui
# Or via Python module
python -m arvc.app.gui
# With a public share link
python -m arvc.app.gui --shareThe interface will be available at http://localhost:7860.
# Voice conversion
rvc-cli infer -m model.pth -i input.wav -o output.wav
# Audio separation
rvc-cli uvr -i song.mp3
# Show all commands
rvc-cli --help# ~3× faster training, vocal-quality-safe (no loss/numerics changed)
rvc-cli train my_model --fast_train true --epochs 200 --batch_size 4
# Additional ~1.5–2× speedup on Ampere+ GPUs (A100/H100/RTX 30xx+/40xx+)
# Skip on Colab T4 (Turing) — bf16 is emulated there.
rvc-cli train my_model --fast_train true --bf16_adamw true --epochs 200 --batch_size 8
# 10-minute dataset recipe (max accuracy)
rvc-cli preprocess my_model --sample_rate 48000 \
--cut_method Automatic --chunk_len 3.0 --overlap_len 0.5 \
--process_effects --normalization post
rvc-cli extract my_model --sample_rate 48000 --f0_method rmvpe
rvc-cli create-index my_model --version v2 --algorithm Auto
rvc-cli train my_model --fast_train true --bf16_adamw true \
--epochs 300 --batch_size 4 --multiscale_loss --cosine_lr \
--overtrain_detect --overtrain_threshold 50See docs/TRAINING_BOOST.md for the full optimization
breakdown and docs/SECURITY_PATCHES.md for the
complete security audit trail.
| Notebook | Description |
|---|---|
| Full Web UI | |
| CLI only — lightweight headless mode |
arvc/
├── app/ # Gradio web UI (tabs, pages, layouts)
│ ├── tabs/ # inference, training, downloads, realtime, extra
├── engine/ # Core logic (no UI dependency)
│ ├── inference/ # Voice conversion pipeline, TTS
│ ├── training/ # preprocess, extract, train, export
│ │ ├── preprocess/ # Audio slicing & normalization
│ │ ├── extract/ # Embedding & F0 extraction
│ │ └── runner/ # Training loop, losses, data loading
│ ├── uvr/ # Audio separation (UVR5)
│ ├── realtime/ # Live mic conversion
│ └── models/ # Model loading, generators, optimizers, embedders
│ ├── generators/ # HiFi-GAN NSF, BigVGAN, MRF-HiFi-GAN, RefineGAN
│ ├── optimizers/ # AdamW, RAdam, AnyPrecisionAdamW, AdaBelief, AdaBeliefV2
│ ├── embedders/ # Hubert, ContentVec embedders
│ ├── predictors/ # F0 predictors (RMVPE, Crepe, FCPE, etc.)
│ └── backends/ # CUDA, DirectML, OpenCL, XPU, ZLUDA
├── services/ # Business logic layer (bridges UI <-> engine)
├── ui/ # UI helpers (feedback, dropdown updates, formatting)
├── utils/ # Shared utilities (variables, download helpers)
├── configs/ # Configuration files (training configs, model templates)
│ ├── v1/ # V1 model configs (32k, 40k, 48k)
│ ├── v2/ # V2 model configs (24k, 32k, 40k, 48k)
│ ├── ringformer_v2/ # RingFormer V2 configs
│ └── pcph_gan/ # PCPH-GAN configs
├── datasets/ # Training datasets (organized per model)
├── assets/ # Runtime assets
│ ├── models/ # Pretrained models, embedders, predictors, UVR5
│ │ ├── pretrained_v1/ # V1 pretrained G/D weights
│ │ ├── pretrained_v2/ # V2 pretrained G/D weights
│ │ ├── pretrained_custom/ # Custom pretrained weights
│ │ ├── embedders/ # Hubert/ContentVec models
│ │ ├── predictors/ # F0 predictor models
│ │ └── uvr5/ # UVR5 separation models
│ ├── logs/ # Training logs, checkpoints, extracted features, indexes
│ ├── audios/ # Audio files (input, output, TTS, UVR results)
│ ├── f0/ # F0 cache files
│ ├── binary/ # Binary resources
│ ├── languages/ # 44 translation JSON files
│ └── presets/ # Inference presets
└── _version.py # Version management
Key rule: engine/ should never import from app/ or services/. Keep the core independent.
The use of the converted voice for the following purposes is strictly prohibited:
- Criticizing or attacking individuals
- Advocating for or opposing specific political positions, religions, or ideologies
- Publicly displaying strongly stimulating expressions without proper zoning
- Selling of voice models and generated voice clips
- Impersonation of the original owner of the voice with malicious intentions
- Fraudulent purposes that lead to identity theft or fraudulent phone calls
🔒 Security Hardening (defense-in-depth — no numerics changed)
- Routed 16
torch.load()calls throughsafe_torch_load(forcesweights_only=True) across all predictors (PESTO, PENN, RMVPE, CREPE, DJCM, FCPE), training (train.py ×3, data_utils, utils), whisper, onnx_export, vr_separator, fairseq - Restricted
pickle.Unpicklerwhitelist (primitive + numpy types only) — blocks every known pickle RCE gadget (os.system,subprocess.Popen,builtins.eval) - Wired
validate_path_within()into 20+os.path.joinsites ininference.pyandservices/training.py— blocks../../etc/cron.d/evilpath traversal from GUI/CLI inputs - All 5 downloaders (HuggingFace, Google Drive, Mega, MediaFire, PixelDrain) now enforce: 8 GB size cap, extension whitelist, filename sanitization,
timeout=300son every network call - Fixed
tempfile.mktemp→mkstempTOCTOU race ingdown.py - MEGA nonce migrated from
random.randint→secrets.randbits(32) - Bare
except:clauses in checkpoint-load (was silently restarting training from epoch 1 — silent data loss) and ONNX export replaced with typed exceptions - Added
timeout=30stourllib.request.urlopenGoogle Sheets fetch at app startup
🚀 Training Speedup (~3× faster, vocal-quality-safe)
- New
--fast_trainflag bundles: TF32 matmul + cuDNN TF32 + cuDNN benchmark + torch.compile(mode="reduce-overhead") on both G and D +PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True+ 8 dataloader workers + prefetch_factor=16 + pin_memory + persistent_workers - New
--bf16_adamwflag (Applio-parity shortcut) — forcesoptimizer=AnyPrecisionAdamWandbrain=True(bf16 autocast). Recommended on Ampere+ GPUs. Skip on T4/Turing (bf16 emulated → slower). - All optimizations are non-numerical — no loss function, gradient path, or weight is touched. Vocal fidelity is bit-for-bit identical.
🎯 Applio-Parity Accuracy for 10-minute datasets
per_preprocess3.7s → 3.0s (matches Applio'sPERCENTAGE=3.0) — produces ~26% more training chunks for the same audio (largest single fix for small-data accuracy)--chunk_len/--overlap_lenCLI flags now apply to Automatic cut mode (was Simple-only). Users can boost--overlap_len=0.5for ~17% more chunks on small datasetspreprocess.pynow writestotal_dataset_duration+total_secondstomodel_info.jsonextract_model()embedsembedder_model,dataset_length,overtrain_infointo the saved.pthas provenance metadata — lets inference auto-select the matching embedder- Preprocess now fails fast with clear error if dataset path is missing or empty (was silent walk + cryptic downstream crash)
📚 Docs & Colab
- New
docs/SECURITY_PATCHES.md— full audit trail of every patch with verification commands - New
docs/TRAINING_BOOST.md— usage guide with recommended recipes for T4 / A100 / RTX 30xx+ colab-noui.ipynbupdated: addedbf16_adamwparameter, always passes--chunk_len/--overlap_len, rewrote Security+Speedup+Accuracy notes section
- VRVC Training Integration — Cloned Vietnamese-RVC training pipeline including architecture selector (RVC/SVC), embedder mix, include mutes, nprobe, alpha, and F0 autotune with configurable strength
- Training Quality Fixes — Multi-scale mel spectrogram loss (8 scales with dynamic windows from PolTrain), proper feature loss gradient flow (removed
.detach()from Applio), scaled v3 discriminator loss for BigVGAN/RefineGAN, cuDNN benchmark enabled by default - Optimizer Cleanup — Reduced from 43 optimizers to 5 proven choices (AdamW, RAdam, AnyPrecisionAdamW, AdaBelief, AdaBeliefV2)
- Directory Structure — Cleaned up assets: datasets moved to
arvc/datasets/, weights merged intoarvc/assets/logs/ - EasyGUI Removed — Deleted
easy_gui.pyand all references; Web UI is the only interface - Bug Fixes — Fixed robotic chirping (#69),
get_gpu_info()unpack error, faiss AVX512/AVX2 import crash, missing--predictor_onnxargument, synced all training params across UI → service → subprocess - Colab Updates — Removed EasyGUI toggle and CLI Usage section from main notebook
This project builds upon the work of many open-source projects and contributors. We gratefully acknowledge the following:
| Project | Author |
|---|---|
| RVC | RVC Project |
| Vietnamese-RVC | Phạm Huỳnh Anh |
| Mangio-Kalo-Tweaks | kalomaze |
| Project | Author |
|---|---|
| PolTrain | Politrees |
| Applio | IAHispano |
| Project | Author |
|---|---|
| python-audio-separator | Nomad Karaoke |
| whisper | OpenAI |
| BigVGAN | Nvidia |
| Ultimate-RVC-Models | R-Kentaren |
| Project | Author |
|---|---|
| ZLUDA | vlsid |
| bitsandbytes | Tim Dettmers |
| Collaborator | Role |
|---|---|
| ArkanDash | Creator & Maintainer |
| BF667 | Collaborator |
This project is licensed under the MIT License — see the LICENSE file for details.