Tech

Retrieval Based Voice Conversion WebUI v2 with RMVPE Colab Notebook

Easy RVC training and inference

Henry

Jan 2, 2024 • 8 min read

As of today (20240102) The official Retrieval-based-Voice-Conversion-WebUI has not updated v2 Colab Notebook with RMVPE yet. I just post it here for your convenience.
Note: Please check the RVC repo before use. Use the official one if it’s been updated.

Retrieval_Based_Voice_Conversion_WebUI_v2_RMVPE_20240102.ipynb

For comprehensive guide, please refer to:

Content Preview:

View graphics card

!nvidia-smi

Install dependencies

# It might have some errors while running, and you can just skip them
!apt-get -y install build-essential python3-dev ffmpeg
!pip3 install --upgrade setuptools wheel
!pip3 install --upgrade pip
!pip3 install faiss-cpu==1.7.2 fairseq gradio==3.14.0 ffmpeg ffmpeg-python praat-parselmouth pyworld numpy==1.23.5 numba==0.56.4 librosa==0.9.2 torchcrepe

Clone repository

!mkdir Retrieval-based-Voice-Conversion-WebUI
%cd /content/Retrieval-based-Voice-Conversion-WebUI
!git init
!git remote add origin https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git
!git fetch origin 56917dbeca8d268ce9d23af90b0ee66fc8761988 --depth=1
!git reset --hard FETCH_HEAD

Update the warehouse (generally no need to execute)

!git pull

Install Aria2

!apt -y install -qq aria2

Download the base model

# v1
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D48k.pth
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G48k.pth
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D48k.pth
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G48k.pth

# v2
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o D48k.pth
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o G48k.pth
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o f0D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o f0D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o f0D48k.pth
# !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o f0G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o f0G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained_v2 -o f0G48k.pth

# RMVPE
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/rmvpe.pt -d /content/Retrieval-based-Voice-Conversion-WebUI -o rmvpe.pt

Download vocal separation model

!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2-人声vocals+非人声instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP2-人声vocals+非人声instrumentals.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5-主旋律人声vocals+其他instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP5-主旋律人声vocals+其他instrumentals.pth

Download hubert_base

!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -d /content/Retrieval-based-Voice-Conversion-WebUI -o hubert_base.pt

Mount Google Drive

from google.colab import drive

drive.mount("/content/drive")

Load the packaged dataset from Google Drive to /content/dataset

# @markdown Dataset location
DATASET = (
    "/content/drive/MyDrive/dataset/example.zip"  # @param {type:"string"}
)

!mkdir -p /content/dataset
!unzip -d /content/dataset -B {DATASET}

Rename files with the same name in the dataset

!ls -a /content/dataset/
!rename 's/(\w+)\.(\w+)~(\d*)/$1_$3.$2/' /content/dataset/*.*~*

Start Web-UI

%cd /content/Retrieval-based-Voice-Conversion-WebUI
# %load_ext tensorboard
# %tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs
!python3 infer-web.py --colab --pycmd python3

Manually back up the trained model files to Google Drive

# @markdown You need to check the file name of the model in the logs folder yourself, and manually modify the file name at the end of the command below. If you encounter any error, just download manually.

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown Model epoch
MODELEPOCH = 150  # @param {type:"integer"}

!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/drive/MyDrive/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/drive/MyDrive/{MODELNAME}_G_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/added_*.index /content/drive/MyDrive/
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/total_*.npy /content/drive/MyDrive/

!cp /content/Retrieval-based-Voice-Conversion-WebUI/weights/{MODELNAME}.pth /content/drive/MyDrive/{MODELNAME}{MODELEPOCH}.pth

Restore pth from Google Drive

# @markdown You need to check the file name of the model in the logs folder yourself, and manually modify the file name at the end of the command below.

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown Model epoch
MODELEPOCH = 150  # @param {type:"integer"}

!mkdir -p /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}

!cp /content/drive/MyDrive/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!cp /content/drive/MyDrive/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth
!cp /content/drive/MyDrive/*.index /content/
!cp /content/drive/MyDrive/*.npy /content/
!cp /content/drive/MyDrive/{MODELNAME}{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/weights/{MODELNAME}.pth

Manual preprocessing (not recommended）

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown Sampling Rate
BITRATE = 48000  # @param {type:"integer"}
# @markdown Number of processes used
THREADCOUNT = 8  # @param {type:"integer"}

!python3 trainset_preprocess_pipeline_print.py /content/dataset {BITRATE} {THREADCOUNT} logs/{MODELNAME} True

Manual feature extraction (not recommended)

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown Number of processes used
THREADCOUNT = 8  # @param {type:"integer"}
# @markdown Pitch extraction algorithm
ALGO = "harvest"  # @param {type:"string"}

!python3 extract_f0_print.py logs/{MODELNAME} {THREADCOUNT} {ALGO}

!python3 extract_feature_print.py cpu 1 0 0 logs/{MODELNAME}

Manual training (not recommended)

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown GPU used
USEGPU = "0"  # @param {type:"string"}
# @markdown Batch size
BATCHSIZE = 32  # @param {type:"integer"}
# @markdown Stopped epoch
MODELEPOCH = 3200  # @param {type:"integer"}
# @markdown Save epoch interval
EPOCHSAVE = 100  # @param {type:"integer"}
# @markdown Sampling Rate
MODELSAMPLE = "48k"  # @param {type:"string"}
# @markdown Whether to cache the training set
CACHEDATA = 1  # @param {type:"integer"}
# @markdown Whether to save only the latest ckpt file
ONLYLATEST = 0  # @param {type:"integer"}

!python3 train_nsf_sim_cache_sid_load_pretrain.py -e lulu -sr {MODELSAMPLE} -f0 1 -bs {BATCHSIZE} -g {USEGPU} -te {MODELEPOCH} -se {EPOCHSAVE} -pg pretrained/f0G{MODELSAMPLE}.pth -pd pretrained/f0D{MODELSAMPLE}.pth -l {ONLYLATEST} -c {CACHEDATA}

Delete other pth and leave only the selected ones (be careful and read the code carefully)

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown Select model epoch
MODELEPOCH = 9600  # @param {type:"integer"}

!echo "Back up the selected model..."
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/{MODELNAME}_G_{MODELEPOCH}.pth

!echo "Deleting..."
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}
!rm /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/*.pth

!echo "Restore the selected model..."
!mv /content/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!mv /content/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth

!echo "Deletion completed"
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}

Clear all files under the project, leaving only the selected model (be careful and read the code carefully)

# @markdown Model name
MODELNAME = "demo_voice"  # @param {type:"string"}
# @markdown Select model epoch
MODELEPOCH = 9600  # @param {type:"integer"}

!echo "Back up the selected model..."
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/{MODELNAME}_G_{MODELEPOCH}.pth

!echo "Deleting..."
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}
!rm -rf /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/*

!echo "Restore the selected model..."
!mv /content/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!mv /content/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth

!echo "Deletion completed"
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}

Copyright statement: Unless otherwise stated, all articles on this blog adopt the CC BY-NC-SA 4.0 license agreement. For non-commercial reprints and citations, please indicate the author: Henry, and original article URL. For commercial reprints, please contact the author for authorization.