SPCell

Jul 19 2025 23:24

My Experience with the RVC Neural Network

Hello, readers! I’m SPCell, and I’d like to share my experience working with the RVC neural network, which I used to create “ Ai covers”, transferring the vocal performance of one song onto another person’s voice. I began experimenting in summer 2023, had settled on optimal parameters by early 2024, and have since returned from time to time to fine‑tune the quality.

At the heart of it all is the training process: you must select argument values so that the output voice sounds as natural as possible and free of artifacts. If the model is under‑trained, the voice sounds robotic; if it’s over‑trained, the pitch begins to “jump,” but the voice itself still sounds acceptable, which is better than audible glitches. Initially, the best encoder was considered to be harvest, but after rmvpe appeared I switched to it, and later when rmvpe+ came out I adopted that as well, since it produced a modest but noticeable improvement over the version without the “+.”

Other training arguments I tweaked included:

*bitrate (depends on the sample rate of your dataset files),

*hop length (controls how strictly the pitch matches the original; lower values force a tighter match, higher values allow more flexibility),

*thread count (likely tied to how many GPU threads are used, affecting training strength),

*batch size (simultaneous file processing to speed up training; I set it to the maximum my GPU could handle),

*the total number of epochs (and saving checkpoints at intervals),

*the number of GPUs used.

I trained my models on Kaggle, where I could employ two GPUs, but found out that using a single GPU provided a cleaner final voice. To separate vocals from instrumentals I used Ultimate Vocal Remover, then cleaned any remaining artifacts in Adobe Audition, RX Pro Audio Editor, and SpectraLayers.

When it came time to generate covers, I always specified rmvpe or rmvpe+ as an argument, testing pitch adjustments separately so that the voice would match my dataset. In songs where the original singer performed at unusually high or low pitches, I’d raise or lower the generation pitch relative to the song’s normal sections (where the singer stays on a single tone) to keep the character of the voice aligned with the dataset.

neural

network

cover

rvc

machine

learning

Jul 19 2025 15:26

My content plan draft

Hello, SPCell here. Posting my content plan draft here.

Posts will be divided by size: small, medium and large. English-speaking audience can read them on Reddit (in specific subreddits that is), Telegram, Threads and here, on Boosty. In the future videos both for my AI content and personal blog are planned to be made. This content plan is not yet final, so some changes most likely will be made.

In the neuro-blog all posts will be published both in Russian and in English. In the personal blog I made a division into Russian and English language segments, since a number of posts will concern life in the Russia, so this information more likely won't say much to an average English-speaking person.

content

plan

About

My Experience with the RVC Neural Network

My content plan draft

Benefactor