fish-speech-Derur
fish-speech-Derur это форк fishaudio/fish-speech в который я добавил больше функционала в файлы vqgan/inference.py и llama/generate.py!
Инструкция по работе с этими файлами и моими дополнениями:
1.Сначала установите все зависимости:
python -m pip install -r requirements.txt
python -m pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
2. Скачайте модель:
git clone https://huggingface.co/fishaudio/fish-speech-1.5
3.Инструкция из оригинального репозитория:
ipynb
inference.ipynb4.96 Kb
WebUi:
python tools/run_webui.py \
--llama-checkpoint-path checkpoints/fish-speech-1.5 \
--decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth \
# --compile
CLI:
1. Encode reference audio:
## Enter the path to the audio file here
src_audio = r"D:\PythonProject\vo_hutao_draw_appear.wav"
python tools/vqgan/inference.py -i {src_audio} --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--logger-offer 0 # 1=On 0=Off #add by me
2. Generate semantic tokens from text:
python tools/llama/generate.py \
--text "hello world" \
--prompt-text "The text corresponding to reference audio" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.5" \
--num-samples 2 \
--output-path "out/" #add by me
--logger-offer 0 # 1=On 0=Off #add by me
# --compile
3. Generate speech from semantic tokens:
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
-o "fake.wav" #add by me
--logger-offer 0 # 1=On 0=Off #add by m
github