fish-speech-Derur

fish-speech-Derur это форк fishaudio/fish-speech в который я добавил больше функционала в файлы vqgan/inference.py и llama/generate.py!

Инструкция по работе с этими файлами и моими дополнениями:

1.Сначала установите все зависимости:

python -m pip install -r requirements.txt

python -m pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

2. Скачайте модель:

git clone https://huggingface.co/fishaudio/fish-speech-1.5

3.Инструкция из оригинального репозитория:

ipynb

inference.ipynb4.96 Kb

WebUi:

python tools/run_webui.py \

--llama-checkpoint-path checkpoints/fish-speech-1.5 \

--decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth \

# --compile

CLI:

1. Encode reference audio:

## Enter the path to the audio file here

src_audio = r"D:\PythonProject\vo_hutao_draw_appear.wav"

python tools/vqgan/inference.py -i {src_audio} --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \

--logger-offer 0 # 1=On 0=Off #add by me

2. Generate semantic tokens from text:

python tools/llama/generate.py \

--text "hello world" \

--prompt-text "The text corresponding to reference audio" \

--prompt-tokens "fake.npy" \

--checkpoint-path "checkpoints/fish-speech-1.5" \

--num-samples 2 \

--output-path "out/" #add by me

--logger-offer 0 # 1=On 0=Off #add by me

# --compile

3. Generate speech from semantic tokens:

python tools/vqgan/inference.py \

-i "codes_0.npy" \

--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \

-o "fake.wav" #add by me

--logger-offer 0 # 1=On 0=Off #add by m

github

Derur