mirror of
https://github.com/babysor/Realtime-Voice-Clone-Chinese.git
synced 2026-02-03 02:23:47 +08:00
Web server: Add latest changes (#96)
* Init App * init server.py (#93) * init server.py * Update requirements.txt Add requirement Co-authored-by: auau <auau@test.com> Co-authored-by: babysor00 <babysor00@gmail.com> * Run web.py! Run web.py! * Restruct readme and add instruction to use web server * fix training preprocess of vocoder Co-authored-by: balala <Ozgay@users.noreply.github.com> Co-authored-by: auau <auau@test.com>
This commit is contained in:
@@ -53,7 +53,9 @@
|
||||
#### 2.3训练声码器 (可选)
|
||||
对效果影响不大,已经预置3款,如果希望自己训练可以参考以下命令。
|
||||
* 预处理数据:
|
||||
`python vocoder_preprocess.py <datasets_root>`
|
||||
`python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>`
|
||||
> `<datasets_root>`替换为你的数据集目录,`<synthesizer_model_path>`替换为一个你最好的synthesizer模型目录,例如 *sythensizer\saved_mode\xxx*
|
||||
|
||||
|
||||
* 训练wavernn声码器:
|
||||
`python vocoder_train.py <trainid> <datasets_root>`
|
||||
@@ -70,7 +72,6 @@
|
||||
`python web.py`
|
||||
运行成功后在浏览器打开地址, 默认为 `http://localhost:8080`
|
||||
<img width="578" alt="bd64cd80385754afa599e3840504f45" src="https://user-images.githubusercontent.com/7423248/134275205-c95e6bd8-4f41-4eb5-9143-0390627baee1.png">
|
||||
|
||||
> 注:目前界面比较buggy,
|
||||
> * 第一次点击`录制`要等待几秒浏览器正常启动录音,否则会有重音
|
||||
> * 录制结束不要再点`录制`而是`停止`
|
||||
@@ -80,6 +81,7 @@
|
||||
### 3.2 启动工具箱:
|
||||
`python demo_toolbox.py -d <datasets_root>`
|
||||
> 请指定一个可用的数据集文件路径,如果有支持的数据集则会自动加载供调试,也同时会作为手动录制音频的存储目录。
|
||||
|
||||
<img width="1042" alt="d48ea37adf3660e657cfb047c10edbc" src="https://user-images.githubusercontent.com/7423248/134275227-c1ddf154-f118-4b77-8949-8c4c7daf25f0.png">
|
||||
|
||||
## Release Note
|
||||
@@ -168,4 +170,6 @@ voc_pad =2
|
||||
#### 7.什么时候算训练完成?
|
||||
首先一定要出现注意力模型,其次是loss足够低,取决于硬件设备和数据集。拿本人的供参考,我的注意力是在 18k 步之后出现的,并且在 50k 步之后损失变得低于 0.4
|
||||

|
||||
|
||||

|
||||
|
||||
|
||||
@@ -54,7 +54,8 @@ Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata,
|
||||
#### 2.3 Train vocoder (Optional)
|
||||
> note: vocoder has little difference in effect, so you may not need to train a new one.
|
||||
* Preprocess the data:
|
||||
`python vocoder_preprocess.py <datasets_root>`
|
||||
`python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>`
|
||||
> `<datasets_root>` replace with your dataset root,`<synthesizer_model_path>`replace with directory of your best trained models of sythensizer, e.g. *sythensizer\saved_mode\xxx*
|
||||
|
||||
* Train the wavernn vocoder:
|
||||
`python vocoder_train.py mandarin <datasets_root>`
|
||||
|
||||
@@ -41,7 +41,7 @@ hparams = HParams(
|
||||
tts_lstm_dims = 1024,
|
||||
tts_postnet_K = 5,
|
||||
tts_num_highways = 4,
|
||||
tts_dropout = 0.5,
|
||||
tts_dropout = 0.2,
|
||||
tts_cleaner_names = ["basic_cleaners"],
|
||||
tts_stop_threshold = -3.4, # Value below which audio generation ends.
|
||||
# For example, for a range of [-4, 4], this
|
||||
|
||||
@@ -16,8 +16,8 @@ if __name__ == "__main__":
|
||||
parser.add_argument("datasets_root", type=str, help=\
|
||||
"Path to the directory containing your SV2TTS directory. If you specify both --in_dir and "
|
||||
"--out_dir, this argument won't be used.")
|
||||
parser.add_argument("--model_dir", type=str,
|
||||
default="synthesizer/saved_models/train3/", help=\
|
||||
parser.add_argument("-m", "--model_dir", type=str,
|
||||
default="synthesizer/saved_models/mandarin/", help=\
|
||||
"Path to the pretrained model directory.")
|
||||
parser.add_argument("-i", "--in_dir", type=str, default=argparse.SUPPRESS, help= \
|
||||
"Path to the synthesizer directory that contains the mel spectrograms, the wavs and the "
|
||||
|
||||
Reference in New Issue
Block a user