15 Commits

Author SHA1 Message Date
Vega
724194a4de Add code to control finetune layers (#154) 2021-10-23 10:25:43 +08:00
babysor00
31bc6656c3 Fix bug of importing GST and add more parameters in toolbox 2021-10-21 00:40:00 +08:00
洛竹
aa35fb3139 docs: this repo -> 本代码库 (#157)
Co-authored-by: 洛竹 <youngjuning@aliyun.com>
2021-10-20 22:54:31 +08:00
babysor00
727eafc51b Merge branch 'main' of https://github.com/babysor/Realtime-Voice-Clone-Chinese 2021-10-20 00:27:19 +08:00
babysor00
d328ecba81 Reconstruct UI of toolbox 2021-10-20 00:27:13 +08:00
Vega
fad574118c Update README-CN.md 2021-10-18 13:50:19 +08:00
babysor00
b0c156a537 Add new dataset support to preprocess parameter 2021-10-17 17:21:49 +08:00
Vega
724809abf4 Update README.md 2021-10-15 14:34:29 +08:00
Vega
05cd1a54ea Add new pretrain model with gst 2021-10-14 01:26:23 +08:00
李子
245099c740 支持data_aishell(SLR33)数据集 (#141)
* 支持data_aishell(SLR33)数据集

* 更新readme
2021-10-12 23:40:27 +08:00
babysor00
6dd2af49fe Merge branch 'main' of https://github.com/babysor/Realtime-Voice-Clone-Chinese 2021-10-12 20:02:05 +08:00
babysor00
8b43ec9a64 Fix bug pre-processing magicdata 2021-10-12 20:01:37 +08:00
Vega
2a99f0ff05 Add gst (#137)
* Commit with working GST

* Make it backward compatible

* Add readme
2021-10-12 19:43:29 +08:00
babysor00
a824b54122 补充预处理文档 2021-10-12 09:22:10 +08:00
weida wang
81befb91b0 Update ui.py (#136)
Add minimize and maximize button of window
2021-10-11 17:17:36 +08:00
12 changed files with 156 additions and 86 deletions

View File

@@ -5,10 +5,10 @@
### [English](README.md) | 中文
### [DEMO VIDEO](https://www.bilibili.com/video/BV17Q4y1B7mY/)
### [DEMO VIDEO](https://www.bilibili.com/video/BV17Q4y1B7mY/) | [Wiki教程](https://github.com/babysor/MockingBird/wiki/Quick-Start-(Newbie)) [训练教程](https://vaj2fgg8yn.feishu.cn/docs/doccn7kAbr3SJz0KM0SIDJ0Xnhd)
## 特性
🌍 **中文** 支持普通话并使用多种中文数据集进行测试aidatatang_200zh, magicdata, aishell3 biaobeiMozillaCommonVoice 等
🌍 **中文** 支持普通话并使用多种中文数据集进行测试aidatatang_200zh, magicdata, aishell3, biaobei, MozillaCommonVoice, data_aishell
🤩 **PyTorch** 适用于 pytorch已在 1.9.0 版本(最新于 2021 年 8 月中测试GPU Tesla T4 和 GTX 2060
@@ -18,6 +18,7 @@
🌍 **Webserver Ready** 可伺服你的训练结果,供远程调用
## 开始
### 1. 安装要求
> 按照原始存储库测试您是否已准备好所有环境。
**Python 3.7 或更高版本** 需要运行工具箱。
@@ -34,8 +35,10 @@
#### 2.1 使用数据集自己训练合成器模型与2.2二选一)
* 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav
* 进行音频和梅尔频谱图预处理:
`python pre.py <datasets_root>`
传入参数 --dataset `{dataset}` 支持 aidatatang_200zh, magicdata, aishell3
`python pre.py <datasets_root> -d {dataset} -n {number}`
可传入参数
* -d`{dataset}` 指定数据集,支持 aidatatang_200zh, magicdata, aishell3, data_aishell, 不传默认为aidatatang_200zh
* -n `{number}` 指定并行数CPU 11770k + 32GB实测10没有问题
> 假如你下载的 `aidatatang_200zh`文件放在D盘`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\`
* 训练合成器:
@@ -48,7 +51,7 @@
| 作者 | 下载链接 | 效果预览 | 信息 |
| --- | ----------- | ----- | ----- |
| 作者 | https://pan.baidu.com/s/1VHSKIbxXQejtxi2at9IrpA [百度盘链接](https://pan.baidu.com/s/1VHSKIbxXQejtxi2at9IrpA ) 提取码:i183 | | 200k steps 只用aidatatang_200zh
| 作者 | https://pan.baidu.com/s/11FrUYBmLrSs_cQ7s3JTlPQ [百度盘链接](https://pan.baidu.com/s/11FrUYBmLrSs_cQ7s3JTlPQ) 提取码:gdn5 | | 25k steps 用3个开源数据集混合训练
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音
|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 旧版需根据[issue](https://github.com/babysor/MockingBird/issues/37)修复
@@ -121,7 +124,7 @@
| --- | ----------- | ----- | --------------------- |
| [1803.09017](https://arxiv.org/abs/1803.09017) | GlobalStyleToken (synthesizer)| Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis | 本代码库 |
| [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | 本代码库 |
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | This repo |
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | 本代码库 |
|[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
|[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 |
@@ -133,6 +136,7 @@
| aidatatang_200zh | [OpenSLR](http://www.openslr.org/62/) | [Google Drive](https://drive.google.com/file/d/110A11KZoVe7vy6kXlLb6zVPLb_J91I_t/view?usp=sharing) |
| magicdata | [OpenSLR](http://www.openslr.org/68/) | [Google Drive (Dev set)](https://drive.google.com/file/d/1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo/view?usp=sharing) |
| aishell3 | [OpenSLR](https://www.openslr.org/93/) | [Google Drive](https://drive.google.com/file/d/1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ/view?usp=sharing) |
| data_aishell | [OpenSLR](https://www.openslr.org/33/) | |
> 解壓 aidatatang_200zh 後,還需將 `aidatatang_200zh\corpus\train`下的檔案全選解壓縮
#### 2.`<datasets_root>`是什麼意思?

View File

@@ -6,7 +6,7 @@
> English | [中文](README-CN.md)
## Features
🌍 **Chinese** supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, and etc.
🌍 **Chinese** supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc.
🤩 **PyTorch** worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060
@@ -36,7 +36,7 @@ You can either train your models or use existing ones:
* Download dataset and unzip: make sure you can access all .wav in folder
* Preprocess with the audios and the mel spectrograms:
`python pre.py <datasets_root>`
Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3, etc.
Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3, data_aishell, etc.If this parameter is not passed, the default dataset will be aidatatang_200zh.
* Train the synthesizer:
`python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer`
@@ -49,7 +49,7 @@ Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata,
| author | Download link | Preview Video | Info |
| --- | ----------- | ----- |----- |
| @myself | https://pan.baidu.com/s/1VHSKIbxXQejtxi2at9IrpA [Baidu](https://pan.baidu.com/s/1VHSKIbxXQejtxi2at9IrpA ) codei183 | | 200k steps only trained by aidatatang_200zh
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [Baidu Pan](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) Code1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing https://u.teknik.io/AYxWf.pt | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan
|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code2021 | https://www.bilibili.com/video/BV1uh411B7AD/
#### 2.3 Train vocoder (Optional)
@@ -91,6 +91,7 @@ You can then try the toolbox:
| aidatatang_200zh | [OpenSLR](http://www.openslr.org/62/) | [Google Drive](https://drive.google.com/file/d/110A11KZoVe7vy6kXlLb6zVPLb_J91I_t/view?usp=sharing) |
| magicdata | [OpenSLR](http://www.openslr.org/68/) | [Google Drive (Dev set)](https://drive.google.com/file/d/1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo/view?usp=sharing) |
| aishell3 | [OpenSLR](https://www.openslr.org/93/) | [Google Drive](https://drive.google.com/file/d/1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ/view?usp=sharing) |
| data_aishell | [OpenSLR](https://www.openslr.org/33/) | |
> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\corpus\train`
#### 2.What is`<datasets_root>`?

5
pre.py
View File

@@ -12,7 +12,8 @@ import argparse
recognized_datasets = [
"aidatatang_200zh",
"magicdata",
"aishell3"
"aishell3",
"data_aishell"
]
if __name__ == "__main__":
@@ -40,7 +41,7 @@ if __name__ == "__main__":
"Use this option when dataset does not include alignments\
(these are used to split long audio files into sub-utterances.)")
parser.add_argument("-d", "--dataset", type=str, default="aidatatang_200zh", help=\
"Name of the dataset to process, allowing values: magicdata, aidatatang_200zh, aishell3.")
"Name of the dataset to process, allowing values: magicdata, aidatatang_200zh, aishell3, data_aishell.")
parser.add_argument("-e", "--encoder_model_fpath", type=Path, default="encoder/saved_models/pretrained.pt", help=\
"Path your trained encoder model.")
parser.add_argument("-ne", "--n_processes_embed", type=int, default=1, help=\

View File

@@ -62,9 +62,11 @@ hparams = HParams(
tts_clip_grad_norm = 1.0, # clips the gradient norm to prevent explosion - set to None if not needed
tts_eval_interval = 500, # Number of steps between model evaluation (sample generation)
# Set to -1 to generate after completing epoch, or 0 to disable
tts_eval_num_samples = 1, # Makes this number of samples
## For finetune usage, if set, only selected layers will be trained, available: encoder,encoder_proj,gst,decoder,postnet,post_proj
tts_finetune_layers = [],
### Data Preprocessing
max_mel_frames = 900,
rescale = True,

View File

@@ -70,7 +70,7 @@ class Synthesizer:
def synthesize_spectrograms(self, texts: List[str],
embeddings: Union[np.ndarray, List[np.ndarray]],
return_alignments=False, style_idx=0):
return_alignments=False, style_idx=0, min_stop_token=5):
"""
Synthesizes mel spectrograms from texts and speaker embeddings.
@@ -125,7 +125,7 @@ class Synthesizer:
speaker_embeddings = torch.tensor(speaker_embeds).float().to(self.device)
# Inference
_, mels, alignments = self._model.generate(chars, speaker_embeddings, style_idx=style_idx)
_, mels, alignments = self._model.generate(chars, speaker_embeddings, style_idx=style_idx, min_stop_token=min_stop_token)
mels = mels.detach().cpu().numpy()
for m in mels:
# Trim silence from end of each spectrogram

View File

@@ -419,7 +419,7 @@ class Tacotron(nn.Module):
return mel_outputs, linear, attn_scores, stop_outputs
def generate(self, x, speaker_embedding=None, steps=200, style_idx=0):
def generate(self, x, speaker_embedding=None, steps=200, style_idx=0, min_stop_token=5):
self.eval()
device = next(self.parameters()).device # use same device as parameters
@@ -454,9 +454,9 @@ class Tacotron(nn.Module):
scale[:] = 0.3
speaker_embedding = (gst_embed[style_idx] * scale).astype(np.float32)
speaker_embedding = torch.from_numpy(np.tile(speaker_embedding, (x.shape[0], 1))).to(device)
style_embed = self.gst(speaker_embedding)
style_embed = style_embed.expand_as(encoder_seq)
encoder_seq = encoder_seq + style_embed
style_embed = self.gst(speaker_embedding)
style_embed = style_embed.expand_as(encoder_seq)
encoder_seq = encoder_seq + style_embed
encoder_seq_proj = self.encoder_proj(encoder_seq)
# Need a couple of lists for outputs
@@ -472,7 +472,7 @@ class Tacotron(nn.Module):
attn_scores.append(scores)
stop_outputs.extend([stop_tokens] * self.r)
# Stop the loop when all stop tokens in batch exceed threshold
if (stop_tokens > 0.5).all() and t > 10: break
if (stop_tokens * 10 > min_stop_token).all() and t > 10: break
# Concat the mel outputs into sequence
mel_outputs = torch.cat(mel_outputs, dim=2)
@@ -496,6 +496,15 @@ class Tacotron(nn.Module):
for p in self.parameters():
if p.dim() > 1: nn.init.xavier_uniform_(p)
def finetune_partial(self, whitelist_layers):
self.zero_grad()
for name, child in self.named_children():
if name in whitelist_layers:
print("Trainable Layer: %s" % name)
print("Trainable Parameters: %.3f" % sum([np.prod(p.size()) for p in child.parameters()]))
for param in child.parameters():
param.requires_grad = False
def get_step(self):
return self.step.data.item()

View File

@@ -7,7 +7,7 @@ from tqdm import tqdm
import numpy as np
from encoder import inference as encoder
from synthesizer.preprocess_speaker import preprocess_speaker_general
from synthesizer.preprocess_transcript import preprocess_transcript_aishell3
from synthesizer.preprocess_transcript import preprocess_transcript_aishell3, preprocess_transcript_magicdata
data_info = {
"aidatatang_200zh": {
@@ -18,13 +18,19 @@ data_info = {
"magicdata": {
"subfolders": ["train"],
"trans_filepath": "train/TRANS.txt",
"speak_func": preprocess_speaker_general
"speak_func": preprocess_speaker_general,
"transcript_func": preprocess_transcript_magicdata,
},
"aishell3":{
"subfolders": ["train/wav"],
"trans_filepath": "train/content.txt",
"speak_func": preprocess_speaker_general,
"transcript_func": preprocess_transcript_aishell3,
},
"data_aishell":{
"subfolders": ["wav/train"],
"trans_filepath": "transcript/aishell_transcript_v0.8.txt",
"speak_func": preprocess_speaker_general
}
}

View File

@@ -6,4 +6,13 @@ def preprocess_transcript_aishell3(dict_info, dict_transcript):
transList = []
for i in range(2, len(v), 2):
transList.append(v[i])
dict_info[v[0]] = " ".join(transList)
dict_info[v[0]] = " ".join(transList)
def preprocess_transcript_magicdata(dict_info, dict_transcript):
for v in dict_transcript:
if not v:
continue
v = v.strip().replace("\n","").replace("\t"," ").split(" ")
dict_info[v[0]] = " ".join(v[2:])

View File

@@ -93,7 +93,7 @@ def train(run_id: str, syn_dir: str, models_dir: str, save_every: int,
speaker_embedding_size=hparams.speaker_embedding_size).to(device)
# Initialize the optimizer
optimizer = optim.Adam(model.parameters())
optimizer = optim.Adam(model.parameters(), amsgrad=True)
# Load the weights
if force_restart or not weights_fpath.exists():
@@ -146,7 +146,6 @@ def train(run_id: str, syn_dir: str, models_dir: str, save_every: int,
continue
model.r = r
# Begin the training
simple_table([(f"Steps with r={r}", str(training_steps // 1000) + "k Steps"),
("Batch Size", batch_size),
@@ -155,6 +154,8 @@ def train(run_id: str, syn_dir: str, models_dir: str, save_every: int,
for p in optimizer.param_groups:
p["lr"] = lr
if hparams.tts_finetune_layers is not None and len(hparams.tts_finetune_layers) > 0:
model.finetune_partial(hparams.tts_finetune_layers)
data_loader = DataLoader(dataset,
collate_fn=collate_synthesizer,

View File

@@ -234,7 +234,8 @@ class Toolbox:
texts = processed_texts
embed = self.ui.selected_utterance.embed
embeds = [embed] * len(texts)
specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_idx_textbox.text()))
min_token = int(self.ui.token_slider.value())
specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
breaks = [spec.shape[1] for spec in specs]
spec = np.concatenate(specs, axis=1)

BIN
toolbox/assets/mb.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.6 KiB

View File

@@ -2,6 +2,7 @@ import matplotlib.pyplot as plt
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure
from PyQt5.QtCore import Qt, QStringListModel
from PyQt5 import QtGui
from PyQt5.QtWidgets import *
from encoder.inference import plot_embedding_as_heatmap
from toolbox.utterance import Utterance
@@ -420,7 +421,10 @@ class UI(QDialog):
## Initialize the application
self.app = QApplication(sys.argv)
super().__init__(None)
self.setWindowTitle("SV2TTS toolbox")
self.setWindowTitle("MockingBird GUI")
self.setWindowIcon(QtGui.QIcon('toolbox\\assets\\mb.png'))
self.setWindowFlag(Qt.WindowMinimizeButtonHint, True)
self.setWindowFlag(Qt.WindowMaximizeButtonHint, True)
## Main layouts
@@ -430,21 +434,24 @@ class UI(QDialog):
# Browser
browser_layout = QGridLayout()
root_layout.addLayout(browser_layout, 0, 0, 1, 2)
root_layout.addLayout(browser_layout, 0, 0, 1, 8)
# Generation
gen_layout = QVBoxLayout()
root_layout.addLayout(gen_layout, 0, 2, 1, 2)
# Projections
self.projections_layout = QVBoxLayout()
root_layout.addLayout(self.projections_layout, 1, 0, 1, 1)
root_layout.addLayout(gen_layout, 0, 8)
# Visualizations
vis_layout = QVBoxLayout()
root_layout.addLayout(vis_layout, 1, 1, 1, 3)
root_layout.addLayout(vis_layout, 1, 0, 2, 8)
# Output
output_layout = QGridLayout()
vis_layout.addLayout(output_layout, 0)
# Projections
self.projections_layout = QVBoxLayout()
root_layout.addLayout(self.projections_layout, 1, 8, 2, 2)
## Projections
# UMap
fig, self.umap_ax = plt.subplots(figsize=(3, 3), facecolor="#F0F0F0")
@@ -458,80 +465,88 @@ class UI(QDialog):
## Browser
# Dataset, speaker and utterance selection
i = 0
self.dataset_box = QComboBox()
browser_layout.addWidget(QLabel("<b>Dataset</b>"), i, 0)
browser_layout.addWidget(self.dataset_box, i + 1, 0)
self.speaker_box = QComboBox()
browser_layout.addWidget(QLabel("<b>Speaker</b>"), i, 1)
browser_layout.addWidget(self.speaker_box, i + 1, 1)
self.utterance_box = QComboBox()
browser_layout.addWidget(QLabel("<b>Utterance</b>"), i, 2)
browser_layout.addWidget(self.utterance_box, i + 1, 2)
self.browser_load_button = QPushButton("Load")
browser_layout.addWidget(self.browser_load_button, i + 1, 3)
i += 2
# Random buttons
source_groupbox = QGroupBox('Source(源音频)')
source_layout = QGridLayout()
source_groupbox.setLayout(source_layout)
browser_layout.addWidget(source_groupbox, i, 0, 1, 4)
self.dataset_box = QComboBox()
source_layout.addWidget(QLabel("Dataset(数据集):"), i, 0)
source_layout.addWidget(self.dataset_box, i, 1)
self.random_dataset_button = QPushButton("Random")
browser_layout.addWidget(self.random_dataset_button, i, 0)
source_layout.addWidget(self.random_dataset_button, i, 2)
i += 1
self.speaker_box = QComboBox()
source_layout.addWidget(QLabel("Speaker(说话者)"), i, 0)
source_layout.addWidget(self.speaker_box, i, 1)
self.random_speaker_button = QPushButton("Random")
browser_layout.addWidget(self.random_speaker_button, i, 1)
source_layout.addWidget(self.random_speaker_button, i, 2)
i += 1
self.utterance_box = QComboBox()
source_layout.addWidget(QLabel("Utterance(音频):"), i, 0)
source_layout.addWidget(self.utterance_box, i, 1)
self.random_utterance_button = QPushButton("Random")
browser_layout.addWidget(self.random_utterance_button, i, 2)
source_layout.addWidget(self.random_utterance_button, i, 2)
i += 1
source_layout.addWidget(QLabel("<b>Use(使用):</b>"), i, 0)
self.browser_load_button = QPushButton("Load Above(加载上面)")
source_layout.addWidget(self.browser_load_button, i, 1, 1, 2)
self.auto_next_checkbox = QCheckBox("Auto select next")
self.auto_next_checkbox.setChecked(True)
browser_layout.addWidget(self.auto_next_checkbox, i, 3)
i += 1
source_layout.addWidget(self.auto_next_checkbox, i+1, 1)
self.browser_browse_button = QPushButton("Browse(打开本地)")
source_layout.addWidget(self.browser_browse_button, i, 3)
self.record_button = QPushButton("Record(录音)")
source_layout.addWidget(self.record_button, i+1, 3)
i += 2
# Utterance box
browser_layout.addWidget(QLabel("<b>Use embedding from:</b>"), i, 0)
browser_layout.addWidget(QLabel("<b>Current(当前):</b>"), i, 0)
self.utterance_history = QComboBox()
browser_layout.addWidget(self.utterance_history, i, 1, 1, 3)
i += 1
# Random & next utterance buttons
self.browser_browse_button = QPushButton("Browse")
browser_layout.addWidget(self.browser_browse_button, i, 0)
self.record_button = QPushButton("Record")
browser_layout.addWidget(self.record_button, i, 1)
self.play_button = QPushButton("Play")
browser_layout.addWidget(self.utterance_history, i, 1)
self.play_button = QPushButton("Play(播放)")
browser_layout.addWidget(self.play_button, i, 2)
self.stop_button = QPushButton("Stop")
self.stop_button = QPushButton("Stop(暂停)")
browser_layout.addWidget(self.stop_button, i, 3)
i += 1
i += 1
model_groupbox = QGroupBox('Models(模型选择)')
model_layout = QHBoxLayout()
model_groupbox.setLayout(model_layout)
browser_layout.addWidget(model_groupbox, i, 0, 1, 4)
# Model and audio output selection
self.encoder_box = QComboBox()
browser_layout.addWidget(QLabel("<b>Encoder</b>"), i, 0)
browser_layout.addWidget(self.encoder_box, i + 1, 0)
model_layout.addWidget(QLabel("Encoder:"))
model_layout.addWidget(self.encoder_box)
self.synthesizer_box = QComboBox()
browser_layout.addWidget(QLabel("<b>Synthesizer</b>"), i, 1)
browser_layout.addWidget(self.synthesizer_box, i + 1, 1)
model_layout.addWidget(QLabel("Synthesizer:"))
model_layout.addWidget(self.synthesizer_box)
self.vocoder_box = QComboBox()
browser_layout.addWidget(QLabel("<b>Vocoder</b>"), i, 2)
browser_layout.addWidget(self.vocoder_box, i + 1, 2)
model_layout.addWidget(QLabel("Vocoder:"))
model_layout.addWidget(self.vocoder_box)
self.audio_out_devices_cb=QComboBox()
browser_layout.addWidget(QLabel("<b>Audio Output</b>"), i, 3)
browser_layout.addWidget(self.audio_out_devices_cb, i + 1, 3)
i += 2
#Replay & Save Audio
browser_layout.addWidget(QLabel("<b>Toolbox Output:</b>"), i, 0)
i = 0
output_layout.addWidget(QLabel("<b>Toolbox Output:</b>"), i, 0)
self.waves_cb = QComboBox()
self.waves_cb_model = QStringListModel()
self.waves_cb.setModel(self.waves_cb_model)
self.waves_cb.setToolTip("Select one of the last generated waves in this section for replaying or exporting")
browser_layout.addWidget(self.waves_cb, i, 1)
output_layout.addWidget(self.waves_cb, i, 1)
self.replay_wav_button = QPushButton("Replay")
self.replay_wav_button.setToolTip("Replay last generated vocoder")
browser_layout.addWidget(self.replay_wav_button, i, 2)
output_layout.addWidget(self.replay_wav_button, i, 2)
self.export_wav_button = QPushButton("Export")
self.export_wav_button.setToolTip("Save last generated vocoder audio in filesystem as a wav file")
browser_layout.addWidget(self.export_wav_button, i, 3)
output_layout.addWidget(self.export_wav_button, i, 3)
self.audio_out_devices_cb=QComboBox()
i += 1
output_layout.addWidget(QLabel("<b>Audio Output</b>"), i, 0)
output_layout.addWidget(self.audio_out_devices_cb, i, 1)
## Embed & spectrograms
vis_layout.addStretch()
@@ -552,7 +567,6 @@ class UI(QDialog):
for side in ["top", "right", "bottom", "left"]:
ax.spines[side].set_visible(False)
## Generation
self.text_prompt = QPlainTextEdit(default_text)
gen_layout.addWidget(self.text_prompt, stretch=1)
@@ -574,14 +588,36 @@ class UI(QDialog):
self.seed_textbox = QLineEdit()
self.seed_textbox.setMaximumWidth(80)
layout_seed.addWidget(self.seed_textbox, 0, 1)
layout_seed.addWidget(QLabel("Style#:(0~9)"), 0, 2)
self.style_idx_textbox = QLineEdit("-1")
self.style_idx_textbox.setMaximumWidth(80)
layout_seed.addWidget(self.style_idx_textbox, 0, 3)
self.trim_silences_checkbox = QCheckBox("Enhance vocoder output")
self.trim_silences_checkbox.setToolTip("When checked, trims excess silence in vocoder output."
" This feature requires `webrtcvad` to be installed.")
layout_seed.addWidget(self.trim_silences_checkbox, 0, 4, 1, 2)
layout_seed.addWidget(self.trim_silences_checkbox, 0, 2, 1, 2)
self.style_slider = QSlider(Qt.Horizontal)
self.style_slider.setTickInterval(1)
self.style_slider.setFocusPolicy(Qt.NoFocus)
self.style_slider.setSingleStep(1)
self.style_slider.setRange(-1, 9)
self.style_value_label = QLabel("-1")
self.style_slider.setValue(-1)
layout_seed.addWidget(QLabel("Style:"), 1, 0)
self.style_slider.valueChanged.connect(lambda s: self.style_value_label.setNum(s))
layout_seed.addWidget(self.style_value_label, 1, 1)
layout_seed.addWidget(self.style_slider, 1, 3)
self.token_slider = QSlider(Qt.Horizontal)
self.token_slider.setTickInterval(1)
self.token_slider.setFocusPolicy(Qt.NoFocus)
self.token_slider.setSingleStep(1)
self.token_slider.setRange(3, 9)
self.token_value_label = QLabel("5")
self.token_slider.setValue(4)
layout_seed.addWidget(QLabel("Accuracy(精度):"), 2, 0)
self.token_slider.valueChanged.connect(lambda s: self.token_value_label.setNum(s))
layout_seed.addWidget(self.token_value_label, 2, 1)
layout_seed.addWidget(self.token_slider, 2, 3)
gen_layout.addLayout(layout_seed)
self.loading_bar = QProgressBar()
@@ -595,7 +631,7 @@ class UI(QDialog):
## Set the size of the window and of the elements
max_size = QDesktopWidget().availableGeometry(self).size() * 0.8
max_size = QDesktopWidget().availableGeometry(self).size() * 0.5
self.resize(max_size)
## Finalize the display