11 Commits

Author SHA1 Message Date
babysor00
178787887b Add readme 2021-10-12 19:38:53 +08:00
babysor00
43c86eb411 Make it backward compatible 2021-10-12 09:12:58 +08:00
babysor00
37f11ab9ce Commit with working GST 2021-10-11 21:52:15 +08:00
babysor00
e2017d0314 Merge branch 'main' of https://github.com/babysor/Realtime-Voice-Clone-Chinese into main 2021-10-05 10:48:58 +08:00
babysor00
547ac816df Update demo and training param
A
2021-10-05 10:48:54 +08:00
Ji Zhang
6b4ab39601 add alternative download source for dataset (google drive) (#112) 2021-10-03 10:10:40 +08:00
babysor00
b46e7a7866 New web with selecting wav files 2021-10-01 22:13:39 +08:00
babysor00
8a384a1191 Merge branch 'main' of https://github.com/babysor/Realtime-Voice-Clone-Chinese into main 2021-10-01 09:33:31 +08:00
Nemo
11154783d8 web tool box update UI (#111)
* web tool box update UI

* update img
2021-10-01 00:32:29 +08:00
AkifSaeed20
d52db0444e Update launch.json (#109) 2021-10-01 00:22:43 +08:00
babysor00
790d11a58b Allow to train encoder 2021-10-01 00:01:33 +08:00
23 changed files with 439 additions and 100 deletions

2
.gitignore vendored
View File

@@ -17,5 +17,7 @@
*.sh *.sh
synthesizer/saved_models/* synthesizer/saved_models/*
vocoder/saved_models/* vocoder/saved_models/*
encoder/saved_models/*
cp_hifigan/* cp_hifigan/*
!vocoder/saved_models/pretrained/* !vocoder/saved_models/pretrained/*
!encoder/saved_models/pretrained.pt

18
.vscode/launch.json vendored
View File

@@ -17,7 +17,7 @@
"request": "launch", "request": "launch",
"program": "vocoder_preprocess.py", "program": "vocoder_preprocess.py",
"console": "integratedTerminal", "console": "integratedTerminal",
"args": ["..\\..\\chs1"] "args": ["..\\audiodata"]
}, },
{ {
"name": "Python: Vocoder Train", "name": "Python: Vocoder Train",
@@ -25,15 +25,23 @@
"request": "launch", "request": "launch",
"program": "vocoder_train.py", "program": "vocoder_train.py",
"console": "integratedTerminal", "console": "integratedTerminal",
"args": ["dev", "..\\..\\chs1"] "args": ["dev", "..\\audiodata"]
}, },
{ {
"name": "Python: demo box", "name": "Python: Demo Box",
"type": "python", "type": "python",
"request": "launch", "request": "launch",
"program": "demo_toolbox.py", "program": "demo_toolbox.py",
"console": "integratedTerminal", "console": "integratedTerminal",
"args": ["-d", "..\\..\\chs"] "args": ["-d","..\\audiodata"]
} },
{
"name": "Python: Synth Train",
"type": "python",
"request": "launch",
"program": "synthesizer_train.py",
"console": "integratedTerminal",
"args": ["my_run", "..\\"]
},
] ]
} }

View File

@@ -5,7 +5,7 @@
### [English](README.md) | 中文 ### [English](README.md) | 中文
### [DEMO VIDEO](https://www.bilibili.com/video/BV1sA411P7wM/) ### [DEMO VIDEO](https://www.bilibili.com/video/BV17Q4y1B7mY/)
## 特性 ## 特性
🌍 **中文** 支持普通话并使用多种中文数据集进行测试aidatatang_200zh, magicdata, aishell3 biaobeiMozillaCommonVoice 等 🌍 **中文** 支持普通话并使用多种中文数据集进行测试aidatatang_200zh, magicdata, aishell3 biaobeiMozillaCommonVoice 等
@@ -73,7 +73,7 @@
### 3.1 启动Web程序 ### 3.1 启动Web程序
`python web.py` `python web.py`
运行成功后在浏览器打开地址, 默认为 `http://localhost:8080` 运行成功后在浏览器打开地址, 默认为 `http://localhost:8080`
<img width="578" alt="bd64cd80385754afa599e3840504f45" src="https://user-images.githubusercontent.com/7423248/134275205-c95e6bd8-4f41-4eb5-9143-0390627baee1.png"> ![123](https://user-images.githubusercontent.com/12797292/135494044-ae59181c-fe3a-406f-9c7d-d21d12fdb4cb.png)
> 目前界面比较buggy, > 目前界面比较buggy,
> * 第一次点击`录制`要等待几秒浏览器正常启动录音,否则会有重音 > * 第一次点击`录制`要等待几秒浏览器正常启动录音,否则会有重音
> * 录制结束不要再点`录制`而是`停止` > * 录制结束不要再点`录制`而是`停止`
@@ -119,15 +119,20 @@
| URL | Designation | 标题 | 实现源码 | | URL | Designation | 标题 | 实现源码 |
| --- | ----------- | ----- | --------------------- | | --- | ----------- | ----- | --------------------- |
| [1803.09017](https://arxiv.org/abs/1803.09017) | GlobalStyleToken (synthesizer)| Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis | 本代码库 |
| [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | 本代码库 | | [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | 本代码库 |
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo | |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | This repo |
|[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
|[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 | |[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 |
## 常見問題(FQ&A) ## 常見問題(FQ&A)
#### 1.數據集哪裡下載? #### 1.數據集哪裡下載?
[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/) | 数据集 | OpenSLR地址 | 其他源 (Google Drive, Baidu网盘等) |
| --- | ----------- | ---------------|
| aidatatang_200zh | [OpenSLR](http://www.openslr.org/62/) | [Google Drive](https://drive.google.com/file/d/110A11KZoVe7vy6kXlLb6zVPLb_J91I_t/view?usp=sharing) |
| magicdata | [OpenSLR](http://www.openslr.org/68/) | [Google Drive (Dev set)](https://drive.google.com/file/d/1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo/view?usp=sharing) |
| aishell3 | [OpenSLR](https://www.openslr.org/93/) | [Google Drive](https://drive.google.com/file/d/1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ/view?usp=sharing) |
> 解壓 aidatatang_200zh 後,還需將 `aidatatang_200zh\corpus\train`下的檔案全選解壓縮 > 解壓 aidatatang_200zh 後,還需將 `aidatatang_200zh\corpus\train`下的檔案全選解壓縮
#### 2.`<datasets_root>`是什麼意思? #### 2.`<datasets_root>`是什麼意思?

View File

@@ -16,7 +16,7 @@
🌍 **Webserver Ready** to serve your result with remote calling 🌍 **Webserver Ready** to serve your result with remote calling
### [DEMO VIDEO](https://www.bilibili.com/video/BV1sA411P7wM/) ### [DEMO VIDEO](https://www.bilibili.com/video/BV17Q4y1B7mY/)
## Quick Start ## Quick Start
@@ -77,6 +77,7 @@ You can then try the toolbox:
| URL | Designation | Title | Implementation source | | URL | Designation | Title | Implementation source |
| --- | ----------- | ----- | --------------------- | | --- | ----------- | ----- | --------------------- |
| [1803.09017](https://arxiv.org/abs/1803.09017) | GlobalStyleToken (synthesizer)| Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis | This repo |
| [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | This repo | | [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | This repo |
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo | |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |
|[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
@@ -85,7 +86,11 @@ You can then try the toolbox:
## F Q&A ## F Q&A
#### 1.Where can I download the dataset? #### 1.Where can I download the dataset?
[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/) | Dataset | Original Source | Alternative Sources |
| --- | ----------- | ---------------|
| aidatatang_200zh | [OpenSLR](http://www.openslr.org/62/) | [Google Drive](https://drive.google.com/file/d/110A11KZoVe7vy6kXlLb6zVPLb_J91I_t/view?usp=sharing) |
| magicdata | [OpenSLR](http://www.openslr.org/68/) | [Google Drive (Dev set)](https://drive.google.com/file/d/1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo/view?usp=sharing) |
| aishell3 | [OpenSLR](https://www.openslr.org/93/) | [Google Drive](https://drive.google.com/file/d/1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ/view?usp=sharing) |
> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\corpus\train` > After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\corpus\train`
#### 2.What is`<datasets_root>`? #### 2.What is`<datasets_root>`?

View File

@@ -117,6 +117,15 @@ def _preprocess_speaker_dirs(speaker_dirs, dataset_name, datasets_root, out_dir,
logger.finalize() logger.finalize()
print("Done preprocessing %s.\n" % dataset_name) print("Done preprocessing %s.\n" % dataset_name)
def preprocess_aidatatang_200zh(datasets_root: Path, out_dir: Path, skip_existing=False):
dataset_name = "aidatatang_200zh"
dataset_root, logger = _init_preprocess_dataset(dataset_name, datasets_root, out_dir)
if not dataset_root:
return
# Preprocess all speakers
speaker_dirs = list(dataset_root.joinpath("corpus", "train").glob("*"))
_preprocess_speaker_dirs(speaker_dirs, dataset_name, datasets_root, out_dir, "wav",
skip_existing, logger)
def preprocess_librispeech(datasets_root: Path, out_dir: Path, skip_existing=False): def preprocess_librispeech(datasets_root: Path, out_dir: Path, skip_existing=False):
for dataset_name in librispeech_datasets["train"]["other"]: for dataset_name in librispeech_datasets["train"]["other"]:

Binary file not shown.

View File

@@ -1,4 +1,4 @@
from encoder.preprocess import preprocess_librispeech, preprocess_voxceleb1, preprocess_voxceleb2 from encoder.preprocess import preprocess_librispeech, preprocess_voxceleb1, preprocess_voxceleb2, preprocess_aidatatang_200zh
from utils.argutils import print_args from utils.argutils import print_args
from pathlib import Path from pathlib import Path
import argparse import argparse
@@ -10,17 +10,7 @@ if __name__ == "__main__":
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Preprocesses audio files from datasets, encodes them as mel spectrograms and " description="Preprocesses audio files from datasets, encodes them as mel spectrograms and "
"writes them to the disk. This will allow you to train the encoder. The " "writes them to the disk. This will allow you to train the encoder. The "
"datasets required are at least one of VoxCeleb1, VoxCeleb2 and LibriSpeech. " "datasets required are at least one of LibriSpeech, VoxCeleb1, VoxCeleb2, aidatatang_200zh. ",
"Ideally, you should have all three. You should extract them as they are "
"after having downloaded them and put them in a same directory, e.g.:\n"
"-[datasets_root]\n"
" -LibriSpeech\n"
" -train-other-500\n"
" -VoxCeleb1\n"
" -wav\n"
" -vox1_meta.csv\n"
" -VoxCeleb2\n"
" -dev",
formatter_class=MyFormatter formatter_class=MyFormatter
) )
parser.add_argument("datasets_root", type=Path, help=\ parser.add_argument("datasets_root", type=Path, help=\
@@ -29,7 +19,7 @@ if __name__ == "__main__":
"Path to the output directory that will contain the mel spectrograms. If left out, " "Path to the output directory that will contain the mel spectrograms. If left out, "
"defaults to <datasets_root>/SV2TTS/encoder/") "defaults to <datasets_root>/SV2TTS/encoder/")
parser.add_argument("-d", "--datasets", type=str, parser.add_argument("-d", "--datasets", type=str,
default="librispeech_other,voxceleb1,voxceleb2", help=\ default="librispeech_other,voxceleb1,aidatatang_200zh", help=\
"Comma-separated list of the name of the datasets you want to preprocess. Only the train " "Comma-separated list of the name of the datasets you want to preprocess. Only the train "
"set of these datasets will be used. Possible names: librispeech_other, voxceleb1, " "set of these datasets will be used. Possible names: librispeech_other, voxceleb1, "
"voxceleb2.") "voxceleb2.")
@@ -63,6 +53,7 @@ if __name__ == "__main__":
"librispeech_other": preprocess_librispeech, "librispeech_other": preprocess_librispeech,
"voxceleb1": preprocess_voxceleb1, "voxceleb1": preprocess_voxceleb1,
"voxceleb2": preprocess_voxceleb2, "voxceleb2": preprocess_voxceleb2,
"aidatatang_200zh": preprocess_aidatatang_200zh,
} }
args = vars(args) args = vars(args)
for dataset in args.pop("datasets"): for dataset in args.pop("datasets"):

View File

@@ -20,3 +20,4 @@ flask_wtf
flask_cors flask_cors
gevent==21.8.0 gevent==21.8.0
flask_restx flask_restx
tensorboard

View File

@@ -0,0 +1,13 @@
class GSTHyperparameters():
E = 512
# reference encoder
ref_enc_filters = [32, 32, 64, 64, 128, 128]
# style token layer
token_num = 10
# token_emb_size = 256
num_heads = 8
n_mels = 256 # Number of Mel banks to generate

View File

@@ -49,12 +49,15 @@ hparams = HParams(
# frame that has all values < -3.4 # frame that has all values < -3.4
### Tacotron Training ### Tacotron Training
tts_schedule = [(2, 1e-3, 20_000, 24), # Progressive training schedule tts_schedule = [(2, 1e-3, 10_000, 12), # Progressive training schedule
(2, 5e-4, 40_000, 24), # (r, lr, step, batch_size) (2, 5e-4, 15_000, 12), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 24), # (2, 2e-4, 20_000, 12), # (r, lr, step, batch_size)
(2, 1e-4, 160_000, 24), # r = reduction factor (# of mel frames (2, 1e-4, 30_000, 12), #
(2, 3e-5, 320_000, 24), # synthesized for each decoder iteration) (2, 5e-5, 40_000, 12), #
(2, 1e-5, 640_000, 24)], # lr = learning rate (2, 1e-5, 60_000, 12), #
(2, 5e-6, 160_000, 12), # r = reduction factor (# of mel frames
(2, 3e-6, 320_000, 12), # synthesized for each decoder iteration)
(2, 1e-6, 640_000, 12)], # lr = learning rate
tts_clip_grad_norm = 1.0, # clips the gradient norm to prevent explosion - set to None if not needed tts_clip_grad_norm = 1.0, # clips the gradient norm to prevent explosion - set to None if not needed
tts_eval_interval = 500, # Number of steps between model evaluation (sample generation) tts_eval_interval = 500, # Number of steps between model evaluation (sample generation)

View File

@@ -70,7 +70,7 @@ class Synthesizer:
def synthesize_spectrograms(self, texts: List[str], def synthesize_spectrograms(self, texts: List[str],
embeddings: Union[np.ndarray, List[np.ndarray]], embeddings: Union[np.ndarray, List[np.ndarray]],
return_alignments=False): return_alignments=False, style_idx=0):
""" """
Synthesizes mel spectrograms from texts and speaker embeddings. Synthesizes mel spectrograms from texts and speaker embeddings.
@@ -125,7 +125,7 @@ class Synthesizer:
speaker_embeddings = torch.tensor(speaker_embeds).float().to(self.device) speaker_embeddings = torch.tensor(speaker_embeds).float().to(self.device)
# Inference # Inference
_, mels, alignments = self._model.generate(chars, speaker_embeddings) _, mels, alignments = self._model.generate(chars, speaker_embeddings, style_idx=style_idx)
mels = mels.detach().cpu().numpy() mels = mels.detach().cpu().numpy()
for m in mels: for m in mels:
# Trim silence from end of each spectrogram # Trim silence from end of each spectrogram

View File

@@ -0,0 +1,135 @@
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.nn.functional as tFunctional
from synthesizer.gst_hyperparameters import GSTHyperparameters as hp
class GlobalStyleToken(nn.Module):
def __init__(self):
super().__init__()
self.encoder = ReferenceEncoder()
self.stl = STL()
def forward(self, inputs):
enc_out = self.encoder(inputs)
style_embed = self.stl(enc_out)
return style_embed
class ReferenceEncoder(nn.Module):
'''
inputs --- [N, Ty/r, n_mels*r] mels
outputs --- [N, ref_enc_gru_size]
'''
def __init__(self):
super().__init__()
K = len(hp.ref_enc_filters)
filters = [1] + hp.ref_enc_filters
convs = [nn.Conv2d(in_channels=filters[i],
out_channels=filters[i + 1],
kernel_size=(3, 3),
stride=(2, 2),
padding=(1, 1)) for i in range(K)]
self.convs = nn.ModuleList(convs)
self.bns = nn.ModuleList([nn.BatchNorm2d(num_features=hp.ref_enc_filters[i]) for i in range(K)])
out_channels = self.calculate_channels(hp.n_mels, 3, 2, 1, K)
self.gru = nn.GRU(input_size=hp.ref_enc_filters[-1] * out_channels,
hidden_size=hp.E // 2,
batch_first=True)
def forward(self, inputs):
N = inputs.size(0)
out = inputs.view(N, 1, -1, hp.n_mels) # [N, 1, Ty, n_mels]
for conv, bn in zip(self.convs, self.bns):
out = conv(out)
out = bn(out)
out = tFunctional.relu(out) # [N, 128, Ty//2^K, n_mels//2^K]
out = out.transpose(1, 2) # [N, Ty//2^K, 128, n_mels//2^K]
T = out.size(1)
N = out.size(0)
out = out.contiguous().view(N, T, -1) # [N, Ty//2^K, 128*n_mels//2^K]
self.gru.flatten_parameters()
memory, out = self.gru(out) # out --- [1, N, E//2]
return out.squeeze(0)
def calculate_channels(self, L, kernel_size, stride, pad, n_convs):
for i in range(n_convs):
L = (L - kernel_size + 2 * pad) // stride + 1
return L
class STL(nn.Module):
'''
inputs --- [N, E//2]
'''
def __init__(self):
super().__init__()
self.embed = nn.Parameter(torch.FloatTensor(hp.token_num, hp.E // hp.num_heads))
d_q = hp.E // 2
d_k = hp.E // hp.num_heads
# self.attention = MultiHeadAttention(hp.num_heads, d_model, d_q, d_v)
self.attention = MultiHeadAttention(query_dim=d_q, key_dim=d_k, num_units=hp.E, num_heads=hp.num_heads)
init.normal_(self.embed, mean=0, std=0.5)
def forward(self, inputs):
N = inputs.size(0)
query = inputs.unsqueeze(1) # [N, 1, E//2]
keys = tFunctional.tanh(self.embed).unsqueeze(0).expand(N, -1, -1) # [N, token_num, E // num_heads]
style_embed = self.attention(query, keys)
return style_embed
class MultiHeadAttention(nn.Module):
'''
input:
query --- [N, T_q, query_dim]
key --- [N, T_k, key_dim]
output:
out --- [N, T_q, num_units]
'''
def __init__(self, query_dim, key_dim, num_units, num_heads):
super().__init__()
self.num_units = num_units
self.num_heads = num_heads
self.key_dim = key_dim
self.W_query = nn.Linear(in_features=query_dim, out_features=num_units, bias=False)
self.W_key = nn.Linear(in_features=key_dim, out_features=num_units, bias=False)
self.W_value = nn.Linear(in_features=key_dim, out_features=num_units, bias=False)
def forward(self, query, key):
querys = self.W_query(query) # [N, T_q, num_units]
keys = self.W_key(key) # [N, T_k, num_units]
values = self.W_value(key)
split_size = self.num_units // self.num_heads
querys = torch.stack(torch.split(querys, split_size, dim=2), dim=0) # [h, N, T_q, num_units/h]
keys = torch.stack(torch.split(keys, split_size, dim=2), dim=0) # [h, N, T_k, num_units/h]
values = torch.stack(torch.split(values, split_size, dim=2), dim=0) # [h, N, T_k, num_units/h]
# score = softmax(QK^T / (d_k ** 0.5))
scores = torch.matmul(querys, keys.transpose(2, 3)) # [h, N, T_q, T_k]
scores = scores / (self.key_dim ** 0.5)
scores = tFunctional.softmax(scores, dim=3)
# out = score * V
out = torch.matmul(scores, values) # [h, N, T_q, num_units/h]
out = torch.cat(torch.split(out, 1, dim=0), dim=3).squeeze(0) # [N, T_q, num_units]
return out

View File

@@ -3,8 +3,7 @@ import numpy as np
import torch import torch
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
from pathlib import Path from synthesizer.models.global_style_token import GlobalStyleToken
from typing import Union
class HighwayNetwork(nn.Module): class HighwayNetwork(nn.Module):
@@ -338,6 +337,7 @@ class Tacotron(nn.Module):
self.encoder = Encoder(embed_dims, num_chars, encoder_dims, self.encoder = Encoder(embed_dims, num_chars, encoder_dims,
encoder_K, num_highways, dropout) encoder_K, num_highways, dropout)
self.encoder_proj = nn.Linear(encoder_dims + speaker_embedding_size, decoder_dims, bias=False) self.encoder_proj = nn.Linear(encoder_dims + speaker_embedding_size, decoder_dims, bias=False)
self.gst = GlobalStyleToken()
self.decoder = Decoder(n_mels, encoder_dims, decoder_dims, lstm_dims, self.decoder = Decoder(n_mels, encoder_dims, decoder_dims, lstm_dims,
dropout, speaker_embedding_size) dropout, speaker_embedding_size)
self.postnet = CBHG(postnet_K, n_mels, postnet_dims, self.postnet = CBHG(postnet_K, n_mels, postnet_dims,
@@ -358,11 +358,11 @@ class Tacotron(nn.Module):
def r(self, value): def r(self, value):
self.decoder.r = self.decoder.r.new_tensor(value, requires_grad=False) self.decoder.r = self.decoder.r.new_tensor(value, requires_grad=False)
def forward(self, x, m, speaker_embedding): def forward(self, texts, mels, speaker_embedding):
device = next(self.parameters()).device # use same device as parameters device = next(self.parameters()).device # use same device as parameters
self.step += 1 self.step += 1
batch_size, _, steps = m.size() batch_size, _, steps = mels.size()
# Initialise all hidden states and pack into tuple # Initialise all hidden states and pack into tuple
attn_hidden = torch.zeros(batch_size, self.decoder_dims, device=device) attn_hidden = torch.zeros(batch_size, self.decoder_dims, device=device)
@@ -383,7 +383,12 @@ class Tacotron(nn.Module):
# SV2TTS: Run the encoder with the speaker embedding # SV2TTS: Run the encoder with the speaker embedding
# The projection avoids unnecessary matmuls in the decoder loop # The projection avoids unnecessary matmuls in the decoder loop
encoder_seq = self.encoder(x, speaker_embedding) encoder_seq = self.encoder(texts, speaker_embedding)
# put after encoder
if self.gst is not None:
style_embed = self.gst(speaker_embedding)
style_embed = style_embed.expand_as(encoder_seq)
encoder_seq = encoder_seq + style_embed
encoder_seq_proj = self.encoder_proj(encoder_seq) encoder_seq_proj = self.encoder_proj(encoder_seq)
# Need a couple of lists for outputs # Need a couple of lists for outputs
@@ -391,10 +396,10 @@ class Tacotron(nn.Module):
# Run the decoder loop # Run the decoder loop
for t in range(0, steps, self.r): for t in range(0, steps, self.r):
prenet_in = m[:, :, t - 1] if t > 0 else go_frame prenet_in = mels[:, :, t - 1] if t > 0 else go_frame
mel_frames, scores, hidden_states, cell_states, context_vec, stop_tokens = \ mel_frames, scores, hidden_states, cell_states, context_vec, stop_tokens = \
self.decoder(encoder_seq, encoder_seq_proj, prenet_in, self.decoder(encoder_seq, encoder_seq_proj, prenet_in,
hidden_states, cell_states, context_vec, t, x) hidden_states, cell_states, context_vec, t, texts)
mel_outputs.append(mel_frames) mel_outputs.append(mel_frames)
attn_scores.append(scores) attn_scores.append(scores)
stop_outputs.extend([stop_tokens] * self.r) stop_outputs.extend([stop_tokens] * self.r)
@@ -414,7 +419,7 @@ class Tacotron(nn.Module):
return mel_outputs, linear, attn_scores, stop_outputs return mel_outputs, linear, attn_scores, stop_outputs
def generate(self, x, speaker_embedding=None, steps=2000): def generate(self, x, speaker_embedding=None, steps=200, style_idx=0):
self.eval() self.eval()
device = next(self.parameters()).device # use same device as parameters device = next(self.parameters()).device # use same device as parameters
@@ -440,6 +445,18 @@ class Tacotron(nn.Module):
# SV2TTS: Run the encoder with the speaker embedding # SV2TTS: Run the encoder with the speaker embedding
# The projection avoids unnecessary matmuls in the decoder loop # The projection avoids unnecessary matmuls in the decoder loop
encoder_seq = self.encoder(x, speaker_embedding) encoder_seq = self.encoder(x, speaker_embedding)
# put after encoder
if self.gst is not None and style_idx >= 0 and style_idx < 10:
gst_embed = self.gst.stl.embed.cpu().data.numpy() #[0, number_token]
gst_embed = np.tile(gst_embed, (1, 8))
scale = np.zeros(512)
scale[:] = 0.3
speaker_embedding = (gst_embed[style_idx] * scale).astype(np.float32)
speaker_embedding = torch.from_numpy(np.tile(speaker_embedding, (x.shape[0], 1))).to(device)
style_embed = self.gst(speaker_embedding)
style_embed = style_embed.expand_as(encoder_seq)
encoder_seq = encoder_seq + style_embed
encoder_seq_proj = self.encoder_proj(encoder_seq) encoder_seq_proj = self.encoder_proj(encoder_seq)
# Need a couple of lists for outputs # Need a couple of lists for outputs
@@ -494,7 +511,7 @@ class Tacotron(nn.Module):
# Use device of model params as location for loaded state # Use device of model params as location for loaded state
device = next(self.parameters()).device device = next(self.parameters()).device
checkpoint = torch.load(str(path), map_location=device) checkpoint = torch.load(str(path), map_location=device)
self.load_state_dict(checkpoint["model_state"]) self.load_state_dict(checkpoint["model_state"], strict=False)
if "optimizer_state" in checkpoint and optimizer is not None: if "optimizer_state" in checkpoint and optimizer is not None:
optimizer.load_state_dict(checkpoint["optimizer_state"]) optimizer.load_state_dict(checkpoint["optimizer_state"])

View File

@@ -71,6 +71,7 @@ class Toolbox:
# Initialize the events and the interface # Initialize the events and the interface
self.ui = UI() self.ui = UI()
self.style_idx = 0
self.reset_ui(enc_models_dir, syn_models_dir, voc_models_dir, seed) self.reset_ui(enc_models_dir, syn_models_dir, voc_models_dir, seed)
self.setup_events() self.setup_events()
self.ui.start() self.ui.start()
@@ -233,7 +234,7 @@ class Toolbox:
texts = processed_texts texts = processed_texts
embed = self.ui.selected_utterance.embed embed = self.ui.selected_utterance.embed
embeds = [embed] * len(texts) embeds = [embed] * len(texts)
specs = self.synthesizer.synthesize_spectrograms(texts, embeds) specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_idx_textbox.text()))
breaks = [spec.shape[1] for spec in specs] breaks = [spec.shape[1] for spec in specs]
spec = np.concatenate(specs, axis=1) spec = np.concatenate(specs, axis=1)

View File

@@ -574,10 +574,14 @@ class UI(QDialog):
self.seed_textbox = QLineEdit() self.seed_textbox = QLineEdit()
self.seed_textbox.setMaximumWidth(80) self.seed_textbox.setMaximumWidth(80)
layout_seed.addWidget(self.seed_textbox, 0, 1) layout_seed.addWidget(self.seed_textbox, 0, 1)
layout_seed.addWidget(QLabel("Style#:(0~9)"), 0, 2)
self.style_idx_textbox = QLineEdit("-1")
self.style_idx_textbox.setMaximumWidth(80)
layout_seed.addWidget(self.style_idx_textbox, 0, 3)
self.trim_silences_checkbox = QCheckBox("Enhance vocoder output") self.trim_silences_checkbox = QCheckBox("Enhance vocoder output")
self.trim_silences_checkbox.setToolTip("When checked, trims excess silence in vocoder output." self.trim_silences_checkbox.setToolTip("When checked, trims excess silence in vocoder output."
" This feature requires `webrtcvad` to be installed.") " This feature requires `webrtcvad` to be installed.")
layout_seed.addWidget(self.trim_silences_checkbox, 0, 2, 1, 2) layout_seed.addWidget(self.trim_silences_checkbox, 0, 4, 1, 2)
gen_layout.addLayout(layout_seed) gen_layout.addLayout(layout_seed)
self.loading_bar = QProgressBar() self.loading_bar = QProgressBar()

View File

@@ -11,7 +11,6 @@ def check_model_paths(encoder_path: Path, synthesizer_path: Path, vocoder_path:
# If none of the paths exist, remind the user to download models if needed # If none of the paths exist, remind the user to download models if needed
print("********************************************************************************") print("********************************************************************************")
print("Error: Model files not found. Follow these instructions to get and install the models:") print("Error: Model files not found. Please download the models")
print("https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models")
print("********************************************************************************\n") print("********************************************************************************\n")
quit(-1) quit(-1)

View File

@@ -9,10 +9,12 @@ from vocoder.wavernn import inference as rnn_vocoder
import numpy as np import numpy as np
import re import re
from scipy.io.wavfile import write from scipy.io.wavfile import write
import librosa
import io import io
import base64 import base64
from flask_cors import CORS from flask_cors import CORS
from flask_wtf import CSRFProtect from flask_wtf import CSRFProtect
import webbrowser
def webApp(): def webApp():
# Init and load config # Init and load config
@@ -29,6 +31,7 @@ def webApp():
synthesizers = list(Path(syn_models_dirt).glob("**/*.pt")) synthesizers = list(Path(syn_models_dirt).glob("**/*.pt"))
synthesizers_cache = {} synthesizers_cache = {}
encoder.load_model(Path("encoder/saved_models/pretrained.pt")) encoder.load_model(Path("encoder/saved_models/pretrained.pt"))
# rnn_vocoder.load_model(Path("vocoder/saved_models/pretrained/pretrained.pt"))
gan_vocoder.load_model(Path("vocoder/saved_models/pretrained/g_hifigan.pt")) gan_vocoder.load_model(Path("vocoder/saved_models/pretrained/g_hifigan.pt"))
def pcm2float(sig, dtype='float32'): def pcm2float(sig, dtype='float32'):
@@ -65,7 +68,6 @@ def webApp():
@app.route("/api/synthesize", methods=["POST"]) @app.route("/api/synthesize", methods=["POST"])
def synthesize(): def synthesize():
# TODO Implementation with json to support more platform # TODO Implementation with json to support more platform
# Load synthesizer # Load synthesizer
if "synt_path" in request.form: if "synt_path" in request.form:
synt_path = request.form["synt_path"] synt_path = request.form["synt_path"]
@@ -79,10 +81,16 @@ def webApp():
current_synt = synthesizers_cache[synt_path] current_synt = synthesizers_cache[synt_path]
print("using synthesizer model: " + str(synt_path)) print("using synthesizer model: " + str(synt_path))
# Load input wav # Load input wav
wav_base64 = request.form["upfile_b64"] if "upfile_b64" in request.form:
wav = base64.b64decode(bytes(wav_base64, 'utf-8')) wav_base64 = request.form["upfile_b64"]
wav = pcm2float(np.frombuffer(wav, dtype=np.int16), dtype=np.float32) wav = base64.b64decode(bytes(wav_base64, 'utf-8'))
encoder_wav = encoder.preprocess_wav(wav, 16000) wav = pcm2float(np.frombuffer(wav, dtype=np.int16), dtype=np.float32)
sample_rate = Synthesizer.sample_rate
else:
wav, sample_rate, = librosa.load(request.files['file'])
write("temp.wav", sample_rate, wav) #Make sure we get the correct wav
encoder_wav = encoder.preprocess_wav(wav, sample_rate)
embed, _, _ = encoder.embed_utterance(encoder_wav, return_partials=True) embed, _, _ = encoder.embed_utterance(encoder_wav, return_partials=True)
# Load input text # Load input text
@@ -99,6 +107,7 @@ def webApp():
embeds = [embed] * len(texts) embeds = [embed] * len(texts)
specs = current_synt.synthesize_spectrograms(texts, embeds) specs = current_synt.synthesize_spectrograms(texts, embeds)
spec = np.concatenate(specs, axis=1) spec = np.concatenate(specs, axis=1)
# wav = rnn_vocoder.infer_waveform(spec)
wav = gan_vocoder.infer_waveform(spec) wav = gan_vocoder.infer_waveform(spec)
# Return cooked wav # Return cooked wav
@@ -112,10 +121,11 @@ def webApp():
host = app.config.get("HOST") host = app.config.get("HOST")
port = app.config.get("PORT") port = app.config.get("PORT")
print(f"Web server: http://{host}:{port}") web_address = 'http://{}:{}'.format(host, port)
print(f"Web server:" + web_address)
webbrowser.open(web_address)
server = wsgi.WSGIServer((host, port), app) server = wsgi.WSGIServer((host, port), app)
server.serve_forever() server.serve_forever()
return app return app
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -5,3 +5,4 @@ PORT = 8080
MAX_CONTENT_PATH =1024 * 1024 * 4 # mp3文件大小限定不能超过4M MAX_CONTENT_PATH =1024 * 1024 * 4 # mp3文件大小限定不能超过4M
SECRET_KEY = "mockingbird_key" SECRET_KEY = "mockingbird_key"
WTF_CSRF_SECRET_KEY = "mockingbird_key" WTF_CSRF_SECRET_KEY = "mockingbird_key"
TEMPLATES_AUTO_RELOAD = True

BIN
web/static/img/bird-sm.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

BIN
web/static/img/bird.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

View File

@@ -4,8 +4,7 @@
<head> <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">
<link rel="shortcut icon" type="image/png" <link rel="shortcut icon" type="image/png" href="../static/img/bird-sm.png">
href="https://cdn.jsdelivr.net/gh/xiangyuecn/Recorder@latest/assets/icon.png">
<title>MockingBird Web Server</title> <title>MockingBird Web Server</title>
@@ -24,50 +23,105 @@
<div class="main"> <div class="main">
<div class="mainBox"> <div class="mainBox">
<div class="pd btns"> <div class="title" >
<div style="width: 15%;float: left;margin-left: 5%;">
<img src="../static/img/bird.png" style="width: 100%;border-radius:50%;"></img>
</div>
<div style="width: 80% ;height: 15%;; margin-left: 15%;overflow: hidden;">
<div style="margin-left: 5%;margin-top: 15px;font-size: xx-large;font-weight: bolder;">
拟声鸟工具箱
</div>
<div style="margin-left: 5%;margin-top: 3px;font-size: large;">
<a href="https://github.com/babysor/MockingBird" target="_blank">https://github.com/babysor/MockingBird</a>
</div>
</div>
</div>
<div style="margin-left: 5%;margin-top: 50px;width: 90%;">
<div style="font-size: larger;font-weight: bolder;">1. 请输入中文</div>
<textarea id="user_input_text"
style="border:1px solid #ccc; width: 100%; height: 100px; font-size: 15px; margin-top: 10px;"></textarea>
</div>
<div class="pd btns" style="margin-left: 5%;margin-top: 20px;width: 90%; ">
<!-- <div> <!-- <div>
<button onclick="recOpen()" style="margin-right:10px">打开录音,请求权限</button> <button onclick="recOpen()" style="margin-right:10px">打开录音,请求权限</button>
<button onclick="recClose()" style="margin-right:0">关闭录音,释放资源</button> <button onclick="recClose()" style="margin-right:0">关闭录音,释放资源</button>
</div> --> </div> -->
<button onclick="recStart()" style="margin-left:100px">录制</button> <div style="font-size: larger;font-weight: bolder;">2. 请直接录音,点击停止结束</div>
<button onclick="recStop()" style="margin-left:100px">停止</button> <button onclick="recStart()" >录制</button>
<button onclick="recPlay()" style="margin-left:100px">播放</button> <button onclick="recStop()">停止</button>
<button onclick="recPlay()" >播放</button>
</div>
<div class="pd btns" style="margin-left: 5%;margin-top: 20px;width: 90%; ">
<div style="font-size: larger;font-weight: bolder;">或上传音频</div>
<input type="file" id="fileInput" accept=".wav" />
<label for="fileInput">选择音频</label>
<div id="audio1"></div>
</div>
<div class="pd btns" style="margin-left: 5%;margin-top: 20px;width: 90%; ">
<div style="font-size: larger;font-weight: bolder;">3. 选择Synthesizer模型</div>
<span class="box">
<select id="select">
</select>
</span>
</div>
<div class="pd btns" style="margin-left: 5%;margin-top: 20px;width: 90%; text-align:right;">
<button id="upload" onclick="recUpload()">上传合成</button>
</div> </div>
<!-- 波形绘制区域 --> <!-- 波形绘制区域 -->
<div class="pd recpower"> <!-- <div class="pd recpower">
<div style="height:40px;width:100%;background:#fff;position:relative;"> <div style="height:40px;width:100%;background:#fff;position:relative;">
<div class="recpowerx" style="height:40px;background:#ff3295;position:absolute;"></div> <div class="recpowerx" style="height:40px;background:#ff3295;position:absolute;"></div>
<div class="recpowert" style="padding-left:50px; line-height:40px; position: relative;"></div> <div class="recpowert" style="padding-left:50px; line-height:40px; position: relative;"></div>
</div> </div>
</div> </div> -->
<div class="pd waveBox" style="height:100px;"> <!-- <div class="pd waveBox" style="height:100px;">
<div style="border:1px solid #ccc;display:inline-block; width: 100%; height: 100px;"> <div style="border:1px solid #ccc;display:inline-block; width: 100%; height: 100px;">
<div style="height:100px; width: 100%; background-color: #FE76B8; position: relative;left: 0px;top: 0px;z-index: 10;" <div style="height:100px; width: 100%; background-color: #5da1f5; position: relative;left: 0px;top: 0px;z-index: 10;"
class="recwave"></div> class="recwave"></div>
<div <div
style="background-color: transparent;position: relative;top: -80px;left: 30%;z-index: 20;font-size: 48px;color: #fff;"> style="background-color: transparent;position: relative;top: -80px;left: 30%;z-index: 20;font-size: 48px;color: #fff;">
音频预览</div> 音频预览</div>
</div> </div>
</div> </div> -->
<div> <div class="reclog" style="margin-left: 5%;margin-top: 20px;width: 90%;"></div>
<div>请输入文本:</div>
<input type="text" id="user_input_text"
style="border:1px solid #ccc; width: 100%; height: 20px; font-size: 18px;" />
</div>
<div class="pd btns">
<button onclick="recUpload()" style="margin-left: 300px; margin-top: 15px;">上传</button>
</div>
</div>
<!-- 日志输出区域 -->
<div class="mainBox">
<div class="reclog"></div>
</div> </div>
</div> </div>
<script> <script>
$("#fileInput").change(function(){
var file = $("#fileInput").get(0).files;
if (file.length > 0) {
var path = URL.createObjectURL(file[0]);
var audio = document.createElement('audio');
audio.src = path;
audio.controls = true;
$('#audio1').empty().append(audio);
}
});
fetch("/api/synthesizers", {
method: 'get',
headers: {
"X-CSRFToken": "{{ csrf_token() }}"
}
}).then(function (res) {
if (!res.ok) throw Error(res.statusText);
return res.json();
}).then(function (data) {
for (var synt of data) {
var option = document.createElement('option');
option.text = synt.name
option.value = synt.path
$("#select").append(option);
}
}).catch(function (err) {
console.log('Error: ' + err.message);
})
var rec, wave, recBlob; var rec, wave, recBlob;
/**调用open打开录音请求好录音权限**/ /**调用open打开录音请求好录音权限**/
var recOpen = function () {//一般在显示出录音按钮或相关的录音界面时进行此方法调用,后面用户点击开始录音时就能畅通无阻了 var recOpen = function () {//一般在显示出录音按钮或相关的录音界面时进行此方法调用,后面用户点击开始录音时就能畅通无阻了
@@ -78,11 +132,11 @@
type: "wav", bitRate: 16, sampleRate: 16000 type: "wav", bitRate: 16, sampleRate: 16000
, onProcess: function (buffers, powerLevel, bufferDuration, bufferSampleRate, newBufferIdx, asyncEnd) { , onProcess: function (buffers, powerLevel, bufferDuration, bufferSampleRate, newBufferIdx, asyncEnd) {
//录音实时回调大约1秒调用12次本回调 //录音实时回调大约1秒调用12次本回调
document.querySelector(".recpowerx").style.width = powerLevel + "%"; // document.querySelector(".recpowerx").style.width = powerLevel + "%";
document.querySelector(".recpowert").innerText = bufferDuration + " / " + powerLevel; // document.querySelector(".recpowert").innerText = bufferDuration + " / " + powerLevel;
//可视化图形绘制 //可视化图形绘制
wave.input(buffers[buffers.length - 1], powerLevel, bufferSampleRate); // wave.input(buffers[buffers.length - 1], powerLevel, bufferSampleRate);
} }
}); });
@@ -93,7 +147,7 @@
rec = newRec; rec = newRec;
//此处创建这些音频可视化图形绘制浏览器支持妥妥的 //此处创建这些音频可视化图形绘制浏览器支持妥妥的
wave = Recorder.FrequencyHistogramView({ elem: ".recwave" }); // wave = Recorder.FrequencyHistogramView({ elem: ".recwave" });
reclog("已打开录音,可以点击录制开始录音了", 2); reclog("已打开录音,可以点击录制开始录音了", 2);
}, function (msg, isUserNotAllow) {//用户拒绝未授权或不支持 }, function (msg, isUserNotAllow) {//用户拒绝未授权或不支持
@@ -186,15 +240,21 @@
/**上传**/ /**上传**/
function recUpload() { function recUpload() {
var blob = recBlob; var blob
var loadedAudios = $("#fileInput").get(0).files
if (loadedAudios.length > 0) {
blob = loadedAudios[0];
} else {
blob = recBlob;
}
if (!blob) { if (!blob) {
reclog("请先录音,然后停止后再上传", 1); reclog("请先录音或选择音频,然后停止后再上传", 1);
return; return;
}; };
//本例子假设使用原始XMLHttpRequest请求方式实际使用中自行调整为自己的请求方式 //本例子假设使用原始XMLHttpRequest请求方式实际使用中自行调整为自己的请求方式
//录音结束时拿到了blob文件对象可以用FileReader读取出内容或者用FormData上传 //录音结束时拿到了blob文件对象可以用FileReader读取出内容或者用FormData上传
var api = "http://127.0.0.1:8080/api/synthesize"; var api = "/api/synthesize";
reclog("开始上传到" + api + ",请求稍后..."); reclog("开始上传到" + api + ",请求稍后...");
@@ -203,15 +263,18 @@
var csrftoken = "{{ csrf_token() }}"; var csrftoken = "{{ csrf_token() }}";
var user_input_text = document.getElementById("user_input_text"); var user_input_text = document.getElementById("user_input_text");
var input_text = user_input_text.value; var input_text = user_input_text.value;
var postData = ""; var postData = new FormData();
postData += "mime=" + encodeURIComponent(blob.type);//告诉后端这个录音是什么格式的可能前后端都固定的mp3可以不用写 postData.append("text", input_text)
postData += "&upfile_b64=" + encodeURIComponent((/.+;\s*base64\s*,\s*(.+)$/i.exec(reader.result) || [])[1]) //录音文件内容后端进行base64解码成二进制 postData.append("file", blob)
postData += "&text=" + encodeURIComponent(input_text); var sel = document.getElementById("select");
var path = sel.options[sel.selectedIndex].value;
if (!!path) {
postData.append("synt_path", path);
}
fetch(api, { fetch(api, {
method: 'post', method: 'post',
headers: { headers: {
"Content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"X-CSRFToken": csrftoken "X-CSRFToken": csrftoken
}, },
body: postData body: postData
@@ -277,7 +340,7 @@
var div = document.createElement("div"); var div = document.createElement("div");
var elem = document.querySelector(".reclog"); var elem = document.querySelector(".reclog");
elem.insertBefore(div, elem.firstChild); elem.insertBefore(div, elem.firstChild);
div.innerHTML = '<div style="color:' + (!color ? "" : color == 1 ? "red" : color == 2 ? "#FE76B8" : color) + '">[' + t + ']' + s + '</div>'; div.innerHTML = '<div style="color:' + (!color ? "" : color == 1 ? "#327de8" : color == 2 ? "#5da1f5" : color) + '">[' + t + ']' + s + '</div>';
}; };
window.onerror = function (message, url, lineNo, columnNo, error) { window.onerror = function (message, url, lineNo, columnNo, error) {
reclog('<span style="color:red">【Uncaught Error】' + message + '<pre>' + "at:" + lineNo + ":" + columnNo + " url:" + url + "\n" + (error && error.stack || "不能获得错误堆栈") + '</pre></span>'); reclog('<span style="color:red">【Uncaught Error】' + message + '<pre>' + "at:" + lineNo + ":" + columnNo + " url:" + url + "\n" + (error && error.stack || "不能获得错误堆栈") + '</pre></span>');
@@ -312,11 +375,11 @@
a { a {
text-decoration: none; text-decoration: none;
color: #FE76B8; color: #327de8;
} }
a:hover { a:hover {
color: #f00; color: #5da1f5;
} }
.main { .main {
@@ -330,7 +393,6 @@
padding: 12px; padding: 12px;
border-radius: 6px; border-radius: 6px;
background: #fff; background: #fff;
--border: 1px solid #f60;
box-shadow: 2px 2px 3px #aaa; box-shadow: 2px 2px 3px #aaa;
} }
@@ -340,20 +402,31 @@
cursor: pointer; cursor: pointer;
border: none; border: none;
border-radius: 3px; border-radius: 3px;
background: #FE76B8; background: #5698c3;
color: #fff; color: #fff;
padding: 0 15px; padding: 0 15px;
margin: 3px 20px 3px 0; margin: 3px 10px 3px 0;
width: 70px;
line-height: 36px; line-height: 36px;
height: 36px; height: 36px;
overflow: hidden; overflow: hidden;
vertical-align: middle; vertical-align: middle;
} }
.btns button:active { .btns #upload {
background: #fd54a6 background: #5698c3;
color: #fff;
width: 100px;
height: 42px;
} }
.btns button:active {
background: #5da1f5
}
.btns button:hover {
background: #5da1f5
}
.pd { .pd {
padding: 0 0 6px 0; padding: 0 0 6px 0;
} }
@@ -361,12 +434,74 @@
.lb { .lb {
display: inline-block; display: inline-block;
vertical-align: middle; vertical-align: middle;
background: #ff3d9b; background: #327de8;
color: #fff; color: #fff;
font-size: 14px; font-size: 14px;
padding: 2px 8px; padding: 2px 8px;
border-radius: 99px; border-radius: 99px;
} }
#fileInput {
width: 0.1px;
height: 0.1px;
opacity: 0;
overflow: hidden;
position: absolute;
z-index: -1;
}
#fileInput + label {
padding: 0 15px;
border-radius: 4px;
color: white;
background-color: #5698c3;
display: inline-block;
width: 70px;
line-height: 36px;
height: 36px;
}
#fileInput + label {
cursor: pointer; /* "hand" cursor */
}
#fileInput:focus + label,
#fileInput + label:hover {
background-color: #5da1f5;
}
.box select {
background-color: #5698c3;
color: white;
padding: 8px;
width: 120px;
border: none;
border-radius: 4px;
font-size: 0.5em;
outline: none;
margin: 3px 10px 3px 0;
}
.box::before {
content: "\f13a";
position: absolute;
top: 0;
right: 0;
width: 20%;
height: 100%;
text-align: center;
font-size: 28px;
line-height: 45px;
color: rgba(255, 255, 255, 0.5);
background-color: rgba(255, 255, 255, 0.1);
pointer-events: none;
}
.box:hover::before {
color: rgba(255, 255, 255, 0.6);
background-color: rgba(255, 255, 255, 0.2);
}
.box select option {
padding: 30px;
}
</style> </style>
</body> </body>