add F Q&A in README、README-CN (#84)

* add FQA
pull/93/head
XiuChen-Liu 3 years ago committed by GitHub
parent 32b9755cbe
commit 222e302274
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -8,7 +8,7 @@
### [DEMO VIDEO](https://www.bilibili.com/video/BV1sA411P7wM/)
## 特性
🌍 **中文** 支持普通话并使用多种中文数据集进行测试adatatang_200zh, magicdata, aishell3 biaobeiMozillaCommonVoice 等
🌍 **中文** 支持普通话并使用多种中文数据集进行测试aidatatang_200zh, magicdata, aishell3 biaobeiMozillaCommonVoice 等
🤩 **PyTorch** 适用于 pytorch已在 1.9.0 版本(最新于 2021 年 8 月中测试GPU Tesla T4 和 GTX 2060
@ -33,7 +33,7 @@
* 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav
* 进行音频和梅尔频谱图预处理:
`python pre.py <datasets_root>`
可以传入参数 --dataset `{dataset}` 支持 adatatang_200zh, magicdata, aishell3
可以传入参数 --dataset `{dataset}` 支持 aidatatang_200zh, magicdata, aishell3
> 假如你下载的 `aidatatang_200zh`文件放在D盘`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\`
>假如發生 `頁面文件太小,無法完成操作`,請參考這篇[文章](https://blog.csdn.net/qq_17755303/article/details/112564030)將虛擬內存更改為100G(102400),例如:档案放置D槽就更改D槽的虚拟内存
@ -49,8 +49,8 @@
### 2.2 使用预先训练好的合成器
> 实在没有设备或者不想慢慢调试,可以使用网友贡献的模型(欢迎持续分享):
| 作者 | 下载链接 | 效果预览 | 信息 |
| --- | ----------- | ----- | ----- |
| 作者 | 下载链接 | 效果预览 | 信息 |
| --- | ----------- | ----- | ----- |
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音
|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 旧版需根据[issue](https://github.com/babysor/MockingBird/issues/37)修复
@ -79,4 +79,70 @@
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |
|[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
|[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 |
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 |
## 常見問題(FQ&A)
#### 1.數據集哪裡下載?
[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/)
> 解壓 aidatatang_200zh 後,還需將 `aidatatang_200zh\corpus\train`下的檔案全選解壓縮
#### 2.`<datasets_root>`是什麼意思?
假如數據集路徑為 `D:\data\aidatatang_200zh`,那麼 `<datasets_root>`就是 `D:\data`
#### 3.訓練模型顯存不足
訓練合成器時:將 `synthesizer/hparams.py`中的batch_size參數調小
```
//調整前
tts_schedule = [(2, 1e-3, 20_000, 12), # Progressive training schedule
(2, 5e-4, 40_000, 12), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 12), #
(2, 1e-4, 160_000, 12), # r = reduction factor (# of mel frames
(2, 3e-5, 320_000, 12), # synthesized for each decoder iteration)
(2, 1e-5, 640_000, 12)], # lr = learning rate
//調整後
tts_schedule = [(2, 1e-3, 20_000, 8), # Progressive training schedule
(2, 5e-4, 40_000, 8), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 8), #
(2, 1e-4, 160_000, 8), # r = reduction factor (# of mel frames
(2, 3e-5, 320_000, 8), # synthesized for each decoder iteration)
(2, 1e-5, 640_000, 8)], # lr = learning rate
```
聲碼器-預處理數據集時:將 `synthesizer/hparams.py`中的batch_size參數調小
```
//調整前
### Data Preprocessing
max_mel_frames = 900,
rescale = True,
rescaling_max = 0.9,
synthesis_batch_size = 16, # For vocoder preprocessing and inference.
//調整後
### Data Preprocessing
max_mel_frames = 900,
rescale = True,
rescaling_max = 0.9,
synthesis_batch_size = 8, # For vocoder preprocessing and inference.
```
聲碼器-訓練聲碼器時:將 `vocoder/wavernn/hparams.py`中的batch_size參數調小
```
//調整前
# Training
voc_batch_size = 100
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad = 2
//調整後
# Training
voc_batch_size = 6
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad =2
```
#### 4.碰到`RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`
請參照 issue [#37](https://github.com/babysor/MockingBird/issues/37)
#### 5.如何改善CPU、GPU佔用率?
適情況調整batch_size參數來改善

@ -33,7 +33,7 @@
* Download aidatatang_200zh or other dataset and unzip: make sure you can access all .wav in *train* folder
* Preprocess with the audios and the mel spectrograms:
`python pre.py <datasets_root>`
Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, aishell3
Allow parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3
>If it happens `the page file is too small to complete the operation`, please refer to this [video](https://www.youtube.com/watch?v=Oh6dga-Oy10&ab_channel=CodeProf) and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed.
@ -50,7 +50,7 @@ Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, ais
> Thanks to the community, some models will be shared:
| author | Download link | Preview Video | Info |
| --- | ----------- | ----- |----- |
| --- | ----------- | ----- |----- |
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [Baidu Pan](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) Code1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan
|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code2021 | https://www.bilibili.com/video/BV1uh411B7AD/
@ -83,4 +83,70 @@ or
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |
|[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
|[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |
## F Q&A
#### 1.Where can I download the dataset?
[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/)
> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\corpus\train`
#### 2.What is`<datasets_root>`?
If the dataset path is `D:\data\aidatatang_200zh`,then `<datasets_root>` is`D:\data`
#### 3.Not enough VRAM
Train the synthesizeradjust the batch_size in `synthesizer/hparams.py`
```
//Before
tts_schedule = [(2, 1e-3, 20_000, 12), # Progressive training schedule
(2, 5e-4, 40_000, 12), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 12), #
(2, 1e-4, 160_000, 12), # r = reduction factor (# of mel frames
(2, 3e-5, 320_000, 12), # synthesized for each decoder iteration)
(2, 1e-5, 640_000, 12)], # lr = learning rate
//After
tts_schedule = [(2, 1e-3, 20_000, 8), # Progressive training schedule
(2, 5e-4, 40_000, 8), # (r, lr, step, batch_size)
(2, 2e-4, 80_000, 8), #
(2, 1e-4, 160_000, 8), # r = reduction factor (# of mel frames
(2, 3e-5, 320_000, 8), # synthesized for each decoder iteration)
(2, 1e-5, 640_000, 8)], # lr = learning rate
```
Train Vocoder-Preprocess the dataadjust the batch_size in `synthesizer/hparams.py`
```
//Before
### Data Preprocessing
max_mel_frames = 900,
rescale = True,
rescaling_max = 0.9,
synthesis_batch_size = 16, # For vocoder preprocessing and inference.
//After
### Data Preprocessing
max_mel_frames = 900,
rescale = True,
rescaling_max = 0.9,
synthesis_batch_size = 8, # For vocoder preprocessing and inference.
```
Train Vocoder-Train the vocoderadjust the batch_size in `vocoder/wavernn/hparams.py`
```
//Before
# Training
voc_batch_size = 100
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad = 2
//After
# Training
voc_batch_size = 6
voc_lr = 1e-4
voc_gen_at_checkpoint = 5
voc_pad =2
```
#### 4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`
Please refer to issue [#37](https://github.com/babysor/MockingBird/issues/37)
#### 5. How to improve CPU and GPU occupancy rate?
Adjust the batch_size as appropriate to improve
Loading…
Cancel
Save