add F Q&A in README、README-CN (#84)

* add FQA
3 years ago · 222e302274
parent 32b9755cbe
commit 222e302274
2 changed files with 140 additions and 8 deletions
--- a/README-CN.md
+++ b/README-CN.md
@ -8,7 +8,7 @@
 ### [DEMO VIDEO](https://www.bilibili.com/video/BV1sA411P7wM/)

 ## 特性
-🌍 **中文** 支持普通话并使用多种中文数据集进行测试：adatatang_200zh, magicdata, aishell3， biaobei，MozillaCommonVoice 等
+🌍 **中文** 支持普通话并使用多种中文数据集进行测试：aidatatang_200zh, magicdata, aishell3， biaobei，MozillaCommonVoice 等

 🤩 **PyTorch** 适用于 pytorch，已在 1.9.0 版本（最新于 2021 年 8 月）中测试，GPU Tesla T4 和 GTX 2060

@ -33,7 +33,7 @@
 * 下载 数据集并解压：确保您可以访问 *train* 文件夹中的所有音频文件（如.wav）
 * 进行音频和梅尔频谱图预处理：
 `python pre.py <datasets_root>`
-可以传入参数 --dataset `{dataset}` 支持 adatatang_200zh, magicdata, aishell3
+可以传入参数 --dataset `{dataset}` 支持 aidatatang_200zh, magicdata, aishell3
 > 假如你下载的 `aidatatang_200zh`文件放在D盘，`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\`

 >假如發生 `頁面文件太小，無法完成操作`，請參考這篇[文章](https://blog.csdn.net/qq_17755303/article/details/112564030)，將虛擬內存更改為100G(102400)，例如:档案放置D槽就更改D槽的虚拟内存
@ -49,8 +49,8 @@
 ### 2.2 使用预先训练好的合成器
 > 实在没有设备或者不想慢慢调试，可以使用网友贡献的模型(欢迎持续分享):

-| 作者 | 下载链接 | 效果预览 | 信息 | 
-| --- | ----------- | ----- | ----- | 
+| 作者 | 下载链接 | 效果预览 | 信息 |
+| --- | ----------- | ----- | ----- |
 |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码：1024  | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音
 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码：2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 旧版需根据[issue](https://github.com/babysor/MockingBird/issues/37)修复

@ -79,4 +79,70 @@
 |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |
 |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
 |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)
-|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 |
+|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 |
+
+## 常見問題(FQ&A)
+#### 1.數據集哪裡下載?
+[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/)
+> 解壓 aidatatang_200zh 後，還需將 `aidatatang_200zh\corpus\train`下的檔案全選解壓縮
+
+#### 2.`<datasets_root>`是什麼意思?
+假如數據集路徑為 `D:\data\aidatatang_200zh`，那麼 `<datasets_root>`就是 `D:\data`
+
+#### 3.訓練模型顯存不足
+訓練合成器時：將 `synthesizer/hparams.py`中的batch_size參數調小
+```
+//調整前
+tts_schedule = [(2,  1e-3,  20_000,  12),   # Progressive training schedule
+                (2,  5e-4,  40_000,  12),   # (r, lr, step, batch_size)
+                (2,  2e-4,  80_000,  12),   #
+                (2,  1e-4, 160_000,  12),   # r = reduction factor (# of mel frames
+                (2,  3e-5, 320_000,  12),   #     synthesized for each decoder iteration)
+                (2,  1e-5, 640_000,  12)],  # lr = learning rate
+//調整後
+tts_schedule = [(2,  1e-3,  20_000,  8),   # Progressive training schedule
+                (2,  5e-4,  40_000,  8),   # (r, lr, step, batch_size)
+                (2,  2e-4,  80_000,  8),   #
+                (2,  1e-4, 160_000,  8),   # r = reduction factor (# of mel frames
+                (2,  3e-5, 320_000,  8),   #     synthesized for each decoder iteration)
+                (2,  1e-5, 640_000,  8)],  # lr = learning rate
+```
+
+聲碼器-預處理數據集時：將 `synthesizer/hparams.py`中的batch_size參數調小
+```
+//調整前
+### Data Preprocessing
+        max_mel_frames = 900,
+        rescale = True,
+        rescaling_max = 0.9,
+        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.
+//調整後
+### Data Preprocessing
+        max_mel_frames = 900,
+        rescale = True,
+        rescaling_max = 0.9,
+        synthesis_batch_size = 8,                  # For vocoder preprocessing and inference.
+```
+
+聲碼器-訓練聲碼器時：將 `vocoder/wavernn/hparams.py`中的batch_size參數調小
+```
+//調整前
+# Training
+voc_batch_size = 100
+voc_lr = 1e-4
+voc_gen_at_checkpoint = 5
+voc_pad = 2
+
+//調整後
+# Training
+voc_batch_size = 6
+voc_lr = 1e-4
+voc_gen_at_checkpoint = 5
+voc_pad =2
+```
+
+#### 4.碰到`RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`
+請參照 issue [#37](https://github.com/babysor/MockingBird/issues/37)
+
+#### 5.如何改善CPU、GPU佔用率?
+適情況調整batch_size參數來改善
--- a/README.md
+++ b/README.md
@ -33,7 +33,7 @@
 * Download aidatatang_200zh or other dataset and unzip: make sure you can access all .wav in *train* folder
 * Preprocess with the audios and the mel spectrograms:
 `python pre.py <datasets_root>`
-Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, aishell3
+Allow parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3

 >If it happens `the page file is too small to complete the operation`, please refer to this [video](https://www.youtube.com/watch?v=Oh6dga-Oy10&ab_channel=CodeProf) and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed.

@ -50,7 +50,7 @@ Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, ais
 > Thanks to the community, some models will be shared:

 | author | Download link | Preview Video | Info |
-| --- | ----------- | ----- |----- | 
+| --- | ----------- | ----- |----- |
 |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [Baidu Pan](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) Code：1024  | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan
 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code：2021 | https://www.bilibili.com/video/BV1uh411B7AD/

@ -83,4 +83,70 @@ or
 |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |
 |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
 |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)
-|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |
+|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |
+
+## F Q&A
+#### 1.Where can I download the dataset?
+[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/)
+> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\corpus\train`
+
+#### 2.What is`<datasets_root>`?
+If the dataset path is `D:\data\aidatatang_200zh`,then `<datasets_root>` is`D:\data`
+
+#### 3.Not enough VRAM
+Train the synthesizer：adjust the batch_size in `synthesizer/hparams.py`
+```
+//Before
+tts_schedule = [(2,  1e-3,  20_000,  12),   # Progressive training schedule
+                (2,  5e-4,  40_000,  12),   # (r, lr, step, batch_size)
+                (2,  2e-4,  80_000,  12),   #
+                (2,  1e-4, 160_000,  12),   # r = reduction factor (# of mel frames
+                (2,  3e-5, 320_000,  12),   #     synthesized for each decoder iteration)
+                (2,  1e-5, 640_000,  12)],  # lr = learning rate
+//After
+tts_schedule = [(2,  1e-3,  20_000,  8),   # Progressive training schedule
+                (2,  5e-4,  40_000,  8),   # (r, lr, step, batch_size)
+                (2,  2e-4,  80_000,  8),   #
+                (2,  1e-4, 160_000,  8),   # r = reduction factor (# of mel frames
+                (2,  3e-5, 320_000,  8),   #     synthesized for each decoder iteration)
+                (2,  1e-5, 640_000,  8)],  # lr = learning rate
+```
+
+Train Vocoder-Preprocess the data：adjust the batch_size in `synthesizer/hparams.py`
+```
+//Before
+### Data Preprocessing
+        max_mel_frames = 900,
+        rescale = True,
+        rescaling_max = 0.9,
+        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.
+//After
+### Data Preprocessing
+        max_mel_frames = 900,
+        rescale = True,
+        rescaling_max = 0.9,
+        synthesis_batch_size = 8,                  # For vocoder preprocessing and inference.
+```
+
+Train Vocoder-Train the vocoder：adjust the batch_size in `vocoder/wavernn/hparams.py`
+```
+//Before
+# Training
+voc_batch_size = 100
+voc_lr = 1e-4
+voc_gen_at_checkpoint = 5
+voc_pad = 2
+
+//After
+# Training
+voc_batch_size = 6
+voc_lr = 1e-4
+voc_gen_at_checkpoint = 5
+voc_pad =2
+```
+
+#### 4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`
+Please refer to issue [#37](https://github.com/babysor/MockingBird/issues/37)
+
+#### 5. How to improve CPU and GPU occupancy rate?
+Adjust the batch_size as appropriate to improve