From 222e3022743236ba0d3c5e22d8b945e8d2aa95f8 Mon Sep 17 00:00:00 2001 From: XiuChen-Liu <55180847+XiuChen-Liu@users.noreply.github.com> Date: Tue, 14 Sep 2021 13:31:30 +0800 Subject: [PATCH] =?UTF-8?q?add=20F=20Q&A=20in=20README=E3=80=81README-CN?= =?UTF-8?q?=20(#84)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * add FQA --- README-CN.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++---- README.md | 72 ++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 140 insertions(+), 8 deletions(-) diff --git a/README-CN.md b/README-CN.md index 85ae1fe..b741ab1 100644 --- a/README-CN.md +++ b/README-CN.md @@ -8,7 +8,7 @@ ### [DEMO VIDEO](https://www.bilibili.com/video/BV1sA411P7wM/) ## 特性 -🌍 **中文** 支持普通话并使用多种中文数据集进行测试:adatatang_200zh, magicdata, aishell3, biaobei,MozillaCommonVoice 等 +🌍 **中文** 支持普通话并使用多种中文数据集进行测试:aidatatang_200zh, magicdata, aishell3, biaobei,MozillaCommonVoice 等 🤩 **PyTorch** 适用于 pytorch,已在 1.9.0 版本(最新于 2021 年 8 月)中测试,GPU Tesla T4 和 GTX 2060 @@ -33,7 +33,7 @@ * 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav) * 进行音频和梅尔频谱图预处理: `python pre.py ` -可以传入参数 --dataset `{dataset}` 支持 adatatang_200zh, magicdata, aishell3 +可以传入参数 --dataset `{dataset}` 支持 aidatatang_200zh, magicdata, aishell3 > 假如你下载的 `aidatatang_200zh`文件放在D盘,`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\` >假如發生 `頁面文件太小,無法完成操作`,請參考這篇[文章](https://blog.csdn.net/qq_17755303/article/details/112564030),將虛擬內存更改為100G(102400),例如:档案放置D槽就更改D槽的虚拟内存 @@ -49,8 +49,8 @@ ### 2.2 使用预先训练好的合成器 > 实在没有设备或者不想慢慢调试,可以使用网友贡献的模型(欢迎持续分享): -| 作者 | 下载链接 | 效果预览 | 信息 | -| --- | ----------- | ----- | ----- | +| 作者 | 下载链接 | 效果预览 | 信息 | +| --- | ----------- | ----- | ----- | |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码:1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码:2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 旧版需根据[issue](https://github.com/babysor/MockingBird/issues/37)修复 @@ -79,4 +79,70 @@ |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo | |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) -|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 | \ No newline at end of file +|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | 本代码库 | + +## 常見問題(FQ&A) +#### 1.數據集哪裡下載? +[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/) +> 解壓 aidatatang_200zh 後,還需將 `aidatatang_200zh\corpus\train`下的檔案全選解壓縮 + +#### 2.``是什麼意思? +假如數據集路徑為 `D:\data\aidatatang_200zh`,那麼 ``就是 `D:\data` + +#### 3.訓練模型顯存不足 +訓練合成器時:將 `synthesizer/hparams.py`中的batch_size參數調小 +``` +//調整前 +tts_schedule = [(2, 1e-3, 20_000, 12), # Progressive training schedule + (2, 5e-4, 40_000, 12), # (r, lr, step, batch_size) + (2, 2e-4, 80_000, 12), # + (2, 1e-4, 160_000, 12), # r = reduction factor (# of mel frames + (2, 3e-5, 320_000, 12), # synthesized for each decoder iteration) + (2, 1e-5, 640_000, 12)], # lr = learning rate +//調整後 +tts_schedule = [(2, 1e-3, 20_000, 8), # Progressive training schedule + (2, 5e-4, 40_000, 8), # (r, lr, step, batch_size) + (2, 2e-4, 80_000, 8), # + (2, 1e-4, 160_000, 8), # r = reduction factor (# of mel frames + (2, 3e-5, 320_000, 8), # synthesized for each decoder iteration) + (2, 1e-5, 640_000, 8)], # lr = learning rate +``` + +聲碼器-預處理數據集時:將 `synthesizer/hparams.py`中的batch_size參數調小 +``` +//調整前 +### Data Preprocessing + max_mel_frames = 900, + rescale = True, + rescaling_max = 0.9, + synthesis_batch_size = 16, # For vocoder preprocessing and inference. +//調整後 +### Data Preprocessing + max_mel_frames = 900, + rescale = True, + rescaling_max = 0.9, + synthesis_batch_size = 8, # For vocoder preprocessing and inference. +``` + +聲碼器-訓練聲碼器時:將 `vocoder/wavernn/hparams.py`中的batch_size參數調小 +``` +//調整前 +# Training +voc_batch_size = 100 +voc_lr = 1e-4 +voc_gen_at_checkpoint = 5 +voc_pad = 2 + +//調整後 +# Training +voc_batch_size = 6 +voc_lr = 1e-4 +voc_gen_at_checkpoint = 5 +voc_pad =2 +``` + +#### 4.碰到`RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).` +請參照 issue [#37](https://github.com/babysor/MockingBird/issues/37) + +#### 5.如何改善CPU、GPU佔用率? +適情況調整batch_size參數來改善 \ No newline at end of file diff --git a/README.md b/README.md index ca612f7..36678d1 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ * Download aidatatang_200zh or other dataset and unzip: make sure you can access all .wav in *train* folder * Preprocess with the audios and the mel spectrograms: `python pre.py ` -Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, aishell3 +Allow parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3 >If it happens `the page file is too small to complete the operation`, please refer to this [video](https://www.youtube.com/watch?v=Oh6dga-Oy10&ab_channel=CodeProf) and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed. @@ -50,7 +50,7 @@ Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, ais > Thanks to the community, some models will be shared: | author | Download link | Preview Video | Info | -| --- | ----------- | ----- |----- | +| --- | ----------- | ----- |----- | |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [Baidu Pan](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) Code:1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code:2021 | https://www.bilibili.com/video/BV1uh411B7AD/ @@ -83,4 +83,70 @@ or |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo | |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) -|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo | \ No newline at end of file +|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo | + +## F Q&A +#### 1.Where can I download the dataset? +[aidatatang_200zh](http://www.openslr.org/62/)、[magicdata](http://www.openslr.org/68/)、[aishell3](http://www.openslr.org/93/) +> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\corpus\train` + +#### 2.What is``? +If the dataset path is `D:\data\aidatatang_200zh`,then `` is`D:\data` + +#### 3.Not enough VRAM +Train the synthesizer:adjust the batch_size in `synthesizer/hparams.py` +``` +//Before +tts_schedule = [(2, 1e-3, 20_000, 12), # Progressive training schedule + (2, 5e-4, 40_000, 12), # (r, lr, step, batch_size) + (2, 2e-4, 80_000, 12), # + (2, 1e-4, 160_000, 12), # r = reduction factor (# of mel frames + (2, 3e-5, 320_000, 12), # synthesized for each decoder iteration) + (2, 1e-5, 640_000, 12)], # lr = learning rate +//After +tts_schedule = [(2, 1e-3, 20_000, 8), # Progressive training schedule + (2, 5e-4, 40_000, 8), # (r, lr, step, batch_size) + (2, 2e-4, 80_000, 8), # + (2, 1e-4, 160_000, 8), # r = reduction factor (# of mel frames + (2, 3e-5, 320_000, 8), # synthesized for each decoder iteration) + (2, 1e-5, 640_000, 8)], # lr = learning rate +``` + +Train Vocoder-Preprocess the data:adjust the batch_size in `synthesizer/hparams.py` +``` +//Before +### Data Preprocessing + max_mel_frames = 900, + rescale = True, + rescaling_max = 0.9, + synthesis_batch_size = 16, # For vocoder preprocessing and inference. +//After +### Data Preprocessing + max_mel_frames = 900, + rescale = True, + rescaling_max = 0.9, + synthesis_batch_size = 8, # For vocoder preprocessing and inference. +``` + +Train Vocoder-Train the vocoder:adjust the batch_size in `vocoder/wavernn/hparams.py` +``` +//Before +# Training +voc_batch_size = 100 +voc_lr = 1e-4 +voc_gen_at_checkpoint = 5 +voc_pad = 2 + +//After +# Training +voc_batch_size = 6 +voc_lr = 1e-4 +voc_gen_at_checkpoint = 5 +voc_pad =2 +``` + +#### 4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).` +Please refer to issue [#37](https://github.com/babysor/MockingBird/issues/37) + +#### 5. How to improve CPU and GPU occupancy rate? +Adjust the batch_size as appropriate to improve \ No newline at end of file