WaveNet

Abstract

This is a clone of Chainer-Examples-WaveNet and an experiment on Google Colaboratory.

github repository

Details of Operation

Please see the document "Synthesize_Human_Speech_with_WaveNet" in the docs folder. That's a softcopy of web of chainer-colab-notebook, wavenet in the reference.

Experiment on Google Colaboratory

Chainer-colab-notebook, Synthesize Human Speech with WaveNet, using CSTR VCTK Corpus.
Following is Loss and Accuray vs iteration, around 20 hours computation, 12 epoch.

loss

At iteration 57000, there is unusual drop shape, it causes resume re-start, due to Colaboratory time limit.

accuaracy

Accuracy is still too low (loss is still high), if without conditioning, waveform generation will fail.

Samples

In the samples folder, there are original and generated wav files, and model file (a snapshot). Suffix of snapshot_iter shows iteration number. Original wav are some wav of Pannous, english digits. (Please see reference link.)
When you generate using this model file, You should specify --n_loop 2 as generate.py arrangement, due to Chainer-colab-notebook wavenet n_loop is 2.

zero

At this iteration 57000 (7 epoch), generated waveform is still dirty than original human speech waveform.

nine

Athough generated waveform from iteration only 8000 (1 epoch), it can hear some reverberation of utterance /nine/(=digit 9).
It's may conditoning trick !?

Reference

License

Regarding to Chainer, please see the LICENSE-chainer.txt in the docs folder.