Multi-Language Datasets for Speech Recognition Based on the End-to-End Framework

Liang, SendongMulti-Language Datasets for Speech Recognition Based on the End-to-End FrameworkAuckland University of Technology2021Speech recognitionEnd-to-endAttention modelCTC modelMy UniversityMy UniversityYan, Wei Qi2021-07-012021-07-01202120212021-07-01enThesishttps://hdl.handle.net/10292/14318OpenAccessIn this thesis, the end-to-end framework for speech recognition is probed with multilanguage datasets. The focus of this thesis is on the end-to-end framework. Our objective is to improve the performance of the CTC/Attention model. To compare speech recognition performance in different languages, we designed and built three small datasets, including Chinese, English and Code-Switch. We compare the performance of the hybrid CTC/Attention model in multiple languages environment. Throughout our experiments, we explore that the end-to-end framework of the CTC/Attention model achieves similar or better performance with the HMM-DNN model in a single language and Code-Switch speaking environment. Moreover, speech recognition in different languages is compared in this thesis.