Mispronunciation Detection for Spoken Isolated Words using Segmentation and Classification under Low Resource Conditions for Kannada Language
Savitha Murthy1, Pragnya Suresh2, Preet Shah3, Dinkar Sitaram4
1Savitha Murthy*, Department of CSE, PES University, Bangalore, India.
2Pragnya Suresh, Department of CSE, PES University, Bangalore, India.
3Preet Shah, Department of CSE, PES University, Bangalore, India.
4Dinkar Sitaram, Department of CSE, PES University, Bangalore, India.

Manuscript received on November 17., 2019. | Revised Manuscript received on November 24 2019. | Manuscript published on 30 November, 2019. | PP: 11874-11882 | Volume-8 Issue-4, November 2019. | Retrieval Number: D9112118419/2019©BEIESP | DOI: 10.35940/ijrte.D9589.118419

Open Access | Ethics and Policies | Cite  | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Relocation makes it inevitable for a person to learn the local pronunciations correctly. With the advent of mobile phones, language learning can be made easy and flexible. Our research involves Kannada Kali, a mobile and cloud based application that is being developed to assist people in learning the correct pronunciations of Kannada (a language spoken in India). Automatic Speech Recognition systems which are used to aid pronunciation training require to be trained on sufficient amount of spoken target language data. Since collecting such data in not easy, the objective of our research is to detect mispronounced segments of words with minimal data. When there is scarcity of data, a comparative approach where spoken word segments are compared with the canonical pronunciations is more effective for detecting anomaly in pronunciation. Since syllables are basic independent units of pronunciation, the spoken words are segmented into syllables for effective comparison and feedback. We propose an unsupervised segmentation method called Spectrogram Formant Contour Analysis that detects syllable boundaries by analysing the change in contours of the formants in the spoken word spectrograms. The task of mispronunciation detection is more effective when the application can identify the actual syllable pronounced and communicate the correct pronunciation to the user. For the purpose of syllable classification, our method employs a novel approach where a model is trained on phonemes and given syllables as input for identification. Our study includes comparing the performance of three machine learning algorithms, namely, Convolution Neural Network, Support Vector Machines and K-Nearest Neighbours on the task of identifying phonemes when they are trained on minimal data. The accuracy of KNN on phoneme classification was the best with 80% for clean and 60% for noisy data. In case of our initial results on syllable classification for Kannada Kali, Support Vector Machines gave the highest accuracy of almost 30%.
Keywords: Mispronunciation Detection, Kannada Isolated Words, Syllable Segmentation, Spectrogram Format Contour Analysis.
Scope of the Article: Classification.