Modification of Prosody for Emotion Conversion using Gaussian Regression Model
Geethashree A1, D J Ravi2
1Geethashree A, Department of Educational Credential Evaluators, Visvesvaraya Technological University, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India.
2D J Ravi, Department of Educational Credential Evaluators, Visvesvaraya Technological University, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India.
Manuscript received on 01 March 2019 | Revised Manuscript received on 05 March 2019 | Manuscript published on 30 July 2019 | PP: 3745-3752 | Volume-8 Issue-2, July 2019 | Retrieval Number: B3318078219/19©BEIESP | DOI: 10.35940/ijrte.B3318.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Emotion conversion is one of the most inspiring forefronts of research in the arena of emotional speech synthesis. The main focus of the work is to convert a neutral speech sentence to the target emotional speech sentence using signal processing techniques. The parameters used for emotion conversion are pitch contour and intensity along with the duration of the sentence. Kannada Emotional Speech (KES) Database is created and used for analysis. The database consists of 4 (sadness, happy, anger, and fear) emotions with neutral. The pitch contour of different emotional sentences are analyzed and Gaussian Regression Model (GRM) is proposed for predicting the target pitch contour. The evaluation of the proposed method is done using Objective test & Subjective test. For objective test, mean pitch, the standard deviation of pitch, mean intensity and duration of the sentences are used. Evaluation using a subjective test is performed by calculating Emotion Recognition Rate (ERR) with the help of confusion matrix and also by taking the Mean Opinion Score (MOS) rating of the conversion system on the scale of 1-5. The result of Subjective test indicates that the effectiveness and discernment of emotion are improved when GRM is used for pitch contour modification with intensity and duration. The most recognized emotion was sadness with MOS of 3.52 and ERR of 83% and the least recognized emotion was anger with MOS of 1.74 and ERR of 66%. The results of the subjective and objective test show that the converted sadness, happy and fear speech is seeming very close to usual sadness, anger and fear emotion.
Index Terms: Emotion Conversion, Gaussian Regression Model, Kannada Emotional Speech Database.
Scope of the Article: Regression and Prediction