Deep Learning for Emotion Recognition in Affective Virtual Reality and Music Applications
Jason Teo1, Jia Tian Chia2, Jie Yu Lee3

1Jason Teo, Faculty of Computing & Informatics, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu, Sabah, Malaysia.
2Jia Tian Chia, Faculty of Computing & Informatics, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu, Sabah, Malaysia
3Jie Yu Lee, Faculty of Computing & Informatics, Universiti Malaysia Sabah, Jalan UMS, Kota Kinabalu, Sabah, Malaysia.
Manuscript received on 26 June 2019 | Revised Manuscript received on 14 July 2019 | Manuscript Published on 26 July 2019 | PP: 162-170 | Volume-8 Issue-2S2 July 2019 | Retrieval Number: B10300782S219/2019©BEIESP | DOI: 10.35940/ijrte.B1030.0782S219
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: This paper presents a deep learning approach to emotion recognition as applied to virtual reality and music predictive analytics. Firstly, it investigates the deep parameter tuning of the multi-hidden layer neural networks, which are also commonly referred to simply as deep networks that are used to conduct emotion detection in virtual reality (VR)- electroencephalography (EEG) predictive analytics. Deep networks have been studied extensively over the last decade and have shown to be among the most accurate methods for predictive analytics in image recognition and speech processing domains. However, most predictive analytics deep network studies focus on the shallow parameter tuning when attempting to boost prediction accuracies, which includes deep network tuning parameters such as number of hidden layers, number of hidden nodes per hidden layer and the types of activation functions used in the hidden nodes. Much less effort has been put into investigating the tuning of deep parameters such as input dropout ratios, L1 (lasso) regularization and L2 (ridge regularization) parameters of the deep networks. As such, the goal of this study is to perform a parameter tuning investigation on these deep parameters of the deep networks for predicting emotions in a virtual reality environment using electroencephalography (EEG) signal obtained when the user is exposed to immersive content. The results show that deep tuning of deep networks in VR-EEG can improve the accuracies of predicting emotions. The best emotion prediction accuracy was improved to over 96% after deep tuning was conducted on the deep network parameters of input dropout ratio, L1 and L2 regularization parameters. Secondly, it investigates a similar possible approach when applied to 4-quadrant music emotion recognition. Recent studies have been characterizing music based on music genres and various classification techniques have been used to achieve the best accuracy rate. Several researches on deep learning have shown outstanding results in relation to dimensional music emotion recognition. Yet, there is no concrete and concise description to express music. In regards to this research gap, a research using more detailed metadata on two-dimensional emotion annotations based on the Russell’s model is conducted. Rather than applying music genres or lyrics into machine learning algorithm to MER, higher representation of music information, acoustic features are used. In conjunction with the four classes classification problem, an available dataset named AMG1608 is feed into a training model built from deep neural network. The dataset is first preprocessed to get full access of variables before any machine learning is done. The classification rate is then collected by running the scripts in R environment. The preliminary result showed a classification rate of 46.0%. Experiments on architecture and hyper-parameter tuning as well as instance reduction were designed and conducted. The tuned parameters that increased the accuracy for deep learners were hidden layer architecture, number of epochs, instance reduction, input dropout ratio and ℓ1 and ℓ2 regularization. The final best prediction accuracy obtained was 61.7%, giving an overall improvement of more than 15% for music emotion recognition which are based purely on the music’s acoustical features.
Keywords: Neuroinformatics, Virtual Reality, Deep Learning, Electroencephalography, Emotion Classification, Music Emotion Recognition, Acoustic Features.
Scope of the Article: Deep Learning