Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Emotions play a pivotal role in the individual's overall physical health. Therefore, there has been a steadily increasing interest towards emotion recognition in conversation (ERC). In this work, we propose bidirectional long short term memory (Bi-LSTM), convolutional neural network (CNN), and CNN-BiLSTM based models to predict the emotional climate established during the conversation by peers. Their speech signals across their conversation are analyzed using Mel frequency cepstral coefficients (MFCCs) that are then fed to the Bi-LSTM, CNN and CNN-BiLSTM models to predict the valence and arousal emotional climate cues. The proposed approach was tested on a publicly available dataset, namely K-EmoCon, that includes emotion labeling and peers' speech signals, during their conversation. The obtained results show that Bi-LSTM, CNN and CNN-BiLSTM models achieved a classification accuracy (arousal/valence) of 67.5%/57.7%, 73.3%/66.9%, and 75.1%/68.3%, respectively. These encouraging results show that a combination of deep learning schemes could increase the classification accuracy and provide efficient emotional climate recognition in naturalistic conversation environments.

Original publication




Conference paper

Publication Date



96 - 103