Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Conversations between two peers exhibit a large amount of emotional content that dynamically creates an emotional climate (EC) during the conversation. The recognition of this EC using artificial intelligence (AI), gives an idea of how the conversation is emotionally interpreted by both interlocutors and external parties. This paper presents a new method for EC detection called DeepBispec that is based on deep bispectral processing of conversational speech. The latter is segmented based on emotional labels and subjected to windowed bispectral analysis. The calculated 2D-bispectrums are inputted as colored images to a convolutional neural network (CNN). The latter detects and extracts features from the bispectrum images, that are then fused with affect dynamics (AD) to classify arousal (A) and valence (V) into (low/high) classes. Extensive experiments on the IEMOCAP dataset with 2D emotions (i.e., A and V) show that DeepBispec outperforms previous state-of-the-art methods, achieving an accuracy of 0.826A/0.749V, sensitivity of 0.898A/0.774V, and area under the curve (AUC) of 0.845A/0.824V. The findings reveal the effectiveness of DeepBispec in detecting the emotional tone of conversations between peers, providing a deeper understanding of the emotional dynamics at play in social interactions.

Original publication

DOI

10.1145/3652037.3663887

Type

Conference paper

Publication Date

26/06/2024

Pages

576 - 581