DEEP BISPECTRAL IMAGE ANALYSIS FOR SPEECH-BASED CONVERSATIONAL EMOTIONAL CLIMATE RECOGNITION
Alhussein G., Alkhodari M., Alfalahi H., Alshehhi A., Hadjileontiadis L.
Conversations between two peers exhibit a large amount of emotional content that dynamically creates an emotional climate (EC) during the conversation. The recognition of this EC using artificial intelligence (AI), gives an idea of how the conversation is emotionally interpreted by both interlocutors and external parties. This paper presents a new method for EC detection called DeepBispec that is based on deep bispectral processing of conversational speech. The latter is segmented based on emotional labels and subjected to windowed bispectral analysis. The calculated 2D-bispectrums are inputted as colored images to a convolutional neural network (CNN). The latter detects and extracts features from the bispectrum images, that are then fused with affect dynamics (AD) to classify arousal (A) and valence (V) into (low/high) classes. Extensive experiments on the IEMOCAP dataset with 2D emotions (i.e., A and V) show that DeepBispec outperforms previous state-of-the-art methods, achieving an accuracy of 0.826A/0.749V, sensitivity of 0.898A/0.774V, and area under the curve (AUC) of 0.845A/0.824V. The findings reveal the effectiveness of DeepBispec in detecting the emotional tone of conversations between peers, providing a deeper understanding of the emotional dynamics at play in social interactions.