Deep Bispectral Analysis of Conversational Speech Towards Emotional Climate Recognition
Alhussein G., Alkhodari M., Khandoker AH., Hadjileontiadis LJ.
Peers' conversational speech plays a significant role in shaping the emotional climate (EC) during interactions. Machine-based recognition of EC provides insights into the emotional perception of conversations by both peers and external observers. In this paper, we propose DeepBispec, a novel approach for EC recognition using deep bispectral analysis. DeepBispec applies windowed bispectral analysis to the 1D conversational speech signal. By capturing higher-order spectral correlations, the bispectrum magnifies the nonlinear characteristics present in speech signals. The estimated 2D -bispectrum magnitude contours, representing these interactions, are transformed into colored images and fed into a convolutional neural network (CNN). The CNN learns deep features from the bispectrum magnitude contours, enabling it to predict the valence (V) and arousal (A) labels associated with the EC. Evaluating DeepBispec on the K- EmoCon dataset using 10-fold cross-validation, we achieve an accuracy of 0.789 (A)/0.771 (V), an F1 score of 0.850 (A)/0.836 (V), and an area under the curve (AUC) of 0.812 (A)/0.788 (V). These results surpass existing benchmarks, demonstrating the effectiveness of bispectrum in capturing nonlinear characteristics and improving EC recognition. DeepBispec introduces an innovative approach to analyzing conversational speech for enhanced EC recognition. By leveraging deep bispectral analysis and CNN, it uncovers the higher-order spectral correlations and nonlinear dynamics of speech signals. This contributes to a deeper understanding of emotional dynamics in conversations and provides valuable insights into EC perception.