Emotional Climate Recognition in Speech-Based Conversations: Leveraging Deep Bispectral Image Analysis and Affect Dynamics
Alhussein G., Alkhodari M., Saleem S., Khandoker AH., Hadjileontiadis LJ.
The growing availability of conversational data across multiple platforms has intensified interest in dynamic emotion recognition. Speech plays a pivotal role in shaping the emotional climate (EC) of peer conversations. We propose DeepBispec, the first framework to integrate deep bispectral image analysis with affect dynamics (AD) for speech-based EC recognition. Bispectrum representations capture nonlinear and non-Gaussian speech characteristics, while AD descriptors model temporal emotion fluctuations. Evaluated on K-EmoCon, IEMOCAP and SEWA datasets, DeepBispec consistently improved EC classification performance. For example, on K-EmoCon, arousal accuracy increased from 79.0% (bispectrum only) to 81.4% (with AD), while valence accuracy improved from 76.8% to 77.5%; similar trends were observed for IEMOCAP and SEWA. DeepBispec outperformed strong CNN, LSTM, and Transformer baselines, demonstrating robust cross-lingual performance across seven languages. These findings highlight its potential for real-world applications such as mental health monitoring, affect-aware learning platforms and empathetic dialogue systems.
