Alhussein G., Alkhodari M., Saleem S., Khandoker AH., Hadjileontiadis LJ.

The growing availability of conversational data across multiple platforms has intensified interest in dynamic emotion recognition. Speech plays a pivotal role in shaping the emotional climate (EC) of peer conversations. We propose DeepBispec, the first framework to integrate deep bispectral image analysis with affect dynamics (AD) for speech-based EC recognition. Bispectrum representations capture nonlinear and non-Gaussian speech characteristics, while AD descriptors model temporal emotion fluctuations. Evaluated on K-EmoCon, IEMOCAP and SEWA datasets, DeepBispec consistently improved EC classification performance. For example, on K-EmoCon, arousal accuracy increased from 79.0% (bispectrum only) to 81.4% (with AD), while valence accuracy improved from 76.8% to 77.5%; similar trends were observed for IEMOCAP and SEWA. DeepBispec outperformed strong CNN, LSTM, and Transformer baselines, demonstrating robust cross-lingual performance across seven languages. These findings highlight its potential for real-world applications such as mental health monitoring, affect-aware learning platforms and empathetic dialogue systems.

Emotional Climate Recognition in Speech-Based Conversations: Leveraging Deep Bispectral Image Analysis and Affect Dynamics

Alhussein G., Alkhodari M., Saleem S., Khandoker AH., Hadjileontiadis LJ.

DOI

Type

Publication Date

Volume