Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

The growing availability of conversational data across multiple platforms has intensified interest in dynamic emotion recognition. Speech plays a pivotal role in shaping the emotional climate (EC) of peer conversations. We propose DeepBispec, the first framework to integrate deep bispectral image analysis with affect dynamics (AD) for speech-based EC recognition. Bispectrum representations capture nonlinear and non-Gaussian speech characteristics, while AD descriptors model temporal emotion fluctuations. Evaluated on K-EmoCon, IEMOCAP and SEWA datasets, DeepBispec consistently improved EC classification performance. For example, on K-EmoCon, arousal accuracy increased from 79.0% (bispectrum only) to 81.4% (with AD), while valence accuracy improved from 76.8% to 77.5%; similar trends were observed for IEMOCAP and SEWA. DeepBispec outperformed strong CNN, LSTM, and Transformer baselines, demonstrating robust cross-lingual performance across seven languages. These findings highlight its potential for real-world applications such as mental health monitoring, affect-aware learning platforms and empathetic dialogue systems.

More information Original publication

DOI

10.1111/exsy.70146

Type

Journal article

Publication Date

2025-11-01T00:00:00+00:00

Volume

42