Automatic identification of respiratory diseases from stethoscopic lung sound signals using ensemble classifiers
Fraiwan L., Hassanin O., Fraiwan M., Khassawneh B., Ibnian AM., Alkhodari M.
This paper investigates the application of different homogeneous ensemble learning methods to perform multi-class classification of respiratory diseases. The case sample involved a total of 215 subjects and consisted of 308 clinically acquired lung sound recordings and 1176 recordings obtained from the ICBHI Challenge database. These recordings corresponded to a wide range of conditions including healthy, asthma, pneumonia, heart failure, bronchiectasis or bronchitis, and chronic obstructive pulmonary disease. Feature representation of the lung sound signals was based on Shannon entropy, logarithmic energy entropy, and spectrogram-based spectral entropy. Decision trees and discriminant classifiers were employed as base learners to build bootstrap aggregation and adaptive boosting ensembles. The optimal structure of the investigated ensemble models was identified through Bayesian hyperparameter optimization and was then compared to typical classifiers in literature. Experimental results showed that boosted decision trees provided the best overall accuracy, sensitivity, specificity, F1-score, and Cohen's kappa coefficient of 98.27%, 95.28%, 98.9%, 93.61%, and 92.28%, respectively. Among the baseline methods, SVM provided the best yet a slightly poorer performance, as demonstrated by its average accuracy (98.20%), sensitivity (91.5%), and specificity (98.55%). Despite their simplicity, the investigated ensemble classification methods exhibited a promising performance for detecting a wide range of respiratory disease conditions. The data fusion approach provides a promising insight into an alternative and more suitable solution to reduce the effect of imbalanced data for clinical applications in general and respiratory sound analysis studies in specific.