AbstractClassifying histopathology images on a pixel-level requires sets of features able to capture the complex characteristics of the images, like the irregular cell morphology and the color heterogeneity on the tissue aspect. In this context, feature selection becomes a crucial step in the classification process such that it reduces model complexity and computational costs, avoids overfitting, and thereby it improves the model performance. In this study, we propose a new ensemble feature selection method by combining a set of base selectors, classifiers, and rank aggregation methods, aiming to determine from any initial set of handcrafted features, a smaller set of relevant color and texture pixel-level features, subsequently used for segmenting HER2 overexpression on a pixel-level, in breast cancer tissue images. We have been able to significantly reduce the set of initial features, using the proposed ensemble feature selection method. The best results are obtained using $$\chi ^2$$ χ 2 , Random Forest, and Runoff as the based selector, classifier, and aggregation method, respectively. The classification performance of the best model trained on the selected features set results in 0.939 recall, 0.866 specificity, 0.903 accuracy, 0.875 precision, and 0.906 F1-score.
Complex & Intelligent Systems
Springer Science and Business Media LLC