Julbø FMI., Henriksen AL., Pradhan M., Lindstrøm EK., Kostolomov I., Oukrif D., van der Schee L., Isaksen MX., Manet A., Skrede O-J., De Raedt S., Liestøl K., Askautrud HA., Eide TJ., Holme Ø., Kerr DJ., Shepherd NA., Lacle MM., Novelli M., Hveem TS., Kleppe A.

BACKGROUND: The shortage of pathologists presents a significant bottleneck in delivering timely and accurate diagnoses. To address this challenge, we developed the POLyp Artificial Intelligence-based RISk classifier (POLARIS), a prescreening tool designed to assist pathologists in handling the increasing volume of colorectal biopsies. METHODS: Using 15,079 whole-slide images (WSIs) from 2993 patients in the UK bowel cancer screening program between 2014 and 2018 and the histological diagnoses as ground truth, POLARIS was developed applying the open-source foundation model H-optimus-0, multiple instance learning, and a training scheme designed to provide robust models. Each WSI was categorised into one of five classes based on the diagnosis, reflecting increasing risk of malignancy, and regrouped into two broader categories for clinical interpretation. One category includes only samples not requiring pathological review, specifically those classified as normal tissue or tubular adenomas with low-grade dysplasia (LGD). The second category includes all other types of polyps, which are polyps recommended for review by a pathologist. A prespecified validation protocol defined how a specific model should be evaluated on a geographically external dataset comprising 10,842 WSIs from Cheltenham General Hospital between 2008 and 2019 and acquired using two scanners, Leica Aperio AT2 and Hamamatsu NanoZoomer XR. After validation, three experienced pathologists independently assessed the cases in which model prediction differed most from clinical diagnosis and, blinded to model output, reached a consensus. FINDINGS: In the external validation, POLARIS correctly identified 98.94% (95% CI 96.92-99.78%) of polyps with high-grade dysplasia (HGD) and adenocarcinoma as POLARIS-positive, and correctly classified 83.04% (81.66-84.36%) of normal and tubular adenomas with LGD as POLARIS-negative. The overall balanced accuracy was 86.65% for this prespecified primary analysis. In the tuning set, balanced accuracy was 87.64%, the sensitivity for polyps with HGD and adenocarcinoma was 100% (95% CI 97.14-100), and the specificity for normal and tubular adenomas with LGD was 81.89% (79.94-83.70). POLARIS achieved an area under the receiver operating characteristic curve (AUROC) in external validation of 0.9449 (95% CI 0.9384-0.9507) for distinguishing normal and tubular adenomas with LGD from polyps recommended for review by a pathologist and 0.9788 (0.9718-0.9844) for polyps with HGD and adenocarcinoma cases specifically. The model predicted the same class for both scanners in 97.93% (97.49-98.32%) of the cases. In the review of selected cases, the pathologists agreed with the model and not the clinical diagnosis in 92.5% of the cases. Heatmaps highlighting regions where the model indicates high-risk features closely correlated with areas annotated by expert pathologists as high-risk. INTERPRETATION: POLARIS has the potential to enhance diagnostic workflows in colorectal pathology by reliably identifying high-risk lesions and highlighting high-risk regions while substantially reducing the number of slides requiring pathologist review. FUNDING: The Norwegian Cancer Society.

Reliable classification of polyps based on artificial intelligence: a development and validation study.

Julbø FMI., Henriksen AL., Pradhan M., Lindstrøm EK., Kostolomov I., Oukrif D., van der Schee L., Isaksen MX., Manet A., Skrede O-J., De Raedt S., Liestøl K., Askautrud HA., Eide TJ., Holme Ø., Kerr DJ., Shepherd NA., Lacle MM., Novelli M., Hveem TS., Kleppe A.

DOI

Type

Publication Date

Volume

Keywords