Gonzales RA., Takahashi MS., Retson T., Banerjee I., Park SH., Kahn CE.

The effective integration of artificial intelligence (AI) systems into clinical medicine depends on comprehensive and transparent performance evaluation; however, the lack of standardized and widely accepted metrics poses challenges for reproducibility and model adoption. A comprehensive, machine-interpretable framework is presented to formalize the nomenclature and descriptions of 207 graphical, matrix, and scalar metrics used to measure AI model performance. The metrics taxonomy, developed as part of the Radiology Ontology of AI Datasets, Models and Projects (ROADMAP), provides a logically structured representation that captures the semantics of AI evaluation metrics, supports reasoning over metric classes, and enables automated completeness checks for AI model reporting. For each metric, the taxonomy incorporates a definition and citations to authoritative reference sources; where applicable, the taxonomy also includes synonyms, abbreviations, alternate language forms, mathematical formulae, and numerical bounds. The taxonomy supports evaluation of models operating on structured data, medical images, audio signals, and/or unstructured text. Logical axioms link each metric to one or more of 18 AI model performance criteria, including classification, calibration, image segmentation, and text analysis. By harmonizing terminology and enabling structured queries, ROADMAP's taxonomy of AI performance metrics facilitates model comparison, bias detection, and selection of appropriate evaluation methods across diverse datasets and clinical tasks. © RSNA, 2026 See also accompanying Special Report on ROADMAP ontology.

Metrics for Artificial Intelligence in Medicine: A Reference Resource.

Gonzales RA., Takahashi MS., Retson T., Banerjee I., Park SH., Kahn CE.

DOI

Type

Publication Date