Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review.

Alderman JE.; Charalambides M.; Sachdeva G.; Laws E.; Palmer J.; Lee E.; Menon V.; Malik Q.; Vadera S.; Calvert M.; Ghassemi M.; McCradden MD.; Ordish J.; Mateen B.; Summers C.; Gath J.; Matin RN.; Denniston AK.; Liu X.

Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review.

Alderman JE., Charalambides M., Sachdeva G., Laws E., Palmer J., Lee E., Menon V., Malik Q., Vadera S., Calvert M., Ghassemi M., McCradden MD., Ordish J., Mateen B., Summers C., Gath J., Matin RN., Denniston AK., Liu X.

During the COVID-19 pandemic, artificial intelligence (AI) models were created to address health-care resource constraints. Previous research shows that health-care datasets often have limitations, leading to biased AI technologies. This systematic review assessed datasets used for AI development during the pandemic, identifying several deficiencies. Datasets were identified by screening articles from MEDLINE and using Google Dataset Search. 192 datasets were analysed for metadata completeness, composition, data accessibility, and ethical considerations. Findings revealed substantial gaps: only 48% of datasets documented individuals' country of origin, 43% reported age, and under 25% included sex, gender, race, or ethnicity. Information on data labelling, ethical review, or consent was frequently missing. Many datasets reused data with inadequate traceability. Notably, historical paediatric chest x-rays appeared in some datasets without acknowledgment. These deficiencies highlight the need for better data quality and transparent documentation to lessen the risk that biased AI models are developed in future health emergencies.

Original publication

DOI

10.1016/S2589-7500(24)00146-8

Type

Journal article

Journal

Lancet Digit Health

Publication Date

11/2024

Volume

Pages

e827 - e847

Keywords

Humans, Artificial Intelligence, COVID-19, Datasets as Topic, Pandemics

Cookies on this website

Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review.

Alderman JE., Charalambides M., Sachdeva G., Laws E., Palmer J., Lee E., Menon V., Malik Q., Vadera S., Calvert M., Ghassemi M., McCradden MD., Ordish J., Mateen B., Summers C., Gath J., Matin RN., Denniston AK., Liu X.

DOI

Type

Journal

Publication Date

Volume

Pages

Keywords