Interpretable machine learning leverages proteomics to improve cardiovascular disease risk prediction and biomarker identification.

Climente-González H., Oh M., Chajewska U., Hosseini R., Mukherjee S., Gan W., Traylor M., Hu S., Fatemifar G., Ghouse J., Del Villar PP., Vernet E., Koelling N., Du L., Abraham R., Li C., Howson JMM.

BACKGROUND: Cardiovascular diseases (CVDs) rank amongst the leading causes of long-term disability and mortality. Predicting CVD risk and identifying associated genes are crucial for prevention, early intervention, and drug discovery. The recent availability of UK Biobank Proteomics data enables investigation of blood proteins and their association with a variety of diseases. We sought to predict 10 year CVD risk using this data modality and known CVD risk factors. METHODS: We focused on the UK Biobank participants that were included in the UK Biobank Pharma Proteomics Project. After applying exclusions, 50,057 participants were included, aged 40-69 years at recruitment. We employed the Explainable Boosting Machine (EBM), an interpretable machine learning model, to predict the 10 year risk of primary coronary artery disease, ischemic stroke or myocardial infarction. The model had access to 2978 features (2923 proteins and 55 risk factors). Model performance was evaluated using 10-fold cross-validation. RESULTS: The EBM model using proteomics outperforms equation-based risk scores such as PREVENT, with a receiver operating characteristic curve (AUROC) of 0.767 and an area under the precision-recall curve (AUPRC) of 0.241; adding clinical features improves these figures to 0.785 and 0.284, respectively. Our models demonstrate consistent performance across sexes and ethnicities and provide insights into individualized disease risk predictions and underlying disease biology. CONCLUSIONS: In conclusion, we present a more accurate and explanatory framework for proteomics data analysis, supporting future approaches that prioritize individualized disease risk prediction, and identification of target genes for drug development.

More information Original publication

DOI

10.1038/s43856-025-00872-0

Type

Journal article

Publication Date

2025-05-19T00:00:00+00:00

Cookies on this website

Interpretable machine learning leverages proteomics to improve cardiovascular disease risk prediction and biomarker identification.

Climente-González H., Oh M., Chajewska U., Hosseini R., Mukherjee S., Gan W., Traylor M., Hu S., Fatemifar G., Ghouse J., Del Villar PP., Vernet E., Koelling N., Du L., Abraham R., Li C., Howson JMM.

DOI

Type

Publication Date

Volume