HyLLM: A RAG-Based Large Language Model Framework for Phenotype-Guided Reasoning in Hypertensive Management
Alkhodari M., Sattwika PD., Wilmes N., Baktash V., Lapidaire W., Leeson P., Kart T.
Hypertension is a pervasive condition that often goes unnoticed but can cause significant damage to vital organs. While imaging has proven central to detecting end-organ changes, interpreting results can be challenging for clinicians due the heterogeneous nature of multi-modal imaging data. In this study, we present HyLLM, a multi-stage large language model (LLM) framework designed to generate phenotypeguided explanations for the severity of hypertension-induced organ damage. It leverages a retrieval-augmented generation (RAG) pipeline to extract contextual information from evidence-based resources including PubMed database and current clinical guidelines. Three major LLMs, namely GPT-OSS, DeepSeek, and Mistral, were employed within the framework and evaluated with quantitative and clinical metrics using UK Biobank imaging substudy. Results showed that GPT-OSS achieved the best clinical evaluation with 94.4 % coherence and 86.1 % low risk of harm while Mistral achieved the highest faithfulness (94.7 %) and superior readability ease, with a score of 50.5, in our quantitative evaluation. These preliminary findings highlight HyLLM framework's ability to generate guideline-consistent insights, bridging predictive modelling with clinical interpretability to facilitate informed clinical-decision making.
