Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Randomized clinical trials (RCTs) are designed to produce evidence in selected populations. Assessing their effects in the real-world is essential to change medical practice, however, key populations are historically underrepresented in the RCTs. We define an approach to simulate RCT-based effects in real-world settings using RCT digital twins reflecting the covariate patterns in an electronic health record (EHR). METHODS: We developed a Generative Adversarial Network (GAN) model, RCT-Twin-GAN, which generates a digital twin of an RCT (RCT-Twin) conditioned on covariate distributions from an EHR cohort. We improved upon a traditional tabular conditional GAN, CTGAN, with a loss function adapted for data distributions and by conditioning on multiple discrete and continuous covariates simultaneously. We assessed the similarity between a Heart Failure with preserved Ejection Fraction (HFpEF) RCT (TOPCAT), a Yale HFpEF EHR cohort, and RCT-Twin. We also evaluated cardiovascular event-free survival stratified by Spironolactone (treatment) use. RESULTS: By applying RCT-Twin-GAN to 3445 TOPCAT participants and conditioning on 3445 Yale EHR HFpEF patients, we generated RCT-Twin datasets between 1141-3445 patients in size, depending on covariate conditioning and model parameters. RCT-Twin randomly allocated spironolactone (S)/ placebo (P) arms like an RCT, was similar to RCT by a multi-dimensional distance metric, and balanced covariates (median absolute standardized mean difference (MASMD) 0.017, IQR 0.0034-0.030). The 5 EHR-conditioned covariates in RCT-Twin were closer to the EHR compared with the RCT (MASMD 0.008 vs 0.63, IQR 0.005-0.018 vs 0.59-1.11). RCT-Twin reproduced the overall effect size seen in TOPCAT (5-year cardiovascular composite outcome odds ratio (95\% confidence interval) of 0.89 (0.75-1.06) in RCT vs 0.85 (0.69-1.04) in RCT-Twin). CONCLUSIONS: RCT-Twin-GAN simulates RCT-derived effects in real-world patients by translating these effects to the covariate distributions of EHR patients. This key methodological advance may enable the direct translation of RCT-derived effects into real-world patient populations and may enable causal inference in real-world settings.

Original publication




Journal article



Publication Date