Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND AND PURPOSE: One-fifth of ischemic strokes are embolic strokes of undetermined source (ESUS). Their theoretical causes can be classified as cardioembolic versus noncardioembolic. This distinction has important implications, but the categories' proportions are unknown. METHODS: Using data from the Cornell Acute Stroke Academic Registry, we trained a machine-learning algorithm to distinguish cardioembolic versus non-cardioembolic strokes, then applied the algorithm to ESUS cases to determine the predicted proportion with an occult cardioembolic source. A panel of neurologists adjudicated stroke etiologies using standard criteria. We trained a machine learning classifier using data on demographics, comorbidities, vitals, laboratory results, and echocardiograms. An ensemble predictive method including L1 regularization, gradient-boosted decision tree ensemble (XGBoost), random forests, and multivariate adaptive splines was used. Random search and cross-validation were used to tune hyperparameters. Model performance was assessed using cross-validation among cases of known etiology. We applied the final algorithm to an independent set of ESUS cases to determine the predicted mechanism (cardioembolic or not). To assess our classifier's validity, we correlated the predicted probability of a cardioembolic source with the eventual post-ESUS diagnosis of atrial fibrillation. RESULTS: Among 1083 strokes with known etiologies, our classifier distinguished cardioembolic versus noncardioembolic cases with excellent accuracy (area under the curve, 0.85). Applied to 580 ESUS cases, the classifier predicted that 44% (95% credibility interval, 39%-49%) resulted from cardiac embolism. Individual ESUS patients' predicted likelihood of cardiac embolism was associated with eventual atrial fibrillation detection (OR per 10% increase, 1.27 [95% CI, 1.03-1.57]; c-statistic, 0.68 [95% CI, 0.58-0.78]). ESUS patients with high predicted probability of cardiac embolism were older and had more coronary and peripheral vascular disease, lower ejection fractions, larger left atria, lower blood pressures, and higher creatinine levels. CONCLUSIONS: A machine learning estimator that distinguished known cardioembolic versus noncardioembolic strokes indirectly estimated that 44% of ESUS cases were cardioembolic.

Original publication




Journal article



Publication Date





e203 - e210


atrial fibrillation, embolism, machine learning, probability, stroke, Aged, Aged, 80 and over, Algorithms, Atrial Fibrillation, Decision Trees, Electrocardiography, Female, Humans, Image Processing, Computer-Assisted, Intracranial Embolism, Machine Learning, Male, Middle Aged, Predictive Value of Tests, ROC Curve, Registries, Reproducibility of Results, Stroke