Data-driven modelling of mutational hotspots and in-silico predictors in hypertrophic cardiomyopathy
Waring AJ., Harper AR., Salatino S., Kramer CM., Neubauer S., Thomson KL., Watkins H., Farrall M.
ABSTRACT Background Although rare-missense variants in Mendelian disease-genes have been noted to cluster in specific regions of proteins, it is not clear how to consider this information when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene-association and variant-interpretation that utilise this powerful signal. Methods We present a case-control rare-variant association test, ClusterBurden , that combines information on both variant-burden and variant-clustering. We then introduce a data-driven modelling framework to estimate mutational hotspots in genes with missense variant-clustering and integrate further in-silico predictors into the models. Results We show that ClusterBurden can increase statistical power to scan for putative disease-genes, driven by missense variants, in simulated data and a 34-gene panel dataset of 5,338 cases of hypertrophic cardiomyopathy. We demonstrate that data-driven models can allow quantitative application of the ACMG criteria PM1 and PP3, to resolve a wide range of pathogenicity potential amongst variants of uncertain significance. A web application ( Pathogenicity_by_Position ) is accessible for missense variant risk prediction of six sarcomeric genes and an R package is available for association testing using ClusterBurden . Conclusion The inclusion of missense residue position enhances the power of disease-gene association and improves rare-variant pathogenicity interpretation.