Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Abstract Although rare missense variants underlying a number of Mendelian diseases have been noted to cluster in specific regions of proteins, this information may be underutilized when evaluating the pathogenicity of a gene or variant. We introduce ClusterBurden and GAMs , two methods for rapid association testing and predictive modelling, respectively, that combine variant burden and amino-acid residue clustering, in case-control studies. We show that ClusterBurden increases statistical power to identify disease genes driven by missense variants, in simulated and experimental 34-gene panel for hypertrophic cardiomyopathy. We then demonstrate that GAMs can be used to apply the ACMG criteria PM1 and PP3 quantitatively, and resolve a wide range of pathogenicity potential amongst variants of uncertain significance. An R package is available for association testing using ClusterBurden , and a web application ( Pathogenicity_by_Position) is available for missense variant risk prediction using GAMs for six sarcomeric genes. In conclusion, the inclusion of amino-acid residue positional information enhances the accuracy of gene and rare variant pathogenicity interpretation. Author Summary Two statistical methods have been developed that utilize signal in the residue position of missense variants. The first is a rapid association method that tests the joint hypothesis of an excess of rare-variants and rare-variant clustering. The method, ClusterBurden , is powerful when rare-missense variants cluster in discrete pathogenic regions of the protein. It can be applied to exome-scans to discover novel Mendelian disease-genes, that may not be identified by classic burden testing. The second method is a statistical model for rare-missense variant interpretation. It provides superior predictive performance compared to generic in silico predictors by training on our large case-control dataset. The method represents a data-driven quantitative approach to apply hotspot and in-silico prediction criteria from the ACMG variant interpretation guidelines.

Original publication




Journal article

Publication Date



HCMR Investigators