It is now straightforward to sequence the DNA in a person's genome, and databases that link genetic data to a range of phenotypes are becoming ever larger. What is less straightforward is to process and interpret these data. My group is interested in developing computational and statistical methods to help use these growing resrouces to answer questions in medical genomics and population genetics.
The questions we address range from data processing to interpretation of genetic variants and understanding the history our species. This range is reflected in the variety of projects we take on, which recently have included:
- improving the accuracy of reads from the Oxford Nanopore (ONT) single-moledule portable sequencing device;
- Inferring demographic events such as migrations and population bottlenecks from whole genome sequencing data;
- Understand the impact of non-coding mutations on disease by building sequence-to-phenotype models;
- Charting the differentiation of B cells in response to vaccination and infection.
We draw on a range of sources for our methods, but key recurring ingredients are Bayesian statistics, machine learning, and algorithm design. We are particularly interested in the application of deep learning methods, such as convolutional neural networks, and in particular for the interpretation of non-coding mutations these methods show a lot of promise. We also use a range of more traditional machine-learning methods, such as Bayesian statistics, hidden Markov models and particle filters, and design novel algorithms, such as based around the Burrows-Wheeler transform, to deal with the often very large data sets.
Sequencing of human genomes with nanopore technology.
Bowden R. et al, (2019), Nat Commun, 10
Haplotype matching in large cohorts using the Li and Stephens model.
Lunter G., (2019), Bioinformatics, 35, 798 - 806
Non-Mendelian Causes of Temporal Lobe Epilepsy
Krestel H. et al, (2018), EPILEPSIA, 59, S316 - S316
An Equivariant Bayesian Convolutional Network predicts recombination hotspots and accurately resolves binding motifs.
Brown R. and Lunter G., (2018), Bioinformatics
A unified haplotype-based method for accurate and comprehensive variant calling
Cooke D. et al, (2018)