It is now straightforward to sequence the DNA in a person's genome, and databases that link genetic data to a range of phenotypes are becoming ever larger. What is less straightforward is to process and interpret these data. My group is interested in developing computational and statistical methods to help use these growing resrouces to answer questions in medical genomics and population genetics.
The questions we address range from data processing to interpretation of genetic variants and understanding the history our species. This range is reflected in the variety of projects we take on, which recently have included:
- improving the accuracy of reads from the Oxford Nanopore (ONT) single-moledule portable sequencing device;
- Inferring demographic events such as migrations and population bottlenecks from whole genome sequencing data;
- Understand the impact of non-coding mutations on disease by building sequence-to-phenotype models;
- Charting the differentiation of B cells in response to vaccination and infection.
We draw on a range of sources for our methods, but key recurring ingredients are Bayesian statistics, machine learning, and algorithm design. We are particularly interested in the application of deep learning methods, such as convolutional neural networks, and in particular for the interpretation of non-coding mutations these methods show a lot of promise. We also use a range of more traditional machine-learning methods, such as Bayesian statistics, hidden Markov models and particle filters, and design novel algorithms, such as based around the Burrows-Wheeler transform, to deal with the often very large data sets.
Inferring B cell specificity for vaccines using a Bayesian mixture model.
Fowler A. et al, (2020), BMC Genomics, 21
Repertoire-wide phylogenetic models of B cell molecular evolution reveal evolutionary signatures of aging and vaccination
PYBUS O. et al, (2019), PNAS
DeepC: Predicting chromatin interactions using megabase scaled deep neural networks and transfer learning
Schwessinger R. et al, (2019)
Sequencing of human genomes with nanopore technology.
Bowden R. et al, (2019), Nat Commun, 10
Haplotype matching in large cohorts using the Li and Stephens model.
Lunter G., (2019), Bioinformatics, 35, 798 - 806