Visiting Professor in Computational Biology and Artifcial Intelligence
It is now straightforward to sequence the DNA in a person's genome, and databases that link genetic data to a range of phenotypes are becoming ever larger. What is less straightforward is to process and interpret these data. My group is interested in developing computational and statistical methods to help use these growing resrouces to answer questions in medical genomics and population genetics.
The questions we address range from data processing to interpretation of genetic variants and understanding the history our species. This range is reflected in the variety of projects we take on, which recently have included:
- improving the accuracy of reads from the Oxford Nanopore (ONT) single-moledule portable sequencing device;
- Inferring demographic events such as migrations and population bottlenecks from whole genome sequencing data;
- Understand the impact of non-coding mutations on disease by building sequence-to-phenotype models;
- Charting the differentiation of B cells in response to vaccination and infection.
We draw on a range of sources for our methods, but key recurring ingredients are Bayesian statistics, machine learning, and algorithm design. We are particularly interested in the application of deep learning methods, such as convolutional neural networks, and in particular for the interpretation of non-coding mutations these methods show a lot of promise. We also use a range of more traditional machine-learning methods, such as Bayesian statistics, hidden Markov models and particle filters, and design novel algorithms, such as based around the Burrows-Wheeler transform, to deal with the often very large data sets.
A unified haplotype-based method for accurate and comprehensive variant calling.
Cooke DP. et al, (2021), Nat Biotechnol
Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma.
Roberts HE. et al, (2021), Sci Rep, 11
Demographic inference from multiple whole genomes using a particle filter for continuous Markov jump processes.
Henderson D. et al, (2021), PLoS One, 16
DeepC: predicting 3D genome folding using megabase-scale transfer learning.
Schwessinger R. et al, (2020), Nat Methods, 17, 1118 - 1124
Multi Locus View : An Extensible Web Based Tool for the Analysis of Genomic Data
Sergeant MJ. et al, (2020)