LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq
Hentges LD., Sergeant MJ., Downes DJ., Hughes JR., Taylor S.
<jats:title>Abstract</jats:title><jats:p>Genomics technologies, such as ATAC-seq, ChIP-seq, and DNase-seq, have revolutionized molecular biology, generating a complete genome’s worth of signal in a single assay. Coupled with the use of genome browsers, researchers can now see and identify important DNA encoded elements as peaks in an analog signal. Despite the ease with which humans can visually identify peaks, converting these signals into meaningful genome-wide peak calls from such massive datasets requires complex analytical techniques. Current methods use statistical frameworks to identify peaks as sites of significant signal enrichment, discounting that the analog data do not follow any archetypal distribution. Recent advances in artificial intelligence have shown great promise in image recognition, on par or exceeding human ability, providing an opportunity to reimagine and improve peak calling. We present an interactive and intuitive peak calling framework, LanceOtron, built around image recognition using a wide and deep neural network. We hand-labelled 499Mb of genomic data, built 5,000 models, and tested with over 100 unique users from labs around the world. In benchmarking open chromatin, transcription factor binding, and chromatin modification datasets, LanceOtron outperforms the long-standing, gold-standard peak caller MACS2 with its increased selectivity and near perfect sensitivity. Additionally, this command-line optional approach allows researchers to easily generate optimal peak-calls using only a web interface. Together, the enhanced performance, and usability of LanceOtron will improve the reliability and reproducibility of peak calls and subsequent data analysis. This tool highlights the general utility of applying machine learning to genomic data extraction and analysis.</jats:p>