A ‘new deep learning’ method, DeepCpG, designed by researchers at EMBL-EBI, the Babraham Institute and the Sanger Institute helps scientists better understand the epigenome – the biochemical activity around the genome. Published in Genome Biology, DeepCpG leverages ‘deep neural networks’, a multi-layered machine learning model inspired by the brain. Machine learning provides a valuable tool for research into health and disease.
Deep learning is one of the most active fields in machine learning, which has led to recent advancements in computer image classification, text translation and speech recognition. But deep learning also has major potential in computational biology, particularly for regulatory genomics and cellular imaging.
Nice book. But what does it mean?
“We now have this amazing ‘book’ of the human genome, thanks to projects like 1000 Genomes, divided up nicely into chapters and annotated in parts. But what does it mean? If we want to really understand how life works, we need to decipher both the genome – the set of instructions repeated in every cell – and the epigenome, the part that varies wildly between cells,” explains Oliver Stegle group leader at EMBL-EBI.
To better understand how DNA sequences relate to biological changes, the genomics community is turning to artificial neural networks – a class of machine learning methods first introduced in the 1980s and inspired by the wiring of the brain. More recently, these models have been rebranded as ‘deep neural networks’, which form the field of ‘deep learning’.
A recent review of deep learning for Molecular Systems Biology provides a ‘user guide’ to how deep learning can be applied in genomics – an area of rapid technological change.
“Single-cell genomics allows us to generate a huge amount of highly detailed information about the genome and all the activity happening around it, in many different types and subtypes of cells. The complexity is simply staggering, and the idea of explicitly probing each of these potential interactions individually is not really workable,” says Stegle.
“Most existing methods require you to know a lot up front, for example which patterns in the DNA sequence are informative for a specific task. However, there is a huge number of possible patterns in the genome that we could explore, so these existing methods are not practical for genomics,” adds Christof Angermueller, PhD candidate at EMBL-EBI. “With deep learning, you do not have to spend your time on manually crafting features that capture these patterns. Instead, the model uses raw DNA sequences as input and discovers relevant patters itself.”
Accelerating single-cell genomics
The team leveraged the capacity of deep learning to fill in the gaps in single-cell genomics, an emerging technology that offers a close-up view on epigenetics.
DeepCpG was designed to help scientists learn about the connections between DNA sequences and DNA methylation – a biochemical modification of the genome sequence that can act like an off-switch for individual genes. Methylation plays a key part in important biological processes, including cell development, ageing and cancer progression.
The new method uses genomic and epigenomic data to make predictions about DNA methylation in single cells. This is important because current technologies provide incomplete information about this. With DeepCpG, researchers can obtain a more complete picture of DNA methylation. The model can also be used to obtain new biological insights, for example on the connection between the DNA sequence and methylation.
“DeepCpG actually learns meaningful features in a data-driven manner,” says Angermueller. “It has major advantages over previous methods, including the ability to more accurately predict DNA methylation and to study intercellular differences. By studying the wiring of the learnt network, we can understand how the biology of DNA methylation works. This has allowed us to recover known DNA sequence motifs that are important for methylation changes, as well as to discover new motifs, which are the starting point for future studies.”
“We have demonstrated that DeepCpG enables to accurately predict and analyse DNA methylation in single cells. However, DeepCpG is just one example of how we can apply deep learning to genomics and single-cell technologies,” says Stegle. “It is exciting to see the versatile applications deep learning has already found in genomics. I am looking forward to seeing more deep learning techniques come online. I believe it will make a big difference to how we study biology and has the potential to yield new answers about how life works.”
“Single cell epigenomics methods provide exciting insights into cell heterogeneity in development, ageing and disease; however if you are just dealing with two genomes (in a single cell) bits of information are often lost during the experiment,” explains Wolf Reik of the Babraham Institute Wolf Reik of the Babraham Institute and Associate Faculty member at the Sanger Institute. “This new method recognises patterns of the epigenome in single cells and then reconstructs lost information, returning a data-rich single cell epigenome.”
"Deep learning is now the state-of-the art in many fields. We are exploring its utility for making sense of large scale biological data. Pioneering studies, such as the one by Angermueller and colleagues, prove that there is lot to be gained by using deep learning methods in computational biology,” concludes Leopold Parts, Group Leader at the Sanger Institute.