We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


1 Million Unannotated Exons Discovered in the Human Genome

A DNA helix.
Credit: iStock.
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

Over two decades after the first human genome was sequenced, a team of researchers has discovered ~1 million new exons in the human genome.

The research group, from the University of Toronto’s (U of T) Donnelly Centre for Cellular and Biomolecular Research, said that none of the newly discovered exons are consistently found in the genomes of other species. “They seem to appear in the human genome mainly due to random mutation and are unlikely to play a significant role in our biology,” explained Dr. Timothy Hughes, principal investigator on the study and professor and chair of molecular genetics at U of T’s Temerty Faculty of Medicine. “This is evidence that evolution in humans involves a lot of trial and error – most likely enabled by the vast size of our genome.”

The study is published in Genome Research.

Surveying exons in the human genome

The human genome comprises ~20,000 genes. Genes consist of exons, DNA bases that encode protein, which are separated by introns – non-coding DNA sequences. When a gene is transcribed, a process called splicing removes introns, so that only exons are included in the final mRNA product, which is then translated into protein. Exons are regarded as autonomous if they do not require any external help to splice into a mature RNA transcript.

Hughes and colleagues assayed large fragments (100–500 base pairs) of the human genome using a method known as exon trapping. They wanted to test the exon definition model, a molecular biology concept that describes how splicing machinery is able to recognize exons during pre-mRNA processing. An assumption of this model is that the accurate removal of introns is achieved because there are clear and consistent indicators of where exons start, and where exons end. Sometimes, though, exon splicing doesn’t go as planned and mature RNA transcripts containing nonfunctional components are produced.

What is exon trapping?

Exon trapping is a traditional molecular biology technique that is used to find and isolate exons. A fragment of DNA is inserted into a vector that carries the DNA for introduction into a host cell. The RNA produced by the host cell is then analyzed, and exons that are expressed and “trapped” in the RNA can be detected using sequencing methods.  

“We used a classical ‘exon trapping’ assay to survey the human genome for autonomous exons whereby genomic fragments are assayed outside of their normal contextual setting, for example, flanking exons, promoter, transcription level and distal intronic sequences,” the authors described.

“We reasoned that this survey would allow us to query whether protein-coding exons are generally autonomous, whether exons exist elsewhere in the genome, what sequence features they possess and whether exons arise at random, which would partly explain the existence of long non-coding RNAs (lncRNAs).”

“While exon trapping is not widely used anymore, it proved to be effective when used in combination with high-throughput sequencing to scan the entire human genome,” Hughes described.

Almost 1 million unannotated exons

Hughes and colleagues defined any trapped exons as “autonomous”, of which there were ~1.25 million, including most known mRNAs and annotated lncRNAs.

Almost 1 million of the trapped exons are not annotated, Hughes and colleagues said: “These exons are not conserved, suggesting they are nonfunctional and arose from random mutations. They are nonetheless highly enriched with known splicing promoting sequence features that delineate known exons.”

The translation of randomly mutated exons could have consequences for human health. lncRNAs are autonomous but lack a known function – though they have been associated with the development of cancer.

Want more breaking news?

Subscribe to Technology Networks’ daily newsletter, delivering breaking science news straight to your inbox every day.

Subscribe for FREE

“This is an interesting study that broadens our knowledge of sequences across the human genome that have the potential to be recognized as exons in transcribed RNA,” Dr. Benjamin Blencowe, professor of molecular genetics at U of T, who was not involved in the study, said. “While the significance of the majority of the newly detected exons is unclear, some of them may be activated in certain contexts – for example, by disease mutations – and therefore cataloging them is important. This study will further serve as a valuable resource facilitating ongoing efforts directed at deciphering the splicing code.”

Improving tools such as SpliceAI

The researchers are confident that their exon trapping data will also be helpful when fed into programs such as SpliceAI, a tool that is used widely to determine splice sites. “SpliceAI often doesn’t provide details on the characteristics of exons and has a poor ability to predict splicing in exons that aren’t already cataloged,” said Hughes. “Our exon trapping data contains biologically meaningful information that can be fed into SpliceAI and other splicing predictors to open up new paths for exploring the dark genome.”

Reference: Stepankiw N, Yang AWH, Hughes TR. The human genome contains over a million autonomous exons. Genome Res. 2023. doi: 10.1101/gr.277792.123

This article is a rework of a press release issued by [name of institute]. Material has been edited for length and content.