Whole-Exome Sequencing at the Dawn of Personalized Medicine
Article May 02, 2019 | By Dmitry Velmeshev
Deciphering the first complete sequence of the human genome in 2003 required a combined effort of scientists from 20 institutions and $3 billion of funding. Over the last decade, whole-exome sequencing (WES) established itself as a method that successfully balances cost and the output of useful data for diagnostic or research applications. Here, we look at how WES is used in both the laboratory and the clinic, and why it is a preferred method of choice in such areas.
Technology at the core of WES
Next-generation sequencing (NGS) of genomic DNA comes in two main flavors: whole-genome sequencing (WGS) and WES. “At their core, both WGS and WES rely on the same technology: massively parallel sequencing of millions to billions of short stretches of DNA produced by fragmenting long stands of genomics DNA,” says Marco Magistri, PhD, cancer researcher from the Sylvester Comprehensive Cancer Center at the University of Miami. However, whereas WGS aims to sequence the entire genome, WES involves an additional step which selects DNA fragments from the exonic regions of the genome.
Only one to two percent of the human genome codes for genes. The rest of the genome is mostly unexplored, though thanks to recent studies more and more “dark matter” DNA is being mapped to regulatory elements, such as enhancers1 and insulators2. In medicine, WGS offers the ability to identify potential disease-causing variations not only in genes, but also in regulatory elements.
However, this feature of WGS comes, literally, at a price: sequencing costs associated with WGS are three-four times that of WES3. Additionally, disease-causing DNA variants tend to cluster in and around gene encoding regions in common disorders such as autism4 and cancer5. Thus, WES is the common method of choice when a large cohort of patient samples need to be analyzed.
Data analysis bottlenecks
The output of a sequencing machine is a very large text file containing millions of strings called reads, sequences of the four DNA bases As, Ts, Gs and C.
A typical WGS or WES experiment requires every region of the genome to be sequenced 50-100 times (the exact number is referred to as “coverage”); otherwise, it is impossible to tell apart erroneous DNA bases introduced by PCR amplification (or misread by the sequencing machine) from true changes in the sequence of a patient’s DNA. Both in WES and WGS, the complete data analysis process is computationally demanding, usually requiring specialized high-performance computing systems with a lot of disk space, memory and parallelization capacity. Even on such systems, analyzing a single WES experiment takes several hours of computational time. “Since a WGS experiment produces ~50 times more raw data than a WES experiment at the same coverage, it is significantly more demanding in terms of computational power and time for data analysis,” comments Yonatan Perez, a postdoctoral fellow at the University of California, San Francisco.
How WES is applied in modern biomedical research
A combination of relatively low price and coverage of all (annotated) genes in the genome has made WES a “working horse” of modern biomedical research, offering a relatively cheap and compressive analysis of an individual's DNA. A recent study analyzed 35,584 whole exomes of autism patients6 and identified 99 high-confidence risk genes associated with the disorder. The study represents the largest to-date whole-exome cohort of patients with autism and expands the catalogue of high-confidence risk genes with novel candidates, such as DEAF1, KCNQ3 and SCN1A.
Increasingly, genomic DNA analysis, including WES, has not only been used as a stand-alone technique but rather in conjunction with technologies such as RNA sequencing; allowing for the identification of the effect variation in DNA sequence of a gene has on its expression level. A recent ongoing study combined WES with single-cell RNA sequencing7 (scRNA-seq), a technique that identifies gene expression changes in specific cell types of a complex tissue. “For our purpose, WES made a lot of sense to use since our focus is on applying scRNA-seq approaches to study changes in specific cell types in psychiatric diseases”, says Perez, who is part of research team combining WES with single-cell RNA sequencing of patient post-mortem brain tissue samples.
Since for the majority of regulatory intergenic regions, such as enhancers, the identity of the target gene is unknown and is subject to change depending on the cell type, WES data is often easier to relate to the expression of particular RNA and proteins. “We chose WES since our goal is to see whether genes dysregulated in patient brain tissue based on the single-cell RNA-seq analysis are associated with potentially pathogenic variants,” comments Perez.
WES as a diagnostic tool of choice
Until recently, DNA testing for disorders that include a genetic component was performed only using targeted assays for a single gene or a group of genes (often referred to as gene panels). These include testing for known predictive mutations in genes that are associated with subtypes of tumors, as well as for genetically defined neurodegenerative disorders, such as subtypes of frontotemporal dementia and early-onset Alzheimer’s disease. For example, mutations in BRCA1 or BRCA2 are well-established risk factors for breast cancer, and women carrying these mutations are found to have 55-85 percent lifetime risk of breast cancer and benefit from prophylactic mastectomy8. However, when the genetic cause of disease is unknown, but the clinical picture of a patient suggests a Mendelian disorder, WES has recommended itself as an ideal tool of causative gene discovery9. Moreover, the most common types of cancer and neurological diseases are too genetically heterogenous to be covered by a test for a single gene or even a panel of genes.
“In one of our projects in the lab, we are studying a previously uncharacterized type of lymphoma,” says Magistri. “Since we don’t know the landscape of somatic mutations for this cancer subtype, we are using WES to acquire an unbiased and comprehensive readout of the genetic variation in these tumors.”
In a relatively new field of precision oncology, WES combined with other unbiased techniques is the method of choice for comprehensive molecular diagnostics. For instance, the somatic mutations identified by WES of pedantic solid tumors were useful for diagnosis and treatment in 40% of patients10. Importantly, most of these mutations were missed by conventional targeted assays. Therefore, WES is an economic and comprehensive option for several diagnostic applications.
Future of WES in the era of personalized medicine
WGS has been proposed to offer a number of advantages over WES, including detecting noncoding disease-causing variants and providing an increased diagnostic yield11. “WGS is a more comprehensive approach but is more expensive than WES. As we learn more about genomic regulatory elements and the sequencing costs drop, using WGS will become more attractive,” says Perez.
“However, for studies involving large patient cohorts and for healthcare, WES will still be a valuable tool due to its reduced price and streamlined analysis.” NGS technologies have only recently started to compete with and complement targeted gene testing in healthcare, and until sequencing costs drop to allow for routine WGS testing, WES strikes that perfect balance of price and diagnostic value.
1. Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Reviews Genetics 15, 272, doi:10.1038/nrg3682 (2014).
2. West, A. G., Gaszner, M. & Felsenfeld, G. Insulators: many functions, many mechanisms. Genes Dev 16, 271-288, doi:10.1101/gad.954702 (2002).
3. Schwarze, K., Buchanan, J., Taylor, J. C. & Wordsworth, S. Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature. Genetics in Medicine 20, 1122-1130, doi:10.1038/gim.2017.247 (2018).
4. An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576, doi:10.1126/science.aat6576 (2018).
5. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47-54, doi:10.1038/nature17676 (2016).
6. Satterstrom, F. K. et al. Novel genes for autism implicate both excitatory and inhibitory cell lineages in risk. bioRxiv, 484113, doi:10.1101/484113 (2018).
7. Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Reviews Genetics 14, 618, doi:10.1038/nrg3542 (2013).
8. Meijers-Heijboer, H. et al. Breast Cancer after Prophylactic Bilateral Mastectomy in Women with a BRCA1 or BRCA2 Mutation. New England Journal of Medicine 345, 159-164, doi:10.1056/NEJM200107193450301 (2001).
9. Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42, 30-35, doi:10.1038/ng.499 (2010).
10. Parsons, D. W. et al. Diagnostic Yield of Clinical Tumor and Germline Whole-Exome Sequencing for Children With Solid TumorsDiagnostic Yield in Genetic Sequencing for Children With Solid TumorsDiagnostic Yield in Genetic Sequencing for Children With Solid Tumors. JAMA Oncology 2, 616-624, doi:10.1001/jamaoncol.2015.5699 (2016).
11. Rusch, M. et al. Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nature Communications 9, 3962, doi:10.1038/s41467-018-06485-7 (2018).
In the Code of the Wild documentary film, Cody Sheehy (award winning producer and director) and CRISPR scientist Samira Kiani explore the controversial and secretive world of genetic engineering. In an interview with Kiani and Sheehy, we cover all aspects of Code of the Wild, including the making of the documentary film, the discovery of Jiankui He's experiment that shook the world and the concept of being able to "purchase" a longer life span.READ MORE
Dr Evangelia Petsalaki is a Group Leader at the European Bioinformatics Group, where her research team study human cell signaling in health and disease conditions. Collaborating with teams specializing in MS, imaging and cell biology, their aim is to make both predictive and conditional models so they can anticipate what might happen in a biological network under different conditions.READ MORE