RNA-seq: Basics, Applications and Protocol
Article Apr 06, 2018 | by Ruairi J Mackenzie, Science Writer for Technology Networks
What is RNA-seq?
RNA-seq (RNA-sequencing) is a technique that can examine the quantity and sequences of RNA in a sample using next generation sequencing (NGS). It analyzes the transcriptome of gene expression patterns encoded within our RNA. Here, we look at why RNA-seq is useful, how the technique works, and the basic protocol which is commonly used today1.
What are the applications of RNA-seq?
RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA. Understanding the transcriptome is key if we are to connect the information on our genome with its functional protein expression. RNA-seq can tell us which genes are turned on in a cell, what their level of expression is, and at what times they are activated or shut off2. This allows scientists to more deeply understand the biology of a cell and assess changes that may indicate disease. Some of the most popular techniques that use RNA-seq are transcriptional profiling, SNP identification, RNA editing and differential gene expression analysis3.
This can give researchers vital information about the function of genes. For example, the transcriptome can highlight all the tissues in which a gene of unknown function is expressed, which might indicate what its role is. It also captures information about alternative splicing events (Figure 1), which produce different transcripts from one single gene sequence. These events would not be picked up by DNA sequencing. It can also identify post-transcriptional modifications that occur during mRNA processing such as polyadenylation and 5’ capping2.
Figure 1: RNA-seq data uses uses short reads of mRNA which is free of intronic non-coding DNA. These reads must then be aligned back to the reference genome.
How does RNA-seq work?
Early RNA-seq techniques used Sanger sequencing technology, a technique that although innovative at the time, was also low-throughput, costly, and inaccurate. It is only recently, with the advent and proliferation of NGS technology, have we been able to fully take advantage of RNA-seq’s potential4.
The first step in the technique involves converting the population of RNA to be sequenced into cDNA fragments (a cDNA library). This allows the RNA to be put into an NGS workflow. Adapters are then added to each end of the fragments. These adapters contain functional elements which permit sequencing; for example, the amplification element and the primary sequencing site. The cDNA library is then analyzed by NGS, producing short sequences which correspond to either one or both ends of the fragment. The depth to which the library is sequenced varies depending on techniques which the output data will be used for. The sequencing often follows either single-read or paired-end sequencing methods. Single-read sequencing is a cheaper and faster technique (for reference, about 1% of the cost of Sanger sequencing) that sequences the cDNA from just one end, whilst paired-end methods sequence from both ends, and are therefore more expensive and time-consuming5,6.
A further choice must be made between strand-specific and non-strand-specific protocols. The former method means the information about which DNA strand was transcribed is retained. The value of extra information obtained from strand-specific protocols make them the favorable option.
These reads, of which there will be many millions by the end of the workflow, can then be aligned to a genome of reference and assembled to produce an RNA sequence map that spans the transcriptome7.
RNA-seq vs microarrays: Why RNA-seq is considered superior
RNA-seq is widely regarded as superior to other technologies, such as microarray hybridization. There are several reasons for RNA-seq’s well-regarded status
Not limited to genomic sequences – unlike hybridization-based approaches, which may require species-specific probes, RNA-seq can detect transcripts from organisms with previously undetermined genomic sequences. This makes it fundamentally superior for the detection of novel transcripts, SNPs or other alterations.
Low background signal – the cDNA sequences used in RNA-seq can be mapped to targeted regions on the genome, which makes it easy to remove experimental noise. Furthermore, issues with cross-hybridization or sub-standard hybridization, which can plague microarray experiments, are not an issue in RNA-seq experiments.
More quantifiable - Microarray data is only ever displayed as values relative to other signals detected on the array, whilst RNA-seq data is quantifiable. RNA-seq also avoids the issues microarrays have in detecting very high or very low expression levels.
An RNA-seq protocol
Preparation prior to starting your RNA-seq experiment is essential. Questions to answer before starting include10:
• What method of RNA purification are you using?
• How many reads will you need?
• Which platform will you use?
• What reference genome will you use?
• How are you assessing the quality of your RNA?
• Do you need to enrich your target RNA?
• Will you barcode your RNA?
• Have I got enough biological and technical replicates?
• Single-read or paired-end sequencing?
cDNA Library Preparation
After these points have been considered, you can start preparing your cDNA library. This will require adding the platform-specific “adapter sequences” and amplification of the DNA, but the exact procedure will be very specific to the platform used at this stage. The amplification of the DNA involves a reverse transcriptase mediated first strand synthesis followed by a DNA polymerase-mediated second strand synthesis10,11.
Once the library is prepared, and adapters added, you can use your chosen sequencing platform to sequence your cDNA library to your desired depth. Once your transcript data has been produced, you can map the data to your reference genome. The alignment process can be complicated by the presence of splice variants and modifications, and the choice of reference genome used will also vary how difficult this stage is. Software packages such as STAR are useful at this stage, as are quality control tools like Picard or Qualimap12.
RNA-Seq Data Analysis
After the alignment stage, you can focus on analyzing your data. Tools like Sailfish, RSEM and BitSeq12 will help you quantify your expression levels, whilst tools like MISO, which quantifies alternatively spliced genes, are available for more specialized analysis13. There is a library of these tools out there, and reading reviews and roundups are your best way to find the right tool for your research.
To sum up, modern-day RNA-seq is well established as the superior option to microarrays and will likely remain the preferred option for the time being.
1. Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1), 57–63. https://doi.org/10.1038/nrg2484
2. Ozsolak, F., & Milos, P. M. (2011). RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics, 12(2), 87–98. https://doi.org/10.1038/nrg2934
3. Han, Y., Gao, S., Muegge, K., Zhang, W., & Zhou, B. (2015). Advanced Applications of RNA Sequencing and Challenges. Bioinformatics and Biology Insights, 9(Suppl 1), 29–46. https://doi.org/10.4137/BBI.S28991
4. Schuster, S. C. (2008). Next-generation sequencing transforms today’s biology. Nature Methods, 5(1), 16–18. https://doi.org/10.1038/nmeth1156
7. Zhao, S., Zhang, Y., Gordon, W., Quan, J., Xi, H., Du, S., … Zhang, B. (2015). Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genomics, 16(1). https://doi.org/10.1186/s12864-015-1876-7
8. Zhao, S., Fung-Leung, W.-P., Bittner, A., Ngo, K., & Liu, X. (2014). Comparison of RNA-seq and Microarray in Transcriptome Profiling of Activated T Cells. PLOS ONE, 9(1), e78644. https://doi.org/10.1371/journal.pone.0078644
12. Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., … Mortazavi, A. (2016). A survey of best practices for RNA-seq data analysis. Genome Biology, 17. https://doi.org/10.1186/s13059-016-0881-8
13. Katz, Y., Wang, E. T., Airoldi, E. M., & Burge, C. B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods, 7(12), 1009–1015. https://doi.org/10.1038/nmeth.1528
Bacteriophage (phage) are viruses that specifically infect bacteria. They have a two-phase lifecycle, residing in a dormant state within the host genome (lysogenic cycle) or hijacking the host cellular machinery for their own replication (lytic cycle). Here we will explore the important steps of the lytic cycle.READ MORE
Marilyn Cornelis, of the Feinberg School of Medicine at Northwestern University, has spent years researching how coffee affects our body. In this interview, she explains how coffee affects our brain, interacts with cannabis, and why listening to your body might be the best way to decide how much coffee to drink.READ MORE