RNA-Seq: Basics, Applications and Protocol
RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA.
Complete the form below to unlock access to ALL audio articles.
What is RNA-seq?
RNA-seq (RNA-sequencing) is a technique that can examine the quantity and sequences of RNA in a sample using next-generation sequencing (NGS). It analyzes the transcriptome, indicating which of the genes encoded in our DNA are turned on or off and to what extent. Here, we look at why RNA-seq is useful, how the technique works and the basic protocol that is commonly used today.1
Contents
What are the applications of RNA-seq?
How does RNA-seq work?
RNA-seq vs microarrays: Why RNA-seq is considered superior
An RNA-seq protocol
- Experiment planning
- cDNA library preparation
- cDNA sequencing
- RNA-seq data analysis
Challenges of RNA-seq
What are the applications of RNA-seq?
RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA. Understanding the transcriptome is key if we are to connect the information in our genome with its functional protein expression. RNA-seq can tell us which genes are turned on in a cell, what their level of transcription is, and at what times they are activated or shut off.2 This allows scientists to understand the biology of a cell more deeply and assess changes that may indicate disease. Some of the most popular techniques that use RNA-seq are transcriptional profiling, single nucleotide polymorphism (SNP) identification,3 RNA editing and differential gene expression analysis.4
This can give researchers vital information about the function of genes. For example, the transcriptome can highlight all the tissues in which a gene of unknown function is turned on, which might indicate what its role is. It also captures information about alternative splicing events (Figure 1), which produce different transcripts from one single gene sequence. These events would not be picked up by DNA sequencing. It can also identify post-transcriptional modifications that occur during mRNA processing such as polyadenylation and 5’ capping.2
Figure 1: RNA-seq data uses short reads of mRNA which is free of intronic non-coding DNA. These reads must then be aligned back to the reference genome. Credit: Technology Networks.
How does RNA-seq work?
Early RNA-seq techniques used Sanger sequencing technology, a technique that although innovative at the time was also low-throughput and costly. It is only recently, with the advent and proliferation of NGS technology, have we been able to fully take advantage of RNA-seq’s potential.5
An RNA-seq workflow has several steps, which can be broadly summarized as:
- RNA extraction
- Reverse transcription into cDNA
- Adapted ligation
- Amplification
- Sequencing
Once you have obtained your RNA sample for analysis, the first step in the technique involves converting the population of RNA to be sequenced into complimentary DNA (cDNA) fragments (a cDNA library). This is done by reverse transcription and allows the RNA to be put into an NGS workflow. The cDNA is then fragmented, and adapters are added to each end of the fragments. These adapters contain functional elements which permit sequencing, for example, the amplification element (which facilitates clonal amplification of the fragments) and the primary sequencing priming site. Following processes of amplification, size selection, clean-up and quality checking, the cDNA library is then analyzed by NGS, producing short sequences that correspond to all or part of the fragment from which it was derived. The depth to which the library is sequenced varies depending on the purpose for which the output data will be used for. Sequencing may follow either single-end or paired-end sequencing methods. Single-read sequencing is a cheaper and faster technique (for reference, about 1% of the cost of Sanger sequencing) that sequences the cDNA fragments from just one end. Paired-end methods sequence from both ends and are therefore more expensive6,7 but offer advantages in post-sequencing data reconstruction.
A further choice must be made between strand-specific and non-strand-specific protocols. The former method means the information about which DNA strand was transcribed is retained. The value of extra information obtained from strand-specific protocols make them the favorable option.
These reads, of which there will be many millions by the end of the workflow, can then be aligned to a reference genome if available or assembled de novo to produce an RNA sequence map that spans the transcriptome.8
RNA-seq vs microarrays: Why RNA-seq is considered superior
RNA-seq is widely regarded as superior to other technologies, such as microarray hybridization. There are several reasons for RNA-seq’s well-regarded status:
Not limited to genomic sequences – unlike hybridization-based approaches, which may require species-specific probes, RNA-seq can detect transcripts from organisms with previously undetermined genomic sequences. This makes it fundamentally superior for the detection of novel transcripts, SNPs or other alterations.9,10
Low background signal – the cDNA sequences used in RNA-seq can be mapped to targeted regions on the genome, which makes it easy to remove experimental noise. Furthermore, issues with cross-hybridization or sub-standard hybridization, which can plague microarray experiments, are not an issue in RNA-seq experiments.
More quantifiable - Microarray data is only ever displayed as values relative to other signals detected on the array, whilst RNA-seq data is quantifiable. RNA-seq also avoids the issues microarrays have in detecting very high or very low transcription levels.
Figure 2: A workflow for RNA-seq. Credit: Technology Networks.
An RNA-seq protocol
Experiment planning
Preparation prior to starting your RNA-seq experiment is essential. Questions to answer before starting include:11
- What method of RNA purification are you using?
- What read depth will you need?
- Which platform will you use?
- Is there a reference genome available and which will you use?
- How are you assessing the quality of your RNA?
- Do you need to enrich your target RNA?
- Will you barcode your RNA?
- Have I got enough biological and technical replicates?
- Single-end or paired-end sequencing?
- What read length will you use?
- Do I want to retain strand-specific information?
cDNA library preparation
cDNA sequencing
RNA-seq data analysis
To sum up, modern-day RNA-seq is well established as the superior option to microarrays and will likely remain the preferred option for the time being.
Challenges of RNA-seq
Significant progress has been made in the field of RNA-seq over the last decade or so. The associated costs have reduced significantly while throughput has increased, sequence fidelity is far superior to earlier iterations of the NGS technologies and the availability of data analysis tools and pipelines has improved tremendously. However, there remain a number of challenges for scientists to bear in mind when considering RNA-seq experiments. These include:
Isolating sufficient, high-quality RNA – while the sample quantity requirements for RNA-seq analysis have reduced drastically, it is still important to ensure you are able to obtain sufficient RNA to fulfill all your analysis requirements, including repeats if necessary. It is also important to bear in mind that, while you may isolate total RNA, depending upon your experimental question, you are likely only to be sequencing a fraction of this (typically messenger RNA (mRNA)), further reducing your sample quantity. This must also be of high quality and purity as poor samples are likely to lead to poor results, or in some cases failure within the library preparation protocol. The quality and concentration of RNA can be determined using UV-visible spectroscopy. Unlike DNA, RNA degrades rapidly so it important to treat samples with care at all stages of isolation and purification. Degradation may not be uniform, hindering the comparison of transcription levels between genes. Low-level transcripts may be lost from the sequenced population altogether.
The impact of sample pooling – pooling samples prior to library preparation (without the use of barcoding) can reduce sequencing effort and costs or enable sequencing in cases where sample quantities are very limited. However, it is important to account for this during data analysis, with one such pool considered to be one biological replicate, not however many samples went in to making up the pool. Variations between the pooled samples can lead to misleading results and statistical issues so possible implications should be considered during the experimental design process.
Trading-off sequencing depth against sample number – It may seem appealing to get as many samples done in a single sequencing run as possible to reduce costs and machine time. However, this comes at a cost. The more samples are multiplexed, the fewer reads will be obtained for each of those samples. With reducing read depth comes mounting uncertainty as to the reliability of the sequences obtained. Sequencing technologies are still far from perfect, and mistakes are made in reads. It is therefore important to find the sweet spot between obtaining sufficient read depth to give confidence in the quality and fidelity of the sequencing data obtained and maximizing sequencing capacity to ensure sufficient biological replicates can be analyzed to give meaningful data.