RNA-Seq: Basics, Applications and Protocol

RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA.

Article

Published: April 6, 2018

| Last Updated: January 24, 2024

Ruairi J Mackenzie

A printer outputting RNA bases from an RNA sequence.

Credit: Technology Networks

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 9 minutes

What is RNA-seq?

RNA-seq (RNA-sequencing) is a technique that can examine the quantity and sequences of RNA in a sample using next-generation sequencing (NGS). It analyzes the transcriptome, indicating which of the genes encoded in our DNA are turned on or off and to what extent. Here, we look at why RNA-seq is useful, how the technique works and the basic protocol that is commonly used today.¹

Contents

What are the applications of RNA-seq?
How does RNA-seq work?
RNA-seq vs microarrays: Why RNA-seq is considered superior
An RNA-seq protocol
- Experiment planning
- cDNA library preparation
- cDNA sequencing
- RNA-seq data analysis
Challenges of RNA-seq

References

What are the applications of RNA-seq?

RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA. Understanding the transcriptome is key if we are to connect the information in our genome with its functional protein expression. RNA-seq can tell us which genes are turned on in a cell, what their level of transcription is, and at what times they are activated or shut off.² This allows scientists to understand the biology of a cell more deeply and assess changes that may indicate disease. Some of the most popular techniques that use RNA-seq are transcriptional profiling, single nucleotide polymorphism (SNP) identification,³ RNA editing and differential gene expression analysis.⁴

This can give researchers vital information about the function of genes. For example, the transcriptome can highlight all the tissues in which a gene of unknown function is turned on, which might indicate what its role is. It also captures information about alternative splicing events (Figure 1), which produce different transcripts from one single gene sequence. These events would not be picked up by DNA sequencing. It can also identify post-transcriptional modifications that occur during mRNA processing such as polyadenylation and 5’ capping.²

An image explains how RNA short reads are split by intron when aligning to a reference genome.

Figure 1: RNA-seq data uses short reads of mRNA which is free of intronic non-coding DNA. These reads must then be aligned back to the reference genome. Credit: Technology Networks.

How does RNA-seq work?

Early RNA-seq techniques used Sanger sequencing technology, a technique that although innovative at the time was also low-throughput and costly. It is only recently, with the advent and proliferation of NGS technology, have we been able to fully take advantage of RNA-seq’s potential.⁵

An RNA-seq workflow has several steps, which can be broadly summarized as:

RNA extraction
Reverse transcription into cDNA
Adapted ligation
Amplification
Sequencing

Once you have obtained your RNA sample for analysis, the first step in the technique involves converting the population of RNA to be sequenced into complimentary DNA (cDNA) fragments (a cDNA library). This is done by reverse transcription and allows the RNA to be put into an NGS workflow. The cDNA is then fragmented, and adapters are added to each end of the fragments. These adapters contain functional elements which permit sequencing, for example, the amplification element (which facilitates clonal amplification of the fragments) and the primary sequencing priming site. Following processes of amplification, size selection, clean-up and quality checking, the cDNA library is then analyzed by NGS, producing short sequences that correspond to all or part of the fragment from which it was derived. The depth to which the library is sequenced varies depending on the purpose for which the output data will be used for. Sequencing may follow either single-end or paired-end sequencing methods. Single-read sequencing is a cheaper and faster technique (for reference, about 1% of the cost of Sanger sequencing) that sequences the cDNA fragments from just one end. Paired-end methods sequence from both ends and are therefore more expensive^6,7 but offer advantages in post-sequencing data reconstruction.

A further choice must be made between strand-specific and non-strand-specific protocols. The former method means the information about which DNA strand was transcribed is retained. The value of extra information obtained from strand-specific protocols make them the favorable option.

These reads, of which there will be many millions by the end of the workflow, can then be aligned to a reference genome if available or assembled de novo to produce an RNA sequence map that spans the transcriptome.⁸

RNA-seq vs microarrays: Why RNA-seq is considered superior

RNA-seq is widely regarded as superior to other technologies, such as microarray hybridization. There are several reasons for RNA-seq’s well-regarded status:

Not limited to genomic sequences – unlike hybridization-based approaches, which may require species-specific probes, RNA-seq can detect transcripts from organisms with previously undetermined genomic sequences. This makes it fundamentally superior for the detection of novel transcripts, SNPs or other alterations.^9,10

Low background signal – the cDNA sequences used in RNA-seq can be mapped to targeted regions on the genome, which makes it easy to remove experimental noise. Furthermore, issues with cross-hybridization or sub-standard hybridization, which can plague microarray experiments, are not an issue in RNA-seq experiments.

More quantifiable - Microarray data is only ever displayed as values relative to other signals detected on the array, whilst RNA-seq data is quantifiable. RNA-seq also avoids the issues microarrays have in detecting very high or very low transcription levels.

Figure 2: A workflow for RNA-seq. Credit: Technology Networks.

An RNA-seq protocol

Experiment planning

Preparation prior to starting your RNA-seq experiment is essential. Questions to answer before starting include:¹¹

What method of RNA purification are you using?
What read depth will you need?
Which platform will you use?
Is there a reference genome available and which will you use?
How are you assessing the quality of your RNA?
Do you need to enrich your target RNA?
Will you barcode your RNA?
Have I got enough biological and technical replicates?
Single-end or paired-end sequencing?
What read length will you use?
Do I want to retain strand-specific information?

cDNA library preparation

After these points have been considered, you can start preparing your cDNA library. This will require fragmentation of the cDNA, addition of the platform-specific “adapter sequences” and amplification of the cDNA, but the exact procedure will be very specific to the platform used at this stage. For strand-specific protocols, the amplification of the cDNA involves a reverse transcriptase-mediated first strand synthesis followed by a DNA polymerase-mediated second strand synthesis.^11,12 Barcodes may also be added that enable multiplexing, so numerous samples can be sequenced in a single run. It can be beneficial to quantify your library at the end of the library preparation stage to ensure the protocol has been successful and check the quality and concentration of your library to enable optimal sequencing performance.

cDNA sequencing

Once the library is prepared, you can use your chosen sequencing platform to sequence your cDNA library to your desired depth and requirements. Once your transcript data has been produced, you can map the data to your reference genome or assemble it de novo if no reference is available. The alignment process can be complicated by the presence of splice variants and modifications, and the choice of reference genome used will also vary how difficult this stage is. Software packages such as STAR are useful at this stage, as are quality control tools like Picard or Qualimap.¹³De novo assembly will allow for the discovery of novel transcripts in addition to those already known.

RNA-seq data analysis

After the alignment stage, you can focus on analyzing your data. Tools like Sailfish, RSEM and BitSeq¹³ will help you quantify your transcription levels, whilst tools like MISO, which quantifies alternatively spliced genes, are available for more specialized analysis.¹⁴ There is a library of these tools out there, and reading reviews and roundups are your best way to find the right tool for your research.

To sum up, modern-day RNA-seq is well established as the superior option to microarrays and will likely remain the preferred option for the time being.

Challenges of RNA-seq

Significant progress has been made in the field of RNA-seq over the last decade or so. The associated costs have reduced significantly while throughput has increased, sequence fidelity is far superior to earlier iterations of the NGS technologies and the availability of data analysis tools and pipelines has improved tremendously. However, there remain a number of challenges for scientists to bear in mind when considering RNA-seq experiments. These include:

Isolating sufficient, high-quality RNA – while the sample quantity requirements for RNA-seq analysis have reduced drastically, it is still important to ensure you are able to obtain sufficient RNA to fulfill all your analysis requirements, including repeats if necessary. It is also important to bear in mind that, while you may isolate total RNA, depending upon your experimental question, you are likely only to be sequencing a fraction of this (typically messenger RNA (mRNA)), further reducing your sample quantity. This must also be of high quality and purity as poor samples are likely to lead to poor results, or in some cases failure within the library preparation protocol. The quality and concentration of RNA can be determined using UV-visible spectroscopy. Unlike DNA, RNA degrades rapidly so it important to treat samples with care at all stages of isolation and purification. Degradation may not be uniform, hindering the comparison of transcription levels between genes. Low-level transcripts may be lost from the sequenced population altogether.

The impact of sample pooling – pooling samples prior to library preparation (without the use of barcoding) can reduce sequencing effort and costs or enable sequencing in cases where sample quantities are very limited. However, it is important to account for this during data analysis, with one such pool considered to be one biological replicate, not however many samples went in to making up the pool. Variations between the pooled samples can lead to misleading results and statistical issues so possible implications should be considered during the experimental design process.

Trading-off sequencing depth against sample number – It may seem appealing to get as many samples done in a single sequencing run as possible to reduce costs and machine time. However, this comes at a cost. The more samples are multiplexed, the fewer reads will be obtained for each of those samples. With reducing read depth comes mounting uncertainty as to the reliability of the sequences obtained. Sequencing technologies are still far from perfect, and mistakes are made in reads. It is therefore important to find the sweet spot between obtaining sufficient read depth to give confidence in the quality and fidelity of the sequencing data obtained and maximizing sequencing capacity to ensure sufficient biological replicates can be analyzed to give meaningful data.

1. Wang Z, Gerstein M, & Snyder M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet, 2009;10(1), 57–63. doi:10.1038/nrg2484

2. Ozsolak F, & Milos PM. RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet,2011; 12(2), 87–98. doi:10.1038/nrg2934

3. Bakhtiarizadeh MR, Alamouti AA. RNA-Seq based genetic variant discovery provides new insights into controlling fat deposition in the tail of sheep. Sci Rep 10, 13525 (2020). doi:10.1038/s41598-020-70527-8

4. Han Y, Gao S, Muegge K, Zhang W, & Zhou B. Advanced applications of RNA sequencing and challenges. Bioinform. Biol. Insights , 2015;9(Suppl 1), 29–46. doi:10.4137/BBI.S28991

5. Schuster SC. Next-generation sequencing transforms today’s biology. Nat. Methods, 2008;5(1), 16–18. doi:10.1038/nmeth1156

6. JP Sulzberger Columbia Genome Center. Genome sequencing: Defining your experiment. Columbia Systems Biology. https://systemsbiology.columbia.edu/genome-sequencing-defining-your-experiment. Accessed August 24, 2021.

7. Functional genomics II. EMBL-EBI. https://www.ebi.ac.uk/training/online/courses/functional-genomics-ii-common-technologies-and-data-analysis-methods/rna-sequencing/performing-a-rna-seq-experiment/design-considerations/. Accessed September 6, 2021.

8. Zhao S, Zhang Y, Gordon W et al. Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genomics, 2015;16(1). doi:10.1186/s12864-015-1876-7

9. Zhao S, Fung-Leung W-P, Bittner A, Ngo K, & Liu X. Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells. PLOS ONE, 2014;9(1), e78644. doi:10.1371/journal.pone.0078644

10. Rao MS, Van Vleet TR, Ciurlionis R, et al. Comparison of RNA-seq and microarray gene expression platforms for the toxicogenomic evaluation of liver from short-term rat toxicity studies. Front. Genet. 2019;9:636. doi:10.3389/fgene.2018.00636

11. Kukurba KR, Montgomery SB. RNA sequencing and analysis. Cold Spring Harb Protoc. 2015;2015(11):951-969. doi:10.1101/pdb.top084970

12. The Cresko Lab of the University of Oregon. RNA-seqlopedia. University of Oregon. https://rnaseq.uoregon.edu/#library-prep-stranded-libraries. Accessed August 24, 2021.

13. Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biol., 2016;17. doi:10.1186/s13059-016-0881-8

14. Katz Y, Wang ET, Airoldi EM, & Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods, 2010;7(12), 1009–1015. doi:10.1038/nmeth.1528

What is RNA-seq?
RNA-seq (RNA-sequencing) is a technique that can examine the quantity and sequences of RNA in a sample using next generation sequencing (NGS). It analyzes the transcriptome of gene expression patterns encoded within our RNA. Here, we look at why RNA-seq is useful, how the technique works, and the basic protocol which is commonly used today1.

What are the applications of RNA-seq?
RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA. Understanding the transcriptome is key if we are to connect the information on our genome with its functional protein expression. RNA-seq can tell us which genes are turned on in a cell, what their level of expression is, and at what times they are activated or shut off2. This allows scientists to more deeply understand the biology of a cell and assess changes that may indicate disease. Some of the most popular techniques that use RNA-seq are transcriptional profiling, SNP identification, RNA editing and differential gene expression analysis.

How does RNA-seq work?
Early RNA-seq techniques used Sanger sequencing technology, a technique that although innovative at the time, was also low-throughput, costly, and inaccurate. It is only recently, with the advent and proliferation of NGS technology, have we been able to fully take advantage of RNA-seq’s potential4.

What are the basic principles behind RNA-Seq technology?
RNA-Seq (RNA sequencing) is a technology that uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment. It sequences all the RNA molecules, then maps them back to the genome, providing a detailed picture of gene expression, alternative gene-spliced transcripts, mutations and fusions.

How does RNA-Seq contribute to our understanding of gene expression patterns?
RNA-Seq allows researchers to measure levels of gene expression at an unprecedented resolution. By sequencing RNA, it provides real-time snapshots of cellular activity, including which genes are being expressed and at what levels. This can give insights into cellular responses to various stimuli or conditions and can uncover new mutations or alternative splicing events.

What are some practical applications of RNA-Seq in biomedical research and clinical practice?
RNA-Seq has numerous applications in biomedical research and clinical practice. It's used in developmental biology, cancer genomics, infectious diseases and neuroscience, among other areas. In clinical practice, it can aid in diagnosing diseases, understanding disease mechanisms and personalizing treatments based on a patient's gene expression profile.

How does the RNA-Seq protocol work and what are the key steps involved?
The RNA-Seq protocol involves several steps. First, RNA is extracted from the sample and then converted into complementary DNA (cDNA) using reverse transcription. The cDNA is fragmented and adapters are added for sequencing. These fragments are then sequenced using next-generation sequencing. The resulting sequences, or 'reads', are aligned to a reference genome, and gene expression levels are quantified.

What are some challenges and solutions in RNA-Seq data analysis?
Data analysis in RNA-Seq can be challenging due to the vast amount of data produced. It involves read mapping, gene expression quantification, differential expression analysis and interpretation. Solutions include developing robust computational algorithms, bioinformatics tools and statistical methods. Increasingly, machine learning and AI are being used to analyze and interpret RNA-Seq data.

Meet the Author

Ruairi J Mackenzie

RJ is a freelance science writer based in Glasgow. He covers biological and biomedical science, with a focus on the complexities and curiosities of the brain and emerging AI technologies. RJ was a science writer at Technology Networks for six years. RJ has a Master’s degree in Clinical Neurosciences from the University of Cambridge.

Diagnostics

Diagnostics

RNA-Seq: Basics, Applications and Protocol

RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA.

What is RNA-seq?

What are the applications of RNA-seq?

How does RNA-seq work?

RNA-seq vs microarrays: Why RNA-seq is considered superior

An RNA-seq protocol

Experiment planning

cDNA library preparation

cDNA sequencing

RNA-seq data analysis

Challenges of RNA-seq

RNA-Seq: Basics, Applications and Protocol

RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA.

What is RNA-seq?

What are the applications of RNA-seq?

How does RNA-seq work?

DNA vs. RNA – 5 Key Differences and Comparison

RNA-seq vs microarrays: Why RNA-seq is considered superior

An RNA-seq protocol

Experiment planning

cDNA library preparation

cDNA sequencing

RNA-seq data analysis

Challenges of RNA-seq