We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.
Carl Robinson holds a PhD in microbiology from the University of Sheffield. He is currently a senior research associate at the University of Cambridge School of Veterinary Medicine.
Getting clean, accurate transcriptomics data isn’t just about sequencing power, it’s about getting every step right, from study design and sample prep to data analysis. Even small mistakes upstream can wreck outcomes, wasting time and resources.
With new sequencing technology and bioinformatics tools expanding fast, knowing how to control variables at each stage matters more than ever. This guide breaks down essential practices to help boost reproducibility, accuracy and overall data quality.
Download this guide to explore:
Critical decisions in experimental design that impact downstream results
How to reduce variability during RNA extraction, library prep and storage
Smart data analysis strategies to get more from each sequencing run
1
Top Tips for Obtaining Meaningful
Transcriptomics Data
Carl Robinson, PhD
The utilization of transcriptomics enables researchers to assess which genes are active under specific
conditions. It gives insight into how cells respond and are able to adapt to changes in their environment,
for example, in response to infectious disease, pharmaceutical interventions or during development.
The first attempt at studying the partial transcriptome, in 1991, detailed 609 transcripts from the human
brain. Since then, the rapid increase in new and improved next-generation sequencing (NGS) technologies
has enabled researchers to increase the amount of data they are able to capture massively and
at relatively little cost.1 RNA sequencing (or RNA-seq) is now a very widely used technique to study a
variety transcriptomes.
Obtaining meaningful transcriptomics data from whatever sample you are extracting RNA from, however,
is dependent upon a number of key steps in the process, from experimental design, sample preparation
and sequencing to final data analysis. Failure to optimize each step will result in poor data that, at best,
will be a waste of time and resources. At worst, this failure could generate misleading results and conclusions
that derail follow-on or related studies.
This guide will provide some top tips on how to produce valid transcriptomics data by optimizing each
step of the process.
Experimental design – what do you want to know?
The first step, and probably the most important, in the process is to define the hypothesis or biological
question you are trying to answer clearly. It is imperative that you know what you are trying to find out.
For example, are you looking at disease vs non-disease, the effects of drug treatment, differences in biological
phenotype, pathogen–host interactions, temporal and spatial dynamics within tissues, etc.?
Once you know the question, there are a number of key factors to consider when designing your experiment.
You must include at least three biological replicates to obtain statistical power. Biological
replicates are independent samples representing the natural variability within biological samples, as
opposed to technical replicates, which represent repeated technical applications performed on the same
biological sample.
There are a number of programs available that can be utilized to help improve your study design by
optimizing things such as sample size and sequencing read depth for different applications, for example,
RNAseqPower, ScPower and PROPER. If you are in any doubt, it is always worth getting a statistician or
bioinformatician involved in this process to ensure you have sufficient power in the experimental design.
How To Guide
TOP TIPS FOR OBTAINING MEANINGFUL TRANSCRIPTOMICS DATA 2
Sample collection and preparation – quality from the start
It sounds simple, but firstly, you must use samples that are appropriate and not just convenient. A couple
of examples could be the correct body tissue or fluid; if you need a urine sample to answer your hypothesis,
do not use saliva. There is also little point in taking a stationary phase bacterial culture sample when
you are asking a question relating to exponential phase growth, or vice versa using a healthy log phase
sample when you want to answer a question about the transcriptome under a stress condition.
It’s important to minimize batch effects and bias during sample collection and processing, but keep this in
mind throughout the whole process and not just at this stage. Randomly assigning samples from different
biological groups to all aspects of the protocol you have designed, including RNA extraction batches,
library preparation and sequencing lane assignment, will help to reduce the effect of batches.
It is also essential that you are consistent in how all of the samples are handled. Be consistent in the
protocols and consumables that you are using, e.g., the same kits, RNAse-free labware, same batches
of media, tissue disruption and timings for each aspect of the experiment. To avoid inter-user variation,
experiments should ideally be performed by the same person using the same equipment each time.
It is vital that all samples are stored in an appropriate fashion and in monitored equipment. If you say
your samples are at -70 °C, make sure they are and document this. Try to avoid, where possible, repeated
freeze-thawing, which can damage samples and will introduce variation into the experiment.
You will need to measure both the integrity and quantity of your RNA. Integrity is now commonly measured
using bioanalyzers that assign an RNA integrity number (RIN) to your sample – ideally, this should
be above 7 on a scale of 1 to 10. If you don’t have access to a bioanalyzer, an easy and cheap method is to
separate the sample by agarose gel electrophoresis and visually check the integrity. This is less satisfactory
overall as it is so subjective, but this approach at least provides an idea of RNA integrity. Both spectrophotometric
and fluorescence methods are available for measuring the quantity of RNA (or DNA) within
your sample accurately. Using a spectrophotometric method will also allow you to measure protein and
organic material contamination.
Finally, there are additional methods available for selecting specific types of RNA. For example, if you are
focusing on mRNA, kits are available that can be used to deplete ribosomal RNA (the bulk of the sample)
and capture polyadenylated RNA.
Library preparation and sequencing
RNA-seq using NGS follows a relatively simple set of processes to generate useful information. The majority
of the suggestions below refer primarily to short-read sequencing, although some are equally valid
for long-read technologies, e.g. cleanliness.
The first step in the process is to fragment your chosen RNA population into smaller components, the size
of which will depend upon which kits and sequencing platform you are using. It can be achieved either
enzymatically (RNase) or through mechanical methods. Check the size of fragments you have generated
by using a bioanalyzer or by agarose gel electrophoresis – some optimization of the fragmentation step
may be required to obtain the correct fragments. To reiterate, it is vitally important to maintain an RNAsefree
environment, including your equipment and the solutions that you are utilizing, to minimize unwanted
fragmentation of your samples.
How To Guide
TOP TIPS FOR OBTAINING MEANINGFUL TRANSCRIPTOMICS DATA 3
The second step in the process is to convert the RNA into complementary DNA (cDNA) using random
primers and an enzyme called reverse transcriptase, which generates an RNA–DNA hybrid. The length
of this process may require optimization to suit the RNA fragments you are using, i.e. bigger fragments
need longer. This hybrid is then converted to double stranded cDNA using RNAse H, DNA polymerase and
DNA ligase. After this, the cDNA is purified either using column-based kits or bead-based methods. If you
are using a bead-based method, it is critical that you do not disturb the beads once the DNA is bound to
them (until you elute them) and also never over-dry the beads after washing, as this will result in a lower
recovery of your cDNA sample. At this point, the cDNA can be stored at -20 °C. However, this will introduce
a freeze–thaw cycle that may compromise the final results, so it is best to avoid if possible. Organize your
time so that you can complete as many stages of the protocol in one go as possible.
The next stage is to end repair the cDNA so that it contains 5’ phosphate and 3’ hydroxyl groups and does
not have overhangs. Most sequencing platforms rely on adaptors being added to allow the actual sequencing
to happen, but depending on which platform you are using to sequence, you may need to dA-tail
your DNA to allow the ligation of the adapter sequences. It is also possible at this stage to incorporate
barcoding to enable multiplexing of samples so that they can be run on the sequencer at the same time.
Size selection should be applied at this stage for general mRNA-seq to ensure you have the correct fragments
for downstream processing. This can be achieved in a number of ways, including using beads or
column-based kits, electrophoretically on acrylamide or agarose gels or using automated systems. If you
are sequencing other types of RNA then size selection may need to be applied at different points in the
protocol – consult the manufacturer’s instructions. Once purified, the adaptor ligated cDNA is enriched by
PCR and cleaned again; it is a good idea to minimize the number of PCR cycles performed to reduce any
bias that may be introduced at this stage.
Check the DNA quality either using a bioanalyzer or by gel electrophoresis, ensuring the size of fragments
is correct for the sequencing platform you are using. The library fragments are then quantified by qPCR;
this will enable you to optimize the amount of your library samples you load for sequencing and also flag
any poor quality or failed library preparations. Finally, sequence the DNA to the depth you require. This
can vary significantly depending on what you are looking for. For example, if you are looking for low-abundance
transcripts, you will need far more depth than if you are looking at single-cell transcripts. You will
also need to decide whether you want single- or paired-end sequencing; the first is cheaper and faster,
however, the latter is often better for the subsequent data analysis you are going to perform.
Data analysis – the interesting part
All of the above is essentially just a means to get to this point – now things get interesting, as you are
about discover whether you can answer the question you are asking. Firstly, you must quality check the
reads before aligning them with your reference sequence. As examples, FastQC and MultiQC (aggregates
multiple QC reports) are useful tools for this process, as is FastP, which has the additional advantage of
trimming your sequences to remove things like adaptor sequences.
Once checked and trimmed, the next step is to align your reads to a reference sequence and analyze
them. There are a plethora of tools available that enable you to do this. Which set of tools or pipelines you
use is dependent upon the question you are trying to answer and there are review papers available that
will enable you to choose wisely.2-6 If a sequenced genome is not available, it is also possible to assemble
the transcriptome de novo using the RNA-seq reads alone.7 As I suggested earlier, if you are in any
doubt, then consult a bioinformatician. They can advise you from onset of the experimental design process,
through the quality checking at various steps to the final analysis of your data and are an invaluable
resource that you can tap into for advice. One final suggestion is to combine the transcriptomics data you
have with other omics outputs like genomics, proteomics, metabolomics, etc., in so-called multiomics
approach, which can significantly enhance the value of your data.
How To Guide
TOP TIPS FOR OBTAINING MEANINGFUL TRANSCRIPTOMICS DATA 4
The advances in sequencing technologies and the multitude of computational tools available that have
been developed over the past 20 or so years have been nothing short of spectacular. They have transformed
our understanding of the transcriptome of both eukaryotic and prokaryotic species alike. Hopefully,
by utilizing at least some of the suggestions contained within this guide, you will be able to use
good transcriptomic data to enhance your research and answer some of the fundamental questions you
are asking.
References
1. Lowe R, Shirley N, Bleackley M, Dolan S, Shafee T. Transcriptomics technologies. PLoS Comput Biol. 2017;13(5). doi:
10.1371/journal.pcbi.1005457
2. Li J, Varghese RS, Ressom HW. RNA-seq data analysis. Methods Mol Biol. 2024;2822:263-290. doi: 10.1007/978-1-
0716-3918-4_18
3. Batut B, van den Beek M, Doyle MA, Soranzo N. RNA-seq data analysis in Galaxy. Methods Mol Biol. 2021;2284:367-392.
doi: 10.1007/978-1-0716-1307-8_20
4. Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G et al. Data analysis guidelines for single-cell RNA-seq in biomedical studies
and clinical applications. Mil Med Res. 2022;9(1):68. doi: 10.1186/s40779-022-00434-8
5. Hwang B. Computational analysis of single-cell RNA-seq data. Methods Mol Biol. 2023;2594:165-172. doi:10.1007/978-1-
0716-2815-7_12
6. Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, Zhang J et al. RNA-seq data science: From raw data to effective
interpretation. Front Genet. 2023;14:997383. doi: 10.3389/fgene.2023.997383
7. Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform.
2022;23(2):bbab563. doi: 10.1093/bib/bbab563
About the author:
Carl Robinson holds a PhD in microbiology from the University of Sheffield. He is currently a senior research associate at the University
of Cambridge School of Veterinary Medicine.
How To Guide
Sponsored by
Download the How To Guide for FREE Now!
Information you provide will be shared with the sponsors for this content. Technology Networks or its sponsors may contact you to offer you content or products based on your interest in this topic. You may opt-out at any time.