Tackling the Challenges of Multiplexing in NGS
Product News Nov 06, 2018
The need for multiplexing
Next generation sequencing (NGS) is fast. Runs capable of sequencing an entire genome are measured in just hours, rather than days or even the years needed for the human genome project. The capacity and availability of NGS technology is also greater than ever, with hundreds of millions of reads producing hundreds of gigabases of data in a single run.
No single genome sequencing experiment requires such vast capacity, however, with most read depth needs comfortably covered by a fraction of that capacity. As the cost of an individual run is still substantial, making use of this full capacity by multiplexing hundreds or thousands of libraries enables us to continue driving down the cost of sequencing a genome.
Multiplexing is not without its challenges, though. Let’s look at the challenges of index misassignment and parallel sample preparation, and how we can address them to improve both workflow and data quality, ultimately reducing your cost per sample.
The challenges associated with multiplexed sequencing
Despite the goal of making each sequencing run as productive as possible, any approach that involves scaling up a workflow will introduce some unique challenges. In the case of NGS multiplexing, these can be found in both the library preparation and the sequencing data analysis stages.
Multiplexing requires tagging each library with indexing barcodes before pooling together and running on a single patterned flow cell. This allows the reads for each library to be identified and separated out by software from the bulk sequencing data. This might be only a handful of libraries, or thousands, each with unique indexes.
Sometimes, however, indexes are misassigned, meaning that the read for a molecule in one library is mistakenly assigned to another, complicating the analysis. These events are fairly uncommon (perhaps 1-2%, though can be as high as 10%). Given the sheer numbers involved, this can add up to many misassigned reads.
This is a particular issue in low frequency allele detection, where it may be impossible to distinguish true and false positives, or where sample might be scarce and you need to maximize yield.
Tackling index misassignment Index misassignment is largely attributed to ‘index hopping,’ a process where contaminating free adapters and index primers are thought to bind clustered molecules in flow cells. These can extend during clonal amplification and produce reads with another library’s index. Index hopping appears to be more common with Illumina’s ExAmp clustering chemistry, compared to the older bridge amplification method, according to tests conducted by Illumina and the NGS community.
Misassignment is a well-known problem, and now largely addressed by dual indexing and the use of unique molecular identifiers (UMIs). This approach enables the software to discard the data for any reads that do not have the correct combination of two indexes or UMIs.
Parallel sample preparation
Parallel sample preparation of multiple individual libraries for multiplexing requires extra time and resources, and adds potential sources of error.
Challenges in sample preparation are nothing new, of course, and the work is compounded by the multiplex rate. Preparing a handful of libraries manually over several days was acceptable just a few years ago. Now, you might need to prepare hundreds or thousands of samples in that time to maintain a competitive cost per sample. This is all while performing painstakingly accurate quantitation, fragment size identification, and normalization for every library to generate high quality, reproducible data.
Multiplexing libraries with different sized fragments already creates inconsistencies with read depth, as there’s a natural bias towards sequencing smaller fragments more efficiently than larger ones. So, pooling needs to be as accurate as possible.
If there are differences in quantitation method, because of time, resource, or equipment limitations, the same experiment using the same sample sources might produce data of different quality. Simple, fast methods, like spectrophotometry, aren’t accurate enough and quantitate all nucleic acids, including primers and nucleotides. Electrophoretic methods are good for size determination, but not reliable for quantitation either. Ideally, you would use qPCR, but this is time-consuming.
Normalization is also a source of error, requiring careful and precise handling of volumes often less than 10 µl. Small errors or user-to-user variation here affects data quality and confidence, and repeating preparation and pooling isn’t always easy, especially with scarce samples.
So, there is a need for a more practical workflow that doesn’t compromise on data quality.
Solving the sample preparation challenge
A certain amount of automation does help alleviate some of the issues with parallel sample preparation. But this is much like putting a more powerful engine in a vehicle that’s not aerodynamically efficient. It would make more sense to improve the workflow. Rather than processing everything in parallel, pooling libraries at an early stage and processing them all together would be more practical. This would be an ideal solution, reducing the overall workload and minimizing any sample-to-sample variation introduced by the user.
Key to making this approach successful would be easy or automatic barcoding and normalization of libraries, and a high tolerance for variation in input amounts to avoid compromising data quality. For example, a simple molecular tagging step for each library could pull out equimolar quantities of fragments. This would remove the need for separate quantitation and normalization steps, simplifying and easily integrating into existing workflows.
At GE Healthcare Life Sciences, we’re working to resolve the challenges of multiplexed library preparation. Our aim is to develop a practical workflow that maintains high data quality, reduces cost per sample, and doesn’t require any specialist knowledge or resources.