Easing the "Informatics Bottleneck" Burden in Next-generation Sequencing
A major challenge facing researchers in the next-generation sequencing (NGS) space is informatics analysis. This can heavily impact the journey from raw data to discovery. To address this issue, Thermo Fisher Scientific has entered into a co-marketing agreement with Genialis, a data science company leading the industry in RNA-based discovery.
Through the agreement, an analysis workflow that integrates the Invitrogen Collibri Stranded RNA Library Prep Kit for Illumina Systems with Genialis Expressions analysis software will be created, supporting data analysis for Collibri customers.
“The goal is to accelerate discovery to empower scientists to find the story in their data. We want researchers focused on great experiments rather than data processing,” says Rafael Rosengarten, CEO of Genialis.
Technology Networks recently spoke with Rosengarten and Raymond Mercier, VP and GM, Molecular Biology, Thermo Fisher Scientific, to learn more about how the agreement will help ease the burden of the "informatics bottleneck" for research scientists.
Molly Campbell (MC): What is causing the “informatics bottleneck” in next-generation sequencing (NGS) based research and how does it impact the work of scientists?
Raymond Mercier (RM): Sequencing technologies have advanced in their capacity and adoption at a truly astounding rate— and this has opened a whole world of opportunities in scientific discovery. But the flip side is having to contend with the volume and scale of the data resulting from NGS based studies. Key challenges include the technical, such as data storage and processing, organizational issues of data management and curation, and the scientific imperative of interpreting the data, arriving at an answer or the next hypothesis. These constriction points are interrelated, and each one needs to be addressed to get the most discovery bang for your sequencing buck.
In the sequencing world, instrumentation and wet-lab capabilities have outpaced informatics solutions. Bioinformatics and biology-fluent computational talent is scarce as these are emerging specializations. The solution lies in a combination of training future generations of polyglot informaticians, assembling multidisciplinary teams with a focus on communication, and building great tools to engage biologists and clinicians with their data.
MC: What other “bottlenecks” are slowing down NGS-based research?
RM: The first step in NGS-based research is library preparation, a time consuming and delicate operation. It’s easy to introduce an error at this stage, and if you do then everything downstream is corrupted. The worst part is you often won’t realize a mistake has been made until sequencing is complete and at that point the data is meaningless. Researchers then must spend even more precious time trying to determine where the error originated. It’s a constant source of frustration to realize it was a user error from the very beginning.
Thermo Fisher’s mission is to enable scientists to make the world healthier, cleaner and safer. As part of that mission, we are always looking for ways to simplify workflows for scientists and take the complexity and inefficiency out of life in the lab. That’s why we developed the Invitrogen Collibri Stranded RNA Library Prep Kits for Illumina systems. The kits simplify and speed up the library prep process from days to as little as four and a half hours, reducing the risk of user error that can contaminate sequencing results. When we designed the kits, we included a tracking dye to demonstrate that reagents are thoroughly mixed when the solution changes color, while any incomplete addition or mixing is indicated by a lack of color change. It’s a simple innovation but one that effectively ensures workflow success, enabling researchers to spend saved time and resources on other valuable activities. This new technology builds upon the success experienced by Scientists who use ERCC spike-in controls for increased reproducibility and confidence in experimental results.
MC: How will the agreement between Thermo Fisher Scientific and Genialis help ease this burden for researchers?
RM: RNA-sequencing (RNA-seq) requires a lot of time and care, and the larger the scale, the worse the bottlenecks at each step of the process. Research often gets held up at the library prep stage due to user errors that can lead to costly delays. The research can also slow down again when it comes to analyzing the data to glean useful insights. Our agreement with Genialis is a natural partnership bringing together solutions up- and downstream in the genome sequencing workflow. It converts challenging steps in the sequencing process to a “one-click” experience for our customers. The goal is to save scientists time on library prep and data analysis by making these solutions accessible so they can focus on finding the story in their experiments.
Rafael Rosengarten (RR): As a former biomedical researcher, I felt firsthand the frustration of a bench biologist faced with gigabytes of raw data and the uncertainty of what to do next. At Baylor, I was fortunate enough to partner with an incredibly talented data science group from BioLab at the University of Ljubljana, Slovenia. This collaboration provided me with the ability to derive fascinating insights from my research. I recognized that the next wave of biomedical innovation required this kind of cross-pollination. I wanted to help other scientists find meaning in their data, which is why I am so excited about Genialis’ collaboration with Thermo Fisher Scientific to expand access to our informatics solutions to scientists around the world.
MC: What types of research will benefit from this support?
RM: Genomics is a growing field of study, especially as it becomes easier and less expensive to conduct genome sequencing. Dedicated academic research centers are focusing solely on conducting genomic research on a large scale and, as a result, have made some groundbreaking discoveries. These studies are changing the research paradigm because they allow researchers to go into research without forming a set hypothesis to test. Instead, researchers can approach studies with an open-mind and let the data speak for itself. This is only possible when scientists have the tools they need to interpret that data.
In addition, clinical institutions are increasingly looking to sequencing to guide patient diagnoses. In the future, as sequencing becomes more accessible and cost-effective, providers may adopt genome sequencing as a part of routine preventative care. Meanwhile, direct-to-consumer companies have developed swab kits offering consumers the ability to better understand their unique genetic footprint and what it means for their health and ancestry. There are even kits available to test your dog’s genetics for breed and health information. As more people purchase these kinds of tests, it increases demand on sequencing centers to operate efficiently to handle the workload.
MC: Why is it so critical that researchers be able to scale up their experiments and what other challenges do they face when doing so? How can data analytics help?
RR: There are countless reasons to do large-scale experiments. One, biology is really complex with lots of components, so there are a ton of samples or cells to process. Two, biology can be messy and noisy, so you’ll want a lot of data to ensure sound statistical power. And three, ambition—some folks just want to sequence all the things! And while the chance to unlock insights might increase with this scale of sequencing, so do all the challenges we already discussed.
Software like ours can handle the complexity of large-scale sequencing from the point of data generation to insight. Robotics in the lab make the experimental processing more efficient, and excellent reagents like the Collibri kit take the guess work out of sample processing. In our experience working with drug development data, library prep error is a major confounding factor, especially when a contract research organization (CRO) outsources the work. This high failure rate becomes really problematic when trying to analyze data across experiments and sites. Efficient, end-to-end workflows are critical to meeting high quality standards, and ultimately, engendering trust in the experimental results and conclusions drawn.
MC: What other opportunities are there for implementing data analysis to further advance the field of health and science?
RR: Genialis is excited about the power of RNA-seq to contribute to drug discovery and development. It’s already a key technology in the early R&D stages, and in understanding disease biology. We’re excited to work on the frontier of target discovery, challenging long-held notions of what is druggable. Measuring transcription, and the regulation of gene expression, helps uncover entirely new paths to treating disease. Further down the pipe, more and more we are working with drug developers to measure gene expression in patient cohorts to better understand the impact of their drugs in humans, and to find biomarkers that will ensure the drug reaches the right patients. We think RNA-seq, in concert with best-practice approaches to data management and advanced analytics, will prove central to the realization of precision medicine.
Raymond Mercier, VP and GM, Molecular Biology, Thermo Fisher Scientific, and Rafael Rosengarten, CEO, Genialis, were speaking with Molly Campbell, Science Writer, Technology Networks.