How Can Data Platforms Unlock the Power of Single-Cell Analysis?
How Can Data Platforms Unlock the Power of Single-Cell Analysis?
Single-cell technologies have positioned themselves at the forefront of biomedical research. These platforms allow researchers to bypass the uncertainties of bulk data and instead interrogate biological systems at a level of detail that was previously unreachable. But the value of these systems risks being impaired by bottlenecks in data analysis and handling. To find out more, we spoke to Marilyn Matz, CEO and co-founder of Paradigm4, an analytics solutions company that aims to provide solutions to these data problems.
Ruairi Mackenzie (RM): Could you tell us more about your data platform, REVEAL?
Marilyn Matz (MM): We heard from scientists and data scientists about the challenges they encountered working with ever larger, more complex data sets. We set out to help them focus on their science, to more easily ask and answer hard questions without getting bogged down in the computer science mechanics required to do so.
Our approach involves two distinct but interdependent parts: the REVEALTM suite of apps is a family of user-friendly, application-specific apps that sit on top of our unique Scientific Analytics Engine, SciDBTM, a massively scalable array-native analytical platform, designed specifically for scientific data and scientific computing.
Scientists also clearly told us they wanted higher-level, use-case focused solutions, not just ‘ecosystems’ or workspaces which require them to assemble their own combinations of tools. So, each REVEALTM app is designed as a complete package that allows researchers to query and probe their data using familiar R and Python languages.
With the combination of SciDBTM and REVEAL™ apps we offer researchers a cost-effective, scalable and reproducible storage and elastic computing platform that is tailored to their area of interest. For example, the recently launched REVEAL: Single CellTM app gives biopharmaceutical developers the ability to break through the data wrangling and programming challenges associated with the analysis of large-scale, single-cell datasets. And our REVEAL: BiobankTM app brings together multiple data types, such as multi-omics data, medical records, as well as biometric and imaging data to support scientists in population-scale translational medicine and healthcare research. As the family of apps grows, we plan to support more areas of life science research too.
RM: Why are single-cell datasets such a focus of modern research?
MM: The idea of precision medicine – delivering the right drug treatment to the right patient at the right time and at the right dose – underpins current thinking in healthcare practice, and in pharma R&D. However, until single-cell ‘omics came along, researchers were looking at an aggregated picture – the ‘omics of a tissue system, rather than that of a single cell type.
Now, single-cell analysis has become a major focus of interest and is widely seen as the ‘game changer’ – with the potential to take precision medicine to the next level by adding ‘right cell’ into the mix.
Through ‘omics analysis, notably genomics, transcriptomics, epigenomics and proteomics at the single-cell level, the identification of minor subpopulations of cells that may play a critical role in a biological process is now possible. And, as sequencing depth increases to allow for a deeper view off the transcriptome and proteome, the biological state of the individual cells will become clearer and improve definition of cell types and cell states.
In practice, single-cell sequencing within tumors can help oncologists understand the distribution of mutations and their co-occurrence within individual cells, potentially guiding treatment decisions. With this new toolbox, researchers and clinicians can look for insights into the transition from ‘healthy’ to ‘disease’ states, study potential biomarkers, understand the mechanics of disease pathways and assess response to drug targets or available therapeutic regimens over time.
RM: How can we speed up data retrieval in translational medicine?
MM: This is an interesting question because whilst getting results in a few minutes or hours (rather than days or weeks) can be a significant step forward, in fact the speed of data retrieval in translational medicine is only one part of the entire problem.
Taking single-cell DNA and RNA sequencing as our example, information from tens of thousands of cells per patient is available and while this provides clear opportunities in terms of increasing the statistical power of growing datasets, the technical and interpretative challenges associated with such ‘Big Data’ are holding back the biological insights that should be coming out. To unlock this value, life scientists will need to tackle the variety of omics layers (genomes, epigenomes, transcriptomes and proteomes), along with reference maps like the Human Cell Atlas (HCA), at unprecedented levels of resolution, specificity and volume.
In addition, we need to remember that current single-cell data sets only represent a small number of individuals, and statistical significance relies on the number of patients studied, rather than the number total of cells. This is because cells from the same patient are ‘siblings’ and not true biological replicates. Datasets with 100,000s of patients/treatment conditions will therefore necessitate technology to manage billions of cells.
All this demands database platforms that can evaluate key biological hypotheses by querying a mind-bending amount of single-cell data. Many established approaches and tools are simply not suitable for this challenge. Current methods require repetitive extract/transform/load operations (data science janitorial work), increasing time and computational overhead with every question asked of the data. Many also significantly constrain the number of total cells/datasets that can be inter-compared.
This is where SciDBTM and REVEALTM apps become quite enabling. Built with a ‘load/QA once – interrogate often’ philosophy, and a natural capacity to scale to evaluate billions of cells in a cost-effective way, the apps utilise capacity on existing cloud-based machines, rather than needing dedicated and expensive hardware.
RM: Where will your application-specific solutions take us next – can you give us a glimpse into the future?
MM: To channel Freeman Dyson, “New directions in science are launched by new tools much more often than by new concepts.” For the life sciences, new tools encompass both new data-generating instruments and data collection initiatives as well as next generation software to combine and mine that data in unique ways. With our platform, we aspire to enable scientists to do breakthrough science by giving them the ability to ask and answer bigger and more complex questions of their data more easily and more cost-effectively. The result is an increased ability and confidence to make earlier and adaptive change decisions that will guide development, and provide earlier access to complex, real-time data that can detect efficacy and safety signals sooner. Importantly, working in partnership with users, we will continue to expand the analytical, computational and machine-learning capabilities that will help drive their innovation. As vendors introduce new technology and toolkits, the research community is quick to embrace and exploit what becomes the ‘new normal’. In turn, as they solve their problems, new information is generated, new tools and methodologies are developed, and the cycle of discovering ‘new directions’ repeats again.
Marilyn Matz is CEO and co-founder, along with Turing laureate Michael Stonebreaker, of Paradigm4. She also serves on the Board of Directors of Teradyne, a leading supplier of automation equipment for test and industrial applications.