Informatics in Biopharma: 2022 Is the Year of Value Creation and ROI
Complete the form below to unlock access to ALL audio articles.
At the beginning of each year, the great and the good in every industry are asked for their predictions for the year ahead. What’s going to be hot, and what’s not. The life science and pharma industries, and publishers dedicated to these sectors, are no exception. Over the last few weeks, I have read interesting opinions on, for example: working at scale with ever larger, more complex datasets, the potential of machine learning (ML) and artificial intelligence (AI) to streamline extracting insights from multi-modal data, and the rise of single-cell proteomics.
There are many reviews that explore the potential importance of these concepts, such as Google AI’s plan for 20221 and an article on single-cell proteomics taking center stage2, so, in this article, as well as adding to the debate, I will highlight some practical examples where new thinking in informatics is having real impact. In addition, I propose that in 2022, a related topic should be uppermost in the minds of drug discovery and development researchers, and their bioinformatics colleagues, namely: “How can my scientific data management and computational platform create additional value and return on investment (ROI) for me this year?”
Opportunities – from disease etiology to drugs in the clinic
Gaining an understanding of the underlying basis of disease has been a constant driver in pharmaceutical research and development. Data that allows a new molecular entity (NME) to advance through discovery and formulation phases, to become a candidate drug that moves into clinical trials and gain market approval, has long been the currency of drug development. The dramatic advances in genomics, proteomics and metabolomics in the past three decades have fueled the rise of biotherapeutics and the growth of the biopharma industry. For example, in 2021, one of the 50 NMEs approved by the FDA was the 100th monoclonal antibody product.3 In addition, the first KRAS inhibitor for cancer and the first anti-amyloid antibody for Alzheimer’s disease were FDA approved in 2021.4
Looking back, many believed that following the sequencing of the first human genome, understanding disease and drug discovery would be simplified. However, we now know that biological systems are more complex than that and enormous amounts of data are required to identify diseases and cures. It is how this data is utilized that will be key to how the field continues to evolve.
Progress produced massive datasets
In recent years, biobanks have developed into a central resource for drug development. Many countries either have or are developing local biobanks, including the UK (UK Biobank), China (Kadorrie), Japan (Jenger), the US (All of Us) and Finland (FinnGen). The UK Biobank (UKBB) has 7,400 categories of phenotypes along with single nucleotide polymorphisms (SNP) and whole exome sequencing (WES) data from 500,000 participants – an important milestone in the availability of population health data. It is now embarking on whole genome sequencing and proteomics.
A key application for the resulting massive multi-dimensional data sets is being able to perform ad hoc analyses on the data sets. There are some significant barriers to doing this. First, the complexity of algorithms such as linkage disequilibrium (LD) makes them very complicated to obtain and limits the calculations to cis locations and not inter-chromosomal calculations. LD provides insights into genetic interactions and can be coupled with known physical interaction datasets to test for physiologically relevant physical interactions where there is genetic evidence that points to an in-human phenotype when specific mutations are combined. Since LD represents an enormous calculation space, it is crucial to find a way to calculate LD as needed. When this data is coupled with a rich collection of phenotypic information, burden tests can be done to find the pairwise relationship between mutations in two loci and which pairs of mutations are consequential. LD and burden tests can uncover novel drivers of disease and interactions not previously thought to be involved in the development of disease.
An illustration of this power was shown in a poster at ASHG5, where mutations in the KCNJ1 gene, involved in Bartter Syndrome, and interacting intracellular scaffold proteins, SLC9A3R1/SLC9A3R2, were tested for LD and Burden. The UK Biobank data set was used to provide both the synonymous mutations and phenotypic information. The results suggest a very strong link between specific mutations in the proteins and red blood cell production and liver inflammation (a putative precursor to non-alcoholic steatohepatitis (NASH)). The ability to uncover novel disease protein connections is a very powerful tool in understanding disease initiation and progression.
Adding even more complexity, single-cell nucleic acid sequencing generates orders of magnitude more data than traditional sequencing approaches and has changed the game for drug discovery. Now, single-cell proteomics aims to get us closer to the dynamic post-transcription phenotypic information that will better inform disease diagnostics and response to drugs, for example. Recent experiments that combine novel microfluidics techniques with mass spectrometry can measure around 1,000 proteins/ ‘proteoforms’ per cell, compared to established flow cytometry and mass cytometry methods that are limited to identifying around 50 proteins per cell.6,7
Combining genetic and proteomic data at the single-cell level will allow facile testing along the central paradigm of molecular biology, and the quick confirmation of hypotheses generated from large human genetic datasets. As the bank of data available increases even further, the prospect of ML and/or AI solutions to data interpretation and value creation comes into play. However, from what I know, even the most optimistic outlook suggests it will take 3-5 years for ML and AI to become an established, value-creating strategy for pharma and healthcare.
Creating value with bioinformatics
The topics and examples discussed here highlight important bioinformatics challenges that face the industry. Taken together, they point to what must be at the heart of successful research and drug discovery efforts in 2022: building an ability within an organization to collect, curate and manage, compute, share and interrogate all the relevant data at each stage of the development journey.
To be effective, this must be done in a way that is research scientist-friendly, works at scale on massive datasets and provides predictable, cost-effective performance so organizations can significantly shorten timelines, answer more of their big questions, and accelerate the commercialization of their discoveries.
What has never been more certain is that data drives discovery and development decisions, and that, in 2022, informatics platforms and approaches must be assessed on an ability to unlock value and provide measurable ROI from the mountain of available data.
About the author:
Dr. Zachary Pitluk is vice president of life sciences and healthcare at Paradigm4. He has worked in sales and marketing for 23 years, from being a pharmaceutical representative for BMS to management roles in Life Science technology companies. Since 2003, his positions have included VP of business development at Gene Network Sciences and chief commercial officer at Proveris Scientific. Zach has held academic positions at Yale University Department of Molecular Biophysics and Biochemistry: associate research scientist, postdoctoral fellow and graduate student, and has been named as co-inventor on numerous patents.
Follow Paradigm4 on LinkedIn.
1. Gopani A. Google AI’s plan for 2022 and beyond. Analytics India Magazine. https://analyticsindiamag.com/google-ais-plan-for-2022-and-beyond/. Published 2022. Accessed 15th February 2022.
2. Perkel J. Single-cell proteomics takes centre stage. Nature. 2021;597(7877):580-582. doi:10.1038/d41586-021-02530-6
3. Mullard A. FDA approves 100th monoclonal antibody product. Nature Reviews Drug Discovery. 2021;20(7):491-495. doi:10.1038/d41573-021-00079-7
4. Mullard A. 2020 FDA drug approvals. Nature Reviews Drug Discovery. 2021;20(2):85-90. doi:10.1038/d41573-021-00002-0
5. Pitluk, Z, Sarangi, S, Colosimo, M, Moore, S, Peterson, M, Poliakov, A. ASHG event. 2021. Identifying synthetic interactions between synonymous mutations in the UK Biobank WES data using REVEALTM: Biobank – Program #2881
6. Vistain, L, Tay, S. Single-Cell Proteomics. Trends in biochemical sciences. 2021. https://doi.org/10.1016/j.tibs.2021.01.013
7. Perkel, J. Single-cell proteomics takes centre stage. Nature. 2021. https://www.nature.com/articles/d41586-021-02530-6