The Five Steps Needed To Turn AI Hype in Life Science Into Reality
The last 18 months have seen incredible R&D breakthroughs made around viable COVID-19 vaccines and treatments. The ability to make these gains at such speed and scale owes much to the application of data, both old and new. The development of mRNA COVID-19 vaccines, for example, has drawn heavily on data and learnings from existing research into HIV. This ever-growing reliance on data is driving the adoption of artificial intelligence (AI) in the field of life sciences. A GlobalData report predicts that in 2021, AI will be “the most disruptive technology across the pharmaceutical industry.”
There’s no doubt that the use of AI has reduced the amount of time spent processing data to accelerate R&D. But there is a note of caution to be sounded. Despite the understandable excitement, the majority of scientists in research functions are still spending almost half of their time “wrangling” data. Though this is an improvement on prior years, where that figure was close to 80 percent, it’s still a lot of time and effort spent on data preparation rather than revealing data insights.
This is particularly true of large and complex research organizations. R&D functions in these organizations have exponentially more data that may be siloed and poorly organized. They may also lack the agility of nimble, tech-savvy start-ups born in the cloud and who are adept at AI. So, how can R&D leaders turn the huge potential of AI into real-world success? Here are five ways to unlock the value of AI.
- Ensure domain-specific expertise: Generic, broad-spectrum AI platforms will not enrich data sufficiently or deliver the results to the level of accuracy required in life sciences R&D. IBM Watson’s unsuccessful forays into healthcare are a salient reminder of this fact. While advanced ML models exist in the public domain (e.g., BERT, BioBERT, ELMo, Word2vec), they do not address real-world use cases and often require domain-specific tuning at the training, validation and interpretation stages. Domain-specific AI employs sophisticated models built by domain experts, including named-entity recognition (NER) to semantic relationship extraction, and question answering based on semantic structures. Such specificity is essential, for example, when retrieving information from large volumes of unstructured scientific literature.
- Embrace the noise: Much data, particularly historical data, is messy, complex and difficult to retrieve. But these data are too valuable to be discarded or ignored. Organizations need to be able to capture, filter, tag and mine existing data at speed so that it can be harmonized with new research and made machine readable. It’s estimated that 80 percent of enterprise data is typically held in unstructured text such as Word documents and PDFs. This is also true of external data sources such as patents, clinical notes and literature databases. Domain-specific AI that applies precise ontologies helps organizations embrace the “noise” and turn it into insights.
- Integrate real-world evidence: Real-world evidence (RWE) is hugely beneficial in life sciences and public health. We’ve seen this in action with COVID-19 symptom study apps like ZOE, which is tracking symptoms in real time. The monitoring of COVID-19 vaccine side effects via the NHS’s Yellow Card app is also adding to the pool of real-world knowledge. Another emerging source of RWE includes forums such as social media. Discussion groups set up by patients suffering from a particular disease can hold valuable real-world and real-time insights and context. We are beginning to see such data incorporated into AI models now.
- Embed data standards: Industry standards for data will contribute greatly to successful AI projects. The FAIR data principles – findable, accessible, interoperable, reusable – are key here. When datasets across an entire enterprise are “FAIR-ified”, they can be taken and applied to new questions or AI models quickly and without the need for an extensive period of data cleansing. Additionally, R&D is not just an internal process. Life sciences companies require data from third parties, from contract research organizations (CROs) to public data sources like PubMed. By embedding FAIR principles and encouraging others within the ecosystem to do the same, collaboration, data sharing and reuse is made much more straightforward.
- Keep the human at the center: Finally, it’s important that life science organizations don’t forget the value of the human in any AI undertaking. Though we are living in an age of rapid technological innovation and computing power, researchers’ insights remain indispensable. Human inference and nuance are crucial in spotting anomalies, identifying patterns, drawing sensible conclusions, interpreting results accurately and training models with a high chance of success. Advancing our understanding of disease and achieving breakthroughs will ultimately be a joint effort, where AI augments the skills of human researchers.
AI will play a critical role in the future of life sciences R&D. We’ve already seen AI helping to make great strides throughout the pandemic – for instance, in screening large numbers of approved drugs for those with potential therapeutic use against COVID-19. However, the reality today is that life sciences is only just scratching the surface of what AI, alongside human ingenuity, can do. As R&D leaders mature their approach to AI and follow the above steps, we’ll realize the benefits even faster – helping to advance our understanding of disease and accelerate the speed at which vital therapeutics reach patients.