We've updated our Privacy Policy to make it clearer how we use your personal data.

We use cookies to provide you with a better experience. You can read our Cookie Policy here.


7 Ways Big Data is Delivering for Drug Discovery

Just when you thought you’d got to grips with the idea of big data, a whole new set of buzzwords comes along: artificial intelligence (AI), machine learning, deep learning, structured and unstructured data. But for the pharmaceutical industry right now these are more than just buzzwords, they are the promise of a new era of productivity. It’s the perfect storm of two things: the availability of big data and the computational firepower to meaningfully analyze it. So, are these new tools –AI, algorithms – really going to revolutionize R&D, or is it more hype than hope?

This list highlights some of the ways big data is delivering for drug discovery right now and shows how much industry is set to gain from the latest evolution of big data.

1. Spotting missed targets

One of the most promising applications of big data in drug discovery is its power to uncover new drug targets that would have been missed by previous methods. The likes of GlaxoSmithKline (GSK) and Pfizer are already capitalizing on this, partnering with AI technology providers to find new drug targets and strategies from huge amounts of disparate biological data. Several biotech companies are also building their own AI platforms for in-house drug discovery. Among them is Berg Health, whose approach is to start with the individual patient and use their proprietary machine learning algorithms to examine the molecular activity of healthy and diseased cellular environments. The approach has already discovered several targets and candidate drugs for diseases including glaucoma, alopecia, cancer and neurodegenerative disorders.

2. Finding HIV’s Achilles’ heel

Taking target discovery to an entirely new level, researchers in Hong Kong recently used machine learning to estimate the fitness landscape of gp160,1 the polyprotein that comprises HIV's viral spike and is an attractive target for HIV vaccines and antibody-based drugs. To create the ideal vaccine, the team wanted to target the fragment of the spike that HIV uses to replicate and reproduce. This required them to map the polyprotein sequence to degrees of fitness, something that would have been impossible without big data approaches. They used AI to process 815 amino acid residues across 20,043 sequences from 1,918 HIV-infected individuals. The results will pinpoint new immunogens and vaccination approaches that force the virus to mutate into unfit states and limit its ability to cause infection.

3. Drug screening with a difference

Machine learning is also earning its stripes in the arena of drug screening, where faster, higher resolution methods of interrogating drug responses can reveal hits that might have otherwise been missed. This is the concept behind Recursion’s drug repurposing platform, which uses ‘computer vision’ to extract thousands of morphological measures at the level of individual cells from its human cellular models of genetic diseases. The approach adds particular power to drug discovery for rare diseases, where little or no target biology is already known, and is being adopted by companies such as Sanofi for screening their clinical-stage molecules in new indications.2  

4. Mining the literature

AI technologies present the pharmaceutical industry with an opportunity to carry out all aspects of R&D more efficiently, and that extends to the humble but important literature research. BenevolentAI has developed a Judgment Correlation System (JACS), whose algorithms can review billions of sentences and paragraphs from millions of scientific research papers and abstracts. JACS then identifies direct relationships between the data and regulates it into ‘known facts’. These known facts are curated, and hitherto unrealised connections made, to generate a large number of possible hypotheses using criteria set by the scientist. An expert team of researchers then assess the validity of these hypotheses to generate a prioritized list considered to be worth exploring further. The biomedical arm of the company applied this technology to amyotrophic lateral sclerosis (ALS), and prioritized five compounds for testing in the lab. Of these, two were much more effective against ALS cells in the lab than the gold standard treatments. These compounds are now being developed further in trials.3 

5. Patient-powered research

Big data technology is not just revolutionizing the analysis of information, but our ability to collect it as well. LymeDisease.org recently launched MyLymeData, a patient-powered registry which aims to accelerate research for chronic Lyme disease – the most common vector-borne infectious disease in the US.4 Many patients who contract Lyme disease remain seriously ill after treatment. In fact, treatment failure rates can be as high as 35% to 50%. But there have only been three NIH-funded trials in chronic Lyme disease over the past 15 years, and they were too small to provide robust insights into treatment response. The hope is that data collected through the registry from thousands of patients will provide the numbers necessary to understand why some patients respond to treatment, while others don't.

6. Crowd-sourcing cancer drug data

Launched by the Collaboration for Oncology Data in Europe, CODE-cancer.com is an initiative that aims to collate data about cancer drug use from 200 cancer treatment centres in seven European countries in the first three years, scaling up to a potential 2,000 treatment centres within the next decade. The technology platform has been designed to aggregate data on anti-cancer medicine usage for all forms of cancer, in all patients and for all treatment centres across Europe who wish to join. The goal? To collate comprehensive, up-to-date data describing how anti-cancer medicines are used, and then use this to inform the next generation of cancer treatments.

7. Mining health data with AI

While not strictly drug discovery, a new ‘grand challenge’ in cancer may well set a precedent in our ability to integrate and mine structured and unstructured data. This is particularly crucial for the pharmaceutical industry, who need to analyze data from diverse in-house and public sources. Cancer Research UK is offering a $30 million grant for researchers who can find new ways to interrogate medical and non-medical data sources to spot patterns which could detect cancer earlier. The challenge will require collating anonymized datasets from healthcare records and combining this with anonymized data from online and social activity, then developing new tools and algorithms to spot patterns, which can be optimized as data sources change over time. If successful, teams will have answered some of the most difficult and ethically complex big data questions – if we can find ways to use personal medical data, who would use it? And how? 

Meet the Author
Joanna Owens, PhD
Joanna Owens, PhD