We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Transforming Drug Discovery With Artificial Intelligence

Pile of red, white and blue capsule medicinal drugs.
Credit: mmmCCC / Pixabay
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 6 minutes

This article includes research findings that are yet to be peer-reviewed. Results are therefore regarded as preliminary and should be interpreted as such. Find out about the role of the peer review process in research here. For further information, please contact the cited source.

The past two centuries have seen a tremendous evolution in the field of drug research and development (R&D), stretching from the foundational works of Louis Pasteur and Alexander Fleming to the modern era of precision medicine and immunotherapy. Despite these strides, formidable challenges persist in designing therapeutic strategies to combat complex, multifactorial diseases, and the quest for reliable, informative models to effectively guide drug development is ongoing and urgent. Recently, the meteoric rise of transformer-based models, such as generative pre-trained transformers (GPT), has demonstrated the potential to revolutionize almost every possible domain, including life sciences.


In this article, we explore the potential of recent advancements in artificial intelligence (AI) to revolutionize drug R&D. By decoding the biological lexicon and integrating diverse forms of data, AI offers new possibilities for modeling biology and designing therapies. We will also discuss the associated risks and limitations to provide a balanced perspective on this promising frontier in drug R&D.


Navigating the challenge: drug development for complex diseases


Multifactorial diseases such as cancer, neurodegenerative and inflammatory disorders present a significant challenge to drug development. Such diseases involve complicated interplay between multiple biological mechanisms and factors. For instance, genomic alterations can influence protein expression and, in turn, cellular interactions and spatial biology. This network of interconnected biological relationships impacts everything from disease initiation to progression and, ultimately, a patient’s response to therapy. To develop effective therapeutic strategies, a comprehensive understanding of the interactions between relevant biological systems is essential.


While the resources and tools available for drug development are now more advanced than ever, harnessing them effectively has become increasingly challenging, requiring a broad spectrum of knowledge, skills, experiences and the ability to integrate disparate data modalities – a task that goes beyond the proficiency of single individuals. Add to that the fact that traditional methods like cell and animal models often fall short of accurately capturing the intricacies of human biology, limiting their predictive power for multifaceted diseases.

While the use of computational, predictive models in biological research is not new, current models often necessitate professional expertise and typically capture specific aspects of biology rather than providing a true representation of the disease's multifaceted nature. This creates an urgent need for innovative, all-encompassing solutions that account for the complexities of biological systems and can support drug R&D efforts toward more effective treatments.


Transforming our understanding of biology through AI


The world is currently witnessing extraordinary progress in the development and advancement of AI. Until recently, AI was a tool available to computer scientists and programmers only. However, the emergence of user-friendly applications such as ChatGPT, DALL-E and others have created a profound paradigm shift by making AI accessible to anyone with internet access. Undoubtedly, these potent tools underlie the transformative role of AI across many sectors and domains.


Many of these new, powerful models are based on transformers, neural networks that identify relationships in data, thus “learning” meaning and context. This architecture forms the backbone of large language models (LLMs) such as GPT, multimodal models like the visualtextual contrastive language–image pre-training (CLIP) model and others. According to Stanford researchers, these models “have stretched our imagination of what is possible.”


The unparalleled ability of models like GPT to parse, extract and integrate complex, multimodal information is the result of an architecture designed to identify relationships that was trained on enormous and diverse datasets. This ability can be adapted to the scientific domain easily. For example, scientific knowledge has traditionally been accessible by extensive literature searches, data collection and analyses. The integration and analysis of the collected information and data tends to yield the most fruitful results when involving personnel of various specialties, often requiring a significant amount of time from all involved. Today, models like GPT-4 can accurately identify, summarize and contextualize scientific information and even explain experimental results and generate hypotheses.


For life sciences, the potential goes far beyond integrating knowledge from scientific literature. Recent preliminary works have demonstrated utilization of transformer-based models for a variety of biologically-related applications. Some interesting projects include training an LLM on the “language of proteins” to predict protein structure, using the “language of DNA” to identify genomic features relevant to predicting molecular phenotypes and harnessing the “language of RNA” from single-cell data to model and predict various aspects of cellular biology. The implementation of large multimodal LLMs takes this a step further, allowing human-language interpretation and interaction with visual data, such as medical imaging and spatial biology.


Although preliminary, these works indicate that the use of LLMs could make it easier to investigate and understand different biological data modalities and, therefore, accelerate their utilization in scientific and pharmaceutical R&D.


From the lab to the clinic: comprehensive disease modeling

These LLM applications illustrate the promise of transformer-based models in accessing, analyzing and performing predictions on biological data. Fueled by these advances, there's a budding potential for the birth of groundbreaking tools. These won't merely consolidate our existing scientific knowledge, but will integrate the myriad of data, all the way from genomics to spatial biology, into a comprehensive, holistic, dynamic model of disease. Instead of researchers focusing on individual systems or elements of disease, this powerful framework could be used to do something that has not yet been possible: to visualize and simulate various disease processes together in complete context.


Such a holistic and comprehensive approach would open new horizons in biopharmaceutical research. In drug discovery, rather than focusing on individual outcomes, screening assays could be powered by an understanding of a compound's overall impact on the patient. This all-inclusive perspective enhances the probability of pinpointing drugs with real-world effectiveness in the clinic. When exploring drug mechanisms of action, these models could lay out an expansive landscape of the cascading effects induced by a new compound. For instance, when targeting a particular pathway, these models could identify other processes that might inadvertently be affected or reveal unexpected resistance or compensatory mechanisms effects that are easily missed with traditional approaches. In clinical research, they could help anticipate potential adverse events, explain why some patients are non-responsive to treatment and even predict the consequences of altering dosages.

These insights could lead to more informed clinical trial design, personalized treatment plans and, ultimately, improved patient outcomes. An integrated approach of this nature, building upon the pillars of generations of scientific research and meticulous experimental observations, could redefine the field of drug discovery and development.


Moreover, these AI-empowered models could promote a broader understanding of disease, depending mostly on scientific and biological thinking rather than specialized technical skill. The language of biology will become more accessible through simple human language, unlocking possibilities for a wider range of users and encouraging collaborative inquiry. Researchers could ask pointed questions and explore “what if” scenarios, fostering a dynamic platform for scientific exploration. By converting static data into dynamic, interactive models, we could streamline the drug development process. This is not a marginal improvement, but a paradigm shift in how we approach the intricate challenges of drug development for complex diseases.



While the use of AI models within research stages, or as supporting tools, has considerable potential, it does come with its share of constraints. The risks associated with the implementation of AI models, specifically LLMs, are predominantly related to their inputs. For example, the data used for training is often not curated, and therefore might contain inaccuracies and mistakes. The way one phrases the research inquiry “prompting” can also significantly impact the resulting output and the algorithm’s generative characteristic (i.e., the fact that its operation is based on distribution sampling) could potentially lead to different answers to the exact same question.

From a quality perspective, the application of such models within drug development requires the adoption of appropriate controls. The control approach and activities should be commensurate with the risks associated with the development phase, and may include the following:

  Verification of both prompts and outputs by human subject matter experts to optimize predictions and support selective implementation.

  Setting the model’s temperature (a parameter set by the user to define the degree of creativity of the model’s responses) to zero will control its randomness, resulting in consistent output.

Additionally, the use of regulatory science terminology and concepts have the potential to facilitate collaboration between the fields of AI and medicine, enabling safer and more effective adoption of this emerging power of deduction.

As we venture deeper into this new frontier, the hope is that these advanced AI models will continue to evolve and unlock further possibilities, bringing us closer to the vision of tailored, effective therapies for complex multifactorial diseases.

About the authors:

Mor Kenigsbuch PhD, is a product manager at Nucleai.
Hagar Sachs is head of RA/QA at Nucleai.