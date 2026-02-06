Read time: 5 minutes

Drug discovery pipelines are notorious for being costly, slow, and failure-prone, leading to AI and machine learning becoming more commonplace to accelerate progress and improve outcomes.





Currently, machine learning in drug discovery centers around data-rich stages, which provide plentiful data for algorithm training. However, parts of the pipeline that generate less data could also benefit from machine learning.





Ahead of the Society for Laboratory Automation and Screening (SLAS) Conference 2026, Technology Networks spoke to Dr. Daniel Reker, an assistant professor of biomedical engineering at Duke University, about his work on pairwise molecular learning, which enables better computational decision-making in data-scarce scenarios.





In this interview, Reker discusses how pairwise molecular learning opens up new avenues in drug discovery, including for first-in-class drug candidates, and explores what happens when machine learning is integrated into automated laboratories.





Katie Brighton (KB): How would you describe the role machine learning plays in modern drug discovery today—and where does it still fall short?





Dr. Daniel Reker (DR): Machine learning is actively reshaping drug discovery across multiple stages of the pipeline, and we see widespread adoption from pharma and biotech as well as interest from tech companies and numerous startups. The majority of these efforts currently focus on target identification, lead generation, and clinical trials. While it's still too early for definitive assessments, early readouts suggest computational approaches have accelerated timelines and modestly improved success rates, which could be significant given how costly, slow, and failure-prone drug discovery is.





However, the current impact of machine learning concentrates heavily on data-rich stages that leverage high-throughput screening, genomics, and large-scale clinical datasets to enable training and fine-tuning of complex algorithms.





Substantial progress can still be made in addressing data-scarce drug discovery challenges like lead optimization, safety, and formulation development. These stages rely on low-throughput experiments, such as complex synthesis, material characterization, and in vivo animal studies, but they represent critical decision points that determine the fate of drug candidates.





Innovations in novel experimental platforms and robust computational algorithms are poised to enhance these decisions with potentially even stronger benefits to reduce cost and failure rates compared to what we have seen so far, ultimately positioning the community to bring more and better therapies to patients.





KB: Could you explain a little more about what pairwise molecular learning is?





DR: Pairwise molecular learning transforms the traditional machine learning task into a contrastive problem where the algorithm directly compares two molecules rather than evaluating each one independently.





Essentially, instead of asking the computer, “What is the potency of molecule A?” we transform the question to “Which of these two molecules is more potent?” This enables combinatorial data augmentation, creating millions of molecular comparisons from just hundreds to thousands of original datapoints. In simple terms, we give deep neural networks different perspectives on the same underlying data to enhance training efficiency.





This allows us to train cutting-edge deep learning architectures on datasets of as few as 100–1000 compounds, which is where a lot of the real-world pharmaceutical decision-making around critical properties like drug safety, metabolism, and pharmacokinetics happens—these are expensive to measure experimentally but essential for advancing the best candidates. We believe pairwise learning will enable the community to unleash the predictive power of deep neural networks for these data-scarce but high-value decision points.

KB: What kind of avenues in drug discovery does pairwise molecular learning open up?





DR: Pairwise molecular learning opens several exciting avenues in drug discovery. First, it enables more accurate computational molecular optimization by directly predicting which chemical modifications will improve critical drug properties like safety, metabolism, and potency. This helps medicinal chemists prioritize which compounds to synthesize next, saving time and resources.

Second, this pairwise augmentation approach enables better computational decision-making in data-scarce scenarios. This is particularly valuable for properties like drug safety, metabolism, and formulations—critical decision points where experimental data is limited and expensive to generate.





It can also enhance predictive performance on novel and challenging drug targets where little knowledge has been accumulated so far, thereby providing an opportunity for machine learning to better support the identification of first-in-class therapies. This capability is further strengthened algorithmically by pairwise learning's ability to incorporate bounded or incompletely characterized datapoints that are normally discarded from modeling efforts. While insufficiently characterized for direct inclusion in traditional models, these datapoints still provide important perspectives and contrast to stronger candidates.





Third, our data suggests the algorithm excels at identifying genuinely novel molecules. By learning the impact of molecular changes rather than simply identifying analogues of known compounds, it avoids the memorization problem common in complex algorithms and pushes the algorithm to focus learning on relationships and patterns. In our proof-of-concept data, this enables more drastic structural modifications during optimization, with strong potential to further enhance safety and efficacy of drug candidates.





KB: What are the biggest gains you’ve seen from combining machine learning with automated labs, and where are the remaining bottlenecks?





DR: The biggest gains from combining machine learning with automated labs that I have seen stem from creating truly adaptive experimental design loops. In the machine learning community, we call these “active learning workflows” to indicate that the predictive algorithm is directly involved in the data acquisition and can request the most informative and valuable datapoints. Our work and others have shown that such “active learning” setups can potentially reduce the required data for decision-making by up to 90% and enable better predictive models by directly addressing biases in the data. These setups have helped us to identify new drug candidates using fewer datapoints as well as identifying new nanoparticle formulations that enhance the efficacy and safety of medications with greater accuracy.

A major remaining bottleneck in this deployment of such feedback loops centers around automation infrastructure and algorithmic robustness. Most high-throughput screening platforms are optimized for scale at the cost of flexibility, for example, relying on rapidly screening pre-defined compound libraries rather than enabling adaptive cherry-picking of individual experiments suggested by algorithms. Additionally, several of the critical experiments such as material characterization or even in vivo studies are difficult to integrate into these automated workflows.





We believe these feedback cycles are most impactful in truly low-data scenarios—like early-stage projects with under 100 datapoints. But building predictive models and enabling them to decide which datapoint to acquire next remains challenging even for the most data-efficient computational approaches. We're addressing this through pairwise learning methods as well as other new active learning developments including yoked learning, where algorithms are paired to work together. There's substantial room for further innovation in automation architecture and experimental design strategies to maximize the impact of integrated laboratories on drug discovery.





KB: Is there anything you can tease about your talk at SLAS 2026?





DR: I'm really excited for what promises to be a stimulating SLAS 2026. There will be a lot of great presentations and discussions around the intersection of automation and AI in drug discovery.

For my talk specifically, I'll be introducing some of these pairwise and active learning concepts we've been developing, along with some new and unpublished developments that I think the community will find intriguing. One highlight is a novel class of algorithms that actually “forget data” strategically to enhance their learning—it seems counterintuitive, but we're seeing some remarkable improvements in how quickly these models converge to better solutions.





I'll be building in concrete examples from our work in drug discovery and nanoparticle design to showcase the practical potential of these algorithms. The goal is to demonstrate how adaptive machine learning can bring better decision-making to every stage of drug development—from early hit identification through formulation optimization.





I'm looking forward to connecting with potential partners and collaborators who are interested in deploying these approaches in their own pipelines. The real breakthroughs will come from getting these tools into the hands of more research teams across academia and industry.