It’s an exciting day when a new piece of kit arrives in the lab. Between postdocs planning assays, PhD students wondering what will happen if they break it and lab managers wondering how they can fit it on the benchtop, technology advancements are something that affect every person involved in science. But certain advances in technology don’t just promise new capabilities but threaten to change from the ground up how research is conducted in a lab group or company.
Andrew Anderson, vice president innovation, informatics strategy at industry software solutions provider ACD/Labs, has had a unique vantage point into how technology affects the analytical chemistry field. In this interview, I ask Andrew how the day-to-day life of a chemist has been changed by technologies like AI and automation and how budding chemists can get their skillset up to scratch to handle the changing face of analytical science.
Ruairi Mackenzie (RM): How are technological advancements changing the current job specifications for a research scientist?
Andrew Anderson (AA): It’s a great question. If we had talked five years ago, I would have a different vision than I have today. In the pharmaceutical industry there are some good examples, particularly around commercializing Katalyst D2D (read more here). As you may recall we worked collaboratively with one major pharmaceutical company and then since then several others. In order to describe the changes I’ve seen in scientists’ job roles in these companies I’ll talk about what I’m used to, particularly in chemistry. If you think about the pharmaceutical industry, traditionally therapeutics are made using small molecule technology and they matriculate through a drug discovery and development process, ultimately into commercialization.
If you go back five years, what we saw, particularly in discovery, was reliance on an external ecosystem of suppliers and contract research organizations, contract development and manufacturing organizations. Those ecosystems are healthy and vibrant. But what we are also seeing now is a resurgence, in my view of the market, of internal investment. This is based on that market immersion in working with these different perspectives and clients. What I see is a shift back to investment in core infrastructure, like technology platforms, robotics and automation. I won’t go into the reasons behind that resurgence, but I do think that it feels like the scientists working on really pivotal projects want to have a shift back into a more balanced portfolio between internal and external work.
I could surmise or assume that there’s several reasons for that. One might be the ability to collaborate at the project team level. I don’t think that the technology that supports collaboration has fully replaced the level of, we’ll call it quality, face-to-face collaboration Certainly there are efforts to use technology to make geographically disparate collaborators work more effectively together. But when you look at the level of cross-functional effort that goes on to work through the drug discovery and development and commercialization process, those folks at times need to work very closely together. That’s one reason. The other is, from my perspective, advances in experimentation technology. You can have in-house staff that are today using far more productive technology than they were in the past.
Early adopters of this new technology have realized productivity gains in their drug discovery and development processing. As an example, utilizing high throughput experimentation technology you can produce materials at a rate faster than you could in the past. Now there are drawbacks, but certainly the high throughput experimentation paradigm is yielding benefits. We do see an uplifted interest in investing in automation and so you have in-house capabilities that are highly productive. You’re certainly going to continue to leverage externalized resources for work that doesn’t require automation. But I think balancing between the two is really important and we do see that senior leaders within these organizations are also recognizing the value of that balance. I do see an uplift in hiring of internal staff in parts of the world where major pharmaceutical R&D operations are being performed.
What I see is a lot of chemists moving into those organizations. Now, the chemist of today is a different chemist than they were even five years ago.
RM: How has the role of a chemist today changed from five years ago?
AA: Let’s pretend you’re a medicinal chemist in a pharmaceutical organization. Your responsibility is to determine how to optimize a particular lead compound and make it into something that would be nominated for candidacy to clinical development. In the old world your responsibility was largely focused around a fume hood, and your work would be often devoted to some design work, understanding what molecule to make based on things like structure to activity relationships. You may even do modeling to determine how a particular molecule might fit into a particular drug target and optimize the molecule based on how it should fit in.
Now, a changing factor is artificial intelligence, where you have machines prescribing what to make. That is a realistic future where you’re using in silico tools to help augment the scientist’s decision making around what to make next.
Following a machine-augmented approach to defining what to make, you now decide how to make it. You’ll want to make these materials and subject them to physical assays, in vitro assays etc. You would then utilize tools to help prescribe the process of making a particular material. If you have heard anything about the innovation in retrosynthetic analysis and reaction prediction, that is an area where scientists of today will utilize those technologies. What that then implies is if you’re having machine learning applications or artificial intelligence applications prescribing what to make and how to make it, presumably that process can be very fast with the speed of computers and other factors.
Where the bottleneck moves to is in how to make things in parallel and in high throughput. The next technological innovation is in high throughput experimentation. Where you’re able to produce more materials faster than you could in the past. Historically, you’d work on one or a small set of reactions at one time and we’ve seen in today’s paradigm or maybe the short-term future’s paradigm, depending on what company you’re talking to, you can now use automation tools to produce up to 1500 molecules at a time. The rate of going through that traditional trial and error process to arrive at a drug development candidate is much faster if you utilize the combination of artificial intelligence for design, artificial intelligence for reaction planning and then automation tools for high throughput and parallel experimentation.
The final thing you’ll want to have is what I’d call no loss fidelity decision support interfaces. What a lot of companies are also investing in is looking across, from design to execution to task, all of the data that is generated during those discrete unit operations in the scientific process to be able to present that data in a holistic fashion to decision makers.
From my perspective what that means for the scientist is in addition to their chemistry knowledge and their biology knowledge, their pharmaceutical knowledge they also need to be able to deal with a lot of data. Part of their job transitions from being a chemist to almost being like a data scientist or a data engineer.
RM: Does that mean that today’s analytical chemist will spend less time in the fume hood, or will they be expecting to spend the same amount in the fume hood, and on top of that analyze data?
AA: I would say that in the future there is no fume hood and what I mean by that is instead of interfacing with what you’d classically visualize as a fume hood with reaction flasks and the like, the future paradigm or even the current paradigm is you’re walking up to robots who are inside the fume hood or glove box. Where you’re effectively providing machine instructions to the robots who go and do the work for you, okay? The transition is that the scientists aren’t touching materials any more. You’re essentially providing machines with instructions across this set of unit operations you would execute during the process. That’s certainly different than what you would do even three years ago as a traditional chemist. You’re really interfacing with robotics and digital software interfaces.
RM: How can software help chemists in this new role?
AA: It’s the transcription and translation between systems. Certainly, there is a significant amount of human effort. We talked about this data engineering need currently to be able to transcribe information from one system to another and a simple example is if I’ve executed a reaction with an automated reactor system I would say the majority of analysis that is performed is off the deck. What I mean by that is the reaction deck that has a robot, a robotic arm, that dispenses materials into containers, those containers will serve as reaction vessels, those reaction vessels are subjected to different environmental conditions, like heating or stirring or pressurization, etc.
At the conclusion or even during the experiment you’ll want to perform some sort of analysis to determine how the reaction is going. Often times what that means is the robot will sample either at the end of the experiment or during the experiment and create analysis samples. The analysis equipment is usually separate from the reaction equipment.
I need to make sure that the data that I generate from the analytical experiment is somehow associated through the sample provenance to my reaction experiment. That’s indeed one of the challenges right now is interfacing between these systems. What we’re a strong advocate for is to make software do that work of transcription and translation; make a software do that for you. What we work on is helping our customers interface the reaction equipment and the analysis equipment by creating digital representations of that sample provenance and then formatting those digital representations so that they can be consumed by, for example, analysis equipment.
A practical example is if I’ve sampled a 96-well plate’s worth of reactions at the end of the experiment, the reaction, I’m going to sample and drop into 96 HPLC vials and then I’d walk over and load those HPLC vials. What I need are identifiers that associate the HPLC vial to the position in the 96-well plate so that I know what the sample belongs to.
Within our Katalyst application we have identifiers for the reaction plate, and we map those identifiers to the sample plate that you would load onto the system. Furthermore we prepare a sequence file for those samples, where the sample identifier relationships, are accounted for. Whether it’s a comment field or the name of the file or a variety of ways to make that association. Then, what we do is once the data is acquired, we read, so Katalyst will read the sample identifier, make the association to the appropriate reaction information.
What that then gives is a software experience that has all of the reaction information, like what reagents did I add and what product did I make, and I have all the analytical data associated to that reaction information. Now what I’m able to do is walk up to a software interface that has all of that information in one place. Traditionally what scientists would have to do after this whole experiment is take data from the analytical software package and data from the reaction and make the associations themselves. That can be quite time-consuming work. We reduced that work practically to zero.
RM: Will software advances mean that scientists don’t need all this data-handling training or will it just take a lot of the manual labor of data handling out of the equation?
AA: There’s two schools of thought from my perspective. The first is that you build tightly integrated monolithic systems. There are certain companies that build these very high-end platforms with perfectly integrated monolithic applications. These are robotics platforms coupled with software, all tightly integrated. While those are great and, in that paradigm, you see less data engineering, because these are monoliths, they’re not modular.
There’s a consequence; if the scientific experiment you’re performing doesn’t fit into the platform, it won’t be supported by the platform. I’ll give you an example to illustrate the point. Say you had a type of chemistry that required a pressure level that the platform couldn’t support. Now you’re relegated back to doing the fume hood chemistry that you would do traditionally. Your platform doesn’t support it. The breadth of experiments that you can perform with those monolithic platforms is limited. The analogy I like to say is that it’s like you have a house and all you want to do is move the couch, but you have to rebuild the house to do so. In these examples the monoliths, while they are efficient for the intended scope, if the scope changes it’s very difficult.
Another trend we see is modular automation. If you need to change a particular element or a unit operation in your automated process, there are plenty of options for that particular unit. It’s the data integration that becomes a burden.
What we try to do is offer an ability to integrate or change different components of the platform using software and integration tools to reduce the risk of creating monoliths in a platform, make them modular a priori. You do that with effective software integration. Your data gets integrated. As opposed to doing hard code like building software that operates equipment in a monolith, we’re effectively using the software that exists in the modular component and providing instruction lists between them. That instruction list can be human-delivered or software-delivered. It depends on the modular component’s application programming interface and what it can receive and support, etc.
The point is: I don’t think you’ll ever have to completely not understand data engineering because of that need for modularity. We certainly want to reduce the burden of manual transcription between systems, but we would facilitate either an automated or very convenient and efficient mechanism by which you can translate information, by virtue of automatically reformatting data. If we can reformat that data using software, it greatly reduces the burden on a scientist to transcribe information from one system to another.
RM: Do you have any other advice for new chemists coming into a field which has changed so rapidly in the last five years?
AA: I would say that the more you have experience in dealing with predictive applications, certainly that is an important skill set to acquire. The second thing is being able to deal with data using some of the more modern data processing and analysis tools is also equally important. Finally, from my perspective, because we’re talking about high throughput and parallel, I can’t help but think that a good understanding of statistics is an important skill set to acquire. The reason being that if you have access to highly scalable reaction equipment, the ability to assure that you’re conducting an effective statistical design of experiment, so that you capture as many variables as possible with the minimum set of experiments, that’s a really important knowledge set to have because if you can execute 1536 experience in parallel, it’s probably a good idea to maximize the amount of information you’ll glean from those 1536 experiments. One way to do that is to utilize statistical design of experiment math. I think that’s an important thing to be aware of. By the way, a lot of AI and a lot of machine learning, a lot of those statistics that you can get double the benefit not just in your experiment design but also in the way you analyze data.
Andrew Anderson was speaking to Ruairi J Mackenzie, Science Writer for Technology Networks