A New Language for Biological Research
Industry Insight Sep 30, 2018 | by Ruairi J Mackenzie, Science Writer for Technology Networks
Whilst all life sciences have benefited immensely from digitization and automation in the last ten years, tools designed specifically for biologists have been somewhat scarce. London-based Synthace are aiming to change that. An interdisciplinary team of computer scientists, biologists, mathematicians, and chemists, Synthace’s flagship product is their software platform Antha, a comprehensive tool aimed at biologists that promises easy automation and integration for their experiments. Intriguingly, the underlying Antha programming language has been made available open source. In this blog, we ask Synthace co-founder and Protocol Engineer Chris Grant how Antha was developed and why it has earned Synthace a “Cool Vendor of the Year” award from Gartner.
Ruairi Mackenzie (RM): Could you tell us about Antha and how Synthace designed and first used it?
Chris Grant (CG): Antha stemmed out from our early work, which we started off to optimize the production of certain biomolecules, proteins and small molecules, widely-used design-of-experiment (DoE) methodologies, and genetic factors like harnessing synthetic biology advances at the time.
We built a series of technologies which abstracted the DoE's execution from the execution itself, that being the low-level details of how you lay out plates, and how you denote instructions for different robotic instruments.
We were essentially using it in-house as a tool to speed up our own development, because the experiments we were running were very, very complex. It essentially spun out of our own internal requirements, such that we built this series of software tools, which now comprises of a language itself.
So Antha is now a high-level programming language, which is designed to be approachable for biologists, and to simplify a lot of those complex, low-level instructions which you need to fully describe the process from the abstract concept of what the process means. For example, if you're doing a construct assembly in which you want to stitch five DNA parts together by cutting them with particular restriction enzymes, and then ligating them with a particular ligase enzyme, the high-level process is essentially, mix these five liquids together with these volumes, and apply some simulation to test whether the actual generated sequences are going to produce a viable product.
Whereas in reality, when you come to run that in the lab, there's all sorts of low-level details which you want to capture but are a burden to manually record. Looking at the specific plate used, the specific stocks, what flow rates you're pipetting at, how many times you mix the sample... All of these sorts of tacit knowledge which often the experts in the lab that carry out these procedures have but won't necessarily record.
So, the language was designed to be expressive enough to describe the actual concept of what you're trying to do, and then leave a lot of the low-level details off for the system to decide at run time, so then you can transfer processes to different instruments, where you might map those low-level details in a different way, depending on the instrument. Also, it becomes easier to record what happened in those experiments, because you're not relying on the person to go back and record what they did. It's being recorded as you run the experiment, so you're capturing a lot more of the details which are critical to successfully reproducing a piece of work.
RM: Is data kept within Antha's cloud, meaning it can be transferred between researchers?
CG: The details would be something that could be transferred. All those low-level details are still captured, either in the system policy or in the metadata, when you come to run an experiment. So that is available to then transfer to different labs or different environments.
RM: How does Antha incorporate machine learning?
CG: It can provide a structured dataset of exactly what you did in one run, and also compare it to previous runs which may have used the same set of conditions, or different conditions, or a campaign of parallel experiments deliberately perturbed under different conditions, so then you can upload this data to cloud or local machine learning platforms. The key thing is that Antha provides that structured dataset that gives you the flexibility to apply the appropriate machine learning for that use case.
A lot of machine learning falls down, really, because it's only as good as the data it was trained upon. If you upload data which has been manually run, or run using different environments, then there's a lot of holes and unknowns which can produce variability.
You get a lot of unexplained behavioral differences that way, whereas the key thing with our system is that we're producing very structured datasets so that you inherently get less noise, and less unknown variability between different datasets. You have a more reliable means to generate useful insights.
RM: Zooming out from Antha, tell me about how Synthace came together.
CG: Back in 2011, it formed in the labs of University College London (UCL). Three of us, Sean Ward, Markus Gershater and myself, were all in different departments in UCL. We came together, and we all had a sort of common philosophy about how we felt biology and process optimization should be carried out, all learning from each other's disciplines.
We started a company which applied DoE methodology, and quality by design approaches, to develop processes for high value molecules, and later service work for particular customers, to work on problematic processes for them. Applying these techniques manually was very troublesome to do at the scale of experiments we wanted to do.
So, we decided very early on that automation was the way to go. We had access to a few different types of robots, so we built this interface which could translate DoE designs from one, and then compile down the instructions necessary for several different robots. And that's how Antha was born, really.
RM: You’ve been named a Cool Vendor of the Year by Gartner – why did you win this award?
CG: I think they recognized that this was an area of the industry which was ripe for modernization, really. There aren’t many other players in this doing exactly what we're doing, and there's a few over in the States trying to build robotic cloud labs, to attempt to run similar processes where you have a workcell to run other people's processes and generate structured data which you can then learn.
Our approach is to be deployed into existing pharmaceutical labs, so it could work on their own equipment, and it would be more conducive to protecting your own IP, and more flexible to either running it in house, or passing it over the fence to a different department, or a different site in another part of the world. Or having that well-defined characterization of your process to then pass across to a contract manufacturer.
We know the troubles that the pharmaceutical industry faces, like declining efficiency of R&D and rising healthcare costs. I think it was recognized that our approach was one of those technologies that could really help out the industry.
RM: Your approach is quite unique - what do you think it is about biological data that has maybe scared others off from focusing on this field?
CG: I think it's the inherent noise, really. A lot of the statistical methods applied in other industries, you can make a lot of headway because you can control the set points much more easily than you can in biology. If you set something to a pH of 7.1, then you can be quite confident it's set to 7.1. In a biological system, you've got a heterogeneous mixture of cells. Even in the most well-studied models, like E. coli and yeast there are a huge number of unknowns.
Additionally, the rate at which you can iterate is a lot slower, as well. Particularly when you come to mammalian cells, where if you run experiments, they're very expensive to run. It can take months to generate variants, and even the most sophisticated analytical methods you have access to, like LC-MS, Q-ToF and Orbitrap will only have a certain amount of resolution, and you're not getting the full picture.
So, I think the complexity of the problem is a barrier, and the cost involved is extortionate. There aren’t many people that can afford to do the breadth of experimentation needed in order to really tackle these problems which the industry faces.
RM: You've made the base language to Antha available open source. What was behind that decision?
CG: Antha’s success will partially be governed by adoption of the language. If it's open source, other people can contribute to the language. And basically, all research which underpins the discovery of new drugs often originates from university research, basic research.
And if people in universities have access to tools which can record what they did more thoroughly, then I think that research translates into industry. You'll get a better success rate, instantly. I think it's in everybody's interest to make these tools more widely available, from a real grassroots perspective. I mean, the pharma industry has its own methods to achieve reproducibility and quality by design, but they're very rigorous and quite onerous for people to actually carry out.
Obviously, they're not going to expect basic research in universities to be done to GMP standards, but you almost kind of need that, in a way, to really make sure a lot of research is genuinely reproducible. If you can get something closer to more rigor, the kind of rigor you would get in a pharmaceutical setting, but without the overheads and inconvenience of filling out countless amounts of paperwork, and essentially catching a lot of this information for free, based on the automation software, then I think it's really in everybody's interest, and for our interest as well.
Chris Grant was speaking to Ruairi J Mackenzie, Science Writer for Technology Networks