Automating Research to Improve Reproducibility and Throughput
Complete the form below to unlock access to ALL audio articles.
For several decades mankind has looked to automate tedious and error-prone manual steps carried out in the laboratory, with a goal to improve scientific reproducibility and throughput. We recently spoke to Charles Fracchia, CEO and Co-Founder of BioBright, to learn how automation can be adopted to help researchers analyze their data. Charles discusses the challenges to consider when performing analysis in an automated fashion, he also highlights the value of your data and the importance of cyber security.
Laura Lansdowne (LL): How is data revolutionizing the way we do science?
Charles Fracchia (CF): Up until now pretty much all of science has being process driven. In particular, at the beginning this process was very manual. The subject selection step was performed manually, the observation was done manually. The analysis was done manually.
We made a tremendous amount of progress when it comes to science, and automation has really helped, particularly with the first two steps (subject selection and observation) but the last frontier, if you will, is the ability to perform analysis in a completely automated fashion.
That concept was unthinkable a few years ago. And now we have wonderful technologies, like machine learning and artificial intelligence (AI), that can help us automate analysis.
We have a volume of data that's unprecedented and we now have the capability of computation that's unprecedented.
That is how data is primarily changing the way we do science – we are going from a process driven approach to a data driven approach. It is turning the whole scientific process on its head. Instead of saying, “I'm going to do A, B and C, and then trust the results” researchers are adopting a data driven process, “I have, A, B and C pieces of data… what is that telling me and what other data do I need to collect?” – this inversion can pose a lot of challenges.
For example, if you're not careful when controlling your data, or if you are careless when collecting your data, your experiment may become completely worthless. The reproducibility crisis is a related phenomenon to this, and it is costing the US economy an estimated $28 billion each year. We often see situations nowadays where scientists are drowning in data with no means to handle the volume and complexity of this data, leading to a tremendous waste of time and resources.
LL: You have explained that both the subject selection and observation steps have been automated for quite some time, whereas the analysis step is still primarily manual. Why do you think that is?
CF: It is a more challenging step, traditionally, it's the step that humans have done. And to this day, we still do it, right? A lot of people would like to think that to be able to automate the analysis step is just a matter of throwing data into an AI platform and “voilà” magic happens, and meaningful insights appear. It's normal, we're amidst the hype cycle where there's a lot of promise. But the reality is automation is just another analysis method, but it shifts the role of the human from one where it's absolutely the bottleneck, to being more supervisory.
There are notions in automation in other fields – not in the biomedical space yet, but it's coming – of the human in the loop versus the human on the loop; whereby the loop is turning around, and the human is positioned on the loop, supervising the process. This is now common-place in car manufacturing, network security, and financial services, but we are bringing these principles to the biomedical field.
Clearly, we're not quite there yet. After every run, even if it's a high-throughput run, a human is analyzing, however, they're playing a narrower and narrower role, which I think is good. This new automation step that reduces error and increases throughput is a welcome transition that's happening that will bring us closer to automated analysis.
LL: You spoke about the fact that in the lab workflow the analysis step can really benefit from machine learning; however, you touch on some of the pitfalls and the things to watch out for. Can you just highlight some of those?
CF: There are three key pitfalls:
1. The Hype circle
2. Cyber security
Cyber security is paramount if your process relies on all this data and all this training which then feeds into a machine learning algorithm which then feeds you an outcome. A competitor or malicious intent user could otherwise spike that data – there are tons of examples of that happening in other fields.
3. Human usability
LL: Cyber security is something that you are particularly aware of. Why are people not getting savvy to the fact that this is something they really need to take notice of, especially as there are examples of how damaging this can be in other fields?
CF: I think culturally speaking, there's been a very wide chasm between computer science and biology and while that is shrinking rapidly there is still a gap. Not only that, even if you focus specifically on the computer science side – it doesn't necessarily get cybersecurity spot on. I mean, we've had enormous crises, recent crises, that we're still trying to fix now. We are certainly still learning as a field.
As I mentioned previously, cyber security is one of our priorities, and this goes all the way back to how we were founded, we received a grant to start our own company from the Defense Advanced Research Projects Agency (DARPA) and I jumped out of my PhD program to become the CEO of BioBright. This is an example of where security is a really important concern – something that has been for years – and they have a lot more experience in autonomous systems compared to other industries.
We must be equipped to avoid similar situations to those where hackers are breaking into power distribution networks and shutting sometimes parts of a country down, right?
We can't have that happen to the biomedical industry because...
It just simply cannot happen.
LL: Do you believe every lab can benefit from automation?
CF: Good question – I certainly think so if you take automation as the broader point. I do feel we are moving towards a more data centric approach, that said, even if the whole process remains manual but automation of data analysis is put in place, then there are still huge gains to be had.
So largely I agree. Although, to the point of the question, you know, automation is not this panacea, it is something that has to be applied correctly and thoughtfully, in a way that enhances the scientist.