Automating Research to Improve Reproducibility and Throughput

Article

Published: July 5, 2019

Laura Elizabeth Lansdowne

Automating Research to Improve Reproducibility and Throughput content piece image

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 5 minutes

For several decades mankind has looked to automate tedious and error-prone manual steps carried out in the laboratory, with a goal to improve scientific reproducibility and throughput. We recently spoke to Charles Fracchia, CEO and Co-Founder of BioBright, to learn how automation can be adopted to help researchers analyze their data. Charles discusses the challenges to consider when performing analysis in an automated fashion, he also highlights the value of your data and the importance of cyber security.

Laura Lansdowne (LL): How is data revolutionizing the way we do science?

Charles Fracchia (CF): Up until now pretty much all of science has being process driven. In particular, at the beginning this process was very manual. The subject selection step was performed manually, the observation was done manually. The analysis was done manually.

We made a tremendous amount of progress when it comes to science, and automation has really helped, particularly with the first two steps (subject selection and observation) but the last frontier, if you will, is the ability to perform analysis in a completely automated fashion.

Figure 1: A scientific method involves three steps; Subject selection, observation, analysis. Whist tremendous efforts have led to the automation of the first two steps, efforts to successfully automate the final step (analysis) are still ongoing. Credit: Charles Fracchia, BioBright.

That concept was unthinkable a few years ago. And now we have wonderful technologies, like machine learning and artificial intelligence (AI), that can help us automate analysis.

We have a volume of data that's unprecedented and we now have the capability of computation that's unprecedented.

That is how data is primarily changing the way we do science – we are going from a process driven approach to a data driven approach. It is turning the whole scientific process on its head. Instead of saying, “I'm going to do A, B and C, and then trust the results” researchers are adopting a data driven process, “I have, A, B and C pieces of data… what is that telling me and what other data do I need to collect?” – this inversion can pose a lot of challenges.

For example, if you're not careful when controlling your data, or if you are careless when collecting your data, your experiment may become completely worthless. The reproducibility crisis is a related phenomenon to this, and it is costing the US economy an estimated $28 billion each year. We often see situations nowadays where scientists are drowning in data with no means to handle the volume and complexity of this data, leading to a tremendous waste of time and resources.

LL: You have explained that both the subject selection and observation steps have been automated for quite some time, whereas the analysis step is still primarily manual. Why do you think that is?

CF: It is a more challenging step, traditionally, it's the step that humans have done. And to this day, we still do it, right? A lot of people would like to think that to be able to automate the analysis step is just a matter of throwing data into an AI platform and “voilà” magic happens, and meaningful insights appear. It's normal, we're amidst the hype cycle where there's a lot of promise. But the reality is automation is just another analysis method, but it shifts the role of the human from one where it's absolutely the bottleneck, to being more supervisory.

There are notions in automation in other fields – not in the biomedical space yet, but it's coming – of the human in the loop versus the human on the loop; whereby the loop is turning around, and the human is positioned on the loop, supervising the process. This is now common-place in car manufacturing, network security, and financial services, but we are bringing these principles to the biomedical field.

Clearly, we're not quite there yet. After every run, even if it's a high-throughput run, a human is analyzing, however, they're playing a narrower and narrower role, which I think is good. This new automation step that reduces error and increases throughput is a welcome transition that's happening that will bring us closer to automated analysis.

LL: You spoke about the fact that in the lab workflow the analysis step can really benefit from machine learning; however, you touch on some of the pitfalls and the things to watch out for. Can you just highlight some of those?

CF: There are three key pitfalls:

1. The Hype circle

We need to be careful of the hype cycle to ensure people don't jump to conclusions when it comes to machine learning – it isn’t a black box that will solve all our woes –It's not a silver bullet.

This phase has been seen in other areas –self-driving cars is a great example.

We have to go through that hype cycle. So, understanding that machine learning is not something that is going to just magically solve your problems is really important. Understanding when and how it is applicable is extremely important.

2. Cyber security

Something that is very close to BioBright’s heart is cyber security. When you move to a data centered process your data is everything. It holds all the value. I’d like us to have this become more front and center in our public dialogue. This is something that we at BioBright are very concerned about, because other industries have made that mistake before. So we're actively working to position that at the forefront of what we do. For example, our platform DarwinSync is fully encrypted, and it was designed to be so from day one, due to our origins being funded by DARPA and the Department of Defense. It is essential that data be encrypted at rest, that it travels on secure channels and that the whole system be designed to have multiple failsafes, to minimize the outcome of a breach.

Cyber security is paramount if your process relies on all this data and all this training which then feeds into a machine learning algorithm which then feeds you an outcome. A competitor or malicious intent user could otherwise spike that data – there are tons of examples of that happening in other fields.

3. Human usability

No matter how automated a process gets in this field, there is always a human in the loop. Even in the companies that have the highest level of automation, they still have humans interpreting the data and deciding which direction to take in their experiments. Unfortunately, in our field a lot of the solutions that are around don’t incorporate human usability into their design. If you look at a lot of proprietary software, the function is often good, but it’s barely usable. And most of the time, the vendor is more preoccupied with locking you down than giving you the tools and interfaces to make the most of the data it generates.

LL: Cyber security is something that you are particularly aware of. Why are people not getting savvy to the fact that this is something they really need to take notice of, especially as there are examples of how damaging this can be in other fields?

CF: I think culturally speaking, there's been a very wide chasm between computer science and biology and while that is shrinking rapidly there is still a gap. Not only that, even if you focus specifically on the computer science side – it doesn't necessarily get cybersecurity spot on. I mean, we've had enormous crises, recent crises, that we're still trying to fix now. We are certainly still learning as a field.

As I mentioned previously, cyber security is one of our priorities, and this goes all the way back to how we were founded, we received a grant to start our own company from the Defense Advanced Research Projects Agency (DARPA) and I jumped out of my PhD program to become the CEO of BioBright. This is an example of where security is a really important concern – something that has been for years – and they have a lot more experience in autonomous systems compared to other industries.

We must be equipped to avoid similar situations to those where hackers are breaking into power distribution networks and shutting sometimes parts of a country down, right?

We can't have that happen to the biomedical industry because...

It just simply cannot happen.

LL: Do you believe every lab can benefit from automation?

CF: Good question – I certainly think so if you take automation as the broader point. I do feel we are moving towards a more data centric approach, that said, even if the whole process remains manual but automation of data analysis is put in place, then there are still huge gains to be had.

So largely I agree. Although, to the point of the question, you know, automation is not this panacea, it is something that has to be applied correctly and thoughtfully, in a way that enhances the scientist.

Charles Fracchia was speaking to Laura Elizabeth Lansdowne, Science Writer for Technology Networks.

Meet the Author

Laura Elizabeth Lansdowne

Managing Editor

Laura Lansdowne is the managing editor at Technology Networks, she holds a first-class honors degree in biology. Before her move into scientific publishing, Laura worked at the Wellcome Sanger Institute and GW Pharma.

Drug Discovery

Drug Discovery

Automating Research to Improve Reproducibility and Throughput