More Tools and More Time: An Easier Future for Bioinformaticians?
Blog Jul 09, 2018 | by Ruairi J Mackenzie, Science Writer for Technology Networks
In the second of a two-part blog series, we talk with Genestack CEO Misha Kapushesky on the developments in Bioinformatics that he’s most excited about, and the latest additions to Genestack’s bioinformatics platform.
Ruairi Mackenzie (RM): You were recently presenting at Bio-IT World 2018. What are the developments within the bioinformatics industry that you are most excited about?
Misha Kapushesky (MK): I’d say that there are three developments that I’m really excited about. One of them is the elephant in the room which you cannot really ignore; the possibilities that are opened up by advances in machine learning and associated tools. It’s really exciting stuff, and I think that you can see that in the collaborations that are happening between pharma companies and young start-ups like ours and the resources these companies put into developing means to bring diverse data together to make better predictions.
At first, machine learning was used in areas where you had imaging data, which is rich in volume and sits well into the sort of classic machine learning formulation or categorization. But now we’re talking about conversational interfaces, we’re talking about large-scale networks of data and extracting knowledge out of them. For this all to work you need to have a significant human input into having clean and organized data and metadata; it’s not just about having more data but good quality data. This is a pretty important development; I think it’s an interesting evolution where some of the routine things that humans have been doing will be taken up by algorithms, but this will free up humans to do exciting intellectual things. So that’s an exciting development.
Another development that I’m excited about is that up until recently, multi-omics and generally omics data really belonged purely to the research domain. And now it’s being recognized that the clinical domain is interested in omics data. So, clinical trials now collect omics data. When you talk about a new drug’s development, one starts to look at patient omics profiles and again you’ll see what used to be kind of translational medicine now edging even further into the clinical part of the drug development cycle.
The third thing is that it’s exciting to see that the precision of what we are measuring about our genomes is increasing. You look at the origins of genomics and genetics; we had Mendel, who would classify peas as either wrinkled or smooth – that’s the extent of what we knew about. Then we grasped the existence of genes and were able to understand that there is such a thing as mRNA which corresponds to gene activity and we were able to put it on a gel and get a readout: “bright versus not very bright”. So that was already something. Then microarrays came along, and we were able to measure this on a log scale, to say with clarity that “this is four-fold brighter, so we have four times more expression”. Then, we invented sequencing. But all of this was, you know, fairly large-scale measurement. Now we’re talking about single cell measurement, which was an exciting newcomer, let’s say a year or two ago. Now we’re getting routine data sets of millions of cells, where you can get an idea of what’s going on in the genetics at the level of an individual cell. And I think that the next step is to analyze a single cell without taking that out of the body and over time. We will have a complete picture of what each cell in your body is doing over your lifetime. That’s kind of the holy grail goal. We’re not there yet, but that’s the goal.
RM: At Bio IT World, you were presenting your Expression Data Miner module, a transcriptomics tool – why have you decided to focus on this omics with your latest module?
MK: There’s several reasons. It’s less to do with transcriptomic data somehow being more difficult to handle than other data. It’s to do rather with the relationship between the biologist, the researcher, the bioinformatician and the data manager. Currently, to some degree all omics data is siloed, pretty difficult to find, pretty difficult to organize.
One of the first things that we have developed is a single place where all the, in this case, transcriptomics datasets are searchable in one place from across all public sources. If you have your own collection of this data type you can also bring it in. This is just the first module, and the following modules that we have got underway we will be tackling additional omics data types.
In drug discovery, one of the first things you do is identify and validate targets. And the ability to understand gene function using that transcriptomics data is going to be one of the first data types that you look at. You want to know if your target gene is expressed in a particular tissue and only in that tissue; in other words, if you want to target it in the lungs, you don’t want also for this gene to be active in the brain and the heart tissue and elsewhere, so you can minimize off target effects. You also want to make sure that the gene is only active during the disease condition as opposed to all the time. Having your information well-organized and easy to find gives you the ability at certain stages to be quite confident about whether to pursue the target or not, and it fulfills an important chunk of requisite information about deciding on a target.
In this decision-making process, there is a big role for a bioinformatician and a data manager. We can save these people a huge amount of time and a huge amount of money usually spent aggregating data collection and pushing it through an analytical pipeline, essentially to answer more or less the same questions that arise time and again in these early stages of drug discovery.
The idea for us is to free up the time of a bioinformatician and data manager and give the task of identifying and interpreting gene activity directly to the biologist, meaning the bioinformatician and data manager can spend their time on building more complex, interesting analytics than commonly is the case.
I would also add that the technology that is available to us today means that we can make things that are truly interactive, truly real time, really graphically appealing and we can capitalize on the new technology development purely from a user experience point of view, so it’s an exciting time.
Misha Kapushesky was speaking to Ruairi J Mackenzie, Science Writer for Technology Networks