Optimizing Data Visualization and Analysis for Biologics Discovery
Industry Insight Mar 14, 2018 | by Laura Elizabeth Mason, Science Writer, Technology Networks
Dotmatics recently announced the new release of their data visualization and analysis solution, Vortex, with new bioinformatics features that are tailored to support biologics discovery. We caught up with Andrew LeBeau, PhD, Dotmatics Senior Manager of Biologics Marketing, to learn more about Dotmatics and Vortex. Andrew discusses the challenges associated with harnessing large data sets and the implications that could arise from the inability to sufficiently access, share and analyze scientific data in terms of biopharmaceutical and small molecule discovery.
Laura Mason (LM): Could you tell me more about Dotmatics?
Andrew LeBeau (AL): Dotmatics is a scientific software company that provides cheminformatics and bioinformatics solutions to pharmaceutical, biotechnology, academia, food and beverage, oil and gas, and agrochemical customers. Dedicated to supporting scientific discovery, the company’s products are used mostly in the research phase (e.g. from target identification to lead optimization, when used for drug discovery). The founders of the company, both former scientists at Merck in the UK, are CEO Steve Gallagher and CTO Alastair Hill. The company started by identifying and solving a key challenge faced by scientists, which was to be able to aggregate and view all the relevant data for a project to enable them to make properly informed decisions, in a timely fashion. The initial Dotmatics product, Browser, solved this problem, allowing silo-ed data to be aggregated, browsed, and shared, without having to migrate it from legacy systems. From this initial success, Dotmatics has built a complete suite of integrated informatics products, including an electronic laboratory notebook (Studies Notebook), assay data management system (Studies), registration for small molecules (Register) and biopharmaceuticals (Bioregister), analysis and visualization (Vortex) and workflow management (Cascade).
LM: The volume of data being generated in laboratories is increasing dramatically. Could you touch on the challenges researchers face, when it comes to being able to access, manage and analyze large data sets?
AL: The drug discovery industry has traditionally been much more effective at storing data (in databases and other repositories), than it has been extracting the data out and making use of it! As mentioned previously, the very rationale for Dotmatics was to solve this problem, and so the full suite of products is fundamentally built around accessing and using data. Many other vendors in this space take the opposite approach and focus initially on storing data, and accessing it is a secondary consideration. Drug discovery projects by their nature are complex and span multiple years, meaning that a lot of data are generated over an extended period of time. The data are generated by multiple teams, and increasingly by multiple organizations as more and more work is outsourced with Contract Research Organizations (CROs). The result is that large volumes of project data are spread across multiple vendor and home-grown systems, some of which may be upgraded or converted to different vendor solutions over the time-course of an individual discovery project. Aggregating data together can be a huge challenge without a system built for that purpose. This problem is exacerbated for biopharmaceuticals, compared with small molecules, because the size of the compounds and associated entities is so much larger, which can strain the ability of traditional software applications to analyze and represent the data to the user and do so with good performance. Scientific research is a highly intellectual process requiring focused minds. If an analysis takes many minutes or hours to complete, it interrupts that intellectual pursuit and slows down the entire process.
LM: One of your key areas is ‘biologics discovery’. Specifically thinking about biopharmaceutical research and development, what implications could arise from the inability to sufficiently access, share and analyze scientific data?
AL: The ability to access, share and analyze scientific data is a huge issue for the industry – both for biopharmaceuticals and small molecule discovery. The industry as a whole has been much better at getting data into databases and other storage systems than it has at getting that data out again which is required to enable scientists to actually use it in their research. Addressing this industry-wide challenge remains a key strategy in our roadmap. Dotmatics creates informatics solutions that are capable of aggregating data from multiple sources and presents it in a consumable form to researchers. The drug discovery process is fundamentally an iterative process of experimentation aimed at understanding and refining candidate compounds, always attempting to optimize the set of compounds to those that are the most active, and with the fewest potentially harmful side effects. When researchers are not able to view and analyze all the available data it means they are basing their decisions about the next round of experimentation on incomplete and/or poor-quality information. Sometimes the data are simply not accessible. In other cases, scientists are able to find data, but it takes too long, and there is always pressure to progress the project. This can lead to an experiment being unnecessarily repeated, over and over again, wasting resources and adding to the cost of the project. The ultimate implication is that research and development is slowed, meaning it takes longer to get drugs to market. Since patent protection starts while the drug candidates are still in development, every day of patent protection spent in development is a day lost from the period after the drug is approved and on the market – before generic molecules (biosimilars) can enter the market to compete with the original biopharmaceutical. This can easily amount to millions of dollars of lost revenue every day the drug is not on the market.
LM: What is Vortex? Could you tell us more about the benefits of using this solution?
AL: Vortex is Dotmatics’ solution for advanced analysis and visualization. While other parts of the Dotmatics suite provide basic analysis and presentation capabilities, Vortex has the horsepower to perform complex computations and visualize data in a myriad of different chart types. This includes charts that are highly optimized for scientific data. A key feature of Vortex is that it is built specifically to handle scientific data types. So not only does it understand numbers and text, but it is “scientifically-aware” and natively understands what a molecular structure is, and what a biological sequence is. This is essential for scientists as they need their tools to be able to inherently work in the lingua franca of the industry, and not feel that they have to compromise when trying to interpret their data. Vortex has been used by chemists in pharmaceutical companies for many years, and in the last few years Dotmatics has significantly extended its capabilities around biopharmaceuticals, to parallel the increasingly important role that biologics plays in the overall drug discovery industry.
As part of this initiative, much of the very low-level code was rewritten from first principles to allow extremely high performance when working with large biologics sequences, such as a whole human genome, or when working with millions of sequences. The performance improvements are both in terms of computational speed, allowing billions of sequence comparisons to be made in less time than it takes to make a cup of coffee(!), and in terms of displaying large volumes of data with no discernible lag as users scroll through huge datasets, all on a standard business laptop. These performance enhancements, combined with Vortex’s comprehensive range of analytical capabilities, mean that researchers can focus on doing innovative science.
LM: You recently announced a new release of Vortex. The announcement references new bioinformatics features that are tailored to support biologics discovery. What are these new features and how can they aid researchers working in this field?
AL: There has been a considerable focus on extending the biologics discovery capabilities of Vortex in the last couple of years. Vortex can display biological sequences, richly annotated with information that allow scientists to understand the relevance and function of the molecules they represent. Users can easily find key stretches of the sequence that confer functionality, and edit the sequences to perform “what if” experiments and understand how changes to the sequence can improve the potential of their drug candidates. There are new tools in Vortex that specifically support development of antibody drugs. With these performance improvements put in place, Vortex can now be used to conduct very advanced analyses where the sequence of a drug candidate can be associated with its activity, allowing characterization of the relationships between form and function. These approaches have been used for several years in small molecule drug discovery, but are novel for biologic drugs.
In summary, Dotmatics remains highly committed to advancing science and providing tools to facilitate and accelerate discovery.
Andrew LeBeau, PhD was speaking to Laura Elizabeth Mason, Science Writer for Technology Networks.