Today’s Challenge: Managing Big Data
Today’s Challenge: Managing Big Data
Now that we have it, what do we do with it?
There are some amazing parallels between where society stands now with regard to the digital revolution and where it stood with regard to the electrical revolution a hundred years ago. Within the first decade of the 20th century, an entire new global infrastructure – from telephone networks to roads to indoor plumbing – was being implemented and some of the greatest changes ever were taking place with regard to how things were done. A similar sea change is happening today in the beginning years of the 21st century as the promises inherent in digital technology are blossoming into new applications that seem to evolve before our eyes.
The recent Molecular Med Tri-Con 2014 conference in San Francisco, CA from February 9-14, 2014 offered a plethora of insightful presentations on a range of topics related to driving change and shaping the future of medicine. Tri-Con offered channels and symposia on diagnostics, clinical, cancer and of course informatics issues, with focused tracks within those topics that zeroed in on specific areas of interest. The informatics channel focused on Bioinformatics for Big Data, Integrated R&D Informatics and Knowledge Management, and Genome and Transcriptome Analysis. It’s impossible to attend more than a few key presentations, but the ones this editor attended were excellent, providing case studies, panel discussions, and insightful commentary on how to deal with current laboratory challenges associated with data management.
Data Management Challenges
In a nutshell, it’s not just the volume of data being generated but the difficulties searching, retrieving and sharing it. Conference presentations included numerous case studies on how different companies – from big pharma to small biotech labs – solved various aspects of their data management and process workflow challenges.
There’s a definite recognition that managing big data is ultimately tied to lab workflows and that there is a need to change the way things are done in the lab, but … and it’s a big but … several presenters emphasized that there needs to also be a recognition of staff comfort levels with changed and that the new system(s) and procedure(s) need to be easy to use, reasonably intuitive and as non-disruptive as possible.
A panel on data integration and sharing proposed and discussed that the key to the entire big data conundrum is making big data useful. They agreed that this is easier said than done since the data needs to be tailored to the different discovery teams because these different teams have differing needs.
Research Goals are Changing
As researchers are quite aware, the more we know, the more we need to know. As one panelist pointed out, “this makes the real challenge behind data not one of generating it, but of understanding it.” Within the research environment, this challenge is driven by the issue that the nature of research has changed from the goal of big blockbuster drugs to one where personalized medicine has become the focus; an elusive and (so far) less lucrative end-game.
The focus has changed from finding a single marker of a disease state and addressing it with a specifically targeted drug. Instead, the challenge has expanded and deepened so that it’s not just genetics research, but narrowed paths in a number of different directions such as epigenetics research.
Part of this challenge becomes filtering the information so that researchers can focus on the most important attributes, and this means capturing the data digitally and automating the entity, user and investigation factors to drive better visualization and analysis. The solution that is emerging is network-centric rather than the previous hardware-centric and software-centric approaches of the past.
Digital Infrastructure is Complex
The complexity of the digital infrastructure that supports networking, sharing and analyzing data is a large part of the challenge right now. Add to that the constant innovations in laboratory hardware (the instruments) and software (the systems), and you have a constantly moving target that has mushroomed over the past decade. In particular, it is not just the various databases that need to be accessible by many more people, but extending those and integrating those databases into a shared knowledge network is a significant part of the big data challenge.
Dr. Vinod Kumar’s presentation on Harnessing Big Data to Accelerate Drug Development reinforced the data challenge that is going on. He pointed out that 4.0 zettabytes of information were created and replicated in 2013 – that’s 4.0 trillion gigabytes – and this number is expected to grow 50-fold by the end of 2020.
The challenge of managing all this complex, disparate, large and rapidly growing data will drive an equally dramatic sea change that we can’t predict yet will be able to look back upon with equal amounts of amazement and appreciation. Kumar emphasized that there is no end point in sight and that storage issues are a real concern.
Research organizations are also dealing with the current issue that most candidates fail before reaching the market. The era of the blockbuster drug is drawing to a close. Being a multinational pharmaceutical with vast resources doesn’t provide a competitive advantage any more; neither does being a small, entrepreneurial biotech company. The issue of understanding the data and developing viable candidates crosses all boundaries. One solution has been drug repositioning, whereby a drug that was created for one disease or problem is found to work well for another. For instance, Viagra was originally developed for hypertension but is now a leading erectile dysfunction solution. However this is an interim solution that bolsters the bottom line for a short while and not one for long-term corporate fiscal sustainability.
Key to Future Success
The key to the future will be a complete overhaul of how activities and workflows are performed in not just the lab, but across the entire research organization. Processes will need to be more efficient and more effective. A close examination of who does what, when, where and how will cause a realignment of the way things are done. The corporation needs to better preserve, capture and utilize the data and associated knowledge they possess. Streamlining workflows for process efficiencies will become an ongoing activity in research organization that focus on six sigma practices.
In the current economy, it will remain difficult to add new researchers and their associated overhead. In any case, more brains won’t solve the data management challenge. Instead, many organizations are outsourcing tasks to a trusted CRO and automating as many processes as possible to get their data into a digital format that enables fast, shareable retrieval.
When a research organization has thousands of projects occurring simultaneously, these projects will need to be connected to ensure economies of scale and process efficiency through better utilization of resources. That will mean more outsourcing of not only research, but sharing of knowledge outside the organization to drive better knowledge within. It will be a brave new world.