Digital Love - Data management and the future of science
Blog Feb 28, 2014
A recent blog post by Digital Science highlighted the impact of poor data management on the future of science. Startled by the statistics and the breadth of the potential impact, I caught up with Nathan Westgarth, Product Manager for Research Tools at Digital Science to understand more about the company, the importance of data management and how Digital Science are addressing these issues.
AB: Can you tell me a little about Digital Science?
Nathan Westgarth (NW): Digital Science was founded to make scientific research more efficient through the better use of technology. We believe passionately that tomorrow's research will be different - and better - than today's. As such we’re committed to putting scientists at the heart of everything we do and creating tools that truly work to help change the way science is done. We incubate and invest in promising start-ups as well as develop tools internally that serve the needs of scientists. In essence everything we do is about helping those who work in science to work more efficiently.
AB: Your recent blog “Five Top Reasons to Protect Your Data and Practise Safe Science” highlights the importance of effective data management, how big a problem is poor data management and what impact can this have?
NW: Poor data management is a really pressing issue for the scientific community - not just for lab management teams, but also for every individual researcher. The amount of research data being generated is currently increasing by 30% every year. Worryingly, one study has found that the odds of sourcing datasets decline by 17% each year and a massive 80% of scientific data is then lost within two decades (Vines T.H. et al. 2013).
From our discussions with scientists over the problems and challenges they face in their work, the difficulty of managing and accessing their data is one of the most common issues cited to us. In an extreme case of data management issues, we heard from biologist Billy Hinchen who told us, "I lost 400GB of data and close to 4 years of work after my laptop was stolen. As a result I ended up getting an M.Phil rather than a PhD.” Clearly, these issues are having a large impact at both a personal and global level.
The concern is that as data output grows, effective data organisation is only going to get more difficult. And if data continues to be managed poorly then science will ultimately suffer. At best experiments will be hard to replicate and findings called into question. At worst papers will be retracted and careers impacted.
To highlight this issue and spark conversation around the problems of research data management, we’ve investigated the statistics and have produced our ‘Love Your Data’ infographic - which includes the five top reasons to protect your data.
AB: How does Digital Science help enable effective data management?
NW: Two of our tools have been designed to specifically help scientific researchers manage their data better. Our popular tool figshare, is a cloud based repository where researchers can store their data privately, share it with colleagues, or make it publicly available and citable with a permanent Digital Object Identifier (DOI). We have also recently released Projects, a simple desktop app that helps researchers stay on top of all their data with a structured and safe way to organise their research. Projects is Mac only for now but a Windows version is in development. Data from Projects can be uploaded to the figshare cloud with one click, creating a truly integrated solution.
AB: With the output of data growing so rapidly what challenges do you see in the coming years and how will Digital Science help address these?
NW: The main challenge is around encouraging the scientific community to change their behaviour to address the issue of data management. If there is no incentive for researchers to record their analyses, then simply telling them it’s a good idea is not enough since it can seem like extra work when they already have enough to do. At Digital Science we try to help by creating software tools that make it easy to integrate best practices into their existing workflows. By engaging with the scientific community via funding bodies, institutions, publishers and government we hope to affect change in a positive way.
Another challenge is that of historical data and how this can be migrated to the new platforms and tools. Data migration of legacy data files is complex and expensive if not done correctly. It's also critical to the seamless transition required of researchers who need access to existing databases whilst at the same time generating new outputs. We have, through products like Labguru and others we have in development, become skilled in helping manage this process in ways that make it as painless as possible. It's all very well talking about the data produced now, but if we ignore the legacy then we're only solving part of the problem.
At Digital Science we’re building tools to make sure that data can be accessed by anyone and linked in a way that fosters new insights into big data. figshare is looking at new ways to make the data as easy to reuse as possible in this manner. With a tidal wave of data and research outputs, academics will need better filters, and Altmetric helps aid the discoverability of research that is getting attention and being discussed online. From a more general organizational point of view Projects helps researchers organize their data on the desktop, while figshare brings the cloud-based storage and sharing features that scientist’s need.
We always refer back to the analogy that dealing with large amounts of data in our personal lives is something we’ve all become accustomed to - for example we have great tools to help us with our music and photo management. Digital Science’s mission is to bring those same best practices and software standards to the scientific community, to enable those who work in science to function more efficiently and effectively.