Big Data: Why Researchers Should Look for Depth, Not Breadth
Complete the form below to unlock access to ALL audio articles.
Advances in healthcare systems mean our providers have access to more data than ever before. But is more data always beneficial? In this insight, Ben Mansfield, founder of ClinOwl, a content discovery platform for healthcare professionals, discusses the opportunities and challenges of big data for medical researchers.
In the world of scientific and medical research, big data is often touted as the way we will solve some of our biggest questions and treat as yet untreatable diseases. But is more data really better? And what do we need to do to make the most out of all this information?
What are the advantages of Big Data in research?
Big data has the potential to provide answers to many unsolved questions, ultimately providing therapies for as yet incurable diseases, transforming and even saving patients’ lives. With the rise of the internet, shared resources and large-scale data hubs, researchers have access to more data from all over the world. By analysing these extensive data sets, we have the opportunity to compare relationships and monitor patterns with more accuracy. This is especially important in rare diseases where healthcare professionals previously struggled to access sufficient patient populations to draw reliable conclusions1. This can then be used to speed up the diagnosis and treatment of diseases which would traditionally have been difficult to identify.
Big data also provides a great opportunity to support, or disprove, ongoing scientific research as more data points can be beneficial to corroborate and reinforce conclusions. But open science is key to provide access to this data. We have observed a significant increase in the availability of shared resources and collaborative approaches2 during the COVID-19 pandemic, which could have promising implications for open access and research going forward. This does however bring its own challenges in a world of data protection and GDPR regulations. Access to personal, and in this case medical, data requires the anonymisation of data, without loss of the key medical information. Big data is therefore powerless without the correct protection as well as intelligent management and analysis.
Quality is key
Data management is a key concern, alongside making sure we use the right data in the first place. More data points can be useful to see relationships more clearly but adding extra fields that have no relevance in specific contexts can lead to spurious results and even the observation of non-existent correlations3. It is therefore essential that you collect and analyze the correct fields to strengthen your study and provide reliable results. It is also vital not to forget that correlation is not causation, so while more data is helpful, it cannot replace vigorous scientific research to provide in depth understanding4. One way of looking at it is that we want depth of data not breadth.
The use of algorithms to analyze data makes sense in order to reduce confirmation bias, as looking for data that confirms existing opinions is inherent to human nature. This is the case with big data just as with smaller data sets. However, algorithms do not remove this risk completely as humans program the initial algorithm so it could still be biased, and if so, any conclusions drawn would be too.
Analyzing missing data is just as important as looking at the data you have since gaps in data can have significant implications and lead to inaccurate correlations. A study of data collected from Twitter following Hurricane Sandy suggested that Manhattan had experienced the worst of the storm when obviously that wasn’t true5. That’s where most of the tweets originated but this was actually due to population density and proportion of smart phone users in New York, not the actual impact of the storm.
Publishing and data
As mentioned, having access to quality data is important for the progression of medical research and science publishing has a significant role to play. Like big data, the availability of published research has increased exponentially with the rise of the internet. This provides a fantastic opportunity to advance research but once again, depth not breadth is essential. Without the correct management, finding high-quality, relevant research is now more challenging due to the quantity of research available.
Scientific researchers and healthcare professionals are busier than ever during this pandemic so saving time and optimising the efficiency of essential research and learning activities is essential. Platforms which facilitate this by collating research from multiple peer-reviewed journals and analysing it to find relevant, high-quality papers are one example of how technology can help6.
Big data clearly has huge potential and is already transforming research, but the data is only as good as the analysis. Selecting sensible data sets and designing intelligent algorithms is the only way to make sure we come to reliable conclusions. We need to be intelligent in the way we use data, especially in science and medicine, where people’s health and wellbeing is at risk.