Harnessing Big Data in the Fight Against Pediatric Cancer
Harnessing Big Data in the Fight Against Pediatric Cancer
We recently spoke to Dr Samuel L. Volchenboum to learn more about his current research within the areas of pediatric cancer and inflammatory bowel disease. He discusses current challenges of harnessing data and informatics, the importance of robust data collection and standardization, and strategies for optimizing data to enhance the efficiency and impact of research.
Your current research is mainly focused on harnessing data and informatics for pediatric cancer research, could you tell us more about this research?
Samuel L Volchenboum (SV): My main academic interest is in harnessing large sets of data for research. For the pediatric cancer community, this means bringing together international groups of researchers to create common data models for sharing information. Ultimately, the goal is to have data collected all over the world using standardized data dictionaries and then making those data available in a de-identified format to the worldwide research community for study. Of course, there are myriad issues with doing this – everything from lack of data standards, to worries about security and privacy, to issues with data embargo and proprietary claims. But these issues are surmountable, and I continue to be highly optimistic about our success.
In addition to my research as the Director of the Center for Informatics, I am also working with the University of Chicago and Dr David T. Rubin as the Chief Medical Officer of Litmus Health to study inflammatory bowel disease (IBD). The project is to collect real-world data about activity, sleep, heart rate, and diet to better understand how these factors affect patients with Crohn’s disease or ulcerative colitis. Harnessing these data effectively could mean the foundation of personalized medicine for chronic diseases, IBD.
In a recent publication, you reviewed the use of data commons in modern healthcare. To what extent has health data been taken out of non-sharing ‘silos’ and made freely available? How much further does data sharing have to go?
SV: When the 21st Century Cures Act was passed in 2016, it created a mandate to bring medical innovations to market more efficiently. As a result, the National Institutes of Health (NIH) is focusing more on data sharing.
There are many impediments to effective data sharing, but the two main issues are a lack of interoperability and misaligned incentives.
Most data are still collected without regard to any standards. This means that clinicians and researchers often choose their own ways to collect and store data. So, when it becomes the time to share or combine data, the points do not align and the data have to be transformed into a common standard. This process often leads to data loss or compromise. Organizations like the Clinical Data Interchange Standards Consortium (CDISC) are working with clinicians, pharmaceutical companies, and governmental agencies to create mandated standards for data reporting. Of course, this will help ensure that data are transformed into a common standard, but we have a long way to go before the data are actually collected in a standardized fashion without the need for one or more transformations.
Of course, there must be an incentive for a clinician or researcher to share their data. There must be the right kind of governance in place to assure data contributors that the data will be kept safe and shared only under appropriate conditions and with the proper attribution. This is one of the most thorny areas and must be addressed through data sharing agreements.
Your publication “Data Commons to Support Pediatric Cancer Research” references the discrepancy in data driven-progresses for adult cancers compared to pediatric cancers. It highlights that rareness of pediatric cancer cases could be why children with cancer are failing to benefit from the technological revolution, driving the precision-medicine age. What can be done to help combat this discrepancy?
SV: Pediatric cancer is rare – only about 15,000 new cases in the US each year, compared to over 1.6 million adult cancers. Discoveries are made when studying large numbers of subjects, the characteristics of their disease, and their treatment outcomes. When trying to associate genomic findings with clinical outcomes, it is even more critical to have large numbers of patients and associated rich clinical data. Because of the paucity of cases for pediatric cancer, innovations have been driven by consortium trials. But even then, a Children’s Oncology Group study over many years might only enroll dozens or hundreds of patients, depending on the disease. This has led to incredible improvements in survival for most kinds of pediatric cancer. But there remain many cancers that are incurable or curable only with incredibly toxic regimens. To understand and develop better cures, two important innovations need to occur. First, groups around the world need to come together to build standardized data models, so that data can be collected and shared worldwide. Second, much richer clinical data need to be collected alongside the genomic information. Right now, genomic sequencing studies are severely limited by the lack of clinical data and could be significantly enriched though linkage to the associated information from the medical record.
To help solve these vexing problems we are working to bring together groups from around the world to build common data models for pediatric cancer. By creating a commons of pediatric cancer clinical trials data, we hope to provide the world with a rich source of phenotype data to enrich the incredible amount of genomic data being collected.
Could you tell us more about Litmus Health? What impact can the Litmus platform have on clinical research?
SV: As they currently stand, most trials still use outdated data tracking methods to monitor successes. When researchers use digitized forms and questionnaires to touch base with their patients once a month, they sacrifice a lot of the data that the client is generating. Until recently, we have lacked a reliable way to gather continuous, objective data about our patients in a rigorous and compliant way that adheres to industry standards for clinical trial data collection.
Litmus is a clinical data science platform focused on health-related quality of life. We use real-life data collected at the point of experience from wearables, smart devices, and home sensors to guide management and to inform endpoints, and describe the full value of their work in observational studies, therapeutical trials, and post-market research.
Big picture – we make trials more efficient with these new devices. Wearables and other sensors are incredibly rich sources of data, but manufacturers have not yet figured out how to make these data ready for analysis in a clinical setting.
One of our core value props is that we take these real-life data and make them research-ready.
This smarter methodology allows for better tracking of patient data and can streamline the drug-to-market route and help clinicians make better go/no-go decisions and realize the full value of their drug.
In an industry of life or death, where we see one success for a dozen failures, it is imperative to make those successes count from research phase to market adoption.
Biomedical data comes in many forms, and different institutions will store and handle their data differently. How can researchers overcome these inconsistencies when data is grouped to enable maximum value to be taken from these big datasets?
SV: As described above, we lack both the tools and incentives for data standardization and harmonization. Each stakeholder has developed their own, sometimes proprietary ways of collecting and storing data. Data managers running clinical trials at academic medical centers routinely have to copy and paste or re-enter data into vendor-specific platforms for their clinical trials. The trial protocols are usually in static formats like Microsoft Word or PDF, requiring each study site to manually abstract terms for the study and create order sets and data collection forms de novo. Data for studies must be transformed into a common format for reporting to federal agencies like the US Food and Drug Administration (FDA). Outside of clinical trials, there are even less incentives for data harmonization and sharing. While many large hospitals have robust data warehouses, few adhere to accepted standards for data collection and storage, and fewer still use available reference data from public sources to inform their data collection modalities. In short, collecting data so it can be shared and studied is not the priority for most medical centers that are struggling with payments and services and an over-abundance of regulations.
The solutions to these problems will require a realignment of the incentives for robust data collection and sharing. This will come from federal agencies, disease-specific consortia, industry, private foundations, and maybe even the electronic health record vendors. We are starting to see some of these emerging trends now in the form of new grants designed to encourage data standardization and private granting agencies requiring best-practices for collecting and managing data. We are very hopeful that the industry is finally waking up to the need for better data collection, and that we are now beginning leverage these larger and more rich sets of data.
Samuel L. Volchenboum was speaking to Ruairi MacKenzie and Laura Elizabeth Mason, Science Writers for Technology Networks.