Taken on Trust: Ensuring Safe, Secure Access to Health Data for Research
Complete the form below to unlock access to ALL audio articles.
Patient data is one of our most valuable resources for understanding health and tackling disease. However, we must ensure that sensitive patient data is kept safe and secure while it’s being used for research. In this article, Health Data Research UK’s chief technology expert, Gerry Reilly, discusses the key development in health data research security: Trusted Research Environments.
Health data research relies on allowing scientists to access sensitive patient data, so they can uncover patterns and details about diseases and treatments that may otherwise be difficult or even impossible to find.
However, patient health data can be extremely personal and sensitive, so it is essential that it is kept secure and only accessed for legitimate research purposes. "We've got to be able to go back to the patients and say thank you for letting us use your data, we're going to look after it properly", says Gerry Reilly, Health Data Research UK's chief technology expert.
Currently, most health data, such as GP records, are kept in a secure database. Scientists can request access to the data they need to answer their particular research questions and are then sent anonymized extracts of the database, in a model known as 'data release'.
Although any personal identifying information has been removed, releasing sensitive data in this way is still a security risk. What's more, it can take a long time for scientists to receive the data they need, and when they do, it may already be out of date.
A safe haven for data
Trusted Research Environments (TREs) are the next generation of secure platforms for data research that could help researchers get the data they need, whilst maintaining security.
"Trusted Research Environments provide a safe technology area that researchers can use to analyze anonymized patient-level data without that data ever leaving that secure area," explains Reilly.
Rather than sending datasets out into the world, TREs allow vetted researchers to come into the database and access anonymized information in a safe, secure environment. It's a bit like having to go to a special secure library to read a precious, rare book, instead of having a few badly photocopied pages sent over to you.
One example of a TRE is the SAIL Databank in Swansea. SAIL enables scientists to access anonymized data remotely, using a secure gateway. The scientists conduct their research remotely within the safe, protected environment, and once their analysis is complete, they can only remove their statistical results, not individual patient data.
TREs come of age during COVID-19
As well as increased security, the TRE frameworks provide rapid access to the latest health data, which has been particularly important during the COVID-19 pandemic when researchers needed answers quickly.
Many vital datasets are now being fed into the UK’s four major national health data-focused TREs - the NHS Digital Data Processing Service in England, Scotland's National Data Safe Haven, the Northern Irish Honest Broker Service, and SAIL in Wales - so it can be accessed safely and securely to support the UK’s response and shed light on this fast-moving new disease.
For example, data from the 4 million users of the UK’s COVID Symptom Study app is being held by SAIL, so it can be safely accessed by vetted researchers keen to understand more about when, where and how COVID-19 is affecting the population.
"Instead of waiting 3-6 months to do their analysis, researchers have been able to access the data they need in days. It's shown us how beneficial TREs are during a public health crisis." Reilly says.
Another TRE that has been used to find out more about COVID-19 is OpenSAFELY. This platform was designed by scientists at the Evidence-Based Medicine DataLab at the University of Oxford, and the London School of Hygiene and Tropical medicine to allow highly secure access GP records for researchers exploring the risks from COVID-19.
The platform has been used to analyze the data from 17 million people, revealing a clearer picture of who is most at risk from dying from COVID-19. The study was one of the first to show that people from black, Asian and minority ethnic (BAME) backgrounds were more likely to catch coronavirus and die from the disease. It also confirmed that being male, having uncontrolled diabetes or severe asthma also increases the risk of dying from COVID-19.
Cutting down on file fatigue
Besides increased security and rapid access to up-to-date data, TREs also remove the need to transfer massive data files to researchers - a problem that is only getting more pressing as the size of modern health datasets grows.
"Research linking things like imaging data, genomic data, and electronic health data is never going to be done by data release because the data sets required are just too big, so TREs are the only scalable way forward," says Reilly.
Although TREs offer several obvious advantages for health data research, Gerry says that most research is still conducted using data release and we are still early on in the journey.
"Some scientists believe that TREs reduce productivity because they are more complex," says Gerry. "Going forwards, we need to strike a balance between security and productivity, but I believe we are making big strides, and the opportunities are just incredible."
About the author
Health Data Research UK is working to make health data securely and safely accessible for research to improve people's lives.