Why Patient-Generated Health Data Could Power the Next Era of Clinical Research
Leveraging patient-generated health data (PGHD) in healthcare and clinical trials has become significantly easier with the advent of digitalization. Alison Bourke, scientific director for real world insights at IQVIA and past president of the International Society for Pharmacoepidemiology, has worked with PGHD over several decades. In partnership with researchers from the pharmaceutical industry, academia and the regulatory sphere, Bourke helps design clinical studies that make the best use of PGHD.
After the recent publication of her co-authored review of PGHD in pharmacoepidemiological research, we talked to Bourke about how PGHD is collected, why it is useful and how we can best safeguard sensitive data.
Ruairi Mackenzie (RM): What is PGHD?
Alison Bourke (AB): I would define PGHD as information about health that comes directly from the patient, either actively or passively, rather than through the lens of the healthcare team. Traditionally we have undertaken research that is underpinned by data from controlled clinical trials or “real-world” information derived from hospital or GP patient medical records.
RM: What techniques can we use to acquire PGHD en masse in large-scale studies?
AB: There are both active and passive techniques. Active, where the participant is engaged to collect data, so they might sign up, for example, to record symptoms of COVID-19, perhaps using questionnaires via the web or smartphones. But equally you can do passive data collection, often with the participants’ involvement, such as using wearable technology or scraping comments from social media to collect anonymous insights into how patients feel or their experience of side effects.
RM: As opposed to clinician-sourced data, there will naturally be more variation in PGHD. How can we account for that in our in our collection and processing of the data?
AB: Generally speaking, it is best to retain structured data, and it can still be coded – you can ask patients, for example, to pick from a list of symptoms. But you can also collect free text and natural language. Then, you would have to employ natural language processing (NLP) techniques to code it in order to make better sense of the data for analysis.
There is an opportunity to gather a huge variety of information not routinely collected via other methods – such as diet, exercise, and the weather. Such rich contextual data may really help understand the multiple facets of health.
RM: How would you take a data set and then use a technique like NLP to make it more structured?
AB: NLP looks at all the language that people are using and transfers it into very much more standardized terminology and, in this context, equates it to particular diseases or drugs. NLP has come a long way in recent times in terms of sophistication of the algorithms and the methods used to make sense of what would otherwise be quite unstructured, jumbled data.
RM: Could you outline for me what you think is the key advantages of PGHD-led approach are?
AB: It’s all about patient-centric data and a good illustration is a story about a friend of mine who had a bad rash. He went to the doctor who said it was likely to be a dietary issue, so put him on a really restrictive diet, with no wheat, no sugar, no alcohol and no fruit.
Sure enough, the rash cleared up completely. A few months later I met him, and the rash was back. “Is the diet not working anymore?” I asked and he replied, “I gave it up. I looked at my priorities and I flipped the scenario. If I had a dietary intolerance where I couldn't eat wheat, alcohol, fruit and sugar and the doctor said, "I can give you this pill, but it will give you a rash," I would have taken that pill in a heartbeat, because it’s so important for me to be able to eat what I want, when I want. I get so much pleasure out of it that I can cope with the rash.”
This illustrates that from a clinician's point of view, the end point is to get rid of the rash, almost at all costs, but a patient's focus is very different, depending on their values and their lifestyle. Healthcare is moving much more towards a personalized approach based on what the patient wants and needs. A lot of clinical trials are done in a very restricted way, so you don't know the effectiveness of the treatment once people go out and use it in the real world, so collecting information directly from patients means you can understand more of their values and insights into what they want from their treatment and health.
PGHD also gives the opportunity to collect information that's often hard to assess in research, such as adherence. Are patients actually taking their drugs? If they aren’t, why not? Using traditional data sources, trials organizers might not be aware that a patient is not taking their pills anymore because they make them feel sick. But if you're asking patients directly, they can tell you what's really happening.
RM: What are the biggest barriers to using PGHD?
AB: There are many challenges including technology, consistency of data and privacy, but one of the main issues is selection bias, because if you're recruiting patients, for example, by social media or advertisements, then you're going to get a particular group of patients responding and you really need to understand the generalizability of what they're telling you.
Perhaps, if you were doing a social media study on depression, you might find that as your participants felt better, they might be less inclined to be on social media. They might (in non-COVID-19 times) be out enjoying themselves with friends, so you would get a select population responding throughout your study.
One big challenge mentioned in our paper is that there are many new players in this area, such as Big Tech, who have entered the field of PGHD and are particularly good on the analysis side, but they might not understand some of the biases inherent in the data or the analytical pitfalls. This can lead to problems with the interpretation of the analysis. Even though it's a relatively new source of data, you have to use robust tried and tested methodology that's been evolving for decades.
RM: Big Tech has made trillions of dollars through passive data collection for advertising purposes. The collection of people’s healthcare data will be incredibly valuable as well for these companies. How can you convince people that data will be safe with these corporations’ involvement?
AB: This is a point I've written about in the BMJ and various other places. These data are hugely beneficial and can provide an amazing insight and a step change in the way we do healthcare. But you're absolutely right, there is this danger. I think pharma companies have completely bought into protection of this data. They’ve always undertaken informed consent and they keep data extremely securely tied down. However technology companies may not be quite so responsible as it is not baked into their DNA.
There are many clever ways to anonymize data. You can make the data foggy by changing dates or by aggregating symptoms to be less specific, over and above obviously anonymizing by taking away all patient IDs. But it is not just patient IDs, because that would be too simple. The richer the data you have, the more chance there is of identifying people.
You need to protect the data, but also protect access to the data, and this is fundamental and seen as essential by scientists and pharmaceutical companies, and in fact it's slightly weird to me that the public would trust Google more than they would trust a pharmaceutical company because the pharmaceutical company has far more to lose if there were any data anonymity problems. Most people trust the National Health Service (NHS), however there have been far more data breaches from public bodies than commercial companies. So how can we build more trust? It's very difficult. I think. One of the good things coming out of this pandemic is that people have a better understanding of the research cycle and science in general, and perhaps an interest in contributing to the growing body of healthcare data by contributing their scientifically valuable data in a secure way to help produce healthcare more focused on their values.
Alison Bourke was speaking to Ruairi J Mackenzie, Senior Science Writer for Technology Networks