We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Uncovering Real-World Patient Insights With Social Media and NLP

Uncovering Real-World Patient Insights With Social Media and NLP content piece image
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

Historically, drug researchers have relied on methods such as focus groups, interviews and questionnaires for first-person accounts of patient-reported outcomes, views, symptoms, use of competitive products and other relevant data points. These methods provide valuable data but are costly and resource intensive, and social media has emerged as a lower cost alternative that is increasingly accepted as a reputable source of information by the research community. Social posts represent a massive and ever-growing source of real-world data (RWD) capable of offering critical insight into patient health, outcomes and experience.

By observing data from social media, researchers can generate insights that address broader populations of patients, healthcare professionals and key opinion leaders while being more efficient and cost-effective. However, the continuous generation of insights that social platforms contain poses researchers with the significant challenge of separating relevant signals from the noise. Natural language processing (NLP) is becoming a prevalent method to address this challenge for research purposes due to its ability to automatically extract and surface structured data from internal and external unstructured sources such as social media and mobile platforms.

Addressing the historic challenges of real-world evidence

The enactment of the 21st Century Cures Act (Cures Act) acknowledged the importance of real-world evidence (RWE) by regulatory agencies. The Cures Act is intended to help accelerate life-saving clinical innovation and development for the patients who need it most, as well as build on the FDA's ongoing work to incorporate the perspectives of patients into the development of drugs, biological products and devices in the regulator’s decision-making process. Such work includes the Sentinel System, the largest multisite distributed database in the world dedicated to medical product safety, and the National Evaluation System, created for health technology companies to generate evidence across the total product lifecycle of medical devices.

There is continued traction in RWD adoption; however, many data challenges remain, including structure, extraction, integration, standardization and quality, as well as patient privacy. Furthermore, traditional methods of evidence generation have proven to be costly with large pharmaceutical companies spending close to 20 million annually1 on RWE generation. NLP can support improved evidence generation by swiftly and efficiently extracting key facts from these unstructured data sources and using relevant reasoning and focused queries to generate RWE from RWD. This enables more patient-centric intelligence for decision making, in alignment with the goals of the 21st Century Cures Act, while also underpinning better efficiency of resources and cost effectiveness.

Extracting more holistic patient insights

Collecting patient feedback helps pharmaceutical companies understand the patient experience throughout drug discovery and development and into the post-market environment. The value of patient reported outcomes is three-fold: to help evaluate if treatments are achieving their intended purpose, offer insight into how certain drugs are affecting a patient’s quality of life and determine appropriate clinical trial endpoints for patient-centric drug development.

NLP can refine data from social media to advance our understanding of disease states and better incorporate patient voice, improving the design of future clinical research trials. Agile text mining provides us with the tools and methodology to better understand what a blur of social media content might otherwise be. For example, collecting data from specific disease patient social platforms, blogs, patient forums and relevant websites can provide a good substrate to develop clinical endpoints that are relevant to patients. By employing NLP, researchers can better evaluate the following to improve the design of future clinical trials for specific populations: 

  • Symptoms of specific diseases, and the impact of these on patients
  • Treatment patterns such as drug switching, adherence or discontinuation
  • Patient information, including history of disease, demographics, social factors and lifestyle

The flexibility of NLP allows it to be easily adapted for different datasets, as well as optimized to extract scientific and healthcare insights from social media platforms or treatment pattern choices in patient blogs. 

Transforming the future of clinical innovation

Social media represents a crucial opportunity for researchers to tap into the unfiltered experiences and outcomes of their key patient populations. By leveraging NLP, they can filter out non-relevant information, observe the linguistic context of these discussions and structure information to generate quality RWE. This ability to discern what matters most to patients could influence clinical innovation – from clinical trial design and outcome measures through post-market research. With this expanded knowledge of the patient, researchers can now transform the way we approach clinical innovation going forward, with patient experience as a central focus.


1.      Market research. Giving Intelligence Teams an AI-powered advantage. (October 2018). Retrieved January 5, 2022, from https://www.reportlinker.com/p05723260/Pharmaceutical-and-Life-Sciences-Real-World-Evidence-Market-Landscape-and-Competitive-Insights.html

Jane Reed is director at Linguamatics (an IQVIA company)