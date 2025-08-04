Thank you. Listen to this article using the player above. ✖

Read time: 2 minutes

Social scientists are exploring the use of large language models (LLMs) to emulate human speech and responses. These tools can simulate study participants, allowing researchers to test assumptions, pilot studies, and optimize experimental design at lower cost. Early findings suggest LLMs can complement, but not replace, human data collection.

Using LLMs to replicate social experiments

Luke Hewitt, senior research fellow at Stanford PACS, and colleagues tested whether LLMs could replicate results from 476 randomized controlled trials (RCTs). Using GPT-4, they simulated how Americans would respond to different treatments previously studied in human subjects. The model’s predictions correlated strongly (0.85) with measured outcomes and were as accurate as forecasts from human experts, even for studies published after GPT-4’s training period.



While this accuracy is promising, Hewitt noted that newer models are harder to evaluate because they can access fresh online data. Researchers may need archives of unpublished studies to reliably benchmark future LLMs.

Addressing limits in response variability

One key limitation is distributional alignment, or how well LLMs capture the range of human responses. Nicole Meister, a Stanford graduate student, examined methods to improve variation in simulated answers. Techniques such as prompting LLMs to generate multiple simulated participants or providing prior distribution data (“few-shot” steering) helped align outputs with real human response patterns.



This approach was most effective for opinion-based questions but less reliable for predicting individual preferences, suggesting that LLMs are not yet suited for tasks such as forecasting consumer choices.

Challenges in bias and validation

Researchers identified further concerns with LLMs:

Bias: Models often misrepresent social groups, relying on stereotypes.

Models often misrepresent social groups, relying on stereotypes. Sycophancy: Models provide agreeable but inaccurate responses.

Models provide agreeable but inaccurate responses. Generalization: Models perform poorly when applied to unfamiliar populations or settings.

Models perform poorly when applied to unfamiliar populations or settings. Validation gaps: Confidence in model predictions depends on testing them against well-understood scenarios, which remains limited.



These challenges underline the need for methods that keep experiments grounded in human data.

Combining AI with human studies

A hybrid method known as prediction-powered inference integrates human and AI-generated data. Stanford sociology graduate student David Broska developed this approach, which uses a small human pilot study alongside an LLM simulation to assess interchangeability. This combined dataset can increase statistical power while lowering costs.



By first simulating experiments with LLMs, researchers can refine study designs before recruiting participants. However, final validation still depends on human trials, reinforcing the principle that experiments on human behavior must ultimately be anchored in real-world data.



Reference: Anthis JR, Liu R, Richardson SM, et al. LLM Social Simulations Are a Promising Research Method. Arxiv. 2025. doi: 10.48550/arXiv.2504.02234



This article has been republished from the following materials. Note: material may have been edited for length and content. For further information, please contact the cited source. Our press release publishing policy can be accessed here.





This article is based on research findings that are yet to be peer-reviewed. Results are therefore regarded as preliminary and should be interpreted as such. Find out about the role of the peer review process in research here. For further information, please contact the cited source.





This content includes text that has been generated with the assistance of AI. Technology Networks' AI policy can be found here.