Large language models (LLMs), such as ChatGPT, have rapidly entered healthcare, but strong clinical evidence for their real-world use remains limited. A new study published in Gastroenterology & Endoscopy provides the first overview of randomized controlled trials (RCTs) evaluating LLMs specifically in digestive diseases.
The international research team systematically reviewed published and ongoing RCTs conducted since 2022 and identified only 14 eligible trials worldwide—four published and ten ongoing. Most studies were carried out in China and the United States and focused primarily on gastrointestinal and hepatobiliary diseases. The most common applications of LLMs included clinical decision-making and patient education, with question answering being the dominant task.
“We found that while enthusiasm for using LLMs in digestive diseases is growing, high-quality clinical evidence is still scarce,” said first author of the study Dr. Peng Wu, from the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. “Randomized controlled trials are essential to determine whether these tools truly improve patient outcomes and healthcare quality.”
Notably, although many studies claim clinical relevance, only a subset used real patient data, and most trials were single-center and exploratory in nature. The authors also found that both general-purpose models (such as ChatGPT) and domain-specific medical language models are being tested, reflecting different strategies for integrating AI into clinical workflows.
Dr. Zhirong Yang, co-corresponding author, emphasized the importance of cautious implementation. “Large language models should not replace clinicians. Instead, they should be evaluated as supportive tools that extend clinical capabilities while maintaining human oversight,” he said.
The review also highlights several gaps in current research, including the lack of international multicenter trials, inconsistent reporting standards, and limited assessment of ethical risks such as hallucinated outputs and data privacy. The authors call for future trials to adopt standardized reporting guidelines and focus on real-world patient outcomes.
Overall, this study provides a timely snapshot of how AI language models are beginning to move from experimental tools to potential clinical assistants in digestive healthcare—while underscoring the urgent need for stronger evidence before widespread adoption.
###
Contact the author:
Dr. Feng Sha, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, Email: feng.sha@siat.ac.cn
Dr. Zhirong Yang, Shenzhen University of Advanced Technology, Shenzhen, China, Email: yangzhirong@suat-sz.edu.cn
The publisher KeAi was established by Elsevier and China Science Publishing & Media Ltd to unfold quality research globally. In 2013, our focus shifted to open access publishing. We now proudly publish more than 200 world-class, open access, English language journals, spanning all scientific disciplines. Many of these are titles we publish in partnership with prestigious societies and academic institutions, such as the National Natural Science Foundation of China (NSFC).
Journal
Gastroenterology & Endoscopy
Method of Research
Literature review
Subject of Research
People
Article Title
Randomized controlled trials evaluating large language models in digestive diseases: a scoping review
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.