image: A graphical overview of the study evaluating the clinical utility of large language models (LLMs) for hepatocellular carcinoma treatment. The study analyzed 13,614 patients to compare real-world physician decisions with recommendations from ChatGPT, Gemini, and Claude. The findings reveal that while LLM concordance is associated with improved survival in early-stage disease, it correlates with worse outcomes in advanced stages due to divergent clinical priorities.
Credit: Keungmo Yang and Ji Won Han, The Catholic University of Korea (CC-BY 4.0, https://creativecommons.org/licenses/by/4.0/)
Large language models (LLM) can generate treatment recommendations for straightforward cases of hepatocellular carcinoma (HCC) that align with clinical guidelines but fall short in more complex cases, according to a new study by Ji Won Han from The Catholic University of Korea and colleagues publishing January 13th in the open-access journal PLOS Medicine.
Choosing the most appropriate treatment for patients with liver cancer is complicated. While international treatment guidelines provide recommendations, clinicians must tailor their treatment choice based on cancer stage and liver function as well as other factors such as comorbidities.
To assess whether LLMs can provide treatment recommendations for hepatocellular carcinoma (HCC) that reflect real-world clinical practice, researchers compared suggestions generated by three LLMs (ChatGPT, Gemini, and Claude) with actual treatments received by more than 13,000 newly diagnosed patients with HCC in South Korea.
They found that, in patients with early-stage HCC, higher agreement between LLM recommendations and actual treatments was associated with improved survival. The inverse was seen in patients with advanced-stage disease. Higher agreement between LLM treatment recommendations and actual practice was associated with worse survival. LLMs placed greater emphasis on tumor factors, such as tumor size and number of tumors, while physicians prioritized liver function.
Overall, the findings suggest that LLMs may help to support straightforward treatment decisions, particularly in early-stage disease, but are not presently suitable for guiding care decisions for more complex cases that require nuanced clinical judgment. Regardless of stage, LLM advice should be used with caution and considered as a supplement to clinical expertise.
The authors add, “Our study shows that large language models can help support treatment decisions for early-stage liver cancer, but their performance is more limited in advanced disease. This highlights the importance of using LLMs as a complement to, rather than a replacement for, clinical expertise.”
In your coverage, please use this URL to provide access to the freely available paper in PLOS Medicine: https://plos.io/48VHQcm
Citation: Yang K, Lee J, Jang JW, Sung PS, Han JW (2026) Evaluating the clinical utility of large language models for hepatocellular carcinoma treatment recommendations: A nationwide retrospective registry study. PLoS Med 23(1): e1004855. https://doi.org/10.1371/journal.pmed.1004855
Author countries: Republic of Korea
Funding: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (RS-2025-23525359 to J.W.H.) funded by the Ministry of Health & Welfare, Republic of Korea.
Journal
PLOS Medicine
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
COI Statement
Competing interests: The authors have declared that no competing interests exist.