image: Co-author Marvin Kopka from the Division of Ergonomics, Department of Psychology & Ergonomics (IPA) at Technische Universität Berlin.
Credit: Marvin Kopka
(Toronto, May 11, 2026) Researchers at Technische Universität Berlin have discovered that teaching Large Language Models (LLMs) to mimic human intuition and reasoning significantly improves their ability to provide accurate medical care-seeking advice. The study, published in JMIR Biomedical Engineering from JMIR Publications, suggests a paradigm shift in prompt engineering: moving away from computer-focused instructions toward strategies rooted in applied psychology.
As millions of users turn to tools like ChatGPT for health advice, a persistent issue remains: AI often defaults to emergency or professional care recommendations, even for minor issues, out of extreme caution. This over-triage can lead to unnecessary healthcare costs and patient anxiety.
The Breakthrough: Naturalistic Decision-Making (NDM)
The research team, led by Marvin Kopka and Markus A. Feufel, tested 10 different ChatGPT models (including the newest GPT-4o and GPT-5 series) using prompts inspired by Naturalistic Decision-Making (NDM). Unlike traditional logic, NDM focuses on how human experts make high-stakes decisions under uncertainty.
The study utilized two specific psychological frameworks:
-
Recognition-Primed Decision-Making (RPD): Instructing the AI to match the patient’s symptoms to "ypical cases and mentally simulate the outcome.
-
Data-Frame Theory: Tasking the AI to build a mental frame of the situation and constantly question it as new data emerges.
Key Results
-
Significant Accuracy Boost: NDM-inspired prompts increased overall accuracy across all models. The most notable gains were in self-care advice, which jumped from a meager 13.4% with standard prompts to nearly 30% with NDM reasoning.
-
Activating "Thinking" in Simpler Models: Non-reasoning models (which typically failed to identify self-care cases) began providing accurate, nuanced advice when given a "human reasoning blueprint."
-
Safety Maintained: While the AI became better at identifying when it was safe to stay home, it maintained its high accuracy in identifying true emergencies.
“When testing AI, we too often give it perfect information and then see that it performs extremely well,” said author Marvin Kopka. “But many problems in the real world are ill-defined. We have good models for how experts make decisions in such situations, so using them as prompts seemed like an obvious next step. I hope that applying human decision-making to LLMs will help us develop AI tools that are also useful in real-world decision-making.”
Bridging the Gap to Personalized Medicine
The study suggests that in real-world situations, where medical data is often messy or incomplete, a "reasoning blueprint" based on human cognition can be more effective than standard computational logic. By instructing the AI to simulate outcomes and question its own initial "frames" of a situation, the researchers were able to mitigate the common AI tendency toward over-caution.
While these findings mark a significant step forward in making LLMs more effective partners in clinical decision-making, the team notes that the model is currently best suited for controlled environments. Future research will be essential to determine if these NDM-inspired prompts translate into better decision support for everyday users in non-standardized settings.
Recognition for Excellence
About the Author Team: The research was conducted by Marvin Kopka and Markus A. Feufel at the Division of Ergonomics, Department of Psychology & Ergonomics (IPA) at Technische Universität Berlin. Their work focuses on human factors and the safe integration of AI into human decision-making environments. Marvin was recently recognized as one of the five winners of the 2025 JMIR Publications Early Career Researcher Award, an honor that underscores the caliber and impact of the research presented in this study.
Original article: Kopka M, Feufel M. Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study. JMIR Biomed Eng 2026;11:e88053
URL: https://biomedeng.jmir.org/2026/1/e88053
DOI: 10.2196/88053
About JMIR Publications
JMIR Publications is a leading open access publisher of digital health research and a champion of open science. With a focus on author advocacy and research amplification, JMIR Publications apartners with researchers to advance their careers and maximize the impact of their work. As a technology organization with publishing at its core, we provide innovative tools and resources that go beyond traditional publishing, supporting researchers at every step of the dissemination process. Our portfolio features a range of peer-reviewed journals, including the renowned Journal of Medical Internet Research.
To learn more about JMIR Publications, please visit jmirpublications.com or connect with us via X, LinkedIn, YouTube, Facebook, and Instagram.
Head office: 130 Queens Quay East, Unit 1100, Toronto, ON, M5A 0P6 Canada
Media contact: communications@jmir.org
The content of this communication is licensed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, published by JMIR Publications, is properly cited.
Journal
JMIR Biomedical Engineering
DOI
Method of Research
Data/statistical analysis
Subject of Research
Not applicable
Article Title
Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study
Article Publication Date
8-Apr-2026
COI Statement
MK is an associate editor for JMIR Public Health and Surveillance.