Technion researchers develop an innovative approach for identifying limitations and “hallucinations” in artificial intelligence models
Technion-Israel Institute of TechnologyReports and Proceedings
Reliability Check: Technion Researchers Develop a New Method to Detect AI Errors and Hallucinations
As large language models (LLMs) become increasingly integrated into everyday applications—from translation and content generation to coding and scientific research—ensuring their reliability has emerged as a critical challenge. Researchers at the Technion – Israel Institute of Technology have developed an innovative approach for identifying errors, hallucinations, and other undesirable behaviors in AI systems without requiring a full understanding of how the models work internally.
The research was led by Dr. Haggai Maron of the Andrew and Erna Viterbi Faculty of Electrical and Computer Engineering, together with Ph.D. student Guy Bar-Shalom, postdoctoral researcher Dr. Fabrizio Frasca, Prof. Ran El-Yaniv, and Dr. Yftah Ziser of the University of Groningen and NVIDIA. Their findings were presented in three papers accepted to leading machine learning conferences: NeurIPS 2025, AAAI 2026, and ICLR 2026.
Rather than attempting to fully interpret the complex internal mechanisms of large language models, the researchers developed a practical and computationally efficient framework that analyzes the models’ internal computations. By training lightweight machine-learning systems on these internal signals, the method can uncover hidden information that reveals when a model is likely to make mistakes, generate inaccurate content, ignore instructions, or behave unexpectedly.
The researchers demonstrate that it is possible to monitor, diagnose, and predict risks in AI-generated outputs externally and at low cost, enabling users to supervise and control model behavior without access to the model’s training process or a complete understanding of its internal workings.
The work addresses one of the most pressing questions in artificial intelligence: how to determine when an AI system is producing unreliable information. The new methods could support the development of warning mechanisms, quality-assurance tools, and safety standards for AI systems deployed in high-stakes fields such as medicine, education, scientific research, and regulation.
The studies are part of a broader research program in Dr. Maron’s laboratory exploring how information embedded in trained models—including their weights and training signals—can be used to improve the safety, reliability, and interpretability of artificial intelligence systems.
- Meeting
- The 40th Annual AAAI Conference on Artificial Intelligence