image: University of Washington researchers developed the game AI Puzzlers to show kids an area where AI systems still typically and blatantly fail: solving certain reasoning puzzles. In the game, users get a chance to solve puzzles by completing patterns of colored blocks. They can then ask various AI chatbots to solve and have the systems explain their solutions — which they nearly always fail. Here two children in the UW KidsTeam group test the game.
Credit: University of Washington
While the current generation of artificial intelligence chatbots still flub basic facts, the systems answer with such confidence that they’re often more persuasive than humans.
Adults, even those such as lawyers with deep domain knowledge, still regularly fall for this. But spotting errors in text is especially difficult for children, since they often don’t have the contextual knowledge to sniff out falsehoods.
University of Washington researchers developed the game AI Puzzlers to show kids an area where AI systems still typically and blatantly fail: solving certain reasoning puzzles. In the game, users get a chance to solve ‘ARC’ puzzles (short for Abstraction and Reasoning Corpus) by completing patterns of colored blocks. They can then ask various AI chatbots to solve the puzzles and have the systems explain their solutions — which they nearly always fail to do accurately. The team tested the game with two groups of kids. They found the kids learned to think critically about AI responses and discovered ways to nudge the systems toward better answers.
Researchers presented their findings June 25 at the Interaction Design and Children 2025 conference in Reykjavik, Iceland.
“Kids naturally loved ARC puzzles and they’re not specific to any language or culture,” said lead author Aayushi Dangol, a UW doctoral student in human centered design and engineering. “Because the puzzles rely solely on visual pattern recognition, even kids that can’t read yet can play and learn. They get a lot of satisfaction in being able to solve the puzzles, and then in seeing AI — which they might consider super smart — fail at the puzzles that they thought were easy.”
ARC puzzles were developed in 2019 to be difficult for computers but easy for humans because they demand abstraction: being able to look at a few examples of a pattern, then apply it to a new example. Current cutting-edge AI models have improved at ARC puzzles, but they’ve not caught up with humans.
Researchers built AI Puzzlers with 12 ARC puzzles that kids can solve. They can then compare their solutions to those from various AI chatbots; users can pick the model from a drop-down menu. An “Ask AI to Explain” button generates a text explanation of its solution attempt. Even if the system gets the puzzle right, its explanation of how is frequently inaccurate. An “Assist Mode” lets kids try to guide the AI system to a correct solution.
“Initially, kids were giving really broad hints,” Dangol said. “Like, ‘Oh, this pattern is like a doughnut.’ An AI model might not understand that a kid means that there’s a hole in the middle, so then the kid needs to iterate. Maybe they say, ‘A white space surrounded by blue squares.’”
The researchers tested the system at the UW College of Engineering’s Discovery Days last year with over 100 kids from grades 3 to 8. They also led two sessions with the KidsTeam UW, a project that works with a group of kids to collaboratively design technologies. In these sessions, 21 children ages 6-11 played AI Puzzlers and worked with the researchers.
“The kids in KidsTeam are used to giving advice on how to make a piece of technology better,” said co-senior author Jason Yip, a UW associate professor in the Information School and KidsTeam director. “We hadn't really thought about adding the Assist Mode feature, but during these co-design sessions, we were talking with the kids about how we might help AI solve the puzzles and the idea came from that.”
Through the testing, the team found that kids were able to spot errors both in the puzzle solutions and in the text explanations from the AI models. They also recognize differences in how human brains think and how AI systems generate information. “This is the internet’s mind,” one kid said. “It’s trying to solve it based only on the internet, but the human brain is creative.”
The researchers also found that as kids worked in Assist Mode, they learned to use AI as a tool that needs guidance rather than as an answer machine.
“Kids are smart and capable,” said co-senior author Julie Kientz, a UW professor and chair in human centered design and engineering. “We need to give them opportunities to make up their own minds about what AI is and isn't, because they're actually really capable of recognizing it. And they can be bigger skeptics than adults.”
Runhua Zhao and Robert Wolfe, both doctoral students in the Information School, and Trushaa Ramanan, a master’s student in human centered design and engineering, are also co-authors on this paper. This research was funded by The National Science Foundation, the Institute of Education Sciences and the Jacobs Foundation’s CERES Network.
For more information, contact Dangol at adango@uw.edu, Yip at jcyip@uw.edu, and Kientz at jkientz@uw.edu.
Article Title
"AI just keeps guessing": Using ARC Puzzles to Help Children Identify Reasoning Errors in Generative AI
Article Publication Date
25-Jun-2025