Feature Story | 18-Jun-2026

How AI helps World Cup referees make the call

Computer vision won’t replace referees at the World Cup. But it can help them make better calls when every inch matters.

University of Rochester

More than 1.5 billion people worldwide are expected to watch the 2026 World Cup finals. With that many fans scrutinizing every pass, touch, and goal, FIFA is leaning on advanced computer vision technology to help referees make faster, more accurate calls on the way to crowning this year’s victors.

This year, the tournament’s officiating toolkit includes Sony’s Hawk-Eye technology, which supports video assistant referees (VAR), goal-line technology, advanced semi-automated offside technology, and a “last touch” feature for corner and goal kicks.

“It’s a very sophisticated system that glues together multiple computer vision techniques,” says Chenliang Xu, an associate professor of computer science at the University of Rochester and an expert in computer vision. “You have calibrated cameras, real-time vision models to detect the ball, players, and their poses, as well as a decision layer to identify when some sort of intervention needs to happen.”

For players and fans alike, the result may be shorter waits for close calls.

FIFA first deployed Sony’s Hawk-Eye ball-tracking technology in 2012 at the Club World Cup. At the 2022 World Cup, FIFA introduced semi-automated offside technology, which combines limb- and ball-tracking data with artificial intelligence to provide referees and video match officials with information in mere seconds to inform offside decisions.

How does computer vision track players and the ball?

Player- and ball-tracking systems rely on dedicated computer vision neural networks trained on millions of annotated images and videos.

“Training a computer-vision algorithm to detect a human pose is like teaching a child how to recognize things—you feed it different examples,” says Xu. By taking in a massive collection of examples, the deep neural networks learn to locate players, their body parts, and the ball during a match. Beyond recognizing players and the ball in individual frames, these systems continuously track them over time and across multiple camera views, which is critical for determining offside positions and identifying who touched the ball last.

During this year’s World Cup matches, sixteen optical tracking cameras are positioned around each stadium—feeding those tracking systems with live data during games.

Why so many cameras? A single camera view can be blocked or misleading. Multiple cameras enable the triangulation of the ball, players, and boundaries to create precise reconstructions in three dimensions. Those 3D reconstructions are generated in seconds and then provided to officials who make the final call.

“Just like with humans, if you block one of your eyes, it’s very hard to perceive depth,” says Xu. “But when you have both of your eyes open, you can actually fill out the depth and 3D location of the object you’re looking at.”

How can AI refereeing tools work so quickly?

FIFA estimates that the tracking cameras provide more than 150 million tracking data points per match. That’s a lot of data to manage. So, the speed comes from specialization.

“When FIFA deploys these deep neural networks, they only need them to work well in very particular scenarios,” says Xu. “You don’t necessarily need your algorithm to recognize a bird, fans, or anything else unrelated to the match; you just need them to recognize the players.”

That narrower focus helps the system process a still massive stream of match data quickly. A model may begin as a large neural network trained on many kinds of images, according to Xu. Then, it gets refined and scaled back for the specific problems it needs to solve on the pitch.

Xu says these applications would have been hard to imagine just a decade or so ago. Two advances made the systems of today possible: deep neural networks and graphics processing units (GPUs).

The deep neural networks—machine learning systems inspired by the human brain—that have emerged in recent years dramatically improved performance on visual recognition and tracking tasks compared with many earlier approaches. These networks excel at taking vast amounts of unstructured data and identifying complex relationships with little human intervention.

“Neural networks have changed the whole paradigm since it’s no longer necessary to have manually designed features that we need to train the system to look for,” says Xu. “You input the image and the system automatically learns the visual representations needed for the task.”

Meanwhile, the capabilities of GPUs—the electronic circuits specifically designed to process and generate videos, images, and 3D graphics—jumped significantly in the 2010s, making today’s large-scale AI systems possible.

“The computing power has gotten so much better, so we can train those large neural networks with tons of data that we couldn’t imagine maybe 10 or 15 years ago,” says Xu.

Where else is this technology used?

While similar systems are used for measuring first downs in NFL gamesline-calling at the US Open, and making goaltending calls in the NBA, Xu says the technology has applications outside of sports as well.

“This is very similar to the technology that you deploy in self-driving cars,” says Xu. “Those systems need to figure out the vehicle’s environment, detect different traffic participants and track them over time, and have a decision system built inside to choose whether to accelerate, apply the brakes, or change lanes.”

Xu thinks the underlying computer vision technology could be used for security, surveillance, and other settings where cameras need to follow activity across a complex physical space.

“If you want a smart system that tracks people’s activity on a property that contains multiple buildings—indoors and outdoors—and you have cameras deployed in different locations throughout the property, you can see the parallels,” says Xu. “Just like in a soccer match, you could use these systems for person detection and tracking and perhaps reviewing a 3D reconstruction of the property.”

Even as the technology behind the World Cup becomes faster and more sophisticated, Xu says the human element remains at the heart of the game. Computer vision can help officials determine whether a player’s toe drifted offside or who touched the ball last. But at least for now, it can’t predict the brilliance of a last-minute goal, the agony of a missed penalty kick, or the collective joy and heartbreak that keep billions of fans watching until the final whistle.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.