image: Gemini uses three images per candidate—New (the latest science frame showing the putative transient), Reference (an earlier or stacked template of the same part of the sky), and Difference (New minus Reference, isolating any transient signal). From each triplet, Gemini returns three outputs: (1) a real/bogus classification (astrophysical source vs artefact), (2) a concise text explanation describing salient image features and the reasoning behind the decision, and (3) an interest score indicating follow-up prioritisation for rapid flagging to astronomers. Credit: Stoppa & Bulmus et al., Nature Astronomy (2025).
Credit: Stoppa & Bulmus et al., Nature Astronomy (2025).
A new study co-led by the University of Oxford and Google Cloud has shown how general-purpose AI can accurately classify real changes in the night sky — such as an exploding star, a black hole tearing apart a passing star, a fast-moving asteroid, or a brief stellar flare from a compact star system — and explain its reasoning, without the need for complex training.
Published today (8 October) in Nature Astronomy, the study by researchers from the University of Oxford, Google Cloud, and Radboud University demonstrates that a general-purpose large language model (LLM) — Google’s Gemini — can be transformed into an expert astronomy assistant with minimal guidance.
Using just 15 example images and a simple set of instructions, Gemini learned to distinguish real cosmic events from imaging artefacts with approximately 93% accuracy. Crucially, the AI also provided a plain-English explanation for every classification — an important step towards making AI-driven science more transparent and trustworthy, and towards building accessible tools that don’t require massive training datasets or deep expertise in AI programming.
“It’s striking that a handful of examples and clear text instructions can deliver such accuracy,” said Dr Fiorenzo Stoppa, co-lead author from the University of Oxford’s Department of Physics. “This makes it possible for a broad range of scientists to develop their own classifiers without deep expertise in training neural networks — only the will to create one.”
"As someone without formal astronomy training, this research is incredibly exciting.” said Turan Bulmus, co-lead author from Google Cloud. “It demonstrates how general-purpose LLMs can democratise scientific discovery, empowering anyone with curiosity to contribute meaningfully to fields they might not have a traditional background in. It's a testament to the power of accessible AI to break down barriers in scientific research."
Rare Signals in a Universe of Noise
Modern telescopes scan the sky relentlessly, generating millions of alerts every night about potential changes. While some of these are genuine discoveries like exploding stars, the vast majority are ‘bogus’ signals caused by satellite trails, cosmic ray hits, or other instrumental artefacts.
Traditionally, astronomers have relied on specialised machine learning models to filter this data. However, these systems often operate like a ‘black box,’ providing a simple ‘real’ or ‘bogus’ label without explaining their logic. This forces scientists to either blindly trust the output or spend countless hours manually verifying thousands of candidates — a task that will become impossible with the next generation of telescopes such as the Vera C. Rubin Observatory that will output around 20 terabytes of data every 24 hours.
The research team asked a simple question: could a general-purpose, multimodal AI like Gemini, designed to understand text and images together, not only match the accuracy of specialised models but also explain what it sees?
The team provided the LLM with just 15 labelled examples for each of three major sky surveys (ATLAS, MeerLICHT, and Pan-STARRS). Each example included a small image of a new alert, a reference image of the same patch of sky, and a "difference" image highlighting the change, along with a brief expert note. Guided only by these few-shot examples and concise instructions, the model then classified thousands of new alerts, providing a label (real/bogus), a priority score, and a short, readable description of its decision.
A Human in the Loop: An AI That Knows When to Ask for Help
A key component of the study was verifying the quality and usefulness of the AI’s explanations. The team assembled a panel of 12 astronomers to review the AI's descriptions, who rated them as highly coherent and useful.
Moreover, in a parallel test, the team had Gemini review its own answers and assign a ‘coherence score’ to each one. They discovered that the model’s confidence was a powerful indicator of its accuracy: low-coherence outputs were much more likely to be incorrect. This self-assessment capability is critical for building a reliable ‘human-in-the-loop’ workflow. By automatically flagging its own uncertain cases for human review, the system can focus astronomers' attention where it is most needed. Using this self-correction loop to refine the initial examples, the team improved the model's performance on one dataset from ~93.4% to ~96.7%, demonstrating how the system can learn and improve in partnership with human experts.
Co-author Professor Stephen Smartt (Department of Physics, University of Oxford) said: “I’ve worked on this problem of rapidly processing data from sky surveys for over 10 years, and we are constantly plagued by weeding out the real events from the bogus signals in the data processing. We have spent years training machine learning models, neural networks, to do image recognition. However the LLM’s accuracy at recognising sources with minimal guidance rather than task-specific training was remarkable. If we can engineer to scale this up, it could be a total game changer for the field, another example of AI enabling scientific discovery.”
The team envisions this technology as the foundation for autonomous ‘agentic assistants’ in science. Such systems could do far more than classify a single image; they could integrate multiple data sources (like images and brightness measurements), check their own confidence, autonomously request follow-up observations from robotic telescopes, and escalate only the most promising and unusual discoveries to human scientists.
Because the method requires only a small set of examples and plain-language instructions, it can be rapidly adapted for new scientific instruments, surveys, and research goals across different fields.
"We are entering an era where scientific discovery is accelerated not by black-box algorithms, but by transparent AI partners," said Turan Bulmus, co-lead author from Google Cloud. "This work shows a path towards systems that learn with us, explain their reasoning, and empower researchers in any field to focus on what matters most: asking the next great question."
Notes to editors:
For media enquiries and interview requests, contact
Fiorenzo Stoppa — fiorenzo.stoppa@physics.ox.ac.uk
Turan Bulmus – turanbulmus@google.com
The study ‘Textual interpretation of transient image classifications from large language models', will be published in Nature Astronomy at 10:00 BST / 05:00 ET on Wednesday 08 October 2025 at https://www.nature.com/articles/s41550-025-02670-z To view a copy of the study before this under embargo, contact fiorenzo.stoppa@physics.ox.ac.uk
About the University of Oxford
Oxford University has been placed number 1 in the Times Higher Education World University Rankings for the ninth year running, and number 3 in the QS World Rankings 2024. At the heart of this success are the twin-pillars of our ground-breaking research and innovation and our distinctive educational offer.
Oxford is world-famous for research and teaching excellence and home to some of the most talented people from across the globe. Our work helps the lives of millions, solving real-world problems through a huge network of partnerships and collaborations. The breadth and interdisciplinary nature of our research alongside our personalised approach to teaching sparks imaginative and inventive insights and solutions.
Through its research commercialisation arm, Oxford University Innovation, Oxford is the highest university patent filer in the UK and is ranked first in the UK for university spinouts, having created more than 300 new companies since 1988. Over a third of these companies have been created in the past five years. The university is a catalyst for prosperity in Oxfordshire and the United Kingdom, contributing around £16.9 billion to the UK economy in 2021/22, and supports more than 90,400 full time jobs.
Journal
Nature Astronomy
Article Title
Textual interpretation of transient image classifications from large language models
Article Publication Date
8-Oct-2025