Article Highlight | 12-Nov-2025

A*STAR scientists develop method to decode new DNA ‘letters’ that could transform medicine and biotechnology

Agency for Science, Technology and Research (A*STAR), Singapore

A research team led by the A*STAR Genome Institute of Singapore (A*STAR GIS) have developed a method to accurately and efficiently read DNA containing non-standard bases — a task once thought too complex for conventional DNA sequencers. Their work, published in Nature Communications, combines nanopore sequencing with artificial intelligence (AI) to decode these extra “letters” at high speed and accuracy.

 

The Challenge: Decoding DNA, Nature’s Hidden Language

 

DNA is nature’s instruction manual, built from four standard “letters” or bases A, T, C, and G. Scientists have long imagined expanding this genetic alphabet by adding new “letters”, known as non-canonical bases (NCBs). These NCBs can occur naturally in some viruses or be created in the lab, and they have the potential to unlock new ways of designing molecules, materials, and biological systems.

 

However, DNA sequencing machines were built to recognise only the standard four bases. Because they struggle to detect or decode new ones, scientists have been unable to fully harness their potential to develop more precise medicines, engineer artificial genomes for sustainable chemical production, and design programmable materials and nanoscale devices for future technologies.

 

“Our ability to quickly read a piece of text relies largely on how familiar we are with the vocabulary used,” said Dr Mauricio Lisboa Perez, Scientist at A*STAR GIS and first author of the study. “Similarly, for an AI model to ‘speed-read’ DNA, it must have seen enough examples of every base. Non-canonical bases are rare and harder to produce, so we had to design creative ways to generate sufficient examples for our AI model to learn from.”

 

The Solution: Using AI to Translate DNA with Non-Standard Bases

 

The team created a large library of artificial DNA containing both standard and non-standard bases in different combinations, then used nanopore sequencing to record the unique electrical signals produced as each base passed through microscopic pores. Because the data were often noisy and incomplete, the researchers developed an AI-driven approach that could learn and improve iteratively, refining its predictions over time. They also enhanced the AI model’s learning by creatively rearranging existing signal data to represent more combinations. This adaptive method enabled the AI to accurately recognise each base’s distinct pattern, allowing the sequencer to read new DNA “letters” directly.

 

While other research groups have explored similar challenges, this study is the first to demonstrate a DNA sequencer that can directly and reliably identify non-standard bases at scale using AI, overcoming key bottlenecks such as limited training data.

 

“Being able to accurately identify these new bases at scale gives us a much richer vocabulary for writing and reading biological information,” said Dr Niranjan Nagarajan, Associate Director, AI & Compute at A*STAR GIS and senior author of the study. “It’s like learning to recognise new letters, which allows us to understand many more words and meanings in the language of life.”

 

Transformative Potential of the Method

 

This breakthrough could drive innovation across multiple fields:

Healthcare and Therapeutics: Accurately reading and analysing non-standard bases removes a major bottleneck in developing DNA- and RNA-based treatments, paving the way for new drugs and diagnostics.

Advanced Materials and Biotechnology: Non-standard bases could serve as new building blocks for nanostructures and nanorobots, leading to breakthroughs in medicine, manufacturing, and sustainable chemical production.

Data and Information Storage: Encoding information using expanded DNA alphabets could make data storage more affordable and energy-efficient, potentially reducing the environmental footprint of data centres.

 

The researchers plan to extend their work to discover more non-standard bases in viruses and enhance the AI model’s ability to detect them.

 

“We are excited about this new DNA sequencing method and the possibilities it brings,” said Dr Wan Yue, Executive Director at A*STAR GIS. “Working with an expanded DNA alphabet will create more opportunities for scientists to develop new therapeutics, novel organisms that produce chemicals environmentally, and new programmable materials for nanostructures and nanorobots. These innovations can advance scientific discovery, create economic value, and ultimately improve lives.

 

###

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.