New AI model reads DNA sequences to reconstruct ancestry
Peer-Reviewed Publication
This month, we’re focusing on artificial intelligence (AI), a topic that continues to capture attention everywhere. Here, you’ll find the latest research news, insights, and discoveries shaping how AI is being developed and used across the world.
Updates every hour. Last Updated: 14-Jun-2026 06:16 ET (14-Jun-2026 10:16 GMT/UTC)
Large language models like ChatGPT are huge. Letting many people train them together without sharing users’ private data—an approach called federated learning—is slow and inefficient. To collaborate, the models must share their updated versions of the entire data all the time—and that’s a huge amount of information to exchange. This approach uses a lot of network bandwidth memory and is energy intensive. As a result, models can’t be synchronized as often as necessary, resulting in outdated versions. Now, a new study designs an algorithm that improves AI data sharing, boosts performance and reduces power consumption.
* The project will use environmental DNA (eDNA)—traces of genetic material left behind by organisms—to understand the health of streams throughout California and assess impacts from land use and climate change.
* This work will combine advanced machine learning and geospatial data with the on-the-ground efforts of hundreds of volunteer community scientists.
* The project will launch an open-source, low-cost cloud platform so that Indigenous tribes, land managers, watershed groups, and local agencies have the tools to evaluate stream conditions and monitor biodiversity.
Insilico Medicine announced that its research paper, “When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs,” has been accepted for presentation at the International Conference on Machine Learning 2026. The study challenges conventional retrosynthesis benchmarking approaches that rely on single “ground-truth” answers and Top-K accuracy metrics, which may not reflect the multi-solution nature of real-world chemistry.
The paper introduces ChemCensor, a chemistry-aware evaluation metric designed to assess model performance based on reaction centers and functional groups, aligning more closely with expert human reasoning. Additional contributions include the CREED dataset, comprising 6.4 million validated reactions; benchmarking results from the C3LM model; and the URSA-expert-2026 dataset, an expert-annotated benchmark designed to reduce data leakage and improve evaluation rigor.
The research supports the development of more realistic and scalable training and evaluation frameworks for AI-driven retrosynthesis and drug discovery. Supporting materials will be made publicly available to promote transparency and reproducibility.