image: Contrasting alignment paradigms for general-domain versus biomolecular LLMs.
Credit: Qiang Zhang et al. / corresponding authors Haofen Wang and Huajun Chen.
Advances in AI have propelled a new era of molecular sciences, with models capable of predicting protein structures at near-atomic resolution, accelerated ligand design, and opened new routes for materials discovery. This progress has created the impression that computation can compress long cycles of scientific trial and error into much faster loops of design and validation. Nonetheless, despite these impressive progress, significant challenges persist. Many AI-generated molecules violate Yphysical laws, are infeasible for synthesis, or pose safety and regulatory risk, problems that stem from misalignment between AI objectives and real-word scientific and societal requirements.
This gap arises because most models optimize statistical proxies or surrogate scores, neglecting the underlaying physics, thermodynamics, kinetics and safety principles that govern practical feasibility. As a result, these models often produce unusable candidates, leading to costly experimental failures and slower overall progress. Recognizing these challenges, an international group of scientists introduced a comprehensive alignment framework that aims to embed natural laws, scientific goals, and ethical standards directly into AI-driven molecular design processes, transforming AI from a mere predictor into a trustworthy partner aligned with real-world constraints.
According to the researchers, protein sequences that look plausible in silico may violate physics principles, kinetic feasibility or their structure is unsynthesizable.
To describe this gap, the team introduces the concept of misalignment. In their framework, misalignment refers to the divergence between what AI systems are optimized for and what matters in scientific practice. This work identifies three major dimensions of this problem. The first is misalignment with natural laws, where models rely on simplified or static proxies while overlooking dynamics, entropy, kinetics, phase competition, or other governing constraints. The second is misalignment with scientific goals, where optimization focuses on what is easy to compute rather than on what truly determines success, such as catalytic activity, ADMET properties (Absorption, Distribution, Metabolism, Elimination and Toxicity, a fundamental property in the new drugs), manufacturability, durability, or device-level performance. The third is misalignment with research principles, where biosafety, toxicity, dual-use risks, environmental impact, and regulatory compliance are treated as optional downstream checks rather than baseline design requirements.
The paper makes a key distinction between super alignment in general-purpose large language models and the kind of alignment needed for molecular AI. While super alignment concerns broad human values, user intent, and general safety, comprehensive alignment is proposed as a domain-specific framework for biomolecular and chemical research. In this context, success is not measured by fluency or benchmark accuracy alone, but by whether designed molecules can function, persist, be synthesized, and be deployed responsibly.
To bridge the gap, the authors outline several directions for building more trustworthy molecular AI systems. First, they call for feasibility-rich datasets that include not only successful cases but also failed syntheses, unstable folds, toxic compounds, and other negative examples. Second, they argue for constrained or hybrid model architectures that embed physical and mission-relevant constraints directly into the generation process, rather than relying only on post hoc screening. Third, they emphasize the need to rethink evaluation, moving beyond narrow proxies such as RMSD, docking scores, or formation energies toward wider translational criteria that better reflect real experimental and regulatory success. Finally, they advocate a shift from open-loop generation to closed-loop discovery, where failed experiments, robotic assays, and compliance feedback are fed back into model retraining as valuable error signals. The figure on page 9 summarizes this as a transition from reactive filtering to intrinsically aligned workflows.
Rather than treating these failures as proof that AI is unsuitable for molecular discovery, the authors argue that the deeper issue lies in how AI is embedded within broader socio-technical workflows. In other words, the problem is often not the tool itself, but the proxies selected, the constraints deferred, and the validation structures omitted. From this perspective, comprehensive alignment is not merely a model property. It is also a workflow property, shaped by human decisions, institutional safeguards, experimental feedback, and regulatory oversight.
The perspective concludes that the future of molecular AI will be defined not only by predictive accuracy, but by trustworthiness, reproducibility, and societal value. By embedding alignment more deeply into data, models, evaluation, and feedback loops, the authors envision a future in which AI becomes not just a generator of molecular possibilities, but a trustworthy co-pilot for scientific discovery.
The perspective has been recently published in AI For Science, an interdisciplinary and international peer-reviewed gold open access journal committed to publishing high-impact original research, reviews, and perspectives that highlight the transformative applications of AI in driving scientific innovation.
Reference: Qiang Zhang, Xiang Zhuang, Chenyi Zhou, Yihang Zhu, Tong Xu, Tianhao Li, Dianbo Liu, Shengchao Liu, Keyan Ding, Michael Li, Emine Yilmaz, Haofen Wang, Huajun Chen. Beyond accuracy: comprehensive alignment for AI-driven molecular design[J]. AI for Science, 2026, 2(2): 023001. DOI: 10.1088/3050-287X/ae5f37
Article Title
Beyond accuracy: comprehensive alignment for AI-driven molecular design
Article Publication Date
24-Apr-2026