image: The workflow of SQL execution
Credit: HIGHER EDUCATON PRESS
Database optimization has long relied on traditional methods that struggle with the complexities of modern data environments. These methods often fail to efficiently handle large-scale data, complex queries, and dynamic workloads, leading to suboptimal performance and increased computational costs. To address these challenges, researchers have turned to AI4DB (Artificial Intelligence for Database), integrating advanced machine learning and deep learning techniques to enhance database optimization.
To provides an in-depth review of AI4DB, a research team led by Shaojie Qiao published their new research on 15 December 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team focuses on four key areas: cardinality/cost estimation, learning-based join order selection, end-to-end query optimizers, and text-to-SQL models. Their research highlights significant advancements in each area:
Cardinality/cost estimation: Cardinality/cost estimation for selecting the optimal execution plan heavily depends on DB optimizers. However, traditional techniques struggle to effectively capture correlations between different columns and tables, resulting in suboptimal estimation results. Currently, enterprise DB cardinality estimators perform well for simple operations, such as single tables. But as data grows drastically and query operations become more complex, cardinality estimation errors can increase significantly, degrading join order selection performance. Regarding cost estimation models, better cardinality estimation leads to better cost estimation. However, cost estimation can be influenced by several factors, such as hardware overhead and cache size, which can increase operation costs. Recently, deep learning has been utilized for estimating cardinality and cost through the capture of correlations across tables. These approaches can yield improved results.
Join order selection: In complex business scenarios, obtaining the optimal query plan is crucial. Traditional DB optimizers use static plan selection, measured by the DB estimator. They apply heuristic algorithms or dynamic programming methods to sample the state space and determine the optimal query plan with the lowest cost. However, static plan selection has a significant drawback: it struggles to find good plans for tables with significant amounts of data. Exploring the vast space of query plans is highly costly, as it pertains to an NP-hard problem. Methods based on deep reinforcement learning can automatically select good plans through trial-and-error and iterative feedback mechanisms.
End-to-end optimizer: Designing an appropriate query optimizer is essential for DB systems. Traditional optimizers do not consider complex DB environments when obtaining the optimal execution plan. However, learning-based optimizers can leverage deep neural networks to optimize SQL queries, ultimately achieving the optimal execution plan.
Text-to-SQL: Text-to-SQL enables researchers to interact with DBs without needing to master SQL. As more people utilize DBs for data security and management, not all users possess professional SQL skills. Text-to-SQL models automatically convert user-entered text into machine-executable SQL queries.
For future research directions, the team suggests:
Developing adaptive models that can dynamically adjust to changing data distributions and query workloads.
Enhancing the robustness of text-to-SQL models to handle real-world variations in natural language input.
Exploring federated learning and transfer learning techniques to improve the scalability and generalizability of AI4DB models.
Integrating AI4DB techniques with emerging technologies like edge computing and the Internet of Things (IoT) to optimize data processing in distributed environments.
These advancements in AI4DB mark a significant step forward in database optimization, promising more efficient, intelligent, and accessible database systems for both experts and non-experts alike.
Journal
Frontiers of Computer Science
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Learning database optimization techniques: the state-of-the-art and prospects
Article Publication Date
15-Dec-2025