Article
Why Scientific AI Needs Query Planning, Not Just Retrieval
Scientific questions require multiple tools, validation steps, and provenance-aware execution.
Retrieval is an important operator in scientific AI, but it is only one operator.
Many scientific questions require multiple systems to participate. A question may begin with retrieval from a literature index, continue through a database query, call an analysis tool, compare results against a knowledge graph, and finish by archiving a reproducible trace. That is not a retrieval problem alone. It is a planning problem.
The systems challenge is that each step has constraints. Some data sources are slow. Some are protected. Some tools are stochastic. Some require preconditions before execution. The planner has to reason about those constraints and about the quality of the path it chooses.
Query processors already know how to choose plans under cost and selectivity uncertainty. Scientific AI can borrow that logic, while adding provenance and validation as first-class plan outputs. A good plan should say not just what to run, but why it ran, what evidence it used, and what state it changed.
That perspective becomes especially important when a workflow spans people as well as machines. If a scientist, data steward, or engineer needs to inspect the path later, the system must explain which sources were used, which assumptions were made, and which steps are repeatable.
References
TemPredict, Quantum Data Hub, and P2KG from the verified publication list.
Related projects
- National Data Platform Search
- Quantum Data Hub
Related publications
TemPredict: A Big Data Analytical Platform for Scalable Exploration and Monitoring of Personalized Multimodal Data for COVID-19. Quantum Data Hub: A Collaborative Data and Analysis Platform for Quantum Material Science. P2KG: Declarative Construction and Quality Evaluation of Knowledge Graphs from Polystores.
I am interested in exchanging ideas with researchers and engineering teams working on related systems.