Databases for Agentic AI
Planning, execution, metadata, state, and provenance matter as much in agents as they do in database systems.
Subhasis Dasgupta, Ph.D.
Subhasis Dasgupta is a database-systems researcher and scientific-platform architect at the San Diego Supercomputer Center and UC San Diego. He designs trusted infrastructure that connects heterogeneous data, high-performance computing, knowledge systems, scientific workflows, and AI agents. Earlier in his career, he was the first employee of the U.S. cloud startup Kaavo, where he helped build its multi-cloud management platform and establish its India engineering operation.
Research question: How can intelligent systems plan and operate reliably across databases, APIs, models, computing resources, and scientific instruments while preserving provenance, performance, security, and human oversight?
Trusted across institutions and systems
Signature framework
Reliable scientific AI starts with query planning, provenance, execution control, and infrastructure that can span data, models, workflows, and instruments.
Databases for Agentic AI
Planning, execution, metadata, state, and provenance matter as much in agents as they do in database systems.
Scientific AI Infrastructure
Heterogeneous data, HPC, workflows, and knowledge systems need to cooperate without hiding system behavior.
Architecture and Technical Leadership
Complex systems succeed when the technical story is legible to researchers, operators, and collaborators.
Flagship work
Project
Heterogeneous database integration and query processing for relational, graph, text, and analytical systems.
Project
Federated scientific-data discovery with metadata, ontology-driven search, and national-scale APIs.
Project
Wearable and multimodal data infrastructure for predictive-health research and clinical support.
San Diego Supercomputer Center / UC San Diego
A research and platform effort focused on making heterogeneous databases work as one system for scientific and operational workloads.
San Diego Supercomputer Center / National Data Platform
A search and discovery layer for scientific data platforms that emphasizes metadata, semantic access, and community-scale usability.
UCSF Osher Center for Integrative Health / UC San Diego
A system focused on ingesting and organizing multimodal health data for analysis, collaboration, and research workflows.
Research agenda
How should agents plan across databases, APIs, models, and scientific tools?
Can query optimizers reason about subjective and open-world predicates?
How should scientific digital twins be registered, discovered, executed, evaluated, and archived?
Can database instrumentation generate actionable system recommendations?
Research focus
The work centers on scientific data systems that preserve meaning, manage execution cost, and let both humans and agents make reliable decisions.
Publications and patents
journal
journal
journal
journal
conference
Declarative construction and quality evaluation of knowledge graphs from polystore data.
journal
Patent
Patent
Blogs
Blogs
Building before requirements stabilize, and how platform decisions shape long-term engineering.
Blogs
Scientific questions require multiple tools, validation steps, and provenance-aware execution.
Blogs
Why planning, provenance, optimization, and data semantics matter for reliable agent systems.
Speaking and workshops
The speaking page will expand as verified talks, panels, and workshops are added.
Leadership story
Subhasis Dasgupta, Ph.D., is a database-systems researcher and scientific-platform architect at the San Diego Supercomputer Center and UC San Diego. His work spans query processing, heterogeneous databases, distributed systems, scientific workflows, HPC, knowledge graphs, and AI-enabled data discovery. Earlier in his career, he was the first employee of the U.S. cloud startup Kaavo, where he helped build its multi-cloud management platform and establish its India engineering operation.
What I bring
Connect
I welcome conversations with researchers, founders, engineers, and institutions working on databases, scientific data platforms, distributed systems, and AI-enabled infrastructure.