Subhasis Dasgupta, Ph.D.

Building the Data and Systems Foundations for Scientific AI

Subhasis Dasgupta is a database-systems researcher and scientific-platform architect at the San Diego Supercomputer Center and UC San Diego. He designs trusted infrastructure that connects heterogeneous data, high-performance computing, knowledge systems, scientific workflows, and AI agents. Earlier in his career, he was the first employee of the U.S. cloud startup Kaavo, where he helped build its multi-cloud management platform and establish its India engineering operation.

Research question: How can intelligent systems plan and operate reliably across databases, APIs, models, computing resources, and scientific instruments while preserving provenance, performance, security, and human oversight?

Portrait of Subhasis Dasgupta

Trusted across institutions and systems

  • UC San Diego
  • San Diego Supercomputer Center
  • Kaavo
  • Database Systems
  • Scientific Platforms
  • AI-Enabled Discovery

Signature framework

The Scientific Intelligence Stack

Reliable scientific AI starts with query planning, provenance, execution control, and infrastructure that can span data, models, workflows, and instruments.

Databases for Agentic AI

Planning, execution, metadata, state, and provenance matter as much in agents as they do in database systems.

Scientific AI Infrastructure

Heterogeneous data, HPC, workflows, and knowledge systems need to cooperate without hiding system behavior.

Architecture and Technical Leadership

Complex systems succeed when the technical story is legible to researchers, operators, and collaborators.

The Scientific Intelligence Stack A layered architecture showing data, databases, execution, knowledge, and scientific discovery with cross-cutting trust, security, observability, governance, and reproducibility. The Scientific Intelligence Stack Scientific AI needs a stack that can plan, execute, explain, and govern across data, models, and compute. Scientific and Operational Data Sensors, metadata, APIs, files, publications, instruments Database and Query Systems Query planning, search, polystores, streaming, graph processing Distributed and HPC Execution Kubernetes, HPC, GPU workflows, scientific workflows, monitoring Knowledge, Models, and Agents Graphs, ontologies, LLMs, retrieval, planning, and evaluation The stack is designed for humans and agents. The system has to keep provenance visible. Execution must be observable and tunable. Discovery should remain grounded in the source data. Cross-cutting concerns: trust, security, observability, governance, reproducibility

Flagship work

Representative projects and contributions

See the full work page

San Diego Supercomputer Center / UC San Diego

AWESOME Polystore

A research and platform effort focused on making heterogeneous databases work as one system for scientific and operational workloads.

  • Enabled published work on polystore ingestion and query planning.
  • Connected to patentable ingestion and query-processing methods.

San Diego Supercomputer Center / National Data Platform

National Data Platform Search

A search and discovery layer for scientific data platforms that emphasizes metadata, semantic access, and community-scale usability.

  • Connected distributed scientific catalogs to a reusable discovery layer.
  • Aligned with the National Data Platform's AI-ready data and workspace model.

UCSF Osher Center for Integrative Health / UC San Diego

TemPredict

A system focused on ingesting and organizing multimodal health data for analysis, collaboration, and research workflows.

  • Enabled a published conference paper and a large-scale wearable-sensing study.
  • Connected platform work to symptom-prediction and monitoring workflows.

Research agenda

The questions that keep repeating

How should agents plan across databases, APIs, models, and scientific tools?
Can query optimizers reason about subjective and open-world predicates?
How should scientific digital twins be registered, discovered, executed, evaluated, and archived?
Can database instrumentation generate actionable system recommendations?

Research focus

The work centers on scientific data systems that preserve meaning, manage execution cost, and let both humans and agents make reliable decisions.

  • Query planning and optimization across heterogeneous systems
  • Provenance, governance, and reproducibility in scientific platforms
  • Distributed execution, HPC, and workflow orchestration
  • Knowledge graphs, retrieval, and AI-assisted discovery

Publications and patents

Selected research record

Read the research page

journal

Sex Differences in the Variability of Physical Activity Measurements Across Multiple Timescales Recorded by a Wearable Device: Observational Retrospective Cohort Study

K. J. Varner, L. K. Keeler Bruce, S. Soltani, W. Hartogensis, S. Dilchert, F. M. Hecht, A. Chowdhary, L. Pandya, S. Dasgupta, I. Altintas, A. Gupta · Journal of Medical Internet Research · 2025

  • Wearable Data
  • Biomedical Data
  • Scientific Platforms

journal

Biometrics of complete human pregnancy recorded by wearable devices

L. K. Keeler Bruce, D. Gonzlez, S. Dasgupta, B. L. Smarr · NPJ Digital Medicine · 2024

  • Wearable Data
  • Biomedical Data
  • Scientific Platforms

journal

General feature selection technique supporting sex-debiasing in chronic illness algorithms validated using wearable device data

J. H. Burks, L. K. Bruce, P. Kasl, S. Soltani, V. Viswanath, W. Hartogensis, S. Dilchert, F. M. Hecht, S. Dasgupta, I. Altintas, A. Gupta · npj Women's Health · 2024

  • Wearable Data
  • Biomedical Data
  • Scientific Platforms

journal

Utilizing Wearable Device Data for Syndromic Surveillance: A Fever Detection Approach

P. Kasl, L. K. Keeler Bruce, W. Hartogensis, S. Dasgupta, L. S. Pandya, S. Dilchert, F. M. Hecht, A. Gupta, I. Altintas, A. E. Mason, B. L. Smarr · Sensors · 2024

  • Wearable Data
  • Biomedical Data
  • Scientific Platforms

conference

P2KG: Declarative Construction and Quality Evaluation of Knowledge Graphs from Polystores

X. Zheng, S. Dasgupta, A. Gupta · ADBIS · 2023

Declarative construction and quality evaluation of knowledge graphs from polystore data.

  • Knowledge Graphs
  • Polystores
  • Scientific Data

journal

Detection of COVID-19 using multimodal data from a wearable device: results from the first TemPredict Study

A. E. Mason, F. M. Hecht, S. K. Davis, J. L. Natale, W. Hartogensis, N. Damaso, K. T. Claypool, S. Dilchert, S. Dasgupta, S. Purawat, V. K. Viswanath · Scientific Reports · 2022

  • Wearable Data
  • Biomedical Data
  • Scientific Platforms

Patent

Data Ingestion into a Polystore

U.S. patent application US201762594408P.

Patent

Query Processing in a Polystore

U.S. patent application US20220083552P.

Speaking and workshops

Talks, panels, and collaboration

The speaking page will expand as verified talks, panels, and workshops are added.

Research collaborationAcademic partnershipStartup technology discussionInvited talk or panelTechnical workshopOpen-source collaboration

Speaking records will appear here once verified entries are added.

Leadership story

From first employee to platform builder

Subhasis Dasgupta, Ph.D., is a database-systems researcher and scientific-platform architect at the San Diego Supercomputer Center and UC San Diego. His work spans query processing, heterogeneous databases, distributed systems, scientific workflows, HPC, knowledge graphs, and AI-enabled data discovery. Earlier in his career, he was the first employee of the U.S. cloud startup Kaavo, where he helped build its multi-cloud management platform and establish its India engineering operation.

What I bring

  • Research depth in database systems and scientific data infrastructure
  • Platform experience across HPC, cloud, and workflow environments
  • Technical leadership that connects architecture to execution
  • Collaboration with scientists, engineers, and institutional teams

Connect

Let’s build the systems layer for scientific AI.

I welcome conversations with researchers, founders, engineers, and institutions working on databases, scientific data platforms, distributed systems, and AI-enabled infrastructure.