Nature Biomedical Engineering

DSWizard

Reliable Biomedical Data Analysis Agent

DSWizard (Data Science Wizard) is a two-phase agent designed for reliable biomedical data analysis. It operates by first creating a detailed analysis plan in natural language, then converting that plan into executable Python code with automatic error handling and iterative refinement.

Two-Phase Architecture

Phase 1: Planning

  • Explores available datasets and schemas
  • Creates structured, step-by-step analysis plans
  • Identifies quality control steps
  • Iterates until plan is complete and unambiguous

Phase 2: Implementation

  • Reviews plan and checks feasibility
  • Generates complete Python code
  • Executes in sandboxed Docker environment
  • Returns results with artifacts and reports

Key Features

Sandboxed Execution

Safe, isolated code execution in Docker containers with resource monitoring.

Multi-LLM Support

Works with OpenAI, Azure OpenAI, and Anthropic Claude models.

PDF Reports

Generates professional reports with embedded visualizations and code.

Benchmark Results

DSWizard consistently outperforms other methods across biomedical data analysis tasks.

90%
Easy (30 tasks)
76%
Medium (38 tasks)
48%
Hard (60 tasks)

Easy Tasks

DSWizard
90%
ManualPrompt
63%
PlanPrompt
63%
Few-shot
60%
RAG
60%
AutoPrompt
50%
CoderAgent
50%

Medium Tasks

DSWizard
76%
PlanPrompt
58%
CoderAgent
45%
ManualPrompt
37%
RAG
34%
AutoPrompt
32%
Few-shot
29%

Hard Tasks

DSWizard
55%
PlanPrompt
48%
Few-shot
13%
ManualPrompt
13%
CoderAgent
13%
RAG
10%
AutoPrompt
3%

DSWizard achieves state-of-the-art Pass@1 on Easy and Medium tasks, with +27% and +18% improvements respectively.

Benchmark Datasets

Three comprehensive benchmarks for evaluating biomedical data science agents.

BioDSA-1K: Hypothesis Validation

1,029 real biomedical hypothesis tasks from published studies

Example Task
Validate whether TP53 mutation status correlates with overall survival in breast cancer patients using the provided clinical and genomic data.
Task Categories
Survival AnalysisGene ExpressionClinical FeaturesEnrichment Analysis
39 cBioPortal studies1,029 tasks
HuggingFace

BioDSBench Dataset Overview

Comprehensive coverage of study types, analysis categories, and programming packages across 39 cBioPortal studies.

39
Studies
293
Analyses
7
Study Types
8
Analysis Types

Study Types

38 studies
Biomarkers
12
Integrative
7
Molecular
6
Genomics
5
Therapeutic
4
Translational
2
Pan-cancer
2

Analysis Types

498 analyses
Descriptive stats
134
Gene expression
125
Survival analysis
56
Data integration
50
Enrichment/pathway
46
Clinical features
41
Genomic alteration
33
Treatment response
13

Python Packages

14 studies · 128 analyses
pandas
42.2%
matplotlib
25.6%
lifelines
15.7%
seaborn
4.5%
Others
12.1%

R Packages

25 studies · 165 analyses
ggplot2
16.3%
ggrepel
14%
tidyr
12.7%
dplyr
12.7%
clusterProfiler
10%
org.Hs.eg.db
10%
pheatmap
5.9%
Others
18.6%

Quick Start

python
from biodsa.agents import DSWizardAgent

# Initialize the agent
agent = DSWizardAgent(
    model_name="gpt-5",
    api_type="openai",
    api_key=os.environ.get("OPENAI_API_KEY")
)

# Register a dataset for analysis
agent.register_workspace("./biomedical_data/cBioPortal/datasets/")

# Execute a data science task
results = agent.go(
    "Cluster patients based on genomic mutations to maximize "
    "separation of prognostic survival outcomes."
)

# Generate PDF report
results.to_pdf(output_dir="reports")

Citation

@article{wang2026reliable,
  title     = {Making large language models reliable data science programming copilots for biomedical research},
  author    = {Wang, Zifeng and Danek, Benjamin and Yang, Ziwei and Chen, Zheng and Sun, Jimeng},
  journal   = {Nature Biomedical Engineering},
  year      = {2026},
  doi       = {10.1038/s41551-025-01587-2},
}