Nature Biomedical Engineering

DSWizard

Reliable Biomedical Data Analysis Agent

DSWizard (Data Science Wizard) is a two-phase agent designed for reliable biomedical data analysis. It operates by first creating a detailed analysis plan in natural language, then converting that plan into executable Python code with automatic error handling and iterative refinement.

Github Publication

Two-Phase Architecture

Phase 1: Planning

•Explores available datasets and schemas
•Creates structured, step-by-step analysis plans
•Identifies quality control steps
•Iterates until plan is complete and unambiguous

Phase 2: Implementation

•Reviews plan and checks feasibility
•Generates complete Python code
•Executes in sandboxed Docker environment
•Returns results with artifacts and reports

Key Features

Sandboxed Execution

Safe, isolated code execution in Docker containers with resource monitoring.

Multi-LLM Support

Works with OpenAI, Azure OpenAI, and Anthropic Claude models.

PDF Reports

Generates professional reports with embedded visualizations and code.

Benchmark Results

DSWizard consistently outperforms other methods across biomedical data analysis tasks.

90%

Easy (30 tasks)

76%

Medium (38 tasks)

48%

Hard (60 tasks)

Easy Tasks

DSWizard

90%

ManualPrompt

63%

PlanPrompt

63%

Few-shot

60%

RAG

60%

AutoPrompt

50%

CoderAgent

50%

Medium Tasks

DSWizard

76%

PlanPrompt

58%

CoderAgent

45%

ManualPrompt

37%

RAG

34%

AutoPrompt

32%

Few-shot

29%

Hard Tasks

DSWizard

55%

PlanPrompt

48%

Few-shot

13%

ManualPrompt

13%

CoderAgent

13%

RAG

10%

AutoPrompt

DSWizard achieves state-of-the-art Pass@1 on Easy and Medium tasks, with +27% and +18% improvements respectively.

Benchmark Datasets

Three comprehensive benchmarks for evaluating biomedical data science agents.

BioDSA-1K: Hypothesis Validation

1,029 real biomedical hypothesis tasks from published studies

Example Task

Validate whether TP53 mutation status correlates with overall survival in breast cancer patients using the provided clinical and genomic data.

Task Categories

Survival AnalysisGene ExpressionClinical FeaturesEnrichment Analysis

39 cBioPortal studies1,029 tasks

HuggingFace

BioDSBench Dataset Overview

Comprehensive coverage of study types, analysis categories, and programming packages across 39 cBioPortal studies.

Studies

293

Analyses

Study Types

Analysis Types

Study Types

38 studies

Biomarkers

Integrative

Molecular

Genomics

Therapeutic

Translational

Pan-cancer

Analysis Types

498 analyses

Descriptive stats

134

Gene expression

125

Survival analysis

Data integration

Enrichment/pathway

Clinical features

Genomic alteration

Treatment response

Python Packages

14 studies · 128 analyses

pandas

42.2%

matplotlib

25.6%

lifelines

15.7%

seaborn

4.5%

Others

12.1%

R Packages

25 studies · 165 analyses

ggplot2

16.3%

ggrepel

14%

tidyr

12.7%

dplyr

12.7%

clusterProfiler

10%

org.Hs.eg.db

10%

pheatmap

5.9%

Others

18.6%

Quick Start

python

from biodsa.agents import DSWizardAgent

# Initialize the agent
agent = DSWizardAgent(
    model_name="gpt-5",
    api_type="openai",
    api_key=os.environ.get("OPENAI_API_KEY")
)

# Register a dataset for analysis
agent.register_workspace("./biomedical_data/cBioPortal/datasets/")

# Execute a data science task
results = agent.go(
    "Cluster patients based on genomic mutations to maximize "
    "separation of prognostic survival outcomes."
)

# Generate PDF report
results.to_pdf(output_dir="reports")

Citation

@article{wang2026reliable,
  title     = {Making large language models reliable data science programming copilots for biomedical research},
  author    = {Wang, Zifeng and Danek, Benjamin and Yang, Ziwei and Chen, Zheng and Sun, Jimeng},
  journal   = {Nature Biomedical Engineering},
  year      = {2026},
  doi       = {10.1038/s41551-025-01587-2},
}