arXiv Preprint

DeepEvidence

Deep Knowledge Graph Research Agent

DeepEvidence is a hierarchical multi-agent system for comprehensive biomedical literature research and evidence synthesis. It leverages deep knowledge graph exploration to systematically gather, analyze, and synthesize evidence from multiple biomedical knowledge bases.

Multi-Agent Architecture

DeepEvidence Architecture

Orchestrator Agent

Coordinates research strategy, decides which knowledge bases to explore, and synthesizes findings.

BFRS Agent

Breadth-First ReSearch (BFRS) of knowledge graphs to discover related concepts and broad connections.

DFRS Agent

Depth-First ReSearch (DFRS) of specific knowledge paths to extract detailed information.

Unified Knowledge Graph APIs

DeepEvidence integrates 15+ biomedical APIs across multiple domains for comprehensive research.

BioDSA Unified Knowledge Bases

Literature & Publications

PubMedPubTatorUMLS

Genes & Proteins

MyGeneUniProtGene OntologyMyVariants

Drugs & Chemicals

PubChemMyChemOpenFDA

Diseases & Phenotypes

MyDiseaseOpenTargets

Pathways & Reactions

ReactomeKEGG

Clinical Data

ClinicalTrials.gov

Evidence Graph

DeepEvidence builds a persistent knowledge graph during research that captures entities (papers, genes, diseases, drugs) and their relationships.

  • Accumulates knowledge across search rounds
  • Enables retrieval of previously discovered information
  • Supports iterative refinement of research questions
  • Exports to interactive HTML/PDF visualizations
python
# Export evidence graph
results.export_evidence_graph_html(
    "evidence_graph.html"
)

# Access discovered entities
entities = results.evidence_graph_data[
    'entities'
]
relations = results.evidence_graph_data[
    'relations'
]

Evidence Graph Exploration

Interactive visualization showing how DeepEvidence iteratively builds a knowledge graph through systematic exploration. Use the controls to step through the 6 phases of target identification.

Loading visualization...
Navigate steps
New nodes (this step)
New edges (this step)
Drag nodes to rearrange

Benchmark Results

DeepEvidence significantly outperforms existing methods across biomedical research benchmarks.

40%
HLE-Medicine
+12× vs GPT-5
80%
LabBench-LitQA2
+1.7× vs GPT-5
47%
SuperGPQA-Hard
+1.6× vs GPT-5
96%
TrialPanorama
+1.3× vs GPT-5

HLE-Medicine

Hard medicine questions
DeepEvidence
40%
Biomni
20%
Sonnet-4.5
10%
ToolUniverse
6.7%
GPT-5
3.3%

LabBench-LitQA2

Literature QA
DeepEvidence
80%
GPT-5
48%
Sonnet-4.5
40%
Biomni
32%
ToolUniverse
24%

SuperGPQA-Hard

Expert medicine questions
DeepEvidence
47.1%
Sonnet-4.5
43.6%
Biomni
40.7%
ToolUniverse
37.2%
GPT-5
29.4%

TrialPanorama

Evidence synthesis
DeepEvidence
96%
Sonnet-4.5
88%
Biomni
84%
ToolUniverse
73.7%
GPT-5
72%

DeepEvidence Benchmark

7 knowledge-graph-driven deep research tasks spanning the biomedical discovery pipeline.

Target Identification

25 tasks

Identify therapeutic targets for diseases by integrating gene-disease associations and pathway evidence.

MoA Pathway Reasoning

25 tasks

Multi-hop mechanistic reasoning to explain molecular perturbation propagation through pathways.

Metabolic Flux Response

25 tasks

Predict metabolic flux suppression in pre-clinical models based on pathway dependencies.

Drug Regimen Design

25 tasks

Design drug dosing regimens considering pharmacological and clinical factors.

Surrogate Endpoint

14 tasks

Identify plausible surrogate endpoints that reflect downstream clinical outcomes.

Sample Size Estimation

25 tasks

Estimate clinical trial sample sizes under design assumptions and outcome constraints.

Evidence Gap Discovery

20 tasks

Identify missing, weak, or conflicting evidence across biomedical knowledge sources.

159
Total Tasks
7
Task Types
15+
APIs Integrated
Multi-hop
Reasoning Required

Example Discovery Tasks

DeepEvidence tackles complex biomedical research questions requiring multi-hop reasoning across knowledge bases.

Target Identification

Therapeutic target discovery for inflammatory diseases

User Instruction

Which genes are most promising to be effective therapeutic targets for modulating the inflammatory response in Ulcerative Colitis?

Agent Execution Plan

Identify candidate gene targets using disease-gene association databases (OpenTargets, DisGeNET). Filter for genes involved in inflammatory pathways (NF-κB, JAK-STAT, IL-6 signaling). Prioritize druggable targets with existing small molecule or antibody modulators. Cross-reference with clinical trial data for IBD therapeutics.

Quick Start

python
from biodsa.agents import DeepEvidenceAgent

# Initialize the agent
agent = DeepEvidenceAgent(
    model_name="gpt-5",
    api_type="azure",
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    model_kwargs={
        "max_completion_tokens": 5000,
        "reasoning_effort": "minimal",
    },
    subagent_action_rounds_budget=5,  # action rounds for sub research agents
    main_search_rounds_budget=2,      # search rounds for main orchestrator
    main_action_rounds_budget=15,     # action rounds for main orchestrator
    light_mode=False,                  # use full memory graph
    llm_timeout=120,
)

# Run the agent
execution_results = agent.go(
    "Summarizing the cutting-edge immunotherapy drugs in late clinical "
    "trial phase or have been approved for NSCLC?",
    knowledge_bases=[
        "pubmed_papers", "clinical_trials", "drug", "disease"
    ]
)

Citation

@article{wang2025deepevidence,
  title   = {DeepEvidence: Empowering Biomedical Discovery with Deep Knowledge Graph Research},
  author  = {Wang, Zifeng and Chen, Zheng and Yang, Ziwei and Wang, Xuan and Jin, Qiao and Peng, Yifan and Lu, Zhiyong and Sun, Jimeng},
  journal = {arXiv preprint},
  year    = {2025}
}