Introducing NOHARM - a specialist-validated benchmark towards deploying clinically accurate and safe models.

RESEARCH REPORT

State of Clinical AI Report 2026

An annual synthesis of the most significant developments, evidence, and emerging challenges in clinical AI.

Loading presentation...

The State of Clinical AI Report is the inaugural annual synthesis of the most significant developments, evidence, and emerging challenges in clinical AI. It brings together a comprehensive, carefully curated view of where the field meaningfully advanced this year—spanning model performance, evaluation, workflows, and patient-facing tools— while also highlighting the gaps that remain.

Produced by ARISE (AI Research and Science Evaluation), a Stanford-Harvard Research Network, the aim is to make the landscape easier to navigate, support responsible adoption, and offer a shared reference point for clinicians, researchers, and health leaders as the field continues to evolve.

Supported by

Stanford Computational Medicine
Harvard Medical School Shapiro Institute
Beth Israel Deaconess Medical Center
Stanford Medicine
Harvard Medical School Blavatnik Institute
Stanford University

Key Themes

Takeaways from the report

01

Models made major leaps in actionable prediction and autonomous clinical reasoning

However, the "jagged frontier" exists. While some models show superhuman capabilities on controlled tasks, brittleness remains in identifying their own uncertainty.

02

Benchmarks and Eval: Evaluating AI on real-world tasks is a prerequisite for trustworthy clinical AI

With the saturation of traditional QA benchmark scores, there is a need for benchmarking on multi-turn unstructured real-world data, real-world consequences of model errors, and emphasis on administrative/workflow tasks.

03

Foundational Methods: Shifting from better models to better systems

Tokenized medical events, multi-agent orchestration, multimodal models, and reasoning fine-tuning are enabling advances in disease prediction and diagnosis, but can also introduce system-level design trade-offs.

04

AI in Clinical Workflows: Human computer workflow design is as important as model capabilities

While humans + AI often outperform humans alone, there is much room for improvement on workflow design and failure mode training to optimize success while mitigating automation bias and deskilling.

05

Patient Facing AI: Patient-facing AI offers a new landscape for engagement

Measurable outcome improvements and safeguards against harm should be prioritized across use cases such as history taking, coaching, and translation. Patients cannot be assumed to play any oversight role.

06

Applied AI and Demos: The time for context-specific prospective trials is now

Research in model capability is dense, with studies across multiple medical specialities routinely showing incremental task-specific improvements - randomized prospective trials have already commenced and should be the next wave of evidence in 2026.

What is needed in 2026

  • Evaluate models using prospective and post-deployment real-world scenarios to yield evidence based medicine

  • Prioritize human computer interaction design in clinical decision support trials as much as primary outcomes

  • Innovate human–AI or agentic AI workflows to reduce clinical and administrative burden

  • Measure uncertainty, bias, and harm explicitly especially when it comes to patient-facing AI

  • There is a need for claim-level grounding and verification of reasoning traces - measuring support, not fluency will enable increased user trust

Authors

Peter Brodeur, Ethan Goh, Adam Rodman, Jonathan Chen

Acknowledgements

We would like to thank the following reviewers and designers for generously providing feedback: Emily Tat, Liam McCoy, David Wu, Priyank Jain, Rebecca Handler, Jason Hom, Laura Zwaan, Vishnu Ravi, Brian Han, Kevin Schulman, Kathleen Lacar, Kameron Black, Adi Badhwar, Adrian Haimovich, Eric Horvitz

Join us in shaping the future of
healthcare with AI

Mailing List Signup