ARISE
ARISE Logo

MAST

Medical AI Superintelligence Test

Welcome to the MAST project. We're a multi-institutional collaboration independently evaluating medical AI. We curate robust, realistic benchmarks in areas including clinical reasoning, safety, and medical images, so you can make informed decisions about the health AI you use.

Which AI can you trust for medical questions?

We measure performance for general medical AI usage as well as clinician-focused usage. Last updated June 15, 2026

The latest large and small model from each provider, ranked on diagnostic and management reasoning, safety, radiology, and medical images.

Model
Score
1GPT-5.5
OpenAIOpenAI
62.1%
2Claude Opus 4.7
AnthropicAnthropic
59.2%
3Gemini 3.1 Pro
GoogleGoogle
58.0%
4Gemini 3.5 Flash
GoogleGoogle
58.0%
5Kimi K2.5
Moonshot AIMoonshot AI
55.6%
6Grok 4 Fast
xAIxAI
54.5%
7Grok 4
xAIxAI
54.2%
8GPT-5.4 mini
OpenAIOpenAI
54.1%

Compare models

How two models compare across all benchmark dimensions.

DimensionGap
Diagnostic Reasoning
Management Reasoning
Safety
Multimodal Images
Multimodal Radiology
Agentic

Looking for more technical information?

The interactive technical leaderboard hosts the full dataset that powers MAST, with granular benchmark results for researchers, developers, organizations, and labs.

Researchers & Developers

View the public codebase, explore datasets, and run your own models.