MAST
The Medical AI Superintelligence Test is an independent effort run by the ARISE AI Research Network to curate robust and realistic clinical benchmarks to measure the performance of medical AI. MAST exists to ensure that AI entering healthcare is rigorously tested, independently validated, and held to the highest clinical standards before it reaches patients.

To establish an open, evidence-based evaluation framework that holds medical AI to the highest clinical standards, ensuring that deployed systems help rather than harm patients. We believe rigorous, independent benchmarking is the foundation of safe AI adoption in healthcare.
MAST is developed by a multidisciplinary team of clinicians, AI researchers, biostatisticians, and medical educators from ARISE, an independent academic collaborative spanning Stanford Medicine, Harvard Medical School, and partner institutions.
The MAST Steering Committee provides strategic direction, approves new evaluation domains, and sets the weighting methodology for the composite score.
The annotation workforce consists of board-certified physicians across many medical specialties who undergo standardized training on scoring rubrics before participating in evaluations. Technical infrastructure is maintained by a dedicated engineering team responsible for the evaluation pipeline, data security, and leaderboard operations.
Meet the full teamMAST is developed by ARISE, an independent academic research network. MAST may accept external funding, but any external funds support the general development of benchmarks, human baselines, and model testing, and are never directed toward any particular model, evaluation, or outcome, to prevent conflicts of interest.
Our evaluation schedule, scoring rubrics, and publication timeline are determined by the MAST Steering Committee. Model providers are notified of results only after scoring is finalized, and they have no opportunity to influence or preview findings before publication.
For most benchmarks, MAST open-sources at least 20% of the evaluation set, and maintains a private held-out set where possible to prevent overfitting.
MAST team members disclose the following funding and conflicts:
We have accepted token credits from the following companies to run benchmark inference. All judging costs are paid from MAST general funds, so no evaluated company pays for the evaluation that scores it:


ARISE maintains conflict of interest policies to protect the integrity of MAST evaluations. The following principles apply to team members and collaborators involved in the benchmark process:
During evaluation, model providers submit their systems through our controlled API pipeline. MAST does not share benchmark cases with model providers before or after evaluation. All evaluation data is processed in a secure environment, and model outputs are stored only for the duration needed to complete scoring.
De-identified clinical cases used in the benchmark contain no patient-identifiable information. Model providers' API keys and system configurations are handled under standard data protection protocols and are not retained after evaluation completion.
For questions about MAST, our methodology, or our transparency practices, email contact@arise-ai.org.