Resources

Tools, code, and documentation for developers and researchers working with MAST.

GitHub Repository

Clone the repository, run evaluations, open issues, and submit pull requests. All tools, datasets, and documentation below are hosted on GitHub.

View on GitHub

Evaluation Harness

The MAST evaluation harness provides a standardized framework for running benchmarks against AI models. Configure your model endpoint and generate reproducible results.

View Setup Guide

Datasets

Access the de-identified clinical datasets used in the MAST benchmark suite. All datasets are IRB-approved and formatted for direct use in evaluation pipelines.

Browse Datasets

API and Integration

Integrate MAST evaluations into your development pipelines, model training workflows, or internal quality dashboards. The API supports batch evaluation, webhook notifications, and structured result export.

View API Docs

Submitting a Benchmark

We welcome new benchmark submissions. All submissions must include a peer-reviewed or pre-print manuscript, a publicly accessible dataset, and reproducible evaluation code. Results should be generated using the official MAST evaluation harness. Submissions are reviewed by the MAST Steering Committee for clinical relevance, methodological rigor, and reproducibility. Accepted benchmarks are integrated into the composite score on a quarterly cycle.

Submission guidelines on GitHub

Submitting a Model for Evaluation

If you are an AI company and want your model independently evaluated for the MAST leaderboard, please follow the submission instructions on GitHub. For any questions, get in touch with the team.

Submit model