MAST
Tools, code, and documentation for developers and researchers working with MAST.
Clone the repository, run evaluations, open issues, and submit pull requests. All tools, datasets, and documentation below are hosted on GitHub.
View on GitHubThe MAST evaluation harness provides a standardized framework for running benchmarks against AI models. Configure your model endpoint and generate reproducible results.
View Setup GuideAccess the de-identified clinical datasets used in the MAST benchmark suite. All datasets are IRB-approved and formatted for direct use in evaluation pipelines.
Browse DatasetsIntegrate MAST evaluations into your development pipelines, model training workflows, or internal quality dashboards. The API supports batch evaluation, webhook notifications, and structured result export.
View API DocsWe welcome new benchmark submissions. All submissions must include a peer-reviewed or pre-print manuscript, a publicly accessible dataset, and reproducible evaluation code. Results should be generated using the official MAST evaluation harness. Submissions are reviewed by the MAST Steering Committee for clinical relevance, methodological rigor, and reproducibility. Accepted benchmarks are integrated into the composite score on a quarterly cycle.
Submission guidelines on GitHubIf you are an AI company and want your model independently evaluated for the MAST leaderboard, please follow the submission instructions on GitHub. For any questions, get in touch with the team.
Submit model