MAST
Tools, datasets, and guides to help you evaluate and benchmark medical AI systems using the MAST framework.
Access curated clinical datasets used in the MAST benchmark suite. All datasets are de-identified, IRB-approved, and formatted for direct use in evaluation pipelines.
Browse DatasetsThe MAST evaluation harness provides a standardized framework for running benchmarks against medical AI models. Clone the repository, configure your model endpoint, and generate reproducible evaluation results.
View Setup GuideCurated case sets spanning multiple medical specialties, designed for benchmarking clinical reasoning, diagnostic accuracy, and safety. Each case is authored and validated by board-certified physicians.
Explore CasesIntegrate MAST evaluations into your CI/CD pipelines, model training workflows, or internal quality dashboards. Our API supports batch evaluation, webhook notifications, and structured result export.
View API Docs