Explore our publications and preprints advancing healthcare through rigorous AI evaluation.
As capabilities of artificial intelligence (AI) advance rapidly, human understanding of these systems is increasingly falling behind. Several trends are […]
Differential diagnosis is an iterative process that integrates patient information with broader medical knowledge. Clinical case series such as the […]
Real-world clinical practice is inherently multimodal, relying on the synthesis of patient history with visual information such as medical imagery […]
Large language models (LLMs) have been rapidly adopted for their potential to reduce clinician documentation burden and assist with clinical […]
Artificial intelligence (AI) is rapidly reshaping healthcare and the competencies expected of graduating medical students, yet AI curricula and competency […]
Large language models (LLMs) are increasingly positioned as general-purpose medical systems, with demonstrated potential in diagnosis and management reasoning for […]
High-quality discharge summaries are essential for safe care transitions but contribute substantially to clinician documentation burden and burnout. While retrospective […]
We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health […]
More than 65 years ago, complex clinical diagnostic reasoning cases were introduced as the gold standard for the evaluation of […]
Large language model generative artificial intelligence (AI) systems have opened Pandora’s box, beating human benchmarks across a range of tasks. […]
Get the latest on our studies, grant awards, and media coverage.