Poster Abstract: Comparative Performance of agentic AI and Physicians in Clinical History Taking across leading LLMs

Sonu Subudhi, Instructor, Massachusetts General Hospital

Abstract

Introduction: Comprehensive clinical history taking is essential for diagnostic reasoning, triage, and treatment planning, yet is often constrained by time pressure and documentation burden in outpatient care. We hypothesized that large language models (LLMs), when guided by a structured agentic framework, can reliably collect clinically meaningful patient histories. We developed a modular, iterative prompting system that sequentially traverses standard history domains, evaluates relevance and completeness of patient responses, generates targeted follow-up questions, and determines when sufficient information has been obtained. The system produces an EHR-ready clinical summary alongside diagnostic impressions, identification of dangerous conditions to rule out, and recommended investigations.

Methods: We implemented this framework in a patient-facing web application and benchmarked three LLMs (GPT-4o, Gemini-2.5-Flash-Lite, and Grok-3) using simulated patient interactions derived from 52 published clinical case reports spanning 13 medical specialties and 20 constructed clinical scenarios designed to reflect common outpatient presentations. Three blinded physicians independently evaluated chatbot-generated histories against gold-standard references using a predefined rubric. Across models, relevant history elements were captured with >85% accuracy and recommended investigations aligned with those used to establish final diagnoses. Section-wise analysis demonstrated balanced performance across major history components, without systematic underperformance in any domain. Additionally, clinically important red flags were consistently identified reliably across models.

Conclusion: These findings demonstrate that structured agentic framework substantially stabilizes LLM performance for clinical history collection. Automated, structured pre-visit intake may reduce documentation burden, improve completeness of patient information, and support safer clinical workflows, motivating prospective evaluation in real-world clinical settings.