An Llm Pipeline To Elevate The Clinician Voice: Lessons From An Inpatient Ai Chart Summarization Pilot

AN LLM PIPELINE TO ELEVATE THE CLINICIAN VOICE: LESSONS FROM AN INPATIENT AI CHART SUMMARIZATION PILOT

Stephen P. Ma, MD, PhD¹, April S. Liang, MD¹, Olivia Aparicio-Kratz, RN², Jonathan Colston, MD¹, Matthew A. Eisenberg, MD², ¹Stanford University School of Medicine, ; ²Stanford Health Care,

Meeting: SHM Converge 2026

Abstract Number: 38

Session Type: Finalist

Category: Innovations

Sub-Category: Technology in Hospital Medicine

Keywords: Finalist

Background: Health systems are rapidly piloting Large Language Model (LLM)-enabled tools within the Electronic Health Record (EHR) [1–4], but there is limited guidance on how to evaluate these tools during early deployment [5]. Qualitative clinician feedback is essential for understanding real-world usability, identifying safety concerns, and guiding iterative improvement of novel technologies [6]. Traditional qualitative research methods for synthesizing narrative feedback are resource-intensive and difficult to scale [7], limiting their usefulness for rapid iteration and governance.

Purpose: We developed and applied a mixed-methods evaluation framework while piloting a novel inpatient artificial intelligence (AI) chart review tool, combining structured ratings with an LLM-assisted pipeline for high-throughput quantitative thematic analysis of free-text clinician feedback.

Description: The EHR-integrated inpatient AI chart review tool was piloted with a small group of clinicians from 7/4/2025 through 9/5/2025. Structured and narrative feedback was solicited for all generated summaries. Free-text feedback was analyzed using a custom LLM pipeline that first segmented narrative comments into discrete units and then applied mixed deductive/inductive labeling and sentiment classification. High-level feedback categories (summative, task-specific, and feature-specific) were defined by the implementation team, while low-level codes were generated inductively via LLM labeling using prompts refined through iterative physician review. An open-source, locally hosted LLM (gemma-3-27b) was used for accessibility and generalizability.26 clinicians participated in the pilot (21 physicians/advanced practice providers, 3 nurses, and 2 case managers). Participants generated 783 summaries across three summary types. 172 summaries received structured feedback with107 (62%) labeled “helpful without errors.” When errors were present, the most frequent category was inaccurate/omitted information (31/65 = 48%). Free-text feedback was provided for 124 summaries, which were segmented by the LLM into 331 discrete segments (mean 1.9 per summary). A total of 44 inductive codes were generated in the 3 high-level categories: 8 summative, 19 task-specific, and 17 feature-specific. Example labels and analysis are provided in Table 1 and Figure 1 for segments classified as summative feedback.Sentiment regarding overall utility was predominantly positive (24/37 labels = 65%), consistent with structured ratings. Task- and feature-specific codes highlighted strengths in care team summarization (3/6 positive labels = 50%) and the citation feature (7/11 positive labels = 64%), alongside clear opportunities for improvement in tasks such as lab summarization (21/26 negative labels = 81%), and features such as date timestamps (6/6 negative labels = 100%).

Conclusions: A mixed-methods framework anchored by LLM-assisted thematic analysis can transform large volumes of clinician free-text feedback into structured, quantitative signals. For the inpatient AI chart review tool, this approach identified both strong perceived utility and specific, actionable areas for enhancement, aligning with and extending structured rating data. More broadly, LLM-assisted thematic analysis offers a scalable approach for health systems to transform narrative clinician feedback into actionable insights that can guide iterative improvement and responsible evaluation of novel technologies and processes.

IMAGE 1: Table 1: Examples of Labeled Clinician Feedback Segments Categorized as Summative Feedback

IMAGE 2: Figure 1: Visualization of Labeled Clinician Feedback Segments Categorized as Summative Feedback

By webshmorg|2026-04-30T13:07:55-05:00April 30th, 2026|

To cite this abstract:

Stephen P. Ma, MD, PhD¹, April S. Liang, MD¹, Olivia Aparicio-Kratz, RN², Jonathan Colston, MD¹, Matthew A. Eisenberg, MD².

AN LLM PIPELINE TO ELEVATE THE CLINICIAN VOICE: LESSONS FROM AN INPATIENT AI CHART SUMMARIZATION PILOT.

Abstract published at SHM Converge 2026.

Abstract 38

2026 Converge - Posters.

https://shmabstracts.org/abstract/an-llm-pipeline-to-elevate-the-clinician-voice-lessons-from-an-inpatient-ai-chart-summarization-pilot-2/.

May 7th 2026.

AN LLM PIPELINE TO ELEVATE THE CLINICIAN VOICE: LESSONS FROM AN INPATIENT AI CHART SUMMARIZATION PILOT

<< Go back

This Week

This Month

All Time

This Week

This Month

All Time