Background: Advance care planning (ACP) conversations are essential for aligning medical decisions with patient goals and values. However, these conversations vary widely in structure and quality, and documentation in the electronic health record (EHR) often emphasizes treatment decisions rather than the underlying values that should guide them1. Structured approaches to ACP conversations elicit patient values and can enhance patient-centeredness, but their consistent adoption in practice remains limited. Here we develop a clinician-validated large language model (LLM) pipeline to evaluate the degree of patient-centeredness of structured templates in ACP documentation for hospitalized patients.
Methods: We conducted a retrospective chart review of ACP notes from adult patients hospitalized at a tertiary academic medical center between 2011-2025, accessed via a de-identified research database. All notes during the baseline period prior to September 2018 were unstructured narrative notes. A structured note template for documentation of ACP conversations was introduced in September 2018 with voluntary adoption in the intervention period. Notes limited to code status, durable power of attorney, advance directives, or procedural consents were excluded. Patient-centeredness of notes was evaluated using the Advance Care Planning Communication Assessment Tool (ACP-CAT)2, which was iteratively refined by three trained human reviewers to develop a 9-item binary rubric (scores 0 or 1 per item, maximum score 9) and applied to a validation set of 52 randomly sampled notes. Given that multiple LLMs performed similarly on this validation set, an open-source Qwen-based LLM was selected for downstream analysis to maximize generalizability and accessibility. We randomly sampled 1000 notes each from the baseline and intervention periods for LLM scoring. To compare mean scores across periods and between structured vs. narrative notes, we performed univariate analyses.
Results: There were 22 structured notes and 30 narrative notes in the validation set. Inter-rater agreement between human reviewers was high across all rubric items (mean pairwise agreement 0.83 -0.96), with accuracy >=94% for all reviewers compared to the consensus majority vote label. Mean scores were higher for structured notes compared to narrative notes (6.29 vs 4.45). In contrast, the Qwen-based LLM scored 80% against the consensus majority vote label, averaging higher scores than the human reviewers at 7.05 vs. 4.97. Discrepancies were highest between the LLM and the human reviewers when assessing for documentation of fears/worries/concerns and critical functions acceptable for quality of life.After exclusions, 483 and 488 notes from the baseline and intervention periods were scored. There was a small but statistically significant increase in the average score after structured note templates were introduced (5.05 +/- 1.70 vs 5.28 +/- 1.77, p = 0.04). Within the intervention period, the 84 (17%) structured notes scored substantially higher than the 404 (83%) narrative notes (6.42 +/- 1.67 vs. 5.05 +/- 1.70, p < 0.001).
Conclusions: Structured template-based documentation demonstrated higher patient-centeredness scores. Our clinician-validated LLM framework for quantifying patient-centeredness in ACP notes provides a scalable means to rigorously evaluate structured templates, offering actionable insights to guide clinician training and system-level improvement.
.png)
.png)