Background: Frailty is linked to poor outcomes in older patients, especially those with multiple conditions. The electronic frailty index (eFI) is a validated tool based on the cumulative deficit model used to screen hospitalized patients at risk for poor outcomes1. The eFI relies on 35 ICD-10 codes associated with encounters across four domains (morbidity, sensory, cognitive, functional).2 However, some diagnoses, like gait abnormality in the functional domain, are often missing,3 potentially underestimating eFI scores. Physical therapist (PT) and case management (CM) notes typically include functional status assessments in unstructured text, which could identify these missing diagnosis codes. This study aims to determine if prompting Large Language Models (LLMs) like ChatGPT to identify functional status diagnosis codes from PT and CM assessments improves eFI calculation in older adults with multiple conditions likely to be frail.
Methods: We conducted a retrospective analysis of hospitalized patients (>51 years) from two studies (Table 1, footnote). We collected demographic information, ICD-10 diagnosis codes from ambulatory encounters in the 24 months before the hospital encounter, PT and CM notes from the hospital encounter, and 90-day post-discharge data (readmission, disposition, mortality) from the EHR (Epic Systems, Inc.). We prompted GPT-3.5 and GPT-4 in Azure AI Studio (Microsoft, Inc.) to identify functional diagnoses from unstructured PT and CM documentation (Figure 1). We calculated eFIs using only ambulatory encounter diagnoses, and then with functional diagnoses identified by GPT. Descriptive statistics were used to report demographic characteristics and post-discharge events for the overall cohort and frail cases (eFI > 0.2). We compared the frailty rates identified by the unenhanced eFI and GPT-enhanced eFI using bivariate statistics, and analyzed post-discharge events (rehospitalization, institutionalization, death) within 90 days for frail and non-frail cases using multivariable analyses.
Results: The characteristics of the 616 included cases (Table 1a) and frail cases (unenhanced, n=91; GPT-3.5, n=323; GPT-4, n=247) were similar. In our cohort, 320 cases (51.9%) had at least one post-discharge event (Table 1b) within 90 days after hospitalization. An example of the prompt, a truncated PT note, and GPT output are shown in Figure 1. More frail cases were identified using LLM-enhanced eFIs (GPT-3.5: 52.4% vs. 14.8%, p< 0.01; GPT-4: 40.1% vs 14.8%, p< 0.01) compared to unenhanced eFI. Post-discharge events were significantly more frequent in frail cases compared to non-frail cases for all eFIs (unenhanced: 17.8% vs 11.5%, p=0.03; GPT-3.5: 58.8% vs 35.5%, p< 0.01; GPT-4: 61.1% vs 32.43%, p< 0.01). Adjusted analyses showed frailty was associated at least one post-discharge event for each eFI (unenhanced: OR 1.13 [1.01, 1.26], p=0.04; GPT-3.5: OR 1.16 [1.07, 1.26], p< 0.01; GPT-4: OR 1.15 [1.06, 1.25], p< 0.01).
Conclusions: In older hospitalized patients, LLM-enhanced eFIs identified significantly more frail cases by detecting functional deficits in unstructured PT and CM documentation. This approach may improve care planning and facilitate comprehensive frailty screening for older adults as required by CMS’s Age-Friendly Hospital Measure. Future research should validate these findings in larger samples with diverse populations through prospective studies.

