Background: Individuals with diabetes have a 2–3 fold higher hospitalization rate compared to those without diabetes. During hospitalization, individuals with diabetes frequently experience elevated blood glucose levels (or, hyperglycemia) (1), which can increase monitoring by nurses and hospitalists, length of stay, and healthcare cost. Therefore, we sought to develop a machine learning model to predict hyperglycemia in hospitalized individuals with diabetes, and identify risk factors for hyperglycemia in hospitalized patients with diabetes to guide interventions and optimize resource utilization.
Methods: We conducted a retrospective analysis of individuals with diabetes hospitalized at an institution’s 19 hospitals across the United States, from July 2017 – March 2024. The dataset comprised of 179,495 admissions to adult medical and surgical services, of which, 121,262 had blood glucose readings. The outcome was hyperglycemia, defined as having ≥1 blood glucose level exceeding 180 mg/dL. We extracted individual-level data from the electronic health record and used 33 features (e.g., demographics, comorbidities, medications, laboratory data) for model development. We defined rurality of patients’ residence using Rural-Urban Commuting Area (RUCA) codes. We retrieved the temporal trend (i.e., slope) of laboratory data (e.g. hemoglobin A1c, creatinine) using linear mixed effect models. We used an Extreme Gradient Boosting (XGBoost) model with a random 80/20 train-test split, with Synthetic Minority Oversampling Technique (SMOTE) (2) applied to address outcome imbalance (75% of patients with hyperglycemia; 25% without). We used SHapley Additive exPlanations (SHAP) (3) to enhance model interpretability, with higher SHAP values indicating a higher contribution to the model’s predictive power. Analyses were completed using Python programming language in our institution’s Google Cloud Platform (4).
Results: This cohort of 121,262 admissions had a mean (± SD) age of 64.3 ± 14.6 years and included 56.9% men (n = 69,050), 87.9% of white race (n = 106,666), and 63.3% of urban residence (n = 76,732). Among these admissions, 74.7% had ≥1 hyperglycemia episode. The model achieved high accuracy (0.78), specificity (0.84), and sensitivity (0.57) for hyperglycemia (Table). The top 5 predictors of hyperglycemia were hemoglobin A1c trend (SHAP score: 0.59), hospital steroid use (SHAP score: 0.32), home long-acting insulin (SHAP score: 0.26), home short-acting insulin (SHAP score: 0.22), and creatinine trend (SHAP score: 0.20). In contrast, an increasing creatinine trend, higher diabetes composite score, and residing in an urban area were associated with fewer hyperglycemic episodes. (Figure)
Conclusions: In this multi-site analysis of hospitalized adults with diabetes, we developed a model to predict hyperglycemia with high accuracy. Factors associated with hyperglycemia such as sociodemographic factors (e.g., residing in rural area) and hospital factors (e.g., steroid use, admission at night, admission on weekend) will guide interventions to improve patient monitoring and resource utilization. To our knowledge, this is the largest multi-site analysis of hospital hyperglycemia. Future work will evaluate the association of hyperglycemia and outcomes including mortality and hospital re-admission.

