Background:

The ability to accurately identify diabetic patients from electronic data can be a critical component of quality improvement (QI), performance measurement, and research applications. Poor case-finding definitions may produce unreliable reports and inaccurate information.

Multiple methods have been described for automated diabetes case-finding. The development of case-finding definitions can be challenging and is dependent on the ability to evaluate the accuracy of the data. We sought to develop an in-depth understanding of the benefits and risks of three different methodologies to identify patients with diabetes.

Methods:

We applied three automated strategies- two identified from the literature and a novel approach refined from previous local QI efforts. The three strategies are as follows:

1) One occurrence of an ICD-9 code for diabetes at any time.

2) A diabetes-related prescription in the prior year and/or two or more diabetes-related ICD-9 codes from patient visits in the preceding two years.

3) An ICD-9 code for diabetes or at least two separate fills for a diabetes medication or two separate HbA1c values >= 6.5.

Each definition was applied to all patients in the study sample and produced a binary result of diabetes/no diabetes. The gold standard for analysis was based on objective criteria in the 2010 American Diabetes Association definition. Case validation was completed by chart review on a purposeful random sample.

Results:

Our initial sample included 245,754 patients who had vital signs, clinic visits, or medication refills within the preceding 396 days. We excluded patients who were non-veteran employees, with a documented date of death, or had not had either a HbA1c or a blood glucose drawn within the last 396 days. Our analysis was applied to the 143,177 remaining patients.

Sensitivity of the diabetes definition strategies are 1) 89.84%, 2) 87.70%, 3) 90.27%. Specificity of the three strategies are 1) 91.29%, 2) 94.39%, 3) 94.69%. Correlation coefficients between methods are as follows: 1&2 = .8843, 1&3 = .8910, and 2&3 = .9182. 

Conclusions:

Using three different automated definitions for identification of diabetic patients resulted in different, but similar sensitivities and specificities. A disease definition that incorporated laboratory data in addition to pharmacy and coding data was superior. All three strategies appear to be acceptable methods for case-finding based on the data and resources available at a local institution.