Background: The evolving COVID-19 pandemic has raised direct patient care clinical questions that require rapid answers and flexibility in data generation and analysis. Thorough and reliable patient-level data is not available at the local, state, national, or international levels. Institutional efforts to produce datasets derived from electronic health records (EHR) can take months to years and are limited in reliability of non-discrete data (e.g. History & Physical presenting symptoms text) and non-lab/vital discrete data (e.g. “active problem list”).

Purpose: We sought to create a novel COVID learning group to rapidly create a flexible, iterative, and human-verified dataset to answer clinical questions.

Description: The COVID learning group identified the need for thoughtfully-curated data to drive what was anticipated to be rapidly evolving (best available) evidenced-based medicine (EBM) for COVID-19 care decisions. We adopted a rapid-cycle quality improvement (QI)-based Plan-Do-Study-Act framework for group development. A biostatistician was engaged to ensure a well-designed dataset allowing for robust statistical analysis and iterative enhancements to address future questions. We utilized a series of “quick hits” less-than-thirty-minute Zoom meetings to identify needs, available resources, and collaborators. We selected COVID variables through a quasi-systematic review of COVID-19 literature and an analysis of Thomas Jefferson University Hospital (TJUH)-specific front-line clinical questions. Team members engaged in independent data-extraction and hypothesis-generation trial runs to refine efficiency and find further variables and clinical questions of interest; minimal structure was provided to promote broad idea generation. The group created an executive committee to improve group efficiency for the manual chart review and computer-aided extraction sub-teams. We prioritized communication with colleagues at TJUH to avoid work duplication and find opportunities for dataset collaboration. Within one month of our first meeting, data collection began in earnest.

Conclusions: To our knowledge, we have created the largest single-institution United-States-based COVID-19 dataset that uses reliable methodology to ensure data validity; at present, it contains 700+ patients. The group’s approach has educated the team members in methodologically-sound QI, EBM, and patient-privacy-adherent techniques while searching for COVID-19 clinical answers. The dataset accumulated by the COVID learning group has been used to validate data gathered by Woo et al. (2020) for a web-based COVID-19 severity risk calculator, currently in the process of peer-review publication. Multiple teams at TJUH are using the dataset to answer pressing front-line clinical questions in near real-time (e.g. identification of low risk patients; deterioration risk; efficacy of off-label therapies; validation of external models; etc.); analysis and database enhancements will continue as clinical needs dictate. In conclusion, we describe the successful use of QI, EBM, organizational leadership and team building strategies to create an iteratively updated and enhanced novel COVID dataset to answer near real-time clinical questions.