Vlrchat: A Large Language Model Chatbot That Leverages Retrieval-augmented Generation (rag) To Answer Clinical Questions

Webber, Chase Jeffrey

VLRCHAT: A LARGE LANGUAGE MODEL CHATBOT THAT LEVERAGES RETRIEVAL-AUGMENTED GENERATION (RAG) TO ANSWER CLINICAL QUESTIONS

Snehal Bindra¹, Joshua Onyango², Ivo Su³, Matthew Sweeney, MD², Michael J. Neuss, MD PhD⁴, Beau Hilton, MD², Chase Jeffrey Webber, DO⁵, ¹MD/MBA Vanderbilt Candidate, '26, ; ²Vanderbilt University Medical Center, ; ³Vanderbilt University School of Medicine, ; ⁴University of North Carolina School of Medicine, ; ⁵Vanderbilt University School of Medicine, TN

Meeting: SHM Converge 2025

Abstract Number: 0033

Category: Innovations

Sub-Category: Technology in Hospital Medicine

Keywords: Chatbot, generative AI, retrieval-augmented generation, VIMBook

Background: We created a generative artificial intelligence chatbot (“VLRChat”) that draws from a comprehensive curricular resource (the 480-page Vanderbilt Housestaff Handbook) to answer point-of-care clinical questions by hospitalists and inpatient clinical teams. Time constraints and cognitive overload frequently prevent hospitalist faculty and learners from determining the optimal, most current answers to clinical questions: as many as 50% of clinical questions go unanswered (3,4). Large language models (LLMs) have demonstrated remarkable capabilities in complex problem solving, reasoning, and synthesis of documents to answer clinical questions, but few studies have evaluated authentic clinical questions or integrated into the workflow of hospitalists. We present the aims, design, and preliminary human evaluation of VLRChat, and share results from analysis of an initial test dataset of 180 clinical question-answer pairs.

Purpose: VLRChat aims to provide trustworthy, reliable, and accurate responses to point-of-care clinical questions. Integrating content from our own internal medicine residency’s handbook (“VIMBook”), VLRChat’s responses provide systems-based educational decision support at Vanderbilt and the Nashville VA. The handbook, updated and reviewed annually by dozens of residents and faculty, averages 55,000 visits to its website per month (1,2).

Description: VLRChat unites several components in an architecture that includes a large language model framework augmented with retrieval capabilities from a curated medical resource (Figure 1). Users input (1) a text query, the tool searches (2) the context of a digitized vector database version of VIMBook, and VLRChat engages (3) the user-selected LLM. VLRChat then responds (4) with synthesized answers, cites relevant chapters, and user provides (5) immediate feedback. An initial test set of 180 clinical questions (authentic, non-PHI containing) was developed by our research team (two medical students, one fellow, four board-certified hospitalists). We tasked a subset of this group (two students, two hospitalists) with evaluating VLRChat on domain axes for accuracy, relevance, completeness, and clarity. Human evaluators rated the 150/180 question-answer pairs successfully answered by VLRChat highly for accuracy (4.5/5.0), relevance (4.7/5.0), completeness (4.6/5.0) and clarity (4.8/5.0) (Figure 2).

Conclusions: Clinical questions bridge the working mental model of hospitalists and the data (clinical and historical) of patients they treat. In the realm of authentic clinical questions, VLRChat successfully generated answers that human evaluators of varying levels of experience noted to be highly accurate, relevant, and clear. Analysis of this preliminary dataset suggests that VLRChat could enhance educational decision support by rapidly providing users with answers grounded in a trusted and accurate resource. VLRChat may augment care by directing users rapidly to curated content of which they were previously unaware. This effect is enhanced further by the cross-cutting way in which VLRChat synthesizes content across multiple chapters of VIMBook.Timely, intuitive platforms that deliver evidence-based guidance improve the quality and safety of patient care and contribute to a Learning Health System. Following further refinement and validation based on human feedback and co-design, we plan to launch VLRChat for public use in January 2025, offering a trusted, real-time resource for bolstering clinical knowledge.

IMAGE 1: Figure 1. Architecture Components of the Vanderbilt Learning Record (VLRChat)

IMAGE 2: Figure 2. Human Evaluation Results for VLRChat Performance on 150 Questions

By webshmorg|2025-04-22T23:59:45-05:00April 22nd, 2025|

To cite this abstract:

Snehal Bindra¹, Joshua Onyango², Ivo Su³, Matthew Sweeney, MD², Michael J. Neuss, MD PhD⁴, Beau Hilton, MD², Chase Jeffrey Webber, DO⁵.

VLRCHAT: A LARGE LANGUAGE MODEL CHATBOT THAT LEVERAGES RETRIEVAL-AUGMENTED GENERATION (RAG) TO ANSWER CLINICAL QUESTIONS.

Abstract published at SHM Converge 2025.

Abstract 0033

Journal of Hospital Medicine.

https://shmabstracts.org/abstract/vlrchat-a-large-language-model-chatbot-that-leverages-retrieval-augmented-generation-rag-to-answer-clinical-questions/.

April 30th 2026.

VLRCHAT: A LARGE LANGUAGE MODEL CHATBOT THAT LEVERAGES RETRIEVAL-AUGMENTED GENERATION (RAG) TO ANSWER CLINICAL QUESTIONS

<< Go back

This Week

This Month

All Time

This Week

This Month

All Time