Background: With access to big data in medicine, it is becoming increasingly necessary to use tools to automate the pooling of data into relevant thematic structures. Topic modeling is most often used to uncover these thematic structures in large sets of textual data. Latent Dirichlet allocation (LDA) is one such algorithm-based topic model that has been used widely to uncover hidden variables among observed corpora and to summarize large amounts of text.

Purpose: In this study, LDA was used to discover patterns of word use among 100,000+ Google reviews of US-based hospital systems to help uncover consumer sentiment and perception of their hospital experience. In this study, we sought to answer the question, “What do Google users think about hospitals?”

Description: A browser automation library was used to pool all user submitted Google reviews for United States hospitals through the Center for Medicare and Medicaid Services. Data from 4,788 acute care centers, critical access facilities, and hospitals were collected resulting in 100,776 total user submitted reviews. A topic model was created using LDA, a mathematical model for categorizing large groups of text into related groups.
The LDAvis Plot in Figure 1 reveals twenty unique topics that were discovered across all Google reviews and plotted based on based on relatedness to one another. Figure 2 provides examples of the terms within a given topic in Figure 1. An interactive version of the LDA plot can be found at: https://goo.gl/uMvW0H.

Overall, the twenty topics can be loosely clustered by negative sentiment associated with hospital operations including billing and wait times, negative sentiments associated with medical conditions, treatment, and outcomes, and positive sentiment associated with medical procedures including surgery and childbirth in Cluster 1, 2, and 3 in figure 1 respectively.

Analysis of singular topics revealed themes exhibited across hospitals. For example, topic 13 highlights reviews written predominately in Spanish. The terms associated with topic 13 have a positive sentiment, suggesting that Spanish-speaking users perceive their experience at hospitals as favorable. In contrast, Topic 4 indicates adverse sentiments commenting on wait times and overcrowding in emergency rooms. Staff and employees are associated with both positive and negative sentiments in topics 8 and 19, respectively. Topic 18 indicates that positive sentiments are most often associated with facilities, most commonly with hospital parking and cafeterias.

Conclusions: The use of topic modeling in healthcare can help draw system-wide conclusions for Google reviews of hospitals and understand patient and family members’ perceptions of various aspects of the hospital experience. It can reveal previously unknown factors such as parking and cafeteria food having a larger influence on a user’s attitude than construction and building aesthetics. By analyzing the reviews through text mining and LDA, hospital systems can learn about user perceptions of hospital stays and address areas to enhance these experiences.

IMAGE 1: Figure 1. LDAvis Plot

IMAGE 2: Figure 2. Top 30 Most Relevant Terms for Topic 4