Improving Restaurants by Extracting Subtopics from Yelp Reviews

In this paper, we describe latent subtopics discovered from Yelp restaurant reviews by running an online Latent Dirichlet Allocation (LDA) algorithm. The goal is to point out demand of customers from a large amount of reviews, with high dimensionality. These topics can provide meaningful insights to restaurants about what customers care about in order to increase their Yelp ratings, which directly affects their revenue. We used the open dataset from the Yelp Dataset Challenge with over 158,000 restaurant reviews. To find latent subtopics from reviews, we adopted Online LDA, a generative probabilistic model for collections of discrete data such as text corpora. We present the breakdown of hidden topics over all reviews, predict stars per hidden topics discovered, and extend our findings to that of temporal information regarding restaurants peak hours. Overall, we have found several interesting insights and a method which could definitely prove useful to restaurant owners.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.