Yelp Dataset Challenge

Project Dates: 
January 2011 to February 2019
Research Area(s): 
Project Description: 

Yelp reviews and ratings are important source of information to make informed decisions about a venue. We conjecture that further classification of yelp reviews into relevant categories can help users to make an informed decision based on their personal preferences for categories. Moreover, this aspect is especially useful when users do not have time to read many reviews to infer the popularity of venues across these categories. In this paper, we demonstrated how reviews for restaurants can be automatically classified into five relevant categories with precision and recall of 0.72 and 0.71 respectively. We found that an ensemble of two multi-label classification technique (Binary Relevance and Label Powerset) performed better than the techniques individually. Moreover, there is no significant difference in performance when using a combination of bigrams, unigrams and trigrams instead of only unigrams. We also showed how the results of this study can be incorporated into Yelp’s existing website.