reviews

Showing 1-7 of 7 results

TripAdvisor European restaurants

Publication Date: 2021
Creators: (Leone, Stefano)

TripAdvisor is the most popular travel website and it stores data for almost all restaurants, showing locations (even latitude and longitude coordinates), restaurant descriptions, user ratings and reviews, and many more aspects.The dataset is 0.68 GB large.

The TripAdvisor dataset includes 1,083,397 restaurants with attributes such as location data, average rating, number of reviews, open hours, cuisine types, awards, etc.

The dataset combines the restaurants from the main European countries, the data has been scraped in early May 2021.

Yelp Open Dataset

Publication Date: 2015
Creators: Yelp, Inc.
The Yelp dataset is a subset of businesses, reviews, and user data for use in personal, educational, and academic purposes. It contains 6.9M online reviews for 150k businesses. It also includes more than 200,000 images related to the reviews.The data consists of multiple sub datasets:

  1. Yelp Business data: Contains business data including location data, attributes, and categories.
  2. Yelp Review data: Contains full review text data including the user_id that wrote the review and the business_id the review is written for.
  3. Yelp User data: User data including the user’s friend mapping and all the metadata associated with the user.
  4. Yelp Checkin data: Checkins on a business.
  5. Yelp Tip data: Tips written by a user on a business. Tips are shorter than reviews and tend to convey quick suggestions.
  6. Yelp Photo data: Contains photo data including the caption and classification (one of “food”, “drink”, “menu”, “inside” or “outside”).

Available as JSON files, use can use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.

 

Goodreads Datasets

Publication Date: 2017
Creators: Wan, Mengting; McAuley, Julian

We collected three groups of datasets: (1) meta-data of the books, (2) user-book interactions (users’ public shelves) and (3) users’ detailed book reviews. These datasets can be merged together by matching book/user/review ids. The complete book graph includes 2,360,655 books, 876,145 users and 228,648,342 user-book interactions in users’ shelves.

Trip Advisor Hotel Reviews

Publication Date: 2020
Creators: Alam, Md. H.; Ryu, Woo-Jong; Lee, SangKeun

Hotels play a crucial role in traveling and with the increased access to information new pathways of selecting the best ones emerged. With this dataset, consisting of 20k reviews crawled from Tripadvisor, you can explore what makes a great hotel and maybe even use this model in your travels!

Google Local Reviews

Publication Date: 2017
Creators: He, Ruining; Kang, Wang-Cheng; McAuley, Julian

We introduce a new dataset from Google which contains 11,453,845 reviews and ratings from 4,567,431 users on 3,116,785 local businesses (with detailed name, hours, phone number, address, GPS, etc.). Œere are as many as 48,013 categories of local businesses distributed over €ve continents, ranging from restaurants, hotels, parks, shopping malls, movie theaters, schools, military recruiting oces, bird control, mediation services (etc.)

IMDb Movie Reviews Dataset

Publication Date: 2011
Creators: Maas, Andrew L.; Daly, Raymond E.; Pham, Peter T.; Huang, Dan; Ng, Andrew Y.; Potts, Christopher

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.

The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional  50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.

The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided – please download the data directly from the data provider’s website.

When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Food.com Recipe & Review Data

Publication Date: 2019
Creators: Majumder, Bodhisattwa P.; Li, Shuyang; Ni, Jianmo; McAuley, Julian
This dataset consists of 180K+ recipes and 700K+ recipe reviews covering 18 years of user interactions and uploads on Food.com (formerly GeniusKitchen), an online recipe aggregator.This dataset contains three sets of data from Food.com:

Interaction splits

  • interactions_test.csv
  • interactions_validation.csv
  • interactions_train.csv

Preprocessed data for result reproduction

In this format, the recipe text metadata is tokenized via the GPT subword tokenizer with start-of-step, etc. tokens.

  • PP_recipes.csv
  • PP_users.csv

To convert these files into the pickle format required to run our code off-the-shelf, you may use pandas.read_csv and pandas.to_pickle to convert the CSV’s into the proper pickle format.

 

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.