sentiment analysis

Showing 1-3 of 3 results

Multi-aspect Reviews

Publication Date: 2013
Creators: Julian McAuley; Jure Leskovec; Dan Jurafsky
These datasets include reviews with multiple rated dimensions. The most comprehensive of these are beer review datasets from Ratebeer and Beeradvocate, which include sensory aspects such as taste, look, feel, and smell. The data set is about 1 GB large.Ratebeer:

  • Number of users: 40,213
  • Number of items: 110,419
  • Number of ratings/reviews: 2,855,232
  • Timespan: April, 2000 – November, 2011

BeerAdvocate:

  • Number of users: 33,387
  • Number of items: 66,051
  • Number of ratings/reviews: 1,586,259
  • Timespan: January, 1998 – November, 2011

 

Amazon Reviews: Unlocked Mobile Phones

Publication Date: 2019
Creators: PromptCloud, Inc.
We analyzed more than 400,000 reviews of close to 4,400 unlocked mobile phones sold on Amazon.com to find out insights with respect to reviews, ratings, price and their relationships. The author found that on Amazon’s product review platform most of the reviewers have given 4-star and 3-star ratings. The average length of the reviews comes close to 230 characters. They also uncovered that lengthier reviews tend to be more helpful and there is a positive correlation between price & rating. 

IMDb Movie Reviews Dataset

Publication Date: 2011
Creators: Maas, Andrew L.; Daly, Raymond E.; Pham, Peter T.; Huang, Dan; Ng, Andrew Y.; Potts, Christopher

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.

The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional  50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.

The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided – please download the data directly from the data provider’s website.

When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.