Showing 225-232 of 262 results

Customer Support on Twitter

Creators: Axelbrooke, Stuart
Publication Date: 2017
Creators: Axelbrooke, Stuart

The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact. It is intended to facilitate advancements in natural language understanding and the development of conversational models. Compiled by Stuart Axelbrooke in 2017, this dataset encompasses tweets and replies from prominent companies such as Apple, Amazon, Uber, Delta, and Spotify. It provides valuable insights into contemporary customer support practices and their impact, making it an essential resource for researchers interested in automated response generation, sentiment analysis, and conversational flow modeling. The dataset is approximately 516.53 MB in size. It is designed for the analysis of conversation dynamics and contains several key attributes. Each tweet entry has a unique, anonymized tweet ID (tweet_id), an anonymized user ID (author_id), a timestamp (created_at), and the tweet text (text), where sensitive information such as phone numbers and email addresses has been masked to ensure privacy. It differentiates between inbound tweets (inbound), which are directed at companies by customers, and outbound tweets, which are responses from the companies. Additionally, in_response_to_tweet_id and response_tweet_id fields allow for the reconstruction of entire conversation threads by linking tweets to their respective responses.

Spotify Million Playlist Dataset

Creators: Chen, Ching-Wei; Lamere, Paul ; Schedl, Markus ; Zamani, Hamed
Publication Date: 2018
Creators: Chen, Ching-Wei; Lamere, Paul ; Schedl, Markus ; Zamani, Hamed
We released a dataset of one million user-created playlists from the Spotify platform, dubbed the Million Playlist Dataset (MPD). The dataset includes, for each playlist, its title as well as the list of tracks (including album and artist names), and some additional metadata such as Spotify URIs and the playlist’s number of followers. The dataset has a size of 5,39 GB and contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. It is ideal for building and evaluating recommendation algorithms, studying user behavior in music consumption, and understanding how playlists evolve over time. The dataset is widely used by researchers and developers to improve machine learning models for music streaming applications, ensuring a more personalized and engaging experience for users.

The Music Streaming Sessions Dataset

Creators: Brost, Brian; Mehrotra, Rishabh; Jehan, Tristan
Publication Date: 2018
Creators: Brost, Brian; Mehrotra, Rishabh; Jehan, Tristan

The MSSD is a large-scale collection of user interaction data from a music streaming service, designed to support research in user behavior modeling, music information retrieval, and session-based recommendation systems. Released in 2019, this dataset contains approximately 160 million listening sessions, making it one of the most extensive datasets available for analyzing how users engage with music streaming platforms. It provides valuable insights into listening habits, session structures, and sequential user interactions, enabling researchers to study music recommendation, user retention, and engagement patterns. The dataset has a size of 70 GB and captures approximately 3.7 million unique tracks, covering a diverse range of musical content. Each session includes detailed user interactions, such as play, pause, skip, and seek actions, offering a granular view of how listeners interact with music over time. Additionally, it contains rich metadata and audio features for each track, including details such as track ID, artist name, album name, and genre, along with audio attributes like tempo, key, and loudness. These elements make the dataset highly valuable for both behavioral studies and technical research in music information retrieval.

Amazon product co-purchasing network metadata

Creators: Leskovec, Jure
Publication Date: 2006
Creators: Leskovec, Jure

The data was collected by crawling the Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes). It is valuable for analyzing product relationships, customer behavior, and the dynamics of product co-purchasing networks. For each product the following information is available:

Title
Salesrank
List of similar products (that get co-purchased with the current product)
Detailed product categorization
Product reviews: time, customer, rating, number of votes, number of people that found the review helpful.

The data was collected in summer 2006. It has a size of 201 MB and structured into:

  • Product Metadata: Information such as product ID, ASIN, title, group, sales rank, similar products, and categories.

  • Product Reviews: Details including review time, customer ID, rating, number of votes, and helpfulness votes.

Yelp Open Dataset

Creators: Yelp, Inc.
Publication Date: 2015
Creators: Yelp, Inc.
The Yelp dataset offers a collection of real-world data from Yelp, intended for educational and academic purposes. It encompasses information about businesses, user reviews, photos, and check-ins, providing valuable insights into local commerce and consumer behavior. In total, this dataset contains 6.9M online reviews for 150k businesses and covers 11 metropolitan areas. It also includes more than 200,000 images related to the reviews. It has a compressed size of 4,9 GB and uncompressed 10,9 GB available in JSON files. The data consists of multiple sub datasets:

  1. Yelp Business data: Contains business data including location data, attributes, and categories.
  2. Yelp Review data: Contains full review text data including the user_id that wrote the review and the business_id the review is written for.
  3. Yelp User data: User data including the user’s friend mapping and all the metadata associated with the user.
  4. Yelp Checkin data: Checkins on a business.
  5. Yelp Tip data: Tips written by a user on a business. Tips are shorter than reviews and tend to convey quick suggestions.
  6. Yelp Photo data: Contains photo data including the caption and classification (one of “food”, “drink”, “menu”, “inside” or “outside”).

Available as JSON files, use can use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.

 

Twitter 2010 data set

Creators: Lerman, Kristina; Ghosh, Rumi; Surachawala, Tawan
Publication Date: 2010
Creators: Lerman, Kristina; Ghosh, Rumi; Surachawala, Tawan

The Twitter 2010 dataset contains Twitter activity data, focusing on tweets containing URLs and the follower-followee relationships among users during October 2010. It is particularly valuable for studying information diffusion, social network structures, and user interaction patterns on Twitter. The dataset has a size of 21,5 kB and includes 2,859,764 tweets that contain URLs, offering insights into content-sharing behaviors, and 736,930 users who posted these tweets. Additionally, it features 36,743,448 follower-followee relationships, allowing for the reconstruction of the social graph of active users. Each tweet record contains metadata such as tweet ID, creation date, source device, in-reply-to information, and the user’s follower and followee counts.

Web data: Amazon Fine Foods reviews

Creators: McAuley, Julian; Leskovec, Jure
Publication Date: 2012
Creators: McAuley, Julian; Leskovec, Jure

This dataset consists of reviews of fine foods from amazon. The data is 116 MB ins size and spans a period of more than 10 years, including all ~500,000 reviews up to October 2012. Each review includes detailed information such as the product’s unique identifier (ASIN), user ID, profile name, helpfulness rating, score, time of review (in Unix time), summary, and the full text of the review. This dataset is particularly valuable for analyzing consumer behavior, sentiment analysis, and the evolution of user expertise in online reviews.

The dataset is organized with each review capturing multiple attributes:

  • Product Information: Including the product’s unique identifier (ASIN).

  • User Information: Such as user ID and profile name.

  • Review Details: Encompassing helpfulness rating, score, time of review, summary, and the full text.

 

Web data: Amazon movie reviews

Creators: McAuley, Julian; Leskovec, Jure
Publication Date: 2012
Creators: McAuley, Julian; Leskovec, Jure

This dataset is a collection of approximately 8 million movie reviews from Amazon, spanning over a decade up to October 2012. It is particularly valuable for analyzing consumer behavior, sentiment analysis, and the evolution of user expertise in online reviews. In total, the dataset has a size of 3,1 GB. Each review includes detailed information such as the product’s unique identifier (ASIN), user ID, profile name, helpfulness rating, score, time of review (in Unix time), summary, and the full text of the review.

The dataset is organized with each review capturing multiple attributes:

  • Product Information: Including the product’s unique identifier (ASIN).

  • User Information: Such as user ID and profile name.

  • Review Details: Encompassing helpfulness rating, score, time of review, summary, and the full text.

 

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.