online communities

Showing 1-3 of 3 results

Customer Support on Twitter

Creators: Axelbrooke, Stuart
Publication Date: 2017
Creators: Axelbrooke, Stuart

The Customer Support on Twitter dataset is a large, modern corpus of tweets and replies to aid innovation in natural language understanding and conversational models, and for study of modern customer support practices and impact. It is intended to facilitate advancements in natural language understanding and the development of conversational models. Compiled by Stuart Axelbrooke in 2017, this dataset encompasses tweets and replies from prominent companies such as Apple, Amazon, Uber, Delta, and Spotify. It provides valuable insights into contemporary customer support practices and their impact, making it an essential resource for researchers interested in automated response generation, sentiment analysis, and conversational flow modeling. The dataset is approximately 516.53 MB in size. It is designed for the analysis of conversation dynamics and contains several key attributes. Each tweet entry has a unique, anonymized tweet ID (tweet_id), an anonymized user ID (author_id), a timestamp (created_at), and the tweet text (text), where sensitive information such as phone numbers and email addresses has been masked to ensure privacy. It differentiates between inbound tweets (inbound), which are directed at companies by customers, and outbound tweets, which are responses from the companies. Additionally, in_response_to_tweet_id and response_tweet_id fields allow for the reconstruction of entire conversation threads by linking tweets to their respective responses.

Amazon product co-purchasing network metadata

Creators: Leskovec, Jure
Publication Date: 2006
Creators: Leskovec, Jure

The data was collected by crawling the Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes). It is valuable for analyzing product relationships, customer behavior, and the dynamics of product co-purchasing networks. For each product the following information is available:

Title
Salesrank
List of similar products (that get co-purchased with the current product)
Detailed product categorization
Product reviews: time, customer, rating, number of votes, number of people that found the review helpful.

The data was collected in summer 2006. It has a size of 201 MB and structured into:

  • Product Metadata: Information such as product ID, ASIN, title, group, sales rank, similar products, and categories.

  • Product Reviews: Details including review time, customer ID, rating, number of votes, and helpfulness votes.

Food.com Recipe & Review Data

Creators: Majumder, Bodhisattwa P.; Li, Shuyang; Ni, Jianmo; McAuley, Julian
Publication Date: 2019
Creators: Majumder, Bodhisattwa P.; Li, Shuyang; Ni, Jianmo; McAuley, Julian
This dataset consists of 180K+ recipes and 700K+ recipe reviews covering 18 years of user interactions and uploads on Food.com (formerly GeniusKitchen), an online recipe aggregator. This extensive collection allows for in-depth analysis of culinary trends, user preferences, and recipe characteristics over nearly two decades.The dataset is 0,85 GB in size and contains three sets of data from Food.com:Interaction splits

  • interactions_test.csv
  • interactions_validation.csv
  • interactions_train.csv

Preprocessed data for result reproduction

In this format, the recipe text metadata is tokenized via the GPT subword tokenizer with start-of-step, etc. tokens.

  • PP_recipes.csv
  • PP_users.csv

To convert these files into the pickle format required to run our code off-the-shelf, you may use pandas.read_csv and pandas.to_pickle to convert the CSV’s into the proper pickle format.

 

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.