linguistics

Showing 1-2 of 2 results

Goodreads-books

Creators: Zając, Zygmunt
Publication Date: 2019
Creators: Zając, Zygmunt

The primary reason for creating this dataset is the requirement of a good clean dataset of books. It contains important features such as book titles, authors, average ratings, ISBN identifiers, language codes, number of pages, ratings count, text reviews count, publication dates, and publishers. A distinctive aspect of this dataset is its ability to support a wide range of book-related analyses, such as trends in book popularity, author influence, and reader preferences. The data set is 1.56 MB large and was scraped via the Goodreads API. It encompasses over 10,000 observations, each representing a unique book entry with multiple attributes. The structure of the dataset is straightforward, consisting of a single CSV file with the following key columns:

  • bookID: A unique identification number for each book.
  • title: The official title of the book.
  • authors: Names of the authors, with multiple authors separated by a delimiter.
  • average_rating: The average user rating for the book.
  • isbn & isbn13: The 10-digit and 13-digit International Standard Book Numbers, respectively.
  • language_code: The primary language in which the book is published (e.g., ‘eng’ for English).
  • num_pages: The total number of pages in the book.
  • ratings_count: The total number of ratings the book has received from users.
  • text_reviews_count: The total number of text reviews written by users.
  • publication_date: The original publication date of the book.
  • publisher: The name of the publishing house.

Twitter US Airline Sentiment

Creators: (Makone, Ashutosh)
Publication Date: 2016
Creators: (Makone, Ashutosh)

The Twitter US Airline Sentiment dataset is a collection of tweets aimed at analyzing public sentiment toward major U.S. airlines. Compiled in February 2015, the dataset consists of 14,640 tweets directed at several U.S. airlines. It serves as a valuable resource for sentiment analysis and natural language processing research, particularly in understanding customer satisfaction, airline service quality, and issues reported by travelers. Each tweet in the dataset is labeled with one of three sentiment categories: positive, neutral, or negative. Tweets labeled as negative are further categorized into specific negative sentiment reasons, such as late flight, customer service issue, canceled flight, and lost luggage, providing deeper insights into common complaints. The dataset also identifies the airline mentioned in each tweet, covering six major U.S. carriers: United Airlines, US Airways, American Airlines, Southwest Airlines, Delta Air Lines, and Virgin America. Additional metadata is provided for each tweet, including tweet ID, tweet text, tweet coordinates (if available), user information, and location data, allowing for further contextual analysis. The dataset is relatively small, with a total size of 8,46 MB, making it easily manageable for sentiment analysis tasks and machine learning applications. It includes 14,640 tweets from 7,700 unique users, providing a broad yet concise representation of customer interactions with airlines on Twitter. The tweets were collected over a one-month period in February 2015, offering a snapshot of public sentiment during that specific timeframe.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.