Showing 1-8 of 18 results

Steam Video Game Database

Publication Date: 2023
Creators: Beliaev, Volodymyr

JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB.

The greatest hip-hop songs off all time

Publication Date: 2019
Creators: BBC

BBC Music polled over 1001 critics in 15 countries to find the best hip-hop song ever. This repo contains poll data, originally published by BBC Music, as well as code for transforming the data, adding cover artwork, and publishing charts via Datawrapper. The poll data was extracted from this article on bbc.com: The greatest hip-hop songs of all time – who voted

Netflix Prize Data Set

Publication Date: 2009
Creators: Netflix

This dataset was constructed to support participants in the Netflix Prize. See [Web Link] for details about the prize.

There are over 480,000 customers in the dataset, each identified by a unique integer id.

The title and release year for each movie is also provided. There are over 17,000 movies in the dataset, each identified by a unique integer id.

The dataset contains over 100 million ratings. The ratings were collected between October 1998 and December 2005 and reflect the distribution of all ratings received during this period. Each rating has a customer id, a movie id, the date of the rating, and the value of the rating.

As part of the original Netflix Prize a set of ratings was identified whose rating values were not provided in the original dataset. The object of the Prize was to accurately predict the ratings from this ‘qualifying’ set. These missing ratings are now available in the grand_prize.tar.gz dataset file.

One million comic books panel

Publication Date: 2016
Creators: Iyyer, Mohit; Manjunatha, Varun; Guha, Anupam; Vyas, Yogarshi; Boyd-Graber, Jordan; Daumé III, Hal; Davis, Larry
Visual narrative is often a combination of explicit information and judicious omissions, relying on the viewer to supply missing details. In comics, most movements in time and space are hidden in the “gutters” between panels. To follow the story, readers logically connect panels together by inferring unseen actions through a process called “closure”. While computers can now describe what is explicitly depicted in natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels. We construct a dataset, COMICS, that consists of over 1.2 million panels (120 GB) paired with automatic textbox transcriptions. An in-depth analysis of COMICS demonstrates that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot. We introduce three cloze-style tasks that ask models to predict narrative and character-centric aspects of a panel given n preceding panels as context. Various deep neural architectures underperform human baselines on these tasks, suggesting that COMICS contains fundamental challenges for both vision and language. 

Twitch Livestreaming Interactions

Publication Date: 2021
Creators: Rappaz, Jérémie; McAuley, Julian; Aberer, Karl

This is a dataset of users consuming streaming content on Twitch. We retrieved all streamers, and all users connected in their respective chats, every 10 minutes during 43 days.

The dataset Twich 100k includes:

  • Users: 100k
  • Streamers (items): 162.6k
  • Interactions: 3M
  • Time steps: 6148

Steam Video Game and Bundle Data

Publication Date: 2018
Creators: Kang, Wang-Cheng; McAuley, Julian; Pathak, Apurva; Gupta, Kshitiz
These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.The datasets include:

  • Reviews: 7,793,069
  • Users: 2,567,538
  • Items: 15,474
  • Bundles: 615

 

Goodreads-books

Publication Date: 2019
Creators: Zając, Zygmunt

The primary reason for creating this dataset is the requirement of a good clean dataset of books. It contains book names, authors, ratings and review counts. The data set is 1.56 MB large and was scraped via the Goodreads API

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.