Showing 1-8 of 18 results

Steam Video Game Database

Publication Date: 2023
Creators: Beliaev, Volodymyr

JSON file of all games available on Steam with prices and additional data from Steam Spy, GameFAQs, Metacritic, IGDB and HLTB.

The greatest hip-hop songs off all time

Publication Date: 2019
Creators: BBC

BBC Music polled over 1001 critics in 15 countries to find the best hip-hop song ever. This repo contains poll data, originally published by BBC Music, as well as code for transforming the data, adding cover artwork, and publishing charts via Datawrapper. The poll data was extracted from this article on The greatest hip-hop songs of all time – who voted

Netflix Prize Data Set

Publication Date: 2009
Creators: Netflix

This is the official data set used in the Netflix Prize competition. The data consists of about 100 million movie ratings, and the goal is to predict missing entries in the movie-user rating matrix

One million comic books panel

Publication Date: 2016
Creators: Iyyer, Mohit; Manjunatha, Varun; Guha, Anupam; Vyas, Yogarshi; Boyd-Graber, Jordan; Daumé III, Hal; Davis, Larry

Comic books make use of white space — or gutters — to propel the story forward, relying on readers’ intuitive ability to fill in the gaps between panels. To see whether computers could learn to make the same inferences, a group of computer scientists built a giant corpus of public-domain comics and tried training a series of neural networks on it. (Spoiler: Humans are much better at this.) The underlying dataset contains 1.2 million panels from nearly 200,000 scanned pages of nearly 4,000 books in the Digital Comic Museum, all published during the 1938–1954 “Golden Age” of American comics. It also contains 2.5 million chunks of text extracted from the comics’ speech balloons, thought bubbles, and narration boxes. [h/t Robin Sloan]

Twitch Livestreaming Interactions

Publication Date: 2021
Creators: Rappaz, Jérémie; McAuley, Julian; Aberer, Karl

This is a dataset of users consuming streaming content on Twitch. We retrieved all streamers, and all users connected in their respective chats, every 10 minutes during 43 days.

Steam Video Game and Bundle Data

Publication Date: 2018
Creators: Kang, Wang-Cheng; McAuley, Julian; Pathak, Apurva; Gupta, Kshitiz

These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.


Publication Date: 2019
Creators: Zając, Zygmunt

The primary reason for creating this dataset is the requirement of a good clean dataset of books. It contains book names, authors, ratings and review counts-

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.