Showing 9-16 of 34 results

Facebook Social Connectedness Index

Publication Date: 2021
Creators: Meta

We use an anonymized snapshot of all active Facebook users and their friendship networks to measure the intensity of connectedness between locations. The Social Connectedness Index (SCI) is a measure of the social connectedness between different geographies. Specifically, it measures the relative probability that two individuals across two locations are friends with each other on Facebook.

Third Eye Data: TV News Archive chyrons

Publication Date: 2017
Creators: TV News Archive

The TV News Archive’s Third Eye project captures the chyrons–or narrative text–that appear on the lower third of TV news screens and turns them into downloadable data and a Twitter feed for research, journalism, online tools, and other projects. At project launch (September 2017) we are collecting chyrons from BBC News, CNN, Fox News, and MSNBC–more than four million collected over just two weeks.

Social Recommendation Data

Publication Date: 2017
Creators: Cai, Chenwei; He, Ruining; McAuley, Julian; Zhao, Tong; King, Irwin

These datasets include ratings as well as social (or trust) relationships between users. Data are from LibraryThing (a book review website) and epinions (general consumer reviews).

Pinterest Fashion Compatibility

Publication Date: 2019
Creators: Kang, Wang-Cheng; Kim, Eric; Leskovec, Jure; Rosenberg, Charles; McAuley, Julian

This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

The dataset is about 29 MB large and includes:

  • Scenes: 47,739
  • Products: 38,111
  • Scene-Product Pairs: 93,274

Behance Community Art Data

Publication Date: 2016
Creators: He, Ruining; Fang, Chen; Wang, Zhaowen; McAuley, Julian

Likes and image data from the community art website Behance. This is a small, anonymized, version of a larger proprietary dataset.

The dataset is about 3.5 GB large and includes:

  • Users: 63,497
  • Items: 178,788
  • Appreciates (“likes”): 1,000,000

Multi-aspect Reviews

Publication Date: 2013
Creators: Julian McAuley; Jure Leskovec; Dan Jurafsky
These datasets include reviews with multiple rated dimensions. The most comprehensive of these are beer review datasets from Ratebeer and Beeradvocate, which include sensory aspects such as taste, look, feel, and smell. The data set is about 1 GB large.Ratebeer:

  • Number of users: 40,213
  • Number of items: 110,419
  • Number of ratings/reviews: 2,855,232
  • Timespan: April, 2000 – November, 2011


  • Number of users: 33,387
  • Number of items: 66,051
  • Number of ratings/reviews: 1,586,259
  • Timespan: January, 1998 – November, 2011


Facebook URL Shares

Publication Date: 2018
Creators: Solomon Messing; Bogdan State; Chaya Nayak; Gary King; Nate Persily

The data describes web page addresses (URLs) that have been shared on Facebook starting January 1, 2017 and ending about a month before the present day. URLs are included if shared by at least 20 unique accounts, and shared publicly at least once. We estimate the full data set will contain on the order of 2 million unique urls shared in 300 million posts, per week.

COVID-19 Twitter Chatter Dataset

Publication Date: 2024
Creators: Banda, Juan M.; Tekumalla, Ramya; Wang, Guanyu; Yu, Jingyuan; Liu, Tuo; Ding, Yuning; Artemova, Katya; Tutubalina, Elena; Chowell, Gerardo

Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The dataset is 14.2 GB large.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.