Facebook

Showing 1-6 of 6 results

CrowdTangle Platform and API

Creators: Garmur, Matt; King, Gary; Mukerjee, Zagreb; Persily, Nate; Silverman, Brandon
Publication Date: 2019
Creators: Garmur, Matt; King, Gary; Mukerjee, Zagreb; Persily, Nate; Silverman, Brandon

This document describes the CrowdTangle API and user interface being provided to researchers
by Social Science One under its collaboration framework with Facebook. CrowdTangle is a
content discovery and analytics platform designed to give content creators the data and insights
they need to succeed. This dataset enables users to monitor public content interactions, track trends, and identify influential accounts. The CrowdTangle API surfaces stories, and data to measure their social performance and identify influencers. This codebook describes the data’s scope, structure, and fields.

CrowdTangle’s dataset offers insights into public posts made by pages, groups, or verified profiles that have either surpassed 100,000 likes since 2014 or have been tracked by any active API user. The dataset includes all public posts from pages, groups, or verified profiles meeting the aforementioned criteria since 2014.

Key features include:

  • Content Discovery: Access to real-time data on trending posts, facilitating the identification of viral content and emerging topics.

  • Performance Analytics: Metrics such as likes, shares, comments, and interaction rates, allowing for the assessment of content engagement.

  • Influencer Identification: Tools to pinpoint accounts with significant influence within specific niches or broader audiences.

Facebook URL Shares

Creators: Solomon Messing; Bogdan State; Chaya Nayak; Gary King; Nate Persily
Publication Date: 2018
Creators: Solomon Messing; Bogdan State; Chaya Nayak; Gary King; Nate Persily

The data describes web page addresses (URLs) that have been shared on Facebook starting January 1, 2017 and ending about a month before the present day. URLs are included if shared by at least 20 unique accounts, and shared publicly at least once. We estimate the full data set will contain on the order of 2 million unique urls shared in 300 million posts, per week. By doing so, this dataset provides insights into the dissemination of web content on Facebook, capturing the dynamics of how information spreads across the platform. Researchers can use this data to explore patterns in user engagement, the virality of content, and the reach of various web pages within the Facebook ecosystem. The dataset’s focus on URLs shared by a minimum number of unique accounts ensures that the data represents content with a certain level of engagement, filtering out less significant shares.

The dataset is structured to include the following key components:

  • URL Information: Each entry includes the web page address (URL) that was shared on Facebook.

  • Share Metrics: Data on the number of times each URL was shared, including the count of unique accounts that shared it and the total number of posts containing the URL.

  • Engagement Metrics: Information on user interactions with the shared URLs, such as likes, comments, and shares.

Facebook Ad Library

Creators: Franklin Fowler, Erika; Franz, Mike; King, Gary; Martin, Greg; Mukerjee, Zagreb; Persily, Nate
Publication Date: 2019
Creators: Franklin Fowler, Erika; Franz, Mike; King, Gary; Martin, Greg; Mukerjee, Zagreb; Persily, Nate

The Ad Library API provides programmatic access to the Facebook Ad Library, a collection of all political advertisements run on Facebook and Instagram since May 2018 in the US, and for other dates in different countries. The codebook describes the scope, structure, and fields of these data. The Ad Library offers detailed information about each advertisement, including:

  • Ad Creative: Visual and textual content of the ad.

  • Impressions: Number of times the ad was displayed.

  • Spend: Estimated amount spent on the ad.

  • Demographics: Age, gender, and location breakdown of the audience reached.

Given that the Ad Library archives all ads related to political content, social issues, and elections since May 2018, the number of observations runs into the millions. The Ad Library’s data is structured to include various attributes for each advertisement:

  • Ad ID: Unique identifier for each ad.

  • Page ID and Name: Information about the page running the ad.

  • Ad Creative: Content and format of the ad.

  • Impressions and Spend: Metrics indicating the ad’s reach and budget.

  • Demographic Distribution: Breakdown of the audience by age, gender, and location.

Facebook Privacy-Protected Full URLs Data Set

Creators: Messing, Solomon; DeGregorio, Christina; Hillenbrand, Bennett; King, Gary; Mahanti, Saurav; Mukerjee, Zagreb; Nayak, Chaya; Persily, Nate; State, Bogdan; Wilkins, Arjun
Publication Date: 2020
Creators: Messing, Solomon; DeGregorio, Christina; Hillenbrand, Bennett; King, Gary; Mahanti, Saurav; Mukerjee, Zagreb; Nayak, Chaya; Persily, Nate; State, Bogdan; Wilkins, Arjun

This is a codebook for data on the demographics of people who viewed, shared, and otherwise interacted with web pages (URLs) shared on Facebook, between January 1, 2017 and October 31, 2022. The data has about 68 million URLs, over 3.1 trillion rows, and over 71 trillion cell values. It results from a collaboration between Facebook and Social Science One (at IQSS at Harvard), originally prepared for Social Science One grantees and describes the “full” URLs dataset, including its scope, structure, and fields. This is version 10 of the codebook and data (released 4/13/2023), first described by Gary King and Nathaniel Persily at https://socialscience.one/blog/update-social-science-one. The dataset’s structure is organized to facilitate detailed analysis. Each entry corresponds to a unique URL and includes aggregated user interaction metrics. These metrics are further broken down by various demographic dimensions, such as age, gender, and country. For users in the United States, additional categorizations include political page affinity, offering insights into how different political leanings may influence content engagement.

Graph Embedding with Self Clustering: Facebook; GEMSEC

Creators: Rozemberczki, Benedek; Davies, Ryan; Sarkar, Rik; Sutton, Charles
Publication Date: 2019
Creators: Rozemberczki, Benedek; Davies, Ryan; Sarkar, Rik; Sutton, Charles

We collected data about Facebook pages (November 2017). These datasets represent blue verified Facebook page networks across eight distinct categories: Government, News Sites, Athletes, Public Figures, TV Shows, Politicians, Artists, and Companies. In this dataset, nodes represent individual Facebook pages, and edges denote mutual likes between these pages, reflecting the interconnectedness within and between different interest groups.  We reindexed the nodes in order to achieve a certain level of anonymity. The csv files contain the edges — nodes are indexed from 0. We included 8 different distinct types of pages. For each dataset we listed the number of nodes an edges. The dataset’s size varies by category, with the largest subset (Artists) containing 50,515 nodes and 819,306 edges, and the smallest subset (TV Shows) comprising 3,892 nodes and 17,262 edges. In total, the dataset has a size of 0,005 GB and encompasses 134,833 nodes and 1,380,293 edges, offering a rich source for analyzing the structure and dynamics of Facebook page interactions. Structurally, the dataset is divided into eight sub-datasets, each corresponding to a specific category of Facebook pages.

Social circles: Facebook

Creators: McAuley, Julian; Leskovec, Jure
Publication Date: 2012
Creators: McAuley, Julian; Leskovec, Jure

This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks, offering valuable insights into the structure and characteristics of social connections. The dataset includes 4,039 nodes and 88,234 edges. Facebook data has been anonymized by replacing the Facebook-internal ids for each user with a new value. The dataset is approximately 0.01 GB in size.

The dataset is structured into:

  • Node features (profiles): Anonymized user profile information, where specific attributes (e.g., political affiliation) are replaced with generic labels (e.g., ‘anonymized feature 1’).

  • Circles: Lists of friends grouped by users, representing their social circles.

  • Ego networks: Subgraphs centered around individual users (egos), including their direct friends and the connections among those friends.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.