Showing 177-184 of 272 results

Facebook Social Connectedness Index

Creators: Meta
Publication Date: 2021
Creators: Meta

We use an anonymized snapshot of all active Facebook users and their friendship networks to measure the intensity of connectedness between locations. The Social Connectedness Index (SCI) is a measure of the social connectedness between different geographies. Specifically, it measures the relative probability that two individuals across two locations are friends with each other on Facebook. Each entry represents a pair of locations, detailing the strength of social connectedness between them. By doing so, the SCI provides a measure of the relative probability that two individuals from different locations are Facebook friends, offering insights into social ties across regions. The dataset has a a size of 3,9 kB and reflects a specific snapshot in time, with the latest available data from October 2021. The dataset is organized into multiple sub-datasets, each detailing social connectedness at different geographic levels:

  1. Country-Country Pairs:

    • user_loc: ISO2 code of the first country.

    • fr_loc: ISO2 code of the second country.

    • scaled_sci: Scaled Social Connectedness Index between the two countries.

  2. US County-Country Pairs:

    • user_loc: 5-digit FIPS code of the U.S. county.

    • fr_loc: ISO2 code of the country.

    • scaled_sci: Scaled Social Connectedness Index between the U.S. county and the country.

Creators: ProPublica

This free download is a database of more than 12,000 civilian complaints filed against New York City police officers. After New York state repealed the statute that kept police disciplinary records secret, known as 50-a, ProPublica filed a records request with New York City’s Civilian Complaint Review Board, which investigates complaints by the public about NYPD officers. The board provided us with records about closed cases for every police officer still on the force as of late June 2020 who had at least one substantiated allegation against them. The records span decades, from September 1985 to January 2020. Each entry includes specifics such as the nature of the allegation (e.g., use of force, abuse of authority), the outcome of the investigation, and any disciplinary actions taken. The dataset provides information on the officers involved, including their rank and assignment at the time of the complaint. Entries contain timestamps and locations of the alleged incidents, facilitating analyses of patterns over time and across different areas.​​

Structurally, the dataset contains the following information:

  • Complaint ID: A unique identifier for each complaint.

  • Date and Time: When the incident allegedly occurred.

  • Location: Where the incident took place.

  • Officer Details: Information about the officer(s) involved, such as badge number, rank, and assignment.

  • Allegation Details: Type of misconduct reported (e.g., excessive force, discourtesy).

  • Investigation Outcome: Findings of the investigation, including whether the allegation was substantiated, unsubstantiated, exonerated, or unfounded.

  • Disciplinary Action: Any penalties or corrective actions imposed following the investigation.

The greatest hip-hop songs off all time

Creators: BBC
Publication Date: 2019
Creators: BBC

BBC Music polled over 1001 critics in 15 countries to find the best hip-hop song ever. This repo contains poll data, originally published by BBC Music, as well as code for transforming the data, adding cover artwork, and publishing charts via Datawrapper. The poll data was extracted from this article on bbc.com: The greatest hip-hop songs of all time – who voted. It covered over 300 hip-hop songs, each representing an individual observation. The songs listed span from the late 1970s to 2019, covering the evolution of hip-hop over approximately four decades. In total, the dataset has a size of 60,8 kB. Structurally, it is built upon the following key variables:

  • Rank: The position of the song in the overall ranking.

  • Title: The name of the song.

  • Artist: The performing artist(s) of the song.

  • Year: The release year of the song.

  • Total Points: The cumulative points the song received based on the poll’s scoring system.

  • Number of Votes: The count of critics who included the song in their top five lists.

Snap Political Ads Library

Creators: Snapchat
Publication Date: 2024
Creators: Snapchat

The Political and Advocacy Ads Library is an important step in our efforts to increase the level of transparency around political and issue advertising on. The dataset includes advertisements from 2018 through 2025 with a total size of 538,8 kB. It offers detailed insights into political advertising on Snapchat, including:

  • Ad Content: Access to the creative content of each advertisement through unique URLs.

  • Financial Information: Details on the amount spent on each ad campaign, specified in the local currency.

  • Impressions: Data on the number of times each ad was viewed by Snapchat users.

  • Targeting Criteria: Information on the demographic and geographic targeting parameters used in each campaign.

These features enable researchers, policymakers, and the public to analyze the reach, expenditure, and strategies of political advertisers on Snapchat.

IMINTREG data

Creators: Zuber, Christina Isabel
Publication Date: 2011
Creators: Zuber, Christina Isabel

IMINTREG offers data on immigrant integration policies of Italian regions, Spanish autonomous communities and German Länder. You can download the original regional integration laws, as well as measures of in- vs. exclusiveness of regional integration policies based on my coding of the laws here. The dataset has a size of 84,2 kB and entails the following aspects:

  • Regional Integration Laws: The dataset includes the original texts of regional integration laws from the specified regions, allowing for direct analysis of legislative content.

  • Inclusiveness Measures: It provides coded measures assessing the inclusiveness of these policies, facilitating comparative studies across different regions.

  • Comparative Scope: By covering multiple countries with varying approaches to integration, the dataset offers a unique opportunity to study the effects of regional policies within different national contexts.

CVE List

Creators: CVE
Publication Date: 2024
Creators: CVE

The mission of the CVE® Program is to identify, define, and catalog publicly disclosed cybersecurity vulnerabilities identified from 1999 through June 25, 2024, providing a historical perspective on the evolution of cybersecurity threats over a span of 25 years. Its primary purpose is to standardize the identification of vulnerabilities across various platforms and security tools, facilitating consistent and efficient communication within the cybersecurity community. As of June 25, 2024, the CVE List comprises 269,759 records with a size of 36,3 kB.

Each CVE entry includes several key components:

  • CVE Identifier (CVE ID): A unique alphanumeric code assigned to each vulnerability, following the format “CVE-YYYY-NNNN,” where “YYYY” denotes the year of identification, and “NNNN” is a sequential number.

  • Description: A brief summary outlining the nature of the vulnerability, including affected software or hardware, potential impacts, and any known exploits.

  • References: Links to external resources such as security advisories, vendor bulletins, or detailed analyses that provide additional context or mitigation information.

Soccer Power Index (SPI) Ratings

Creators: FiveThirtyEight
Publication Date: 2022
Creators: FiveThirtyEight

This file contains links to the data behind our Club Soccer Predictions and Global Club Soccer Rankings. These data analyse soccer team performances worldwide,  providing valuable insights into team strengths and match outcomes.

The SPI database covers data up to the year 2022 with a total size of 52,6 kB and includes the following metrics:

  • SPI Rating: An overall measure of a team’s strength, combining offensive and defensive capabilities.

  • Offensive and Defensive Ratings: Separate evaluations of a team’s attacking and defensive proficiencies.

  • Match Probabilities: Predicted probabilities for home win, away win, and draw outcomes, offering insights into expected match results.

  • Projected Scores: Anticipated goal counts for both home and away teams, aiding in match analysis and forecasting.

 

Netflix Prize Data Set

Creators: Netflix
Publication Date: 2009
Creators: Netflix

This dataset was constructed to support participants in the Netflix Prize. See [Web Link] for details about the prize.

There are over 480,000 customers in the dataset, each identified by a unique integer id.

The title and release year for each movie is also provided. There are over 17,000 movies in the dataset, each identified by a unique integer id.

The dataset contains over 100 million ratings and has a size of 7,7 kB. The ratings were collected between October 1998 and December 2005 and reflect the distribution of all ratings received during this period. Each rating has a customer id, a movie id, the date of the rating, and the value of the rating.

As part of the original Netflix Prize a set of ratings was identified whose rating values were not provided in the original dataset. The object of the Prize was to accurately predict the ratings from this ‘qualifying’ set. These missing ratings are now available in the grand_prize.tar.gz dataset file.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.