Showing 169-176 of 272 results

The Upworthy Research Archive

Creators: The Upworthy Research Archive
Publication Date: 2019
Creators: The Upworthy Research Archive

The Upworthy Research Archive is an open dataset of thousands of A/B tests of headlines conducted by Upworthy from January 2013 to April 2015. This repository includes the full data from the archive. The dataset’s size is approximately 149,7 MB. It includes 32,488 records of headline experiments, providing insights into how different headline variations impacted user engagement. The dataset is structured as a time series of experiments, with each record detailing the performance metrics of different headline variations. This structure enables researchers to analyze the effectiveness of various headlines and understand user engagement patterns over time.

Steam Video Game Database

Creators: Beliaev, Volodymyr
Publication Date: 2023
Creators: Beliaev, Volodymyr

This dataset aggregates information on all games available on the Steam platform, enriched with additional data from sources like Steam Spy, GameFAQs, Metacritic, IGDB, and HowLongToBeat (HLTB). It is particularly valuable for researchers, developers, and enthusiasts interested in analyzing various aspects of video games, such as pricing, ratings, and gameplay duration. Each entry provides detailed data, including game identifiers, store URLs, promotional content, user scores, release dates, descriptions, pricing, supported platforms, developers, publishers, available languages, genres, tags, and achievements. The dataset reflects the state of the Steam catalog as of 2023 and has a size of 7,1 kB.

Large-scale CelebFaces Attributes (CelebA) Dataset

Creators: Liu, Ziwei; Luo, Ping; Wang, Xiaogang; Tang, Xiaoou
Publication Date: 2015
Creators: Liu, Ziwei; Luo, Ping; Wang, Xiaogang; Tang, Xiaoou

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including: 10,177 number of identities, 202,599 number of face images, and 5 landmark locations, 40 binary attributes annotations per image. Each image in the dataset captures various facial features and accessories, such as eyeglasses, smiling, or bangs. Additionally, five landmark points (e.g., eyes, nose, mouth corners) are provided per image, facilitating tasks like facial alignment. Also, a wide range of poses, expressions, and occlusions are included, reflecting real-world conditions and enhancing the robustness of models trained on this data. The dataset has a size of 25,3 kB and is organized in theree main components:

  • Images:

    • In-the-Wild Images: Original images depicting celebrities in various environments and conditions.

    • Aligned and Cropped Images: Faces have been aligned and cropped to a consistent size, facilitating standardized analysis.

  • Annotations:

    • Landmark Locations: Coordinates for five key facial points (left eye, right eye, nose, left mouth corner, right mouth corner) per image.

    • Attribute Labels: Binary labels indicating the presence or absence of 40 distinct facial attributes for each image.

    • Identity Labels: Each image is associated with an identity label, linking it to one of the 10,177 unique individuals.

  • Evaluation Partitions:

    • The dataset is divided into training, validation, and test sets, enabling standardized evaluation of algorithms.

IRS E-File Bucket

Creators: Internal Revenue Service
Publication Date: 2016
Creators: Internal Revenue Service

This bucket contains a mirror of the IRS e-file release as of December 31, 2016, which are annual information returns submitted by tax-exempt organizations in the United States. The data helps to understand the financial and operational aspects of nonprofit organizations. Each Form 990 provides insights into an organization’s mission, programs, and governance structures.​ The forms include detailed financial data, such as revenues, expenses, assets, and liabilities, offering a clear view of an organization’s financial health. As mandated by law, these forms are publicly accessible, promoting transparency and allowing stakeholders to make informed decision. In total, the dataset has a size of 5,3 kB and is divided into individual Form 990 filings, each corresponding to a specific tax-exempt organization. Each filing includes:

  • Organizational Details: Name, Employer Identification Number (EIN), address, and mission statement.

  • Financial Information: Detailed breakdowns of revenues (e.g., contributions, grants, program service revenue), expenses (e.g., salaries, grants, operational costs), assets, and liabilities.

  • Governance and Compliance: Information on board members, key employees, governance policies, and compliance with tax regulations.

Computer generated building footprints for the United States

Creators: Microsoft Bing Maps Team
Publication Date: 2018
Creators: Microsoft Bing Maps Team

Microsoft Maps is releasing country wide open building footprints datasets in United States. This dataset contains 129,591,852 computer generated building footprints derived using our computer vision algorithms on satellite imagery. Building footprints were extracted using deep neural networks for semantic segmentation, followed by polygonization to convert detected building pixels into vector shapes. This data is freely available for download and use. The dataset is organized by U.S. state and provided in GeoJSON format. Each GeoJSON file contains polygon geometries representing building footprints, accompanied by metadata such as the capture date of the underlying imagery. Notably, footprints within specific regions are based on imagery from 2019-2020, accounting for approximately 73,250,745 buildings.

3 Million Russian troll tweets

Creators: FiveThirtyEight; Warren, Patrick ;Linvill, Darren
Publication Date: 2018
Creators: FiveThirtyEight; Warren, Patrick ;Linvill, Darren

This directory contains data on nearly 3 million tweets sent from Twitter handles connected to the Internet Research Agency, a Russian “troll factory” and a defendant in an indictment filed by the Justice Department in February 2018, as part of special counsel Robert Mueller’s Russia investigation. The tweets in this database were sent between February 2012 and May 2018, with the vast majority posted from 2015 through 2017. Each entry includes detailed information such as the tweet’s content, author handle, language, publication date, and engagement metrics (e.g., number of followers, following count). The dataset provides classifications for each account, indicating the thematic focus (e.g., Right Troll, Left Troll, News Feed), as coded by researchers Darren Linvill and Patrick Warren.​ It has a total size of 507,2 kB.

Creators: ProPublica

This free download is a database of more than 12,000 civilian complaints filed against New York City police officers. After New York state repealed the statute that kept police disciplinary records secret, known as 50-a, ProPublica filed a records request with New York City’s Civilian Complaint Review Board, which investigates complaints by the public about NYPD officers. The board provided us with records about closed cases for every police officer still on the force as of late June 2020 who had at least one substantiated allegation against them. The records span decades, from September 1985 to January 2020. Each entry includes specifics such as the nature of the allegation (e.g., use of force, abuse of authority), the outcome of the investigation, and any disciplinary actions taken. The dataset provides information on the officers involved, including their rank and assignment at the time of the complaint. Entries contain timestamps and locations of the alleged incidents, facilitating analyses of patterns over time and across different areas.​​

Structurally, the dataset contains the following information:

  • Complaint ID: A unique identifier for each complaint.

  • Date and Time: When the incident allegedly occurred.

  • Location: Where the incident took place.

  • Officer Details: Information about the officer(s) involved, such as badge number, rank, and assignment.

  • Allegation Details: Type of misconduct reported (e.g., excessive force, discourtesy).

  • Investigation Outcome: Findings of the investigation, including whether the allegation was substantiated, unsubstantiated, exonerated, or unfounded.

  • Disciplinary Action: Any penalties or corrective actions imposed following the investigation.

Facebook Social Connectedness Index

Creators: Meta
Publication Date: 2021
Creators: Meta

We use an anonymized snapshot of all active Facebook users and their friendship networks to measure the intensity of connectedness between locations. The Social Connectedness Index (SCI) is a measure of the social connectedness between different geographies. Specifically, it measures the relative probability that two individuals across two locations are friends with each other on Facebook. Each entry represents a pair of locations, detailing the strength of social connectedness between them. By doing so, the SCI provides a measure of the relative probability that two individuals from different locations are Facebook friends, offering insights into social ties across regions. The dataset has a a size of 3,9 kB and reflects a specific snapshot in time, with the latest available data from October 2021. The dataset is organized into multiple sub-datasets, each detailing social connectedness at different geographic levels:

  1. Country-Country Pairs:

    • user_loc: ISO2 code of the first country.

    • fr_loc: ISO2 code of the second country.

    • scaled_sci: Scaled Social Connectedness Index between the two countries.

  2. US County-Country Pairs:

    • user_loc: 5-digit FIPS code of the U.S. county.

    • fr_loc: ISO2 code of the country.

    • scaled_sci: Scaled Social Connectedness Index between the U.S. county and the country.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.