Showing 1-8 of 12 results

Digital Twins Dataset

Creators: Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li and Haozhe Chen
Publication Date: 2025
Creators: Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li and Haozhe Chen

The Twin-2K-500 dataset is a publicly released, large-scale survey dataset designed to support the construction of digital twins of individuals.

  • It covers 2,058 U.S. participants who completed four waves of data collection.

  • Across the first three waves, each person responded to about 500 questions spanning a rich battery of measures: demographic variables, personality and psychological scales, cognitive performance tasks, economic preferences, behavioral experiments (heuristics & biases), and a pricing survey.

  • The fourth wave re-administered selected behavioral tasks (from earlier waves) to establish a test-retest baseline for assessing prediction fidelity.

  • On average, participants spent around 2.42 hours total responding across all waves.

  • The survey was implemented via Qualtrics; participants who completed all waves were compensated.

  • The structure of the dataset is organized into different “persona” representations (JSON/text) and wave splits for training/evaluation of models.

In sum: Twin-2K-500 provides richly annotated, multi-wave behavioral and psychological data on over two thousand individuals, enabling researchers to train and evaluate digital-twin models that predict human responses across domains.

Advertising and Media Engagement in the United States

Creators: Matthew Gentzkow, Stanford University and NBER; Jesse Shapiro, Harvard University and NBER; Frank Yang, Stanford University; Ali Yurukoglu, Stanford University and NBER
Publication Date: 2023-11-15
Creators: Matthew Gentzkow, Stanford University and NBER; Jesse Shapiro, Harvard University and NBER; Frank Yang, Stanford University; Ali Yurukoglu, Stanford University and NBER

This collection gathers datasets from advertising platforms, media tracking tools, and auxiliary geographic sources to analyze digital and traditional advertising dynamics in the U.S. It includes detailed Facebook ad campaign records, performance metrics across timeframes, and datasets capturing the effects of ad exposure on consumer behavior. Supplementary files offer geographic and temporal context via zip code–timezone mappings, enabling precise localization and segmentation. Together, these datasets support in-depth analysis of advertising effectiveness, audience targeting strategies, campaign timing, and media consumption patterns across diverse regions and platforms.

Instagram Data

Creators: Thales Bertaglia
Publication Date: 2023
Creators: Thales Bertaglia

The dataset can be self-created by the user by following the main script to collect and process data from Instagram using the CrowdTangle API. An exemplary sample of the data is attached.

The Economist Historical Advertisements - Master Dataset

Creators: Kluge, Stefan; Gehrmann, Leonie; Stahl, Florian
Publication Date: 2023
Creators: Kluge, Stefan; Gehrmann, Leonie; Stahl, Florian

This dataset contains metadata of 512.599 historical advertisements from all 8,840 issues of The Economist magazine, years 1843 to 2014. It is part of a series of datasets related to The Economist Historical Archive (https://www.gale.com/intl/c/the-economist-historical-archive). You will need this Master Dataset, if you want to work with any of the related datasets. Each advertisement entry includes various metadata fields such as publication date, issue number, page number, and advertisement dimensions. This structured information enables detailed analyses of trends and patterns within the advertising practices over time. In total, the dataset has a size of 195,4 MB.

The Economist Historical Advertisements - Faces Dataset

Creators: Kluge, Stefan
Publication Date: 2023
Creators: Kluge, Stefan

This dataset contains 116.746 identified faces (bounding box location on image, predicted age and gender) for all historical advertisements from all 8,840 issues of The Economist magazine, years 1843 to 2014. Faces have been detected using the following library:  https://pythonrepo.com/repo/timesler-facenet-pytorch-python-deep-learning. You will need the The Economist Historical Advertisements – Master Dataset as well, to work with the data. In total, the dataset has a size of 20,2 MB and is organized as follows:

  • Filename: A unique identifier corresponding to each advertisement where a face has been detected. This identifier links directly to the specific advertisement within The Economist archives.
  • Bounding Box Coordinates:

    • Bounding Box relative X1 and Y1: These values represent the top-left corner coordinates of the bounding box encapsulating the detected face, expressed as proportions relative to the image dimensions.
    • Bounding Box relative X2 and Y2: These values denote the bottom-right corner coordinates of the bounding box, also as relative proportions. To determine the absolute pixel coordinates, multiply these relative values by the image’s width and height, respectively.
  • Segmentation Confidence Score: A numerical value indicating the confidence level of the neural network algorithm that the identified bounding box indeed contains a face. Higher scores reflect greater confidence in accurate face detection.

  • Size Relative: A metric indicating the proportion of the advertisement occupied by the detected face. For example, a value of 1 signifies that the face covers the entire advertisement, while 0.5 indicates it covers half.

  • Predicted Age: An estimated age of the individual based on facial analysis performed by the detection algorithm.

  • Gender Probability: A probability score representing the likelihood of the detected face being female. A value of 0 indicates male, 1 indicates female, and intermediate values (e.g., 0.4) suggest a 40% likelihood of the individual being female

The Economist Historical Advertisements - Objects Dataset

Creators: Kluge, Stefan
Publication Date: 2023
Creators: Kluge, Stefan

This dataset contains 191.994 identified object locations and classes for all historical advertisements from all 8,840 issues of The Economist magazine, years 1843 to 2014. We used a state of the art classifier to detect the objects: https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1. You will need the The Economist Historical Advertisements – Master Dataset, as well, to work with the data. The dataset has a size of 29,8 MB.

Creators: Kluge, Stefan
This dataset is a specialized collection of metadata from advertisements related to the banking industry, extracted from The Economist magazine issues spanning 1843 to 2014. It contains metadata of 92,592 historical advertisements from the banking industry, from all 8,840 issues of The Economist magazine, years 1843 to 2014. It is part of a series of  datasets related to The Economist Historical Archive (https://www.gale.com/intl/c/the-economist-historical-archive). In total, the dataset has a size of 136,0 MB. Each advertisement entry includes various metadata fields such as publication date, issue number, page number, and advertisement dimensions. This structured information enables detailed analyses of trends and patterns within the banking industry’s advertising practices.

Super Bowl Ads

Creators: Superbowl Ads; FiveThirtyEight
Publication Date: 2021
Creators: Superbowl Ads; FiveThirtyEight

This dataset contains a list of ads from the 10 brands that had the most advertisements in Super Bowls from 2000 to 2020, according to data from superbowl-ads.com, with matching videos found on YouTube. Each advertisement is evaluated across seven defining characteristics: humor, early product display, patriotism, celebrity presence, danger elements, inclusion of animals, and use of sexual content. This granular assessment allows for in-depth analysis of advertising strategies. Furthermore, links to corresponding YouTube videos are included, facilitating immediate access to the commercials for further qualitative analysis.There are 233 advertisements documented in the dataset, spanning from 2000 to 2020 with a total size of 38,6 kB. Structurally, the dataset is organized as a CSV file with the following columns:

  • year: Year the advertisement aired.

  • brand: Brand of the advertiser, standardized to account for variations and sub-brands.

  • superbowl_ads_dot_com_url: Link to the advertisement’s entry on superbowl-ads.com.

  • youtube_url: Link to the corresponding YouTube video of the advertisement.

  • funny: Indicates if the ad was intended to be humorous (TRUE/FALSE).

  • show_product_quickly: Indicates if the product was shown within the first 10 seconds (TRUE/FALSE).

  • patriotic: Indicates if the ad had patriotic elements (TRUE/FALSE).

  • celebrity: Indicates if a celebrity appeared in the ad (TRUE/FALSE).

  • danger: Indicates if the ad involved elements of danger (TRUE/FALSE).

  • animals: Indicates if animals were featured in the ad (TRUE/FALSE).

  • use_sex: Indicates if sexual content was used to promote the product (TRUE/FALSE).

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.