Digital Twins Dataset

Creators:
Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li and Haozhe Chen
Publication Date:
2025
Data Category:
Dataset Description:

The Twin-2K-500 dataset is a publicly released, large-scale survey dataset designed to support the construction of digital twins of individuals.

  • It covers 2,058 U.S. participants who completed four waves of data collection.

  • Across the first three waves, each person responded to about 500 questions spanning a rich battery of measures: demographic variables, personality and psychological scales, cognitive performance tasks, economic preferences, behavioral experiments (heuristics & biases), and a pricing survey.

  • The fourth wave re-administered selected behavioral tasks (from earlier waves) to establish a test-retest baseline for assessing prediction fidelity.

  • On average, participants spent around 2.42 hours total responding across all waves.

  • The survey was implemented via Qualtrics; participants who completed all waves were compensated.

  • The structure of the dataset is organized into different “persona” representations (JSON/text) and wave splits for training/evaluation of models.

In sum: Twin-2K-500 provides richly annotated, multi-wave behavioral and psychological data on over two thousand individuals, enabling researchers to train and evaluate digital-twin models that predict human responses across domains.

Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.