Digital Twins Dataset
The Twin-2K-500 dataset is a publicly released, large-scale survey dataset designed to support the construction of digital twins of individuals.
-
It covers 2,058 U.S. participants who completed four waves of data collection.
-
Across the first three waves, each person responded to about 500 questions spanning a rich battery of measures: demographic variables, personality and psychological scales, cognitive performance tasks, economic preferences, behavioral experiments (heuristics & biases), and a pricing survey.
-
The fourth wave re-administered selected behavioral tasks (from earlier waves) to establish a test-retest baseline for assessing prediction fidelity.
-
On average, participants spent around 2.42 hours total responding across all waves.
-
The survey was implemented via Qualtrics; participants who completed all waves were compensated.
-
The structure of the dataset is organized into different “persona” representations (JSON/text) and wave splits for training/evaluation of models.
In sum: Twin-2K-500 provides richly annotated, multi-wave behavioral and psychological data on over two thousand individuals, enabling researchers to train and evaluate digital-twin models that predict human responses across domains.

