Showing 25-32 of 262 results

Food Consumption, Prices, and Nutrient Characteristics

Creators: Jean‐Jacques Forneron
Publication Date: 2023-05-01
Creators: Jean‐Jacques Forneron

This collection includes structured data on food item consumption quantities, associated prices, and nutritional content. Variables cover a wide range of food categories and nutrients, enabling analysis of consumption patterns, relative pricing, and dietary composition. Additional metadata clarifies food group classifications and measurement units to support consistent interpretation. The data structure allows for temporal and comparative analysis across items, suitable for applications in food economics, nutrition studies, and price analysis.

Macroeconomic Time Series for Policy Shock Analysis

Creators: Alisdair McKay, Christian K. Wolf
Publication Date: 2023-09-01
Creators: Alisdair McKay, Christian K. Wolf

This dataset provides macroeconomic time-series variables prepared for structural vector autoregression (VAR) analysis. It includes indicators such as real output, inflation, interest rates, and monetary policy shocks. The variables are formatted to support empirical evaluation of policy scenarios through VAR-based counterfactual exercises. Metadata from the accompanying documentation clarifies variable transformations and modeling roles within the policy identification framework.

Digital Twin Dataset

Creators: Olivier Toubia , George Z. Gui , Tianyi Peng , Daniel J. Merlau , Ang Li , Haozhe Chen
Publication Date: 2025-08-20
Creators: Olivier Toubia , George Z. Gui , Tianyi Peng , Daniel J. Merlau , Ang Li , Haozhe Chen

This dataset Twin-2K-500 contains comprehensive persona information from a representative sample of 2,058 US participants, providing rich demographic and psychological data. The dataset is specifically designed for building digital twins for LLM simulations.

Dataset Structure and Format

Twin-2K-500 Dataset is organized into three folders, each with its specific format and purpose:

1. Full Persona Folder

This folder contains complete persona information for each participant. The data is split into chunks for easier processing:

  • pid: Participant ID
  • persona_text: Complete survey responses in text format, including all questions and answers. For questions that appear in both waves 1-3 and wave 4, the wave 4 responses are used.
  • persona_summary: A concise summary of each participant’s key characteristics and responses, designed to provide a quick overview without needing to process the full survey data. This summary captures the essential traits and patterns in the participant’s responses.
  • persona_json: Complete survey responses in JSON format, following the same structure as persona_text. The JSON file is useful if a subset of questions wanted to be excluded or revised.

2. Wave Split Folder

This folder is designed for testing and evaluating different LLM persona creation methodologies (from prompt engineering to RAG, fine-tuning, and RLHF):

  • pid: Participant ID
  • wave1_3_persona_text: Persona information from waves 1-3 in text format, including questions that did not appear in wave 4. This can be used as training data for creating personas.
  • wave1_3_persona_json: Persona information from waves 1-3 in JSON format, following the same structure as wave1_3_persona_text.
  • wave4_Q_wave1_3_A: Wave 4 questions with answers from waves 1-3, useful for human test-retest evaluation.
  • wave4_Q_wave4_A: Wave 4 questions with their actual answers from wave 4, serving as ground truth for evaluating persona prediction accuracy.

The Wave Split Folder is particularly useful for:

  • Training persona creation models using wave1-3 data
  • Evaluating how well the created personas can predict wave 4 responses
  • Comparing different LLM-based approaches (prompt engineering, RAG, fine-tuning, RLHF) for persona creation
  • Testing the reliability and consistency of persona predictions across different time periods

3. Raw Data Folder

This folder provides access to the raw survey response files from Qualtrics, after anonymization and removal of sensitive columns. These files are particularly useful for social scientists interested in measuring correlations across questions and analyzing heterogeneous effects for experiments.

The folder contains the following files for each wave (1-4):

  • Labels CSV (e.g., wave_1_labels_anonymized.csv): Contains survey answers as text.
  • Numbers CSV (e.g., wave_1_numbers_anonymized.csv): Contains survey answers as numerical codes.

Additionally,

  • Questionaire: questionnaire files are provided in the questionnaire subfolder. These files can help visualize the survey structure and question flows.
  • Wave 4 Simulation Results: We also uploaded the Wave 4 csv simluated by GPT4.1-mini (our default setup) along with the human csv. This facilitates the analysis that aims to understand deeply for the llm simulation pattern.

Job postings of DAX40 companies (2023)

Creators: HR Forecast GmbH
Publication Date: 2022-12-31
Creators: HR Forecast GmbH

Job posting dataset of the DAX40 companies for the year 2023, aggregated from multiple public sources. The data contains anonymized information about job advertisements, including job title, job requirements, location, and type of employment.

US State-Level Corporate Tax, Apportionment, and Control Variables with FIPS Crosswalk (1980–2014)

Creators: Suárez Serrato, Juan Carlos, and Owen Zidar
Publication Date: 2019-12-10
Creators: Suárez Serrato, Juan Carlos, and Owen Zidar

This dataset collection provides comprehensive, state-level information on corporate tax rates, apportionment factors, and tax base controls for the United States covering various periods from 1980 to 2014. It includes a crosswalk file linking state FIPS codes to different state names and abbreviations, detailed records of both state and federal corporate tax rates by state-year, and apportionment factors (payroll, sales, and property) sourced from the Book of the States by the Council of State Governments. Additionally, it offers a range of state tax base control variables for each state-year, which were used for analytical controls in related research. This collection is designed to support empirical analysis in public finance, taxation policy, and state-level economic research.

Experimental and Survey Data on the Effects of Peer-to-Peer Recognition in the Workplace

Creators: Paul W. Black, Mark S. Cecchini, Andrew H. Newman
Publication Date: 2024/12
Creators: Paul W. Black, Mark S. Cecchini, Andrew H. Newman

This dataset contains experimental and survey responses used to analyze the psychological effects of peer-to-peer recognition in the workplace. The data was collected as part of the study by Paul W. Black, Mark S. Cecchini, and Andrew H. Newman, published in Accounting, Organizations and Society (2024). The dataset explores how different types of recognition (anonymous vs. attributed) influence perceived appreciation and employee engagement, including responses from both lab experiments and field surveys.

Creators: Susanne Preuss, Malte M. Max

This dataset contains firm-level data linking sociopolitical claims—focused on diversity and environmental protection—to political action committee (PAC) contributions and politician ratings from advocacy groups. It was used by Susanne Preuss and Malte M. Max in their 2024 study published in Accounting, Organizations and Society to assess political alignment between firms’ public statements and their actual political spending behavior.

Creators: Aljoscha Janssen, Xuan Zhang

This dataset contains pharmacy-level data on prescription opioid orders, including OxyContin, and tracks pharmacy ownership status (independent vs. chain). It was used by Aljoscha Janssen and Xuan Zhang in their 2023 study published in the American Economic Review to analyze how ownership affects opioid dispensing behavior. The dataset supports comparisons between pharmacy types and captures the effects of the 2010 OxyContin reformulation.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.