Showing 33-40 of 272 results

Industry-Level Policy and Economic Indicators for Indian Sectors

Creators: Natalie Bau, Adrien Matray
Publication Date: 2023-01-01
Creators: Natalie Bau, Adrien Matray

This dataset collection compiles firm- and sector-level data to support the analysis of economic activity and policy conditions across Indian industries. It includes measures of foreign direct investment (FDI) restrictions at multiple NIC classification levels, tariff schedules, dereservation indicators, and state-level financial development variables. Additionally, it provides merged datasets linking financial indicators from the Reserve Bank of India (RBI) with NIC-based classifications. To ensure comparability over time, the collection contains crosswalks between NIC codes and a variety of deflators, covering output, input, capital, and GDP, along with currency conversion rates. These files enable real-term adjustments and longitudinal consistency in economic indicators, facilitating empirical research on industrial dynamics and regulatory environments.

Elite Backgrounds, Career Paths, and Promotion Patterns in the Chinese Communist Party (CCP)

Creators: Patrick Francois, Francesco Trebbi, Kairong Xiao
Publication Date: 2023-03-01
Creators: Patrick Francois, Francesco Trebbi, Kairong Xiao

This dataset collection contains detailed individual-level data on members of the Chinese Communist Party (CCP), primarily from the 19th Central Committee and related leadership bodies. The data include variables on personal demographics (e.g., gender, ethnicity, education), career trajectories (e.g., office tenures, geographic placements, sectoral experience), and elite backgrounds (e.g., military service, red/technocratic family origins, bureaucratic ties). Several datasets focus on promotion outcomes and covariates relevant for regression analysis, while others record coded biographical information from public sources such as official CCP documents and media reports. Variables span institutional affiliations, cadre rank levels, prior leadership experience, and exposure to national vs. local governance. This structure enables analysis of determinants of elite selection and promotion within the CCP hierarchy.

Cereal Products, Nutritional Composition, and Market Prices

Creators: Nano Barahona, Cristóbal Otero, Sebastián Otero
Publication Date: 2023-05-01
Creators: Nano Barahona, Cristóbal Otero, Sebastián Otero

This dataset collection integrates product-level and time-series data related to cereal and grain products. It includes nutritional composition data from various years and sources, detailed product characteristics from scanned records and field observations, consumer belief responses linked to product features, and historical commodity price trends for key ingredients such as cocoa, corn, oats, rice, sugar, and wheat. Additionally, a week-to-date mapping file supports temporal alignment across datasets. Together, these files enable the analysis of nutritional trends, consumer perceptions, and pricing dynamics in the cereal product market.

Food Consumption, Prices, and Nutrient Characteristics

Creators: Jean‐Jacques Forneron
Publication Date: 2023-05-01
Creators: Jean‐Jacques Forneron

This collection includes structured data on food item consumption quantities, associated prices, and nutritional content. Variables cover a wide range of food categories and nutrients, enabling analysis of consumption patterns, relative pricing, and dietary composition. Additional metadata clarifies food group classifications and measurement units to support consistent interpretation. The data structure allows for temporal and comparative analysis across items, suitable for applications in food economics, nutrition studies, and price analysis.

Macroeconomic Time Series for Policy Shock Analysis

Creators: Alisdair McKay, Christian K. Wolf
Publication Date: 2023-09-01
Creators: Alisdair McKay, Christian K. Wolf

This dataset provides macroeconomic time-series variables prepared for structural vector autoregression (VAR) analysis. It includes indicators such as real output, inflation, interest rates, and monetary policy shocks. The variables are formatted to support empirical evaluation of policy scenarios through VAR-based counterfactual exercises. Metadata from the accompanying documentation clarifies variable transformations and modeling roles within the policy identification framework.

Digital Twin Dataset

Creators: Olivier Toubia , George Z. Gui , Tianyi Peng , Daniel J. Merlau , Ang Li , Haozhe Chen
Publication Date: 2025-08-20
Creators: Olivier Toubia , George Z. Gui , Tianyi Peng , Daniel J. Merlau , Ang Li , Haozhe Chen

This dataset Twin-2K-500 contains comprehensive persona information from a representative sample of 2,058 US participants, providing rich demographic and psychological data. The dataset is specifically designed for building digital twins for LLM simulations.

Dataset Structure and Format

Twin-2K-500 Dataset is organized into three folders, each with its specific format and purpose:

1. Full Persona Folder

This folder contains complete persona information for each participant. The data is split into chunks for easier processing:

  • pid: Participant ID
  • persona_text: Complete survey responses in text format, including all questions and answers. For questions that appear in both waves 1-3 and wave 4, the wave 4 responses are used.
  • persona_summary: A concise summary of each participant’s key characteristics and responses, designed to provide a quick overview without needing to process the full survey data. This summary captures the essential traits and patterns in the participant’s responses.
  • persona_json: Complete survey responses in JSON format, following the same structure as persona_text. The JSON file is useful if a subset of questions wanted to be excluded or revised.

2. Wave Split Folder

This folder is designed for testing and evaluating different LLM persona creation methodologies (from prompt engineering to RAG, fine-tuning, and RLHF):

  • pid: Participant ID
  • wave1_3_persona_text: Persona information from waves 1-3 in text format, including questions that did not appear in wave 4. This can be used as training data for creating personas.
  • wave1_3_persona_json: Persona information from waves 1-3 in JSON format, following the same structure as wave1_3_persona_text.
  • wave4_Q_wave1_3_A: Wave 4 questions with answers from waves 1-3, useful for human test-retest evaluation.
  • wave4_Q_wave4_A: Wave 4 questions with their actual answers from wave 4, serving as ground truth for evaluating persona prediction accuracy.

The Wave Split Folder is particularly useful for:

  • Training persona creation models using wave1-3 data
  • Evaluating how well the created personas can predict wave 4 responses
  • Comparing different LLM-based approaches (prompt engineering, RAG, fine-tuning, RLHF) for persona creation
  • Testing the reliability and consistency of persona predictions across different time periods

3. Raw Data Folder

This folder provides access to the raw survey response files from Qualtrics, after anonymization and removal of sensitive columns. These files are particularly useful for social scientists interested in measuring correlations across questions and analyzing heterogeneous effects for experiments.

The folder contains the following files for each wave (1-4):

  • Labels CSV (e.g., wave_1_labels_anonymized.csv): Contains survey answers as text.
  • Numbers CSV (e.g., wave_1_numbers_anonymized.csv): Contains survey answers as numerical codes.

Additionally,

  • Questionaire: questionnaire files are provided in the questionnaire subfolder. These files can help visualize the survey structure and question flows.
  • Wave 4 Simulation Results: We also uploaded the Wave 4 csv simluated by GPT4.1-mini (our default setup) along with the human csv. This facilitates the analysis that aims to understand deeply for the llm simulation pattern.

Job postings of DAX40 companies (2023)

Creators: HR Forecast GmbH
Publication Date: 2022-12-31
Creators: HR Forecast GmbH

Job posting dataset of the DAX40 companies for the year 2023, aggregated from multiple public sources. The data contains anonymized information about job advertisements, including job title, job requirements, location, and type of employment.

US State-Level Corporate Tax, Apportionment, and Control Variables with FIPS Crosswalk (1980–2014)

Creators: Suárez Serrato, Juan Carlos, and Owen Zidar
Publication Date: 2019-12-10
Creators: Suárez Serrato, Juan Carlos, and Owen Zidar

This dataset collection provides comprehensive, state-level information on corporate tax rates, apportionment factors, and tax base controls for the United States covering various periods from 1980 to 2014. It includes a crosswalk file linking state FIPS codes to different state names and abbreviations, detailed records of both state and federal corporate tax rates by state-year, and apportionment factors (payroll, sales, and property) sourced from the Book of the States by the Council of State Governments. Additionally, it offers a range of state tax base control variables for each state-year, which were used for analytical controls in related research. This collection is designed to support empirical analysis in public finance, taxation policy, and state-level economic research.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.