Showing 1-6 of 6 results
Creators: Maximilian Witte

This dataset consists of two complementary components capturing both the official positions of the major political parties in the 2021 German general election and the public perception of these positions.

The first component contains pre-processed short versions of the election programs from the six major parties that competed at the federal level in 2021: CDU/CSU, SPD, Bündnis 90/Die Grünen, FDP, Die Linke, and AfD. All statements are processed for text classification and were sourced from party publications released for the 2021 campaign.

The second component includes 7,500 individual statements from consumers describing what they believe these political parties stand for and do not stand for. Participants were asked to freely express their perceptions without constraints on length or structure. Each statement is linked to the referenced party where applicable and contains metadata on anonymized participant ID, timestamp, and language. Statements include both supportive and critical assessments and therefore represent a wide range of public interpretations of party identity and political priorities.

Together, the two components enable the study of the relationship between the official communication of political parties and the way citizens mentally represent party beliefs. The dataset can be used for research in political communication, perception gaps, misinformation, narrative framing, and natural language processing applications such as stance detection and text similarity.

Indonesia Social Protection and Poverty Metrics Dataset

Creators: Abhijit Banerjee, Massachusetts Institute of Technology; Rema Hanna, Harvard Kennedy School; Benjamin Olken, Massachusetts Institute of Technology; Elan Satriawan, Gadjah Mada University and TNP2K; Sudarno Sumarto, SMERU and TNP2K
Publication Date: 2022-11-30
Creators: Abhijit Banerjee, Massachusetts Institute of Technology; Rema Hanna, Harvard Kennedy School; Benjamin Olken, Massachusetts Institute of Technology; Elan Satriawan, Gadjah Mada University and TNP2K; Sudarno Sumarto, SMERU and TNP2K

This dataset collection compiles diverse data sources related to poverty measurement, household-level participation in social assistance programs, and aid distribution in Indonesia. It includes official statistics on provincial poverty thresholds from the national statistics agency (BPS), covering multiple periods and regions. De-identified household survey data from the BSPN program captures demographic characteristics, program awareness, assistance received, and interview metadata, enabling analysis of targeting and delivery effectiveness. Additionally, the collection features survey responses from distribution points for major aid programs such as Rastra and BPNT, documenting the type of distributors, geographic locations, and operational details of aid delivery. These datasets are designed to support empirical research on poverty dynamics, social protection coverage, and policy implementation performance.

Consumer Expectations and Economic Preferences in Belgium

Creators: Olivier Coibion, University of Texas-Austin; Dimitris Georgarakos, European Central Bank; Yuriy Gorodnichenko, University of California-Berkeley; Geoff Kenny, European Central Bank; Michael Weber, University of Chicago. Booth School of Business
Publication Date: 2023-11-05
Creators: Olivier Coibion, University of Texas-Austin; Dimitris Georgarakos, European Central Bank; Yuriy Gorodnichenko, University of California-Berkeley; Geoff Kenny, European Central Bank; Michael Weber, University of Chicago. Booth School of Business

This dataset contains microdata from a consumer survey conducted in Belgium as part of a broader study on economic expectations, preferences, and behavior. It includes detailed information on respondents’ employment status, income, housing conditions, job sectors, and economic outlook. The dataset supports analyses of how individuals form expectations about future economic conditions (e.g., GDP growth) and how these expectations influence consumer behavior. Variables also capture demographic characteristics, weighting factors, and data quality indicators, enabling robust quantitative research on household-level economic sentiment and decision-making in a European context.

Economic Expectations and Media Dynamics in the U.S.

Creators: Charles Angelucci, Massachusetts Institute of Technology; Andrea Prat, Columbia University
Publication Date: 2023-12-06
Creators: Charles Angelucci, Massachusetts Institute of Technology; Andrea Prat, Columbia University

This dataset collection combines microdata from a U.S.-based survey on economic expectations with complementary media-related data to support the analysis of public sentiment, media influence, and journalist activity over time. The survey includes detailed information on respondents’ income, employment, housing, and perceptions of macroeconomic indicators such as GDP growth and inflation. These variables enable in-depth research on how individuals form economic expectations and how those expectations affect decision-making.

In addition, the collection features a longitudinal dataset tracking journalist performance metrics across multiple months and years. These include story counts and journalist-level rankings, allowing researchers to study media output, content dynamics, and exposure to economic news or narratives over time. This integrated structure supports empirical research on the interaction between individual beliefs, media exposure, and macroeconomic behavior.

Evaluation Data from a Conditional Cash Transfer Program and Educational Outcomes

Creators: Paul Goldsmith-Pinkham, Yale University; Peter Hull, Brown University; Michal Kolesar, Princeton University
Publication Date: 2024-09-13
Creators: Paul Goldsmith-Pinkham, Yale University; Peter Hull, Brown University; Michal Kolesar, Princeton University

This dataset collection supports the evaluation of a Conditional Cash Transfer (CCT) program designed to improve school attendance and learning outcomes. It includes baseline, endline, and monitoring surveys at the household and school levels, along with administrative records on program delivery, stratification data for experimental assignment, and dropout tracking. Student-level files contain assessment scores and structured profiles, while school visit records detail infrastructure and teacher inputs. Additional datasets document treatment assignment, outcomes across waves, and contextual variables such as household income and school participation. The data are structured for longitudinal analysis and impact evaluation through randomized and stratified designs at the student, household, and school levels.

Real-World LLM Use Cases

Creators: Jingwen Cheng, Kshitish Ghate, Wenyue Hua, William Yang Wang, Hong Shen, Fei Fang
Publication Date: 2025-03-24
Creators: Jingwen Cheng, Kshitish Ghate, Wenyue Hua, William Yang Wang, Hong Shen, Fei Fang

This data contains 93,259 LLM use cases collected from Reddit and news articles between June 2020 and December 2024. It captures two key dimensions: the diverse applications of LLMs and the demographics of their users. It categorizes LLM applications and explores how users’ occupations relate to the types of applications they use.

If you use this dataset, please cite this paper: https://doi.org/10.48550/arXiv.2503.18792.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.