Showing 193-200 of 272 results
Creators: Rauh, Christian; Schwalbach, Jan

This dataset is an extensive collection of parliamentary speeches from nine representative democracies, offering valuable insights into legislative discourse across different political systems. ParlSpeech V2 contains complete full-text vectors of more than 6.3 million parliamentary speeches in the key legislative chambers of Austria, the Czech Republic, Germany, Denmark, the Netherlands, New Zealand, Spain, Sweden, and the United Kingdom, covering periods between 21 and 32 years. Meta-data include information on date, speaker, party, and partially agenda item under which a speech was held. The accompanying release note provides a more detailed guide to the data (2020-03-11). The speeches span from 1987 to 2018 and the full dataset has a size of 1,5 GB.

It is organized into separate R Data files for each country, with each file containing:

  • debate: Title of the debate (if available).

  • party: Name of the party to which the speaker belongs.

  • text: Full text of the speech as recorded by the parliament.

  • SPEAKER: Name of the individual delivering the speech.

The Market for Data Privacy

Creators: Ramadorai, Tarun; Uetwiller, Antoine; Walther, Ansgar
Publication Date: 2019
Creators: Ramadorai, Tarun; Uetwiller, Antoine; Walther, Ansgar

The dataset covers an analysis of privacy policies across U.S. firms, providing valuable insights into corporate data privacy practices. We scrape a comprehensive set of US firms’ privacy policies, and study them alongside firms’ web data extraction behaviour. We find considerable and systematic variation in privacy policies along multiple dimensions including ease of access, length, readability, and clarity, both within and between industries. Surprisingly, firms’ data extraction is strongly and positively related to the length and complexity of their privacy policies. Firms with intermediate levels of technical sophistication have longer, more complex policies. A simple signalling model of firms engaging in data extraction in an economy with both myopic and sophisticated consumers helps to rationalize these findings. The dataset has a size of 12,1 kB and reflects privacy policies up to the year 2019. In total, data for privacy attributes for 7,020 U.S. firms, as well as full texts of privacy policies for 3,047 firms are included.

COVID-19 Economic Stimulus Packages Database

Creators: Elgin, C; Yalaman, A
Publication Date: 2021
Creators: Elgin, C; Yalaman, A

We conduct a comprehensive review of different economic policy measures adopted by 166 countries as a response to the COVID-19 pandemic and create a large database including fiscal, monetary, and exchange rate measures. Furthermore, using principle component analysis (PCA), we construct a COVID-19 Economic Stimulus Index (CESI) that combines all adopted policy measures. This index standardises economic responses taken by governments and allows us to study cross-country differences in policies. Finally, using simple cross-country OLS regressions we report that the median age of the population, the number of hospital beds per-capita, GDP per-capita, and the number of total cases are all significantly associated with the extent of countries’ economic policy responses. The dataset covers economic policy measures adopted during the COVID-19 pandemic, with the latest version dated May 7, 2021, reflecting the policies implemented up to that point. It has a size of 6,2 kB.

Creators: Borret, Camille; Laurer Moritz

The dataset contains 142.036 EU laws – almost the entire corpus of the EU’s digitally available legal acts passed between 1952 – 2019. It encompasses the full texts of these legal documents, providing researchers and analysts with extensive material for legal, historical, and policy-related studies. The dataset covers the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. It was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU. Covering EU legal acts from the year 1952 up to August 2019, this dataset provides a historical perspective on the evolution of EU legislation over nearly seven decades. It has a size of 1,5 GB.

Speeches Dataset

Creators: European Central Bank
Publication Date: 2019
Creators: European Central Bank

To assist researchers in the field of central bank communication, we offer a precompiled dataset containing the content of all speeches together with limited metadata. This dataset contains the full text of speeches delivered by its Executive Board members, accompanied by limited metadata. Each entry includes the complete text of the speech, offering detailed insights into the ECB’s communication. The dataset also includes metadata such as the date of the speech in YYYY-MM-DD format, the names of the ECB Executive Board members who delivered the speech, the title of the speech, and a subtitle providing additional context about the occasion or event. One special characteristic of this dataset is that it is regularly updated, typically on a monthly basis, ensuring that researchers have access to the latest speeches. It has a a size of 1,4 MB and encompasses speeches from the inception of the ECB up to the most recent update, which took place on March 1, 2025. The dataset is structured as a CSV file with the following columns: date (publication date of the speech), speakers (names of the Executive Board members who delivered the speech), title (title of the speech), subtitle (additional context or description), and contents (full text of the speech, including any footnotes).

The German Federal Courts Dataset

Creators: Hamann, Hanjo
Publication Date: 2019
Creators: Hamann, Hanjo

The present project remedies that by easing access to such data and lowering the threshold for empirical studies on judicial behavior. This paper introduces the German Federal Courts Dataset (GFCD) as a resource for empirical legal scholars, with the objective of inspiring more European lawyers to engage with empirical aspects of civil-law adjudication. To that end, several thousand pages of German court documentation were digitized, transcribed into machine-readable tables (ready to be imported into statistics software), and published online (www.richter-im-internet.de). To simultaneously explore innovative ways of sharing public-domain datasets, the data were modeled as linked open data and imported into the Wikidata repository for use in any computational application. The dataset covers the years of 1950 to 2020 and is about 573 MB large.

State of the State

Creators: Fivethirtyeight
Publication Date: 2019
Creators: Fivethirtyeight

We conducted a text analysis of all 50 governors’ 2019 state of the state speeches to see what issues were talked about the most and whether there were differences between what Democratic and Republican governors were focusing on.

index.csv contains a listing of each of the 50 speeches, one for each state as well as the name and party of the state’s governor and a link to an official source for the speech.

words.csv contains every one-word phrase that was mentioned in at least 10 speeches and every two- or three-word phrase that was mentioned in at least five speeches after a list of stop-words was removed and the word “healthcare” was replaced with “health care” so that they were not counted as distinct phrases. It also contains the results of a chi^2 test that shows the statistical significance of and associated p-value of phrases. Overall, the dataset is 134,5 kB in size

MUStARD: Multimodal Sarcasm Detection Dataset

Creators: Castro, Santiago; Hazarika, Devamanyu; Pérez-Rosas, Verónica; Zimmermann, Roger; Mihalcea, Rada; Poria, Soujanya
Publication Date: 2019
Creators: Castro, Santiago; Hazarika, Devamanyu; Pérez-Rosas, Verónica; Zimmermann, Roger; Mihalcea, Rada; Poria, Soujanya

We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each entry in the dataset combines textual transcripts, audio signals, and visual cues, enabling comprehensive analysis of sarcasm as it manifests across different channels. Beyond isolated utterances, the dataset includes preceding conversational context, providing insights into how prior dialogue influences the interpretation of sarcasm. The dataset was compiled and released in 2019 and is approximately 11,9 kB in size.

Key numbers:

  • Total Utterances: 690

  • Sarcastic Utterances: 345

  • Non-Sarcastic Utterances: 345

The dataset is organized in JSON format with the following fields:

  • utterance: The text of the target utterance to classify.

  • speaker: Speaker of the target utterance.

  • context: List of utterances (in chronological order) preceding the target utterance.

  • context_speakers: Respective speakers of the context utterances.

  • sarcasm: Binary label indicating sarcasm (1 for sarcastic, 0 for non-sarcastic).

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.