Showing 193-200 of 272 results
Creators: Rauh, Christian; Schwalbach, Jan

This dataset is an extensive collection of parliamentary speeches from nine representative democracies, offering valuable insights into legislative discourse across different political systems. ParlSpeech V2 contains complete full-text vectors of more than 6.3 million parliamentary speeches in the key legislative chambers of Austria, the Czech Republic, Germany, Denmark, the Netherlands, New Zealand, Spain, Sweden, and the United Kingdom, covering periods between 21 and 32 years. Meta-data include information on date, speaker, party, and partially agenda item under which a speech was held. The accompanying release note provides a more detailed guide to the data (2020-03-11). The speeches span from 1987 to 2018 and the full dataset has a size of 1,5 GB.

It is organized into separate R Data files for each country, with each file containing:

  • debate: Title of the debate (if available).

  • party: Name of the party to which the speaker belongs.

  • text: Full text of the speech as recorded by the parliament.

  • SPEAKER: Name of the individual delivering the speech.

The Market for Data Privacy

Creators: Ramadorai, Tarun; Uetwiller, Antoine; Walther, Ansgar
Publication Date: 2019
Creators: Ramadorai, Tarun; Uetwiller, Antoine; Walther, Ansgar

The dataset covers an analysis of privacy policies across U.S. firms, providing valuable insights into corporate data privacy practices. We scrape a comprehensive set of US firms’ privacy policies, and study them alongside firms’ web data extraction behaviour. We find considerable and systematic variation in privacy policies along multiple dimensions including ease of access, length, readability, and clarity, both within and between industries. Surprisingly, firms’ data extraction is strongly and positively related to the length and complexity of their privacy policies. Firms with intermediate levels of technical sophistication have longer, more complex policies. A simple signalling model of firms engaging in data extraction in an economy with both myopic and sophisticated consumers helps to rationalize these findings. The dataset has a size of 12,1 kB and reflects privacy policies up to the year 2019. In total, data for privacy attributes for 7,020 U.S. firms, as well as full texts of privacy policies for 3,047 firms are included.

The German Federal Courts Dataset

Creators: Hamann, Hanjo
Publication Date: 2019
Creators: Hamann, Hanjo

The present project remedies that by easing access to such data and lowering the threshold for empirical studies on judicial behavior. This paper introduces the German Federal Courts Dataset (GFCD) as a resource for empirical legal scholars, with the objective of inspiring more European lawyers to engage with empirical aspects of civil-law adjudication. To that end, several thousand pages of German court documentation were digitized, transcribed into machine-readable tables (ready to be imported into statistics software), and published online (www.richter-im-internet.de). To simultaneously explore innovative ways of sharing public-domain datasets, the data were modeled as linked open data and imported into the Wikidata repository for use in any computational application. The dataset covers the years of 1950 to 2020 and is about 573 MB large.

Speeches Dataset

Creators: European Central Bank
Publication Date: 2019
Creators: European Central Bank

To assist researchers in the field of central bank communication, we offer a precompiled dataset containing the content of all speeches together with limited metadata. This dataset contains the full text of speeches delivered by its Executive Board members, accompanied by limited metadata. Each entry includes the complete text of the speech, offering detailed insights into the ECB’s communication. The dataset also includes metadata such as the date of the speech in YYYY-MM-DD format, the names of the ECB Executive Board members who delivered the speech, the title of the speech, and a subtitle providing additional context about the occasion or event. One special characteristic of this dataset is that it is regularly updated, typically on a monthly basis, ensuring that researchers have access to the latest speeches. It has a a size of 1,4 MB and encompasses speeches from the inception of the ECB up to the most recent update, which took place on March 1, 2025. The dataset is structured as a CSV file with the following columns: date (publication date of the speech), speakers (names of the Executive Board members who delivered the speech), title (title of the speech), subtitle (additional context or description), and contents (full text of the speech, including any footnotes).

Creators: Borret, Camille; Laurer Moritz

The dataset contains 142.036 EU laws – almost the entire corpus of the EU’s digitally available legal acts passed between 1952 – 2019. It encompasses the full texts of these legal documents, providing researchers and analysts with extensive material for legal, historical, and policy-related studies. The dataset covers the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. It was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU. Covering EU legal acts from the year 1952 up to August 2019, this dataset provides a historical perspective on the evolution of EU legislation over nearly seven decades. It has a size of 1,5 GB.

COVID-19 Economic Stimulus Packages Database

Creators: Elgin, C; Yalaman, A
Publication Date: 2021
Creators: Elgin, C; Yalaman, A

We conduct a comprehensive review of different economic policy measures adopted by 166 countries as a response to the COVID-19 pandemic and create a large database including fiscal, monetary, and exchange rate measures. Furthermore, using principle component analysis (PCA), we construct a COVID-19 Economic Stimulus Index (CESI) that combines all adopted policy measures. This index standardises economic responses taken by governments and allows us to study cross-country differences in policies. Finally, using simple cross-country OLS regressions we report that the median age of the population, the number of hospital beds per-capita, GDP per-capita, and the number of total cases are all significantly associated with the extent of countries’ economic policy responses. The dataset covers economic policy measures adopted during the COVID-19 pandemic, with the latest version dated May 7, 2021, reflecting the policies implemented up to that point. It has a size of 6,2 kB.

Third Eye Data: TV News Archive chyrons

Creators: TV News Archive
Publication Date: 2017
Creators: TV News Archive

The Third Eye: TV News Archive Chyrons dataset captures and analyzes the “lower third” text, known as chyrons, displayed during live TV news broadcasts. This dataset provides a unique look into the real-time editorial choices of major news networks, offering insights into how different media outlets frame news stories. Using Optical Character Recognition (OCR) technology, chyrons are extracted and archived continuously, making it possible to track how key topics are covered over time.

At its inception in September 2017, the dataset collected chyrons from four major news networks: BBC News, CNN, Fox News, and MSNBC. Within just two weeks of its launch, over four million chyrons had already been captured, highlighting the vast amount of real-time data available. The dataset has been continuously updated since, allowing for longitudinal studies of media framing and news presentation trends. It’s size is approximately 12.5 kB in TSV format.

The dataset is structured into several key components. Each chyron entry includes:

  • The exact chyron text, showing the wording used by the network.
  • Timestamps, allowing analysis of how frequently specific topics appear.
  • Channel identifiers, enabling comparisons between different networks.
  • Duration data, indicating how long a chyron remained on screen, which can suggest emphasis or prioritization of certain stories.

By leveraging this dataset, researchers, journalists, and media analysts can examine bias in news presentation, media influence on public perception, and breaking news coverage trends. It serves as a powerful tool for studying news framing, editorial strategies, and the evolution of televised news narratives across competing networks.

CO2 emissions and ancillary data for 343 cities from diverse sources

Creators: Nangini, Cathy; Peregon Anna; Ciais, Philippe; Weddige, Ulf; Vogel, Felix; Wang, Jun; Bréon, François-Marie; Bachra, Simeran; Wang, Yilong; Gurney, Kevin; Yamagata, Yoshiki; Appleby, Kyra; Telahoun, Sara; Canadell, Josep G; Grübler, Arnulf; Dhakal, Shobhakar; Creutzig, Felix
Publication Date: 2019
Creators: Nangini, Cathy; Peregon Anna; Ciais, Philippe; Weddige, Ulf; Vogel, Felix; Wang, Jun; Bréon, François-Marie; Bachra, Simeran; Wang, Yilong; Gurney, Kevin; Yamagata, Yoshiki; Appleby, Kyra; Telahoun, Sara; Canadell, Josep G; Grübler, Arnulf; Dhakal, Shobhakar; Creutzig, Felix

This dataset collects anthropogenic carbon dioxide emissions data, supplemented with various socio-economic and environmental factors, across 343 cities worldwide. A dataset of dimensions 343 × 179 consisting of CO2 emissions from CDP (187 cities, few in developing countries), the Bonn Center for Local Climate contains action and reporting data (73 cities, mainly in developing countries), and data collected by Peking University (83 cities in China). Further, a set of socio-economic variables – called ancillary data – were collected from other datasets (e.g. socio-economic and traffic indices) or calculated (climate indices, urban area expansion), then combined with the emission data. The remaining attributes are descriptive (e.g. city name, country, etc.) or related to quality assurance/control checks. The file size is 1,8 MB and the majority (88%) of the cities reported emissions between 2010 and 2015. Structurally the dataset contains

  • City Identification: Each entry includes city name, country, and other descriptive attributes.
  • CO₂ Emissions Data: Reported emissions with quality assurance/control checks.
  • Ancillary Variables: Socio-economic data, traffic indices, climate indices, urban area expansion metrics, and more.

Please open using Tab as separator and ” as text delimiter.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.