Showing 177-184 of 262 results

Soccer Power Index (SPI) Ratings

Creators: FiveThirtyEight
Publication Date: 2022
Creators: FiveThirtyEight

This file contains links to the data behind our Club Soccer Predictions and Global Club Soccer Rankings. These data analyse soccer team performances worldwide,  providing valuable insights into team strengths and match outcomes.

The SPI database covers data up to the year 2022 with a total size of 52,6 kB and includes the following metrics:

  • SPI Rating: An overall measure of a team’s strength, combining offensive and defensive capabilities.

  • Offensive and Defensive Ratings: Separate evaluations of a team’s attacking and defensive proficiencies.

  • Match Probabilities: Predicted probabilities for home win, away win, and draw outcomes, offering insights into expected match results.

  • Projected Scores: Anticipated goal counts for both home and away teams, aiding in match analysis and forecasting.

 

2020 U.S. Election Emails

Creators: Mathur, Arunesh; Wang, Angelina; Schemmer, Carsten; Hami, Maia; Stewart, Brandom M; Narayanan; Arvind
Publication Date: 2023
Creators: Mathur, Arunesh; Wang, Angelina; Schemmer, Carsten; Hami, Maia; Stewart, Brandom M; Narayanan; Arvind

This is a preliminary release of the code and data associated with the research paper “Manipulative tactics are the norm in political emails: Evidence from 300K emails from the 2020 U.S. election cycle”.

The corpus contains emails from over 3,000 political campaigns and organizations in the 2020 election cycle in the U.S. The corpus aims to be comprehensive and includes coverage of emails from the candidates in prominent federal and state races as well as political organizations such as Political Action Committees (PACs) and political parties active in the 2020 cycle. We automated the process of signing up to receive emails from the websites of the political campaigns and organizations. For each entity’s website, if the bot discovered an email sign-up form, it filled it in with the information of a fictional recipient The entire dataset contains 317,366 emails.

Comparative Constitution Project Data

Creators: Comparative Constitutional Project
Publication Date: 2022
Creators: Comparative Constitutional Project

The dataset includes information on 799 constitutional systems and 2,999 amendments across various countries since 1789. The primary objective of the CCP is to record the characteristics of national constitutions written since 1789. The CCP aims to fill this informational gap by providing systematic data to comparative legal scholars for analysis long before they provide advice to constitution drafters. It is our hope that the analysis of, and insights from, these data will promote peace, justice, and human development through the constitution making process. The dataset has s a size of approximately 435 kB and is organized into several key components:

  • Chronology of Constitutional Events: This component documents each constitutional event (e.g., adoption, amendment, suspension) for recognized independent states since 1789.

  • Constitutional Texts: The CCP has collected the texts of nearly every constitutional system as well as most amendments, providing a repository for textual analysis.

  • Characteristics of National Constitutions: This dataset includes approximately 650 variables coded for each constitution, detailing various aspects such as governmental structure, rights, and amendment processes.

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Creators: Henderson, Peter; Krass, Mark S.; Zheng, Lucia; Guha Neel; Manning, Christopher D.; Jurafsky, Dan; Ho, Daniel E.
Publication Date: 2022
Creators: Henderson, Peter; Krass, Mark S.; Zheng, Lucia; Guha Neel; Manning, Christopher D.; Jurafsky, Dan; Ho, Daniel E.

We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives. The data encompasses a vast number of observations, meticulously collected from 35 distinct sources. These sources include court opinions, contracts, administrative rules, legislative records, and more, reflecting various norms and legal standards for data filtering. The dataset has a size of 256GB. The temporal coverage of the dataset varies across its subsets, as each source spans different time ranges. For instance, U.S. court opinions from CourtListener are synchronized as of December 31, 2022, while the Federal Register includes draft rulemaking documents filed by agencies over an extended period.

Political risks and Covid-19 measures

Creators: Firm-level-risk
Publication Date: 2019
Creators: Firm-level-risk

This website aggregates firm-level measures of exposure, risk, and sentiment constructed using textual analysis of quarterly earnings conference calls held by more than 11,000 listed firms in 81 countries. The data draws on work in three papers:
In our paper “Firm-Level Political Risk: Measurement and Effects,” as published in the Quarterly Journal of Economics , we construct measures of political sentiment and risk ranging from 2002 to 2021q2.

Reference & details
In “The Global Impact of Brexit Uncertainty,” we extend this methodology to construct measures of the costs, benefits, and risks associated with specific shocks, such as the UK’s decision to leave the EU. Our measures of Brexit exposure, risk, and sentiment are currently updated through the first quarter of 2019.

Reference & details
In a third paper, “Firm-level Exposure to Epidemic Disease: Covid-19, SARS, and H1N1,” we apply the same methodology to construct measures of costs, benefits, and risks individual firms associate with the spread of Covid-19, SARS, H1N1, Ebola, Zika, and MERS, ranging 2002 to 2021q2.

The dataset (csv) is about 76.1 MB large.

Creators: Remschel, Tobias; Kroeber, Corinna

we introduce a unique dataset containing all written communication published by the German Bundestag between 1949 and 2017. Increasing numbers of scholars make use of protocols of parliamentary speeches, parliamentary questions, or the texts of legislative drafts in various fields of comparative politics including representation, responsiveness, professionalization and political careers, or parliamentary agenda studies. Since preparing parliamentary documents is rather resource intense, these studies remain limited to single points in time, types of documents and/or policy areas. The long time horizon and various types of documents covered by our new comprehensive dataset will enable scholars interested in parliaments, parties and representatives to answer various innovative research questions related to legislative studies. (2020-11-14)

The dataset is about 1.1 GB large.

Creators: Rauh, Christian; Schwalbach, Jan

This dataset is an extensive collection of parliamentary speeches from nine representative democracies, offering valuable insights into legislative discourse across different political systems. ParlSpeech V2 contains complete full-text vectors of more than 6.3 million parliamentary speeches in the key legislative chambers of Austria, the Czech Republic, Germany, Denmark, the Netherlands, New Zealand, Spain, Sweden, and the United Kingdom, covering periods between 21 and 32 years. Meta-data include information on date, speaker, party, and partially agenda item under which a speech was held. The accompanying release note provides a more detailed guide to the data (2020-03-11). The speeches span from 1987 to 2018 and the full dataset has a size of 1,5 GB.

It is organized into separate R Data files for each country, with each file containing:

  • debate: Title of the debate (if available).

  • party: Name of the party to which the speaker belongs.

  • text: Full text of the speech as recorded by the parliament.

  • SPEAKER: Name of the individual delivering the speech.

The Market for Data Privacy

Creators: Ramadorai, Tarun; Uetwiller, Antoine; Walther, Ansgar
Publication Date: 2019
Creators: Ramadorai, Tarun; Uetwiller, Antoine; Walther, Ansgar

The dataset covers an analysis of privacy policies across U.S. firms, providing valuable insights into corporate data privacy practices. We scrape a comprehensive set of US firms’ privacy policies, and study them alongside firms’ web data extraction behaviour. We find considerable and systematic variation in privacy policies along multiple dimensions including ease of access, length, readability, and clarity, both within and between industries. Surprisingly, firms’ data extraction is strongly and positively related to the length and complexity of their privacy policies. Firms with intermediate levels of technical sophistication have longer, more complex policies. A simple signalling model of firms engaging in data extraction in an economy with both myopic and sophisticated consumers helps to rationalize these findings. The dataset has a size of 12,1 kB and reflects privacy policies up to the year 2019. In total, data for privacy attributes for 7,020 U.S. firms, as well as full texts of privacy policies for 3,047 firms are included.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.