The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables

Borret, Camille; Laurer Moritz
Publication Date:
Dataset Description:

The dataset contains 142.036 EU laws – almost the entire corpus of the EU’s digitally available legal acts passed between 1952 – 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database ( and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project ( We hope that it will facilitate future quantitative and computational research on the EU.

Name Description
Act_amends CELEX number of the old act amended by the new act (see detail on CELEX below)
Act_cites CELEX number of other acts cited by the act
Act_name Full name of act
act_raw_text The full raw text of the act in one string. Mostly includes: title, recitals, legal articles and annex. Please note that the texts of older laws is not always clean.
Additional_info Additional information
Amends_links link to previous act which is amended by the new act
Authors name of the act’s authors
CELEX unique CELEX identifier of the act.
Cites_links link to other acts, cited by the act
Date_document Date of the document. The website does not provide an explanation of which exact date in the legislative process this represents. The dataset ranges from 1952 to August 2019.
Date_publication Date the document was published
ELI_link European Legislation Identifier (ELI) link to the act.
EUROVOC A group of EuroVoc keywords associated with the act.
Eurlex_link Link to act on website.
First_entry_into_force Date when act first entered into force
Legal_basis_celex The CELEX number of the act’s legal basis
Oeil_link Link to the European Parliament’s Legislative Observatory (Oeil). Provides procedural information.
Procedure_number Number of the legislative procedure leading to the act
Proposal_link Link to the Commission proposal proceeding the act
Status whether the act was in force at the time of scraping (August 2019). (“In Force” or “Not in Force”)
Subject_matter Group of keywords representing the subject matter of the act. Similar to EUROVOC, only less detailed, more abstract.
Temporal_status Date of end of validity of the act
Treaty Name of the Treaty the act is based on
Harvard Dataverse
1.5 GB
Creative Commons Zero v1.0 Universal

