Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Creators:

Henderson, Peter; Krass, Mark S.; Zheng, Lucia; Guha Neel; Manning, Christopher D.; Jurafsky, Dan; Ho, Daniel E.

Publication Date:

2022

Data Category:

Dataset Description:

We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives. The data encompasses a vast number of observations, meticulously collected from 35 distinct sources. These sources include court opinions, contracts, administrative rules, legislative records, and more, reflecting various norms and legal standards for data filtering. The dataset has a size of 256GB. The temporal coverage of the dataset varies across its subsets, as each source spans different time ranges. For instance, U.S. court opinions from CourtListener are synchronized as of December 31, 2022, while the Federal Register includes draft rulemaking documents filed by agencies over an extended period.

Variables:

Name	Description
text	the document text
created_timestamp	If the original source provided a timestamp when the document was created we provide this as well. Note, these may be inaccurate. For example CourtListener case opinions provide the timestamp of when it was uploaded to CourtListener not when the opinion was published.
downloaded_timestamp	When the document was scraped
url	the source url

Details:

Bookmark this Dataset/Publication

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Public Domain Music Dataset

Amazon Product Reviews

Characterizing Online Discussion Using Coarse Discourse Sequences

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Sign In

Register

Reset Password