Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.