"LaCour!" - a multilingual dataset of hearing transcripts with 2 million+ tokens

Dataset Summary

This dataset contains transcribed court hearings sourced from official hearings of the European Court of Human Rights (https://www.echr.coe.int/webcasts-of-hearings). The hearings are 154 selected webcasts (videos) from 2012-2022 in their original language (no interpretation). With manual annotation for language labels and automatic processing of the extracted audio with pyannote and whisper-large-v2, the resulting dataset contains 4000 speaker turns and 88920 individual lines. The dataset contains two subsets, the transcripts and the metadata with linked documents. The transcripts are additionally available as .txt or .xml.

The largest amounts in the transcripts are: English, French

A smaller portion also contains the following languages:

Russian, Spanish, Croatian, Italian, Portuguese, Turkish, Polish, Lithuanian, German, Ukrainian, Hungarian, Dutch, Albanian, Romanian, Serbian

The collected metadata is: English
Dataset Structure
Data Instances

Each instance in transcripts represents an entire segment of a transcript, similar to a conversation turn in a dialog.

{ ‘id’: 0, ‘webcast_id’: ‘1021112_29112017’, ‘segment_id’: 0, ‘speaker_name’: ‘UNK’, ‘speaker_role’: ‘Announcer’, ‘data’: { ‘begin’: [12.479999542236328], ‘end’: [13.359999656677246], ‘language’: [‘fr’], ‘text’: [‘La Cour!’] } }

Each instance in documents represents a information on a document in hudoc associated with a hearing and the metadata associated with a hearing. The actual document is linked and can also be found in hudocwith the case_id. Note: hearing_type states the type of the hearing, type states the type of the document. If the hearing is a “Grand Chamber hearing”, the “CHAMBER” document refers to a different hearing.


‘id’: 16,

‘webcast_id’: ‘1232311_02102012’,

‘hearing_title’: ‘Michaud v. France (nos. 12323/11)’,

‘hearing_date’: ‘2012-10-02 00:00:00’,

‘hearing_type’: ‘Chamber hearing’,

‘application_number’: [‘12323/11’],

‘case_id’: ‘001-115377’,

‘case_name’: ‘CASE OF MICHAUD v. FRANCE’,

‘case_url’: ‘https://hudoc.echr.coe.int/eng?i=001-115377’,

‘ecli’: ‘ECLI:CE:ECHR:2012:1206JUD001232311’,

‘type’: ‘CHAMBER’,

‘document_date’: ‘2012-12-06 00:00:00’,

‘importance’: 1,

‘articles’: [‘8’, ‘8-1’, ‘8-2′, ’34’, ’35’],

‘respondent_government’: [‘FRA’],

‘issue’: ‘Decision of the National Bar Council of 12 July 2007 “adopting regulations on internal procedures for implementing the obligation to combat money laundering and terrorist financing, and an internal supervisory mechanism to guarantee compliance with those procedures” ; Article 21-1 of the Law of 31 December 1971 ; Law no. 2004-130 of 11 February 2004 ; Monetary and Financial Code’,

‘strasbourg_caselaw’: ‘André and Other v. France, no 18603/03, 24 July 2008;Bosphorus Hava Yollari Turizm ve Ticaret Anonim Sirketi v. Ireland [GC], no 45036/98, ECHR 2005-VI;[…]’,

‘external_sources’: ‘Directive 91/308/EEC, 10 June 1991;Article 6 of the Treaty on European Union;Charter of Fundamental Rights of the European Union;Articles 169, 170, 173, 175, 177, 184 and 189 of the Treaty establishing the European Community;Recommendations 12 and 16 of the financial action task force (“FATF”) on money laundering;Council of Europe Convention on Laundering, Search, Seizure and Confiscation of the Proceeds from Crime and on the Financing of Terrorism (16 May 2005)’,

‘conclusion’: ‘Remainder inadmissible;No violation of Article 8 – Right to respect for private and family life (Article 8-1 – Respect for correspondence;Respect for private life)’,

‘separate_opinion’: True


