This America Life podcast transcripts

Creators:
Julian McAuley
Publication Date:
2020
Data Category:
Dataset Description:
This dataset contains transcripts from This America Life podcast episodes, including speaker utterances and associated audio for in-depth analysis of long conversations, spanning from 1995 to 2020. It is particularly valuable for research in natural language processing, speech recognition, and multi-speaker diarization, as it provides real-world examples of long-form, multi-speaker conversations. This dataset provides a rich resource for developing and testing algorithms in areas like automatic speech recognition, speaker diarization, and natural language understanding, contributing to advancements in processing long-form, multi-speaker audio content. In total, the dataset encompasses 663 episodes, totaling approximately 637.70 hours of audio content. Each episode serves as an individual observation, offering a substantial collection for analysis. Structurally, the dataset consists of program transcripts and associated audio files. Each transcript includes metadata such as episode acts, speaker names, speaker utterances, utterance lengths, and episode audio. 
Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.