Audio books data (audio + text)
Creators:
Usborne Publishing Ltd
Publication Date:
2017-xx-xx
Data Category:
Dataset Description:
The dataset contains audiobook recordings of a female British English speaker (Lesley Sims), used for the Blizzard Challenge to advance speech synthesis research. It comprises approximately 6.5 hours of speech data, encompassing 56 children's audiobooks. This data is particularly valuable for advancing speech synthesis research, as it offers high-quality, natural speech recordings paired with their textual content. In total, the dataset has a size of 765 MB. The temporal coverage of the dataset pertains to the period leading up to the Blizzard Challenge 2017, with the data being released specifically for that event. Structurally, the dataset consists of:
-
Audio Files: High-quality recordings of Lesley Sims narrating various texts. These files capture natural prosody and articulation, essential for developing and testing speech synthesis models.
-
Text Files: Corresponding textual content for each audio recording, facilitating alignment between spoken and written language.
-
Label Files: Sentence-level segmentation and alignment between the text and speech for a portion of the data. These labels were initially created by Toshiba's Cambridge Research Laboratory and later re-processed by the University of Edinburgh to ensure they correspond accurately to the original audio recordings.
Variables:
Details: