MUStARD: Multimodal Sarcasm Detection Dataset

Creators:

Castro, Santiago; Hazarika, Devamanyu; Pérez-Rosas, Verónica; Zimmermann, Roger; Mihalcea, Rada; Poria, Soujanya

Publication Date:

2019

Data Category:

Dataset Description:

We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each entry in the dataset combines textual transcripts, audio signals, and visual cues, enabling comprehensive analysis of sarcasm as it manifests across different channels. Beyond isolated utterances, the dataset includes preceding conversational context, providing insights into how prior dialogue influences the interpretation of sarcasm. The dataset was compiled and released in 2019 and is approximately 11,9 kB in size. Key numbers:

Total Utterances: 690
Sarcastic Utterances: 345
Non-Sarcastic Utterances: 345

The dataset is organized in JSON format with the following fields:

utterance: The text of the target utterance to classify.
speaker: Speaker of the target utterance.
context: List of utterances (in chronological order) preceding the target utterance.
context_speakers: Respective speakers of the context utterances.
sarcasm: Binary label indicating sarcasm (1 for sarcastic, 0 for non-sarcastic).

Publications Citing This Dataset:

Băroiu, Alexandru-Costin, and Ștefan Trăușan-Matu. 2023. "Comparison of Deep Learning Models for Automatic Detection of Sarcasm Context on the MUStARD Dataset" Electronics 12, no. 3: 666. https://doi.org/10.3390/electronics12030666
Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, and Pushpak Bhattacharyya. 2020. Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4351–4360, Online. Association for Computational Linguistics.
10.18653/v1/2020.acl-main.401

Variables:

Name	Description
utterance	The text of the target utterance to classify.
speaker	Speaker of the target utterance.
context	List of utterances (in chronological order) preceding the target utterance.
context_speakers	Respective speakers of the context utterances.
sarcasm	Binary label for sarcasm tag.

Details:

Bookmark this Dataset/Publication

MUStARD: Multimodal Sarcasm Detection Dataset

Corporate Subsidy Tracker and Political Activity Database

Heart Disease Data Set

Replication Data: Monetary Policy Transmission in Segmented Markets

MUStARD: Multimodal Sarcasm Detection Dataset

Sign In

Register

Reset Password