Showing 1-5 of 5 results


Publication Date: 2021
Creators: ConceptNet

ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave unstated.ConceptNet is a semantic network that represents things that computers should know about the world, especially for the purpose of understanding text written by people. Its “concepts” are represented using words and phrases of many different natural language — unlike similar projects, it’s not limited to a single language such as English. It expresses over 13 million links between these concepts, and makes the whole data set available under a Creative Commons license.

MUStARD: Multimodal Sarcasm Detection Dataset

Publication Date: 2019
Creators: Castro, Santiago; Hazarika, Devamanyu; Pérez-Rosas, Verónica; Zimmermann, Roger; Mihalcea, Rada; Poria, Soujanya

We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context, which provides additional information on the scenario where the utterance occurs.

Twitter US Airline Sentiment

Publication Date: 2016
Creators: (Makone, Ashutosh)

A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as “late flight” or “rude service”).

Consumer Complaint Database

Publication Date: 2011
Creators: Consumer Financial Protection Bureau (CFPB)

Each week we send thousands of consumers’ complaints about financial products and services to companies for response. Those complaints are published here after the company responds or after 15 days, whichever comes first. By adding their voice, consumers help improve the financial marketplace.

Stanford Natural Language Inference; SNL

Publication Date: 2015
Creators: Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D.

The Stanford Natural Language Inference (SNLI) corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral. We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation-learning methods, as well as a resource for developing NLP models of any kind.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.