Showing 1-5 of 5 results

ConceptNet

Publication Date: 2021
Creators: ConceptNet

ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave unstated.ConceptNet is a semantic network that represents things that computers should know about the world, especially for the purpose of understanding text written by people. Its “concepts” are represented using words and phrases of many different natural language — unlike similar projects, it’s not limited to a single language such as English. It expresses over 13 million links between these concepts, and makes the whole data set available under a Creative Commons license.

MUStARD: Multimodal Sarcasm Detection Dataset

Publication Date: 2019
Creators: Castro, Santiago; Hazarika, Devamanyu; Pérez-Rosas, Verónica; Zimmermann, Roger; Mihalcea, Rada; Poria, Soujanya

We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context, which provides additional information on the scenario where the utterance occurs.

Twitter US Airline Sentiment

Publication Date: 2016
Creators: (Makone, Ashutosh)

A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as “late flight” or “rude service”).

Consumer Complaint Database

Publication Date: 2011
Creators: Consumer Financial Protection Bureau (CFPB)
Each week we send thousands of consumers’ complaints about financial products and services to companies for response. Those complaints are published here after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first. Complaint narratives are consumers’ descriptions of their experiences in their own words. By adding their voice, consumers help improve the financial marketplace. The database generally updates daily. 

Stanford Natural Language Inference; SNL

Publication Date: 2015
Creators: Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D.

The Stanford Natural Language Inference (SNLI) corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral. It is only 0.09GB large

It consists of a training, validation, and test set. The variables contained in each of these sub datasets is described below.

The data providers aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation-learning methods, as well as a resource for developing NLP models of any kind.

The following paper introduces the corpus in detail. If you use the corpus in published work, please cite it:

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.