Stanford Natural Language Inference; SNL

The Stanford Natural Language Inference (SNLI) corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral. It is only 0.09GB large

It consists of a training, validation, and test set. The variables contained in each of these sub datasets is described below.

The data providers aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation-learning methods, as well as a resource for developing NLP models of any kind.

The following paper introduces the corpus in detail. If you use the corpus in published work, please cite it:

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Variables:
Name Description
gold_label This is the label chosen by the majority of annotators. Where no majority exists, this is ‘-‘, and the pair should not be included when evaluating hard classification accuracy.
sentence1_binary_parse The same parse as in sentence{1,2}_parse, but formatted for use in tree-structured neural networks with no unary nodes and no labels.
sentence2_binary_parse The same parse as in sentence{1,2}_parse, but formatted for use in tree-structured neural networks with no unary nodes and no labels.
sentence1_parse The parse produced by the Stanford Parser (3.5.2, case insensitive PCFG, trained on the standard training set augmented with the parsed Brown Corpus) in Penn Treebank format.
sentence2_parse The parse produced by the Stanford Parser (3.5.2, case insensitive PCFG, trained on the standard training set augmented with the parsed Brown Corpus) in Penn Treebank format.
sentence1 The premise caption that was supplied to the author of the pair.
sentence2 The hypothesis caption that was written by the author of the pair.
captionID A unique identifier for each sentence1 from the original Flickr30k example.
pairID A unique identifier for each sentence1–sentence2 pair.
label1 These are all of the individual labels from annotators in phases 1 and 2. The first label comes from the phase 1 author, and is the only label for examples that did not undergo phase 2 annotation. In a few cases, the one of the phase 2 labels may be blank, indicating that an annotator saw the example but could not annotate it.
label2 These are all of the individual labels from annotators in phases 1 and 2. The first label comes from the phase 1 author, and is the only label for examples that did not undergo phase 2 annotation. In a few cases, the one of the phase 2 labels may be blank, indicating that an annotator saw the example but could not annotate it.
label3 These are all of the individual labels from annotators in phases 1 and 2. The first label comes from the phase 1 author, and is the only label for examples that did not undergo phase 2 annotation. In a few cases, the one of the phase 2 labels may be blank, indicating that an annotator saw the example but could not annotate it.
label4 These are all of the individual labels from annotators in phases 1 and 2. The first label comes from the phase 1 author, and is the only label for examples that did not undergo phase 2 annotation. In a few cases, the one of the phase 2 labels may be blank, indicating that an annotator saw the example but could not annotate it.
label5 These are all of the individual labels from annotators in phases 1 and 2. The first label comes from the phase 1 author, and is the only label for examples that did not undergo phase 2 annotation. In a few cases, the one of the phase 2 labels may be blank, indicating that an annotator saw the example but could not annotate it.
Publication Date:
2015
Creators:
Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D.
Publisher:
Stanford University
Companies:
Stanford University; Stanford University; Stanford University; Stanford University
Size:
0.09
Formats:
JSON; CSV
License:
Creative Commons Attribution Share Alike 4.0 International
Data Category:
Sentiment and Opinion Data

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.