The Stanford Natural Language Inference (SNLI) Corpus

Creators:
Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D.
Publication Date:
2015
Data Category:
Dataset Description:
The Stanford Natural Language Inference (SNLI) corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral. It is only 0.09GB large It consists of a training, validation, and test set. The variables contained in each of these sub datasets is described below. The data providers aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation-learning methods, as well as a resource for developing NLP models of any kind. The following paper introduces the corpus in detail. If you use the corpus in published work, please cite it: Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.