AmazonQA

Creators:
Gupta, Mansi; Kulkarni, Nitish; Chanda, Raghuveer; Rayasam, Anirudha; Lipton, Zachary C.
Publication Date:
2019
Data Category:
Dataset Description:
We introduce a new dataset and propose a method that combines information retrieval techniques for selecting relevant reviews (given a question) and “reading comprehension” models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we collect additional annotations, marking each question as either answerable or unanswerable based on the available reviews. This dataset is particularly valuable for developing models that integrate information retrieval techniques to select relevant reviews and "reading comprehension" models to synthesize answers based on those reviews. The dataset is approximately 4 GB in size and is available in JSON format.

The dataset uses the following variables:

  • questionText: The text of the question posed by the consumer.

  • questionType: Indicates whether the question is 'yes/no' for boolean questions or 'descriptive' for open-ended questions.

  • review_snippets: A list of extracted review snippets relevant to the question (up to ten).

  • answerText: The text of the answer provided.

  • answerType: Specifies the type of answer.

  • helpful: A list containing two integers; the first indicates the number of users who found the answer helpful, and the second indicates the total number of responses.

  • asin: The unique Amazon Standard Identification Number (ASIN) for the product the question pertains to.

  • qid: A unique question identifier within the dataset.

  • category: The product category.

  • top_review_wilson: The review with the highest Wilson score.

  • top_review_helpful: The review voted as most helpful by users.

  • is_answerable: A boolean indicating whether the question is answerable using the review snippets, based on an answerability classifier.

  • top_sentences_IR: A list of top sentences (up to ten) based on Information Retrieval (IR) score with the question.

Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.