AmazonQA

We introduce a new dataset and propose a method that combines information retrieval techniques for selecting relevant reviews (given a question) and “reading comprehension” models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we collect additional annotations, marking each question as either answerable or unanswerable based on the available reviews.

Variables:
Name Description
questionText String. The question.
questionType String. Either “yesno” for a boolean question, or “descriptive” for a non-boolean question.
review_snippets List of strings. Extracted review snippets relevant to the question (at most ten).
answerText String. The text for the answer.
answerType String. Type of the answer.
helpful List of two integers. The first integer indicates the number of uses who found the answer helpful. The second integer indicates the total number of responses.
asin String. Unique product ID for the product the question pertains to.
qid Integer. Unique question id for the question (in the entire dataset).
category String. Product category.
top_review_wilson String. The review with the highest wilson score
top_review_helpful String. The review voted as most helpful by the users.
is_answerable Boolean. Output of the answerability classifier indicating whether the question is answerable using the review snippets.
top_sentences_IR List of strings. A list of top sentences (at most 10) based on IR score with the question.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.