Characterizing Online Discussion Using Coarse Discourse Sequences

Creators:

Zhang, Amy; Culbertson, Brian; Paritosh, Praveen

Publication Date:

2017

Data Category:

Dataset Description:

In this work, we present a novel method for classifying comments in online discussions into a set of coarse discourse acts towards the goal of better understanding discussions at scale. To facilitate this study, we devise a categorization of coarse discourse acts designed to encompass general online discussion and allow for easy annotation by crowd workers. We collect and release a corpus of over 9,000 threads comprising over 100,000 comments manually annotated via paid crowdsourcing with discourse acts and randomly sampled from the site Reddit. Using our corpus, we demonstrate how the analysis of discourse acts can characterize different types of discussions, including discourse sequences such as Q&A pairs and chains of disagreement, as well as different communities. Finally, we conduct experiments to predict discourse acts using our corpus, finding that structured prediction models such as conditional random fields can achieve an F1 score of 75%. We also demonstrate how the broadening of discourse acts from simply question and answer to a richer set of categories can improve the recall performance of Q&A extraction.

Publications Citing This Dataset:

Dutta, Chakraborty, and Das. (2019). How did the discussion go: Discourse act classification in social media conversations. Linking and Mining Heterogeneous and Multi-view Data, 137-160.
https://doi.org/10.1007/978-3-030-01872-6_6

Variables:

Details:

Bookmark this Dataset/Publication

Characterizing Online Discussion Using Coarse Discourse Sequences

SIFMA – U.S. Treasury Securities Market Statistics

SBA – Paycheck Protection Program (PPP) Loan-Level Data

EndoMondo Fitness Tracking Data

Characterizing Online Discussion Using Coarse Discourse Sequences

Sign In

Register

Reset Password