Showing 1-8 of 34 results

Fact-Checking Facebook Politics Pages

Publication Date: 2016
Creators: Silverman Craig; Strapagiel, Lauren; Shaban, Hamza; Hall, Ellie; Singer-Vine, Jeremy

This repository contains the data and analysis for the BuzzFeed News article, “Hyperpartisan Facebook Pages Are Publishing False And Misleading Information At An Alarming Rate,” published October 20, 2016.

Characterizing Online Discussion Using Coarse Discourse Sequences

Publication Date: 2017
Creators: Zhang, Amy; Culbertson, Brian; Paritosh, Praveen

In this work, we present a novel method for classifying comments in online discussions into a set of coarse discourse acts towards the goal of better understanding discussions at scale. To facilitate this study, we devise a categorization of coarse discourse acts designed to encompass general online discussion and allow for easy annotation by crowd workers. We collect and release a corpus of over 9,000 threads comprising over 100,000 comments manually annotated via paid crowdsourcing with discourse acts and randomly sampled from the site Reddit. Using our corpus, we demonstrate how the analysis of discourse acts can characterize different types of discussions, including discourse sequences such as Q&A pairs and chains of disagreement, as well as different communities. Finally, we conduct experiments to predict discourse acts using our corpus, finding that structured prediction models such as conditional random fields can achieve an F1 score of 75%. We also demonstrate how the broadening of discourse acts from simply question and answer to a richer set of categories can improve the recall performance of Q&A extraction.

Stack Exchange Data

Publication Date: 2014
Creators: Stack Exchange Inc.

This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files zipped via 7-zip using bzip2 compression. Each site archive includes Posts, Users, Votes, Comments, PostHistory and PostLinks.

3 Million Russian troll tweets

Publication Date: 2018
Creators: FiveThirtyEight; Warren, Patrick ;Linvill, Darren

This directory contains data on nearly 3 million tweets sent from Twitter handles connected to the Internet Research Agency, a Russian “troll factory” and a defendant in an indictment filed by the Justice Department in February 2018, as part of special counsel Robert Mueller’s Russia investigation. The tweets in this database were sent between February 2012 and May 2018, with the vast majority posted from 2015 through 2017.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.