Twitter Dataset

Creators:
Cheng, Zhiyuan; Caverlee, James; Lee, Kyumin
Publication Date:
2010
Data Category:
Dataset Description:
This dataset is a collection of scraped public twitter updates used in coordination with an academic project to study the geolocation data related to twittering. We provide both training set and test set in the paper You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users in CIKM 2010. The training set contains 115,886 Twitter users and 3,844,612 updates from the users. All the locations of the users are self-labeled in United States in city-level granularity. The test set contains 5,136 Twitter users and 5,156,047 tweets from the users. In total, the dataset has a size of 30,0 kB. All the locations of users are uploaded from their smart phones with the form of "UT: Latitude,Longitude". The Twitter activity is covered over a period of five months, from September 2009 to January 2010, offering a valuable temporal snapshot of user interactions and content generation during that time. Structurally, the dataset is divided into four text files. The training set users file ("training_set_users.txt") contains user information in the format "UserIDtUserLocation", and the training set tweets file ("training_set_tweets.txt") stores tweets in the format "UserIDtTweetIDtTweettCreatedAt". Similarly, the test set users file ("test_set_users.txt") follows the same format as the training set users file, while the test set tweets file ("test_set_tweets.txt") follows the same structure as the training set tweets file.
Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.