Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The dataset is 14.2 GB large.
Publications Citing This Dataset:
Edgar León-Sandoval, Mahdi Zareei, Liliana Ibeth Barbosa-Santillán, Luis Eduardo Falcón Morales, Antonio Pareja Lora, Gilberto Ochoa Ruiz, "Monitoring the Emotional Response to the COVID-19 Pandemic Using Sentiment Analysis: A Case Study in Mexico", Computational Intelligence and Neuroscience, vol. 2022, Article ID 4914665, 11 pages, 2022. https://doi.org/10.1155/2022/4914665 Kirn, S.L., Hinders, M.K. Ridge count thresholding to uncover coordinated networks during onset of the Covid-19 pandemic. Soc. Netw. Anal. Min. 12, 45 (2022).
https://doi.org/10.1007/s13278-022-00873-0 Preiss, J. Predicting the impact of online news articles – is information necessary?. Multimed Tools Appl 82, 8791–8809 (2023). https://doi.org/10.1007/s11042-021-11621-5 Lopez, C.E., Gallemore, C. An augmented multilingual Twitter dataset for studying the COVID-19 infodemic. Soc. Netw. Anal. Min. 11, 102 (2021). https://doi.org/10.1007/s13278-021-00825-0 Rajdeep Mukherjee, Atharva Naik, Sriyash Poddar, Soham Dasgupta, and Niloy Ganguly. 2021. Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 2303–2307. https://doi.org/10.1145/3404835.3463080 Lyu JC , Luli GK. Understanding the Public Discussion About the Centers for Disease Control and Prevention During the COVID-19 Pandemic Using Twitter Data: Text Mining Analysis Study. J Med Internet Res 2021;23(2):e25108
doi:10.2196/25108
O'Leary, DE, Storey, VC. A Google–Wikipedia–Twitter Model as a Leading Indicator of the Numbers of Coronavirus Deaths. Intell Sys Acc Fin Mgmt. 2020; 27: 151– 158.
https://doi.org/10.1002/isaf.1482
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.