Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The dataset is 14.2 GB large.
Publications Citing This Dataset:
Edgar León-Sandoval, Mahdi Zareei, Liliana Ibeth Barbosa-Santillán, Luis Eduardo Falcón Morales, Antonio Pareja Lora, Gilberto Ochoa Ruiz, "Monitoring the Emotional Response to the COVID-19 Pandemic Using Sentiment Analysis: A Case Study in Mexico", Computational Intelligence and Neuroscience, vol. 2022, Article ID 4914665, 11 pages, 2022. https://doi.org/10.1155/2022/4914665 Kirn, S.L., Hinders, M.K. Ridge count thresholding to uncover coordinated networks during onset of the Covid-19 pandemic. Soc. Netw. Anal. Min. 12, 45 (2022).
https://doi.org/10.1007/s13278-022-00873-0 Preiss, J. Predicting the impact of online news articles – is information necessary?. Multimed Tools Appl 82, 8791–8809 (2023). https://doi.org/10.1007/s11042-021-11621-5 Lopez, C.E., Gallemore, C. An augmented multilingual Twitter dataset for studying the COVID-19 infodemic. Soc. Netw. Anal. Min. 11, 102 (2021). https://doi.org/10.1007/s13278-021-00825-0 Rajdeep Mukherjee, Atharva Naik, Sriyash Poddar, Soham Dasgupta, and Niloy Ganguly. 2021. Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 2303–2307. https://doi.org/10.1145/3404835.3463080 Lyu JC , Luli GK. Understanding the Public Discussion About the Centers for Disease Control and Prevention During the COVID-19 Pandemic Using Twitter Data: Text Mining Analysis Study. J Med Internet Res 2021;23(2):e25108
doi:10.2196/25108
O'Leary, DE, Storey, VC. A Google–Wikipedia–Twitter Model as a Leading Indicator of the Numbers of Coronavirus Deaths. Intell Sys Acc Fin Mgmt. 2020; 27: 151– 158.
https://doi.org/10.1002/isaf.1482