The MSSD is a large-scale collection of user interaction data from a music streaming service, designed to support research in user behavior modeling, music information retrieval, and session-based recommendation systems. Released in 2019, this dataset contains approximately 160 million listening sessions, making it one of the most extensive datasets available for analyzing how users engage with music streaming platforms. It provides valuable insights into listening habits, session structures, and sequential user interactions, enabling researchers to study music recommendation, user retention, and engagement patterns. The dataset has a size of 70 GB and captures approximately 3.7 million unique tracks, covering a diverse range of musical content. Each session includes detailed user interactions, such as play, pause, skip, and seek actions, offering a granular view of how listeners interact with music over time. Additionally, it contains rich metadata and audio features for each track, including details such as track ID, artist name, album name, and genre, along with audio attributes like tempo, key, and loudness. These elements make the dataset highly valuable for both behavioral studies and technical research in music information retrieval.
Brost, Brian; Mehrotra, Rishabh; Jehan, Tristan (2019). The Music Streaming Sessions Dataset. WWW ’19: Proceedings of the 2019 World Wide Web Conference.
https://doi.org/10.1145/3308558.3313641
Meggetto, F., Revie, C., Levine, J., & Moshfeghi, Y. (2023). Why People Skip Music? On Predicting Music Skips using Deep Reinforcement Learning. Proceedings of the 2023 Conference on Human Information Interaction and Retrieval. Presented at the CHIIR ’23: ACM SIGIR Conference on Human Information Interaction and Retrieval, Austin TX USA.
https://doi.org/10.1145/3576840.3578312
Lyu, Yan & Dai, Sunhao & Wu, Peng & Dai, Quanyu & Deng, Yuhao & hu, Wenjie & Dong, Zhenhua & Xu, Jun & Zhu, Shengyu & Zhou, Xiao-Hua. (2022). A Semi-Synthetic Dataset Generation Framework for Causal Inference in Recommender Systems.
https://doi.org/10.48550/arXiv.2202.11351
Hosseininasab, Amin & Hoeve, Willem-Jan & Cire, Andre. (2022). Memory Efficient Tries for Sequential Pattern Mining.
https://doi.org/10.48550/arXiv.2202.06834
Heggli, O. A., Stupacher, J., & Vuust, P. (2021). Diurnal fluctuations in musical preference. Royal Society open science, 8(11), 210885.
https://doi.org/10.1098/rsos.210885
Hongyi Wen, Longqi Yang, and Deborah Estrin. 2019. Leveraging post-click feedback for content recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys '19). Association for Computing Machinery, New York, NY, USA, 278–286.
https://doi.org/10.1145/3298689.3347037
Francesco Meggetto, Crawford Revie, John Levine, and Yashar Moshfeghi. 2021. On Skipping Behaviour Types in Music Streaming Sessions. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM '21). Association for Computing Machinery, New York, NY, USA, 3333–3337.
https://doi.org/10.1145/3459637.3482123
Casper Hansen, Christian Hansen, Lucas Maystre, Rishabh Mehrotra, Brian Brost, Federico Tomasi, and Mounia Lalmas. 2020. Contextual and Sequential User Embeddings for Large-Scale Music Recommendation. In Proceedings of the 14th ACM Conference on Recommender Systems (RecSys '20). Association for Computing Machinery, New York, NY, USA, 53–62.
https://doi.org/10.1145/3383313.3412248
Cheng, L., Guo, R., Moraffah, R., Candan, K.S., Raglin, A.J., & Liu, H. (2019). A Practical Data Repository for Causal Learning with Big Data. BenchCouncil International Symposium.
DOI:10.1007/978-3-030-49556-5_23
Chang, S., Lee, S., & Lee, K. (2019). Sequential Skip Prediction with Few-shot in Streamed Music Contents. ArXiv, abs/1901.08203.
https://doi.org/10.1007/978-3-030-49556-5_23
Chang, S., Lee, S., & Lee, K. (2019). Sequential Skip Prediction with Few-shot in Streamed Music Contents. ArXiv, abs/1901.08203.
DOI:10.13140/RG.2.2.34790.88647