The Million Song Dataset is a large-scale music dataset created by The Echo Nest and LabROSA to advance research in music information retrieval and recommendation systems. It contains metadata for one million contemporary music tracks, including details such as song titles, artists, release years, and genres, as well as audio features like tempo, loudness, and key.
Publications Citing This Dataset:
Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210.
Afsar, M. M., Crump, T., & Far, B. (2022). Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7), 1-38.
Gorishniy, Y., Rubachev, I., Khrulkov, V., & Babenko, A. (2021). Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34, 18932-18943.
Liang, D., Krishnan, R. G., Hoffman, M. D., & Jebara, T. (2018, April). Variational autoencoders for collaborative filtering. In Proceedings of the 2018 world wide web conference (pp. 689-698).
Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2017, March). Convolutional recurrent neural networks for music classification. In 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 2392-2396). IEEE.
Variables:
Variable
Description
Song_ID
Unique identifier for each track in the dataset.
Title
Title of the song.
Artist
Name of the artist or band that performed the song.
Album
Name of the album on which the song was released.
Release_Year
Year in which the song was originally released.
Genre
Music genre classification for the song (e.g., rock, pop, jazz).
Tempo
Speed of the song, measured in beats per minute (BPM).
Loudness
Average volume level of the song in decibels (dB).
Key
Key in which the song is composed (e.g., C major, G minor).
Time_Signature
Meter of the song, indicating the number of beats per measure (e.g., 4/4, 3/4).
Mode
Tonal mode of the song, typically major or minor.
Song_Similarity
Data indicating similarity between songs, used for recommendation and playlist generation.
Artist_Popularity
Measure of the artist’s popularity, often based on play counts or ranking systems.
Artist_Biographical_Data
Biographical information about the artist, such as origin, genre, and career timeline.
Details:
Publisher:
LabROSA
Dataset Size:
audio features and metadata for one million contemporary popular music tracks, totaling approximately 300 GB