This dataset was constructed to support participants in the Netflix Prize. See
[Web Link] for details about the prize.
There are over 480,000 customers in the dataset, each identified by a unique integer id.
The title and release year for each movie is also provided. There are over 17,000 movies in the dataset, each identified by a unique integer id.
The dataset contains over 100 million ratings. The ratings were collected between October 1998 and December 2005 and reflect the distribution of all ratings received during this period. Each rating has a customer id, a movie id, the date of the rating, and the value of the rating.
As part of the original Netflix Prize a set of ratings was identified whose rating values were not provided in the original dataset. The object of the Prize was to accurately predict the ratings from this 'qualifying' set. These missing ratings are now available in the grand_prize.tar.gz dataset file.
Wu, M. (2007). Collaborative Filtering via Ensembles of Matrix Factorizations. In KDD Cup and Workshop.
https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_1790294
Paterek, A. (2007). Improving regularized singular value decomposition for collaborative filtering. In Proceedings of KDD cup and workshop (Vol. 2007, pp. 5-8).
https://zhangyk8.github.io/teaching/file_spring2018/Improving_regularized_singular_value_decomposition_for_collaborative_filtering.pdf
Szomszor, Martin, Cattuto, Ciro, Alani, Harith, O’Hara, Kieron, Baldassarri, Andrea, Loreto, Vittorio and Servedio, Vito D.P. (2007)Folksonomies, the Semantic Web, and Movie Recommendation. 4th European Semantic Web Conference, Bridging the Gap between Semantic Web and Web 2.0, Innsbruck, Austria.
https://eprints.soton.ac.uk/264007/
Lim, Y. J., & Teh, Y. W. (2007, August). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD cup and workshop (Vol. 7, pp. 15-21).
https://www.stats.ox.ac.uk/~teh/research/bayesml/kddcup2007.pdf
Goel, D., & Batra, D. (2009). Predicting user preference for movies using netflix database. Department of Electrical and Computer Engineering, Carniege Mellon University, 1-7.
https://www.cs.cmu.edu/~epxing/Class/10701-06f/project-reports/goel_batra.pdf
Bell, R. M., & Koren, Y. (2007, August). Improved neighborhood-based collaborative filtering. In KDD cup and workshop at the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 7-14). sn.
https://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/Neighbor-Koren.pdf
Salakhutdinov, R., Mnih, A., & Hinton, G. (2007, June). Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning (pp. 791-798).
https://dl.acm.org/doi/abs/10.1145/1273496.1273596
Bell, R. M., & Koren, Y. (2007, October). Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In Seventh IEEE international conference on data mining (ICDM 2007) (pp. 43-52). IEEE.
https://ieeexplore.ieee.org/abstract/document/4470228
Raiko, T., Ilin, A., Karhunen, J. (2007). Principal Component Analysis for Large Scale Problems with Lots of Missing Values. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-540-74958-5_69
Mnih, A., & Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. Advances in neural information processing systems, 20.
https://proceedings.neurips.cc/paper_files/paper/2007/file/d7322ed717dedf1eb4e6e52a37ea7bcd-Paper.pdf
Narayanan, A., & Shmatikov, V. (2008, May). Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008) (pp. 111-125). IEEE.
https://ieeexplore.ieee.org/document/4531148