Resources by Stefan

Food.com Recipe & Review Data

Creators: Majumder, Bodhisattwa P.; Li, Shuyang; Ni, Jianmo; McAuley, Julian
Publication Date: 2019
Creators: Majumder, Bodhisattwa P.; Li, Shuyang; Ni, Jianmo; McAuley, Julian
This dataset consists of 180K+ recipes and 700K+ recipe reviews covering 18 years of user interactions and uploads on Food.com (formerly GeniusKitchen), an online recipe aggregator. This extensive collection allows for in-depth analysis of culinary trends, user preferences, and recipe characteristics over nearly two decades.The dataset is 0,85 GB in size and contains three sets of data from Food.com:Interaction splits

  • interactions_test.csv
  • interactions_validation.csv
  • interactions_train.csv

Preprocessed data for result reproduction

In this format, the recipe text metadata is tokenized via the GPT subword tokenizer with start-of-step, etc. tokens.

  • PP_recipes.csv
  • PP_users.csv

To convert these files into the pickle format required to run our code off-the-shelf, you may use pandas.read_csv and pandas.to_pickle to convert the CSV’s into the proper pickle format.

 

Advertisement CTR Prediction Data

Creators: Huawei
Publication Date: 2020
Creators: Huawei

Advertisement CTR prediction is the key problem in the area of computing advertising. Increasing the accuracy of Advertisement CTR prediction is critical to improve the effectiveness of precision marketing. In this competition, we release big advertising datasets that are anonymized. Based on the datasets, contestants are required to build Advertisement CTR prediction models. The aim of the event is to find talented individuals to promote the development of Advertisement CTR prediction algorithms. The datasets contain the advertising behavior data collected from seven consecutive days, including a training dataset and a testing dataset. The total size of the datasets amounts to 6,86 GB. It contains millions of observations and is structured into training and testing sets, with multiple variables capturing different aspects of user-ad interactions. These variables include user identifiers, ad identifiers, timestamps, user behavior features, and ad content features, allowing researchers to analyze engagement patterns and develop predictive models for ad click-through rates. This dataset is valuable for improving advertising strategies and refining targeted marketing approaches.

Relative Wealth Index Data

Creators: Chi, Guanghua; Fang, Han; Chatterjee, Sourav; Blumenstock, Joshua E.
Publication Date: 2021
Creators: Chi, Guanghua; Fang, Han; Chatterjee, Sourav; Blumenstock, Joshua E.
The Relative Wealth Index predicts the relative standard of living within countries using de-identified connectivity data, satellite imagery and other nontraditional data sources.
It has been built by researchers at the University of Carlifornia – Berkeley and Facebook. The estimates are built by applying machine learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, topographic maps, as well as aggregated and de-identified connectivity data from Facebook. They train and calibrate the estimates using nationally-representative household survey 20 data from 56 LMICs, then validate their accuracy using four independent sources of household survey data from 18 countries. They also provide confidence intervals for each micro-estimate to facilitate responsible downstream use.
The data is provided for 93 low and middle-income countries at 2.4km resolution. It covers the time between April 01, 2021 and December 22, 2023. An interactive map of the Relative Wealth Index is available here: http://beta.povertymaps.net/
The combined size of the dataset is approximately 0,08 GB, available in CSV format.
Please cite / attribute any use of this dataset using the following:
Microestimates of wealth for all low- and middle-income countries Guanghua Chi, Han Fang, Sourav Chatterjee, Joshua E. Blumenstock Proceedings of the National Academy of Sciences Jan 2022, 119 (3) e2113658119; DOI: 10.1073/pnas.2113658119 

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.