Filename |
Unique identifier of this advertisement |
URLs TheEconomistPageScans |
comma separated list of URLs to JPG image files of scanned The Economist pages containing this ad. For multi page ads this can be multiple URLs. |
Date of Issue |
Date of The Economist issue (Years-Month-Day) |
Ad size (pages) |
e.g. 1 = one full page, 0.75 = 3/4 of a page, 2 = two pages |
Ad size < 1/4 |
1 if Ad covers less than 25% of the page; 0 if Ad does not cover less than 25% of the page |
1/4 <= Ad size < 2/4 |
1 if Ad covers at least 25% of the page, but less than 50%; 0 if Ad size is not in this range |
2/4 <= Ad size < 3/4 |
size |
3/4 <= Ad size < 4/4 |
size |
4/4 <= Ad size < 8/4 |
size |
8/4 <= Ad size |
size |
Bounding Box relative X1 |
Left-top coordinate of a rectangle identifying the ad on the page, relative to the pixel coordinates of the image from column 2 ("URLs …"). Multiply this value by the width of the image to get the absolute x coordinate. If the ad is a multi page ad, the images from column 2 have to be horizontally concatenated first. |
Bounding Box relative Y1 |
Left-top coordinate |
Bounding Box relative X2 |
Right-bottom coordinate |
Bounding Box relative Y2 |
Right-bottom coordinate |
Feature Complexity (JPG file size in kb / Ad Size) |
More complex images will have higher values. |
JPG File Size (Byte) |
e.g. 186609 |
OCR GoogleVision |
Advertisement text, based on text recognition using Google Vision API (2021) of the full ad image. |
Brand |
Brand name of advertiser |
Brand is generic (e.g. 'Notices') |
If "True" then this ad doesn't represent a single brand, but a category of ad-like content. Most common categories are "Notices", "Appointments", "Courses". |
Text Class GoogleVision |
Based on the OCR text the ad was classified using GoogleVision API (2021). See full list of categories. This column contains a JSON string with a list of text classes and their class probabilities. |
Category most confident, Level 1 |
Top level category from Google Vision text analysis for this ad. E.g. "/Finance" |
Category most confident, Level 2 |
e.g. "/Finance/Banking" |
Category most confident, Level 3 |
e.g. "/Finance/Banking/B2B" |
Colorfulness (Hasler & Suesstrunk, 2003) |
Colorfulness of ad, based on this paper. |
Color variety (Ke et al., 2006) |
Color variety of ad, based on this paper. |
Brightness_Mean |
Mean of brightness values of all pixels in ad. |
Brightness_SD |
Standard deviation of brightness values of all pixels. |
Red_Mean |
Mean value of redness of all pixels. |
Red_SD |
Standard deviation of redness of all pixels. |
Green_Mean |
Mean value of greenness of all pixels. |
Green_SD |
Standard deviation of greenness of all pixels. |
Blue_Mean |
Mean value of blueness of all pixels. |
Blue_SD |
Standard deviation of blueness of all pixels. |
Text readability Gunning Fog |
Text readability measure according to Gunning Fog index. |
Text readability SMOG |
Text readability measure according to SMOG |
Text readability Flesch Reading Ease |
Text readability measure according to FLESCH |
Text readability Dale Chall |
Text readability measure according to Dale Chall |