business

Showing 1-7 of 7 results

Goodreads-books

Creators: Zając, Zygmunt
Publication Date: 2019
Creators: Zając, Zygmunt

The primary reason for creating this dataset is the requirement of a good clean dataset of books. It contains important features such as book titles, authors, average ratings, ISBN identifiers, language codes, number of pages, ratings count, text reviews count, publication dates, and publishers. A distinctive aspect of this dataset is its ability to support a wide range of book-related analyses, such as trends in book popularity, author influence, and reader preferences. The data set is 1.56 MB large and was scraped via the Goodreads API. It encompasses over 10,000 observations, each representing a unique book entry with multiple attributes. The structure of the dataset is straightforward, consisting of a single CSV file with the following key columns:

  • bookID: A unique identification number for each book.
  • title: The official title of the book.
  • authors: Names of the authors, with multiple authors separated by a delimiter.
  • average_rating: The average user rating for the book.
  • isbn & isbn13: The 10-digit and 13-digit International Standard Book Numbers, respectively.
  • language_code: The primary language in which the book is published (e.g., ‘eng’ for English).
  • num_pages: The total number of pages in the book.
  • ratings_count: The total number of ratings the book has received from users.
  • text_reviews_count: The total number of text reviews written by users.
  • publication_date: The original publication date of the book.
  • publisher: The name of the publishing house.

US Funds dataset from Yahoo Finance

Creators: (Leone, Stefano)
Publication Date: 2018
Creators: (Leone, Stefano)

The US Funds dataset from Yahoo Finance collects data on 24,821 mutual funds and 1,680 exchange-traded funds (ETFs). This contains detailed information on various aspects of each fund, including general characteristics, portfolio indicators, returns, and financial ratios. A notable feature of this dataset is its extensive coverage, offering insights into both mutual funds and ETFs, which can be instrumental for comparative analyses and investment research. The dataset was published in 2018 and contains data up to November 2020, providing a temporal coverage that spans several years leading up to that point. In total, it covers 1.7 GB.

The dataset includes various variables for each fund, such as:

  • fund_symbol: Symbol of the ETF.
  • price_date: Date of the price (in YYYY-MM-DD format).
  • open: Open daily price.
  • high: Highest daily price.
  • low: Lowest daily price.
  • close: Close daily price.
  • adj_close: Adjusted close daily price, which considers elements that have impacted the price such as share splits, dividends, etc.
  • volume: Daily traded volume.
  • nav_per_share: Daily Net Asset Value (NAV) per share.
  • region: Name of the region in which the fund has the domicile.
  • initial_investment: Minimum amount for initial investment.
  • subsequent_investment: Minimum amount for subsequent investments.
  • exchange_code: Code of the exchange where the fund is traded.
  • exchange_name: Name of the exchange where the fund is traded

 

Amazon product co-purchasing network metadata

Creators: Leskovec, Jure
Publication Date: 2006
Creators: Leskovec, Jure

The data was collected by crawling the Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes). It is valuable for analyzing product relationships, customer behavior, and the dynamics of product co-purchasing networks. For each product the following information is available:

Title
Salesrank
List of similar products (that get co-purchased with the current product)
Detailed product categorization
Product reviews: time, customer, rating, number of votes, number of people that found the review helpful.

The data was collected in summer 2006. It has a size of 201 MB and structured into:

  • Product Metadata: Information such as product ID, ASIN, title, group, sales rank, similar products, and categories.

  • Product Reviews: Details including review time, customer ID, rating, number of votes, and helpfulness votes.

Yelp Open Dataset

Creators: Yelp, Inc.
Publication Date: 2015
Creators: Yelp, Inc.
The Yelp dataset offers a collection of real-world data from Yelp, intended for educational and academic purposes. It encompasses information about businesses, user reviews, photos, and check-ins, providing valuable insights into local commerce and consumer behavior. In total, this dataset contains 6.9M online reviews for 150k businesses and covers 11 metropolitan areas. It also includes more than 200,000 images related to the reviews. It has a compressed size of 4,9 GB and uncompressed 10,9 GB available in JSON files. The data consists of multiple sub datasets:

  1. Yelp Business data: Contains business data including location data, attributes, and categories.
  2. Yelp Review data: Contains full review text data including the user_id that wrote the review and the business_id the review is written for.
  3. Yelp User data: User data including the user’s friend mapping and all the metadata associated with the user.
  4. Yelp Checkin data: Checkins on a business.
  5. Yelp Tip data: Tips written by a user on a business. Tips are shorter than reviews and tend to convey quick suggestions.
  6. Yelp Photo data: Contains photo data including the caption and classification (one of “food”, “drink”, “menu”, “inside” or “outside”).

Available as JSON files, use can use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.

 

Consumer Complaint Database

Creators: Consumer Financial Protection Bureau (CFPB)
Publication Date: 2011
Creators: Consumer Financial Protection Bureau (CFPB)
The database encompasses a wide array of complaints related to various financial sectors, including debt collection, credit reporting, mortgages, credit cards, and more. Each week we send thousands of consumers’ complaints about financial products and services to companies for response. Those complaints are published here after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first. Complaint narratives are consumers’ descriptions of their experiences in their own words. By adding their voice, consumers help improve the financial marketplace. The database generally updates daily.  Each complaint entry includes details such as the date received, product type, issue, company involved, consumer’s narrative (if consented for publication), company response, and the complaint’s current status. This dataset serves as a valuable resource for identifying trends, assessing company practices, and informing policy decisions. 
As of February 22, 2025, the database contains a total of 7,867,198 complaints. The dataset is 1,36 GB in size and available for download in CSV format. The dataset spans from December 1, 2011, to the present, with regular updates to include new complaints.

Future of Business - Survey Results

Creators: Facebook; OECD; World Bank
Publication Date: 2018
Creators: Facebook; OECD; World Bank

The Future of Business survey is a collaboration between Facebook, the OECD and the World Bank to provide timely insights on the perceptions, challenges, and outlook of online Small and Medium Enterprises (SMEs). The Future of Business survey was first launched as a monthly survey in 17 countries in February 2016 and expanded to 42 countries in 2018. In 2019, the Future of Business survey increased coverage to 97 countries and moved to a bi-annual cadence.

The target population consists of SMEs that have an active Facebook business Page and include both newer and longer-standing businesses, spanning across a variety of sectors. To date, more than 90 million SMEs have created a Facebook Page, and more than 700,000 of these Facebook Page owners have taken the survey. With more businesses leveraging online tools each day, the survey provides a lens into a new mobilized, digital economy and, in particular, insights on the actors: a relatively unmeasured community worthy of deeper consideration and considerable policy interest. The dataset is approximately 0,04 GB in size.

The survey includes questions about perceptions of current and future economic activity, challenges, business characteristics and strategy. Custom modules include questions related to regulation, access to finance, digital payments, and digital skills.

Advertisement CTR Prediction Data

Creators: Huawei
Publication Date: 2020
Creators: Huawei

Advertisement CTR prediction is the key problem in the area of computing advertising. Increasing the accuracy of Advertisement CTR prediction is critical to improve the effectiveness of precision marketing. In this competition, we release big advertising datasets that are anonymized. Based on the datasets, contestants are required to build Advertisement CTR prediction models. The aim of the event is to find talented individuals to promote the development of Advertisement CTR prediction algorithms. The datasets contain the advertising behavior data collected from seven consecutive days, including a training dataset and a testing dataset. The total size of the datasets amounts to 6,86 GB. It contains millions of observations and is structured into training and testing sets, with multiple variables capturing different aspects of user-ad interactions. These variables include user identifiers, ad identifiers, timestamps, user behavior features, and ad content features, allowing researchers to analyze engagement patterns and develop predictive models for ad click-through rates. This dataset is valuable for improving advertising strategies and refining targeted marketing approaches.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.