Showing 217-224 of 272 results

Social capital I: measurement and associations with economic mobility

Creators: Chetty, Raj; Jackson, Mathew O.; Kuchler, Theresa; Stroebel, Johannes; Hiller, Abigail; Oppenheemer, Sarah
Publication Date: 2022
Creators: Chetty, Raj; Jackson, Mathew O.; Kuchler, Theresa; Stroebel, Johannes; Hiller, Abigail; Oppenheemer, Sarah
Social capital – the strength of our relationships and communities – has been shown to play an important role in outcomes ranging from income to health. This dataset provides a detailed analysis of social capital across various U.S. communities, focusing on its impact on economic mobility. Using privacy-protected data on 21 billion friendships from Facebook, we measure three types of social capital in each neighborhood, high school, and college in the United States:

  • Cohesiveness: the degree to which social networks are fragmented into cliques
  • Economic connectedness: the degree to which low-income and high-income people are friends with each other
  • Civic engagement: rates of volunteering and participation in community organizations

The dataset is approximately 8 MB in size and structured into different geographical levels, including ZIP codes, high schools, and colleges across the United States. Each entry details the three key measures of social capital—economic connectedness, cohesiveness, and civic engagement—allowing for targeted analysis at various community levels.

 

Facebook Privacy-Protected Full URLs Data Set

Creators: Messing, Solomon; DeGregorio, Christina; Hillenbrand, Bennett; King, Gary; Mahanti, Saurav; Mukerjee, Zagreb; Nayak, Chaya; Persily, Nate; State, Bogdan; Wilkins, Arjun
Publication Date: 2020
Creators: Messing, Solomon; DeGregorio, Christina; Hillenbrand, Bennett; King, Gary; Mahanti, Saurav; Mukerjee, Zagreb; Nayak, Chaya; Persily, Nate; State, Bogdan; Wilkins, Arjun

This is a codebook for data on the demographics of people who viewed, shared, and otherwise interacted with web pages (URLs) shared on Facebook, between January 1, 2017 and October 31, 2022. The data has about 68 million URLs, over 3.1 trillion rows, and over 71 trillion cell values. It results from a collaboration between Facebook and Social Science One (at IQSS at Harvard), originally prepared for Social Science One grantees and describes the “full” URLs dataset, including its scope, structure, and fields. This is version 10 of the codebook and data (released 4/13/2023), first described by Gary King and Nathaniel Persily at https://socialscience.one/blog/update-social-science-one. The dataset’s structure is organized to facilitate detailed analysis. Each entry corresponds to a unique URL and includes aggregated user interaction metrics. These metrics are further broken down by various demographic dimensions, such as age, gender, and country. For users in the United States, additional categorizations include political page affinity, offering insights into how different political leanings may influence content engagement.

Video Game Sales

Creators: Smith, Gregory
Publication Date: 2016
Creators: Smith, Gregory

This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com. The dataset has a size of 1,36 MB and includes games released up to the year 2016, offering a historical perspective on video game sales over several decades. It allows for in-depth analysis of sales trends across different regions, platforms, and genres, making it a valuable resource for market analysis and strategic planning within the video game industry. Each entry in the dataset includes the following attributes:

  • Rank: Overall sales ranking of the game.
  • Name: Title of the game.
  • Platform: The platform on which the game was released (e.g., PC, PS4).
  • Year: Year of the game’s release.
  • Genre: Genre classification of the game.
  • Publisher: Company that published the game.
  • NA_Sales: Sales figures in North America (in millions).
  • EU_Sales: Sales figures in Europe (in millions).
  • JP_Sales: Sales figures in Japan (in millions).
  • Other_Sales: Sales figures in the rest of the world (in millions).
  • Global_Sales: Total worldwide sales (in millions).

World Happiness Report

Creators: F. Helliwell, John; Layard, Richard; Sachs, Jeffrey D. ; De Neve, Jan-Emmanuel; Aknin, Lara B.; Wang, Shun
Publication Date: 2012
Creators: F. Helliwell, John; Layard, Richard; Sachs, Jeffrey D. ; De Neve, Jan-Emmanuel; Aknin, Lara B.; Wang, Shun

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others. The dataset has a size of 80,86 kB.

Popular Movies of TMDb

Creators: Mondal, Sankha Subhra
Publication Date: 2020
Creators: Mondal, Sankha Subhra

This dataset of the 10,000 most popular movies across the world has been fetched through the read API.
TMDB’s free API provides for developers and their team to programmatically fetch and use TMDb’s data.
Their API is to use as long as you attribute TMDb as the source of the data and/or images. Also, they update their API from time to time. The data set is 3.2 MB large. It offers valuable insights into global cinematic trends and preferences.

Each movie entry in the dataset includes the following attributes:

  • title: The name of the movie.
  • overview: A brief summary of the movie’s plot.
  • original_language: The language in which the movie was originally produced.
  • vote_average: The average user rating of the movie on TMDb.

goodbooks-10k

Creators: Zając, Zygmunt
Publication Date: 2017
Creators: Zając, Zygmunt

The dataset contains six million ratings for ten thousand most popular books (with most ratings). It offers a rich resource for analyzing reading habits, book popularity, and user engagement within the literary community. There are also books marked to read by the users, book metadata (author, year, etc.) and tags/shelves/genres.

ratings contains ratings sorted by time. Ratings go from one to five. Both book IDs and user IDs are contiguous. For books, they are 1-10000, for users, 1-53424.

to_read  provides IDs of the books marked “to read” by each user, as user_id,book_id pairs, sorted by time. There are close to a million pairs.

books has metadata for each book (goodreads IDs, authors, title, average rating, etc.). The metadata have been extracted from goodreads XML files.

book_tags contains tags/shelves/genres assigned by users to books. Tags in this file are represented by their IDs. They are sorted by goodreads_book_id  ascending and count descending.

The date set is 68.8 MB large.

European Funds dataset from Morningstar

Creators: (Leone, Stefano)
Publication Date: 2019
Creators: (Leone, Stefano)

The file contains 57,603 Mutual Funds and 9,495 ETFs with general aspects (as Total Net Assets, management company and size), portfolio indicators (as cash, stocks, bonds, and sectors), returns (as yeartodate, 2020-11) and financial ratios (as price/earning, Treynor and Sharpe ratios, alpha, and beta).
Additional data in terms of sustainability is also available. A key feature of this dataset is the inclusion of detailed Morningstar ratings, which are widely used in the financial industry to assess fund quality based on past performance, risk-adjusted returns, and analyst evaluations. Additionally, it offers categorization of funds, allowing for segmentation by investment type, sector, region, and fund style (e.g., growth vs. value investing). The dataset has a total size of approximately 103.88 MB.

Overall, the dataset is structured into the following variables:

  • ticker: Fund ticker code.
  • isin: Fund ISIN code.
  • fund_name: Extended name of the fund.
  • inception_date: Date of the fund’s inception.
  • category: Fund category.
  • rating: Morningstar rating.
  • analyst_rating: Morningstar analyst rating.
  • risk_rating: Morningstar risk rating.
  • performance_rating: Morningstar performance rating.

TripAdvisor European restaurants

Creators: (Leone, Stefano)
Publication Date: 2021
Creators: (Leone, Stefano)

TripAdvisor is the most popular travel website and it stores data for almost all restaurants, showing locations (even latitude and longitude coordinates), restaurant descriptions, user ratings and reviews, and many more aspects. The dataset is 0.68 GB large.

The TripAdvisor dataset includes 1,083,397 restaurants with attributes such as location data, average rating, number of reviews, open hours, cuisine types, awards, etc.

The dataset combines the restaurants from the main European countries, the data has been scraped in early May 2021.

The dataset is structured with various variables for each restaurant, such as:

  • restaurant_link: Unique TripAdvisor restaurant link.
  • restaurant_name: Name of the restaurant on TripAdvisor.
  • original_location: Original location displayed on TripAdvisor.
  • country: Country name retrieved from original_location.
  • region: Region name retrieved from original_location.
  • province: Province name retrieved from original_location.
  • city: City name retrieved from original_location.
  • address: Address displayed on TripAdvisor.
  • latitude: Latitude coordinate.
  • longitude: Longitude coordinate.
  • claimed: Indicates if the restaurant business is claimed on TripAdvisor.
  • awards: Award names.
  • popularity_detailed: Detailed popularity ranking.
  • popularity_generic: Generic popularity ranking (among all places to eat in the area).

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.