Showing 249-256 of 273 results

Facebook Large Page-Page Network; MUSAE

Creators: Rozemberczki, Benedek; Allen, Carl; Sarkar, Rik
Publication Date: 2019
Creators: Rozemberczki, Benedek; Allen, Carl; Sarkar, Rik

This webgraph is a page-page graph of verified Facebook sites. It offers a comprehensive snapshot of the mutual like relationships among verified Face. Nodes represent official Facebook pages while the links are mutual likes between sites. Node features are extracted from the site descriptions that the page owners created to summarize the purpose of the site. This graph was collected through the Facebook Graph API in November 2017 and restricted to pages from 4 categories which are defined by Facebook. These categories are: politicians, governmental organizations, television shows and companies. The task related to this dataset is multi-class node classification for the 4 site categories. The dataset has a total size of 0,001 GB and encompasses 22,470 nodes, each representing an official Facebook page, and 171,002 edges, indicating mutual like relationships between these pages. It is particularly suited for tasks such as multi-class node classification, link prediction, community detection, and network visualization, offering researchers a rich resource to explore the structural properties and interactions within the Facebook page ecosystem.

The components of the dataset contain:

  • Nodes: Each node corresponds to a verified Facebook page, categorized into one of four types: politicians, governmental organizations, television shows, and companies.

  • Edges: Edges represent mutual like relationships between pages, forming an undirected graph that reflects the interconnectedness of these entities.

  • Node Features: Features are extracted from the descriptions provided by page owners, summarizing the purpose of their respective page

Social circles: Twitter

Creators: McAuley, Julian; Leskovec, Jure
Publication Date: 2012
Creators: McAuley, Julian; Leskovec, Jure

This dataset consists of ‘circles’ (or ‘lists’) from Twitter. Twitter data was crawled from public sources. It includes node features (profiles), circles, and ego networks and covers 81,306 nodes and 1,768,149 edges, capturing the intricate web of connections among users. This makes the dataset particularly useful for analyzing community structures, social influence, and network dynamics in a more nuanced way. It is approximately 0,02 GB in size and structured into:

  • Node features (profiles): Anonymized user profile information, where specific attributes are replaced with generic labels.

  • Circles: Lists of friends grouped by users, representing their social circles.

  • Ego networks: Subgraphs centered around individual users (egos), including their direct friends and the connections among those friends.

Wikipedia vote network

Creators: Leskovec, Jure; Huttenlocher, Daniel; Kleinberg, Jon
Publication Date: 2010
Creators: Leskovec, Jure; Huttenlocher, Daniel; Kleinberg, Jon

A small part of Wikipedia contributors are administrators, who are users with access to additional technical features that aid in maintenance. In order for a user to become an administrator a Request for adminship (RfA) is issued and the Wikipedia community via a public discussion or a vote decides who to promote to adminship. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all administrator elections and vote history data. This gave us 2,794 elections with 103,663 total votes and 7,066 users participating in the elections (either casting a vote or being voted on). Out of these 1,235 elections resulted in a successful promotion, while 1,559 elections did not result in the promotion. About half of the votes in the dataset are by existing admins, while the other half comes from ordinary Wikipedia users. The network contains all the Wikipedia voting data from the inception of Wikipedia till January 2008. Nodes in the network represent wikipedia users and a directed edge from node i to node j represents that user i voted on user j. In total, the dataset has a size of 0,01 GB.

Social circles: Google+

Creators: McAuley, Julian; Leskovec, Jure
Publication Date: 2012
Creators: McAuley, Julian; Leskovec, Jure

This dataset consists of ‘circles’ from Google+. Google+ data was collected from users who had manually shared their circles using the ‘share circle’ feature. The dataset includes node features (profiles), circles, and ego networks. The dataset includes 107,614 nodes and 13,673,453 edges, providing valuable insights into the structure and characteristics of social connections.

It has a size of 0,4 GB and is structured into:

  • Node features (profiles): Anonymized user profile information, where specific attributes are replaced with generic labels.

  • Circles: Lists of friends grouped by users, representing their social circles.

  • Ego networks: Subgraphs centered around individual users (egos), including their direct friends and the connections among those friends.

AmazonQA

Creators: Gupta, Mansi; Kulkarni, Nitish; Chanda, Raghuveer; Rayasam, Anirudha; Lipton, Zachary C.
Publication Date: 2019
Creators: Gupta, Mansi; Kulkarni, Nitish; Chanda, Raghuveer; Rayasam, Anirudha; Lipton, Zachary C.

We introduce a new dataset and propose a method that combines information retrieval techniques for selecting relevant reviews (given a question) and “reading comprehension” models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we collect additional annotations, marking each question as either answerable or unanswerable based on the available reviews. This dataset is particularly valuable for developing models that integrate information retrieval techniques to select relevant reviews and “reading comprehension” models to synthesize answers based on those reviews. The dataset is approximately 4 GB in size and is available in JSON format.

The dataset uses the following variables:

  • questionText: The text of the question posed by the consumer.

  • questionType: Indicates whether the question is ‘yes/no’ for boolean questions or ‘descriptive’ for open-ended questions.

  • review_snippets: A list of extracted review snippets relevant to the question (up to ten).

  • answerText: The text of the answer provided.

  • answerType: Specifies the type of answer.

  • helpful: A list containing two integers; the first indicates the number of users who found the answer helpful, and the second indicates the total number of responses.

  • asin: The unique Amazon Standard Identification Number (ASIN) for the product the question pertains to.

  • qid: A unique question identifier within the dataset.

  • category: The product category.

  • top_review_wilson: The review with the highest Wilson score.

  • top_review_helpful: The review voted as most helpful by users.

  • is_answerable: A boolean indicating whether the question is answerable using the review snippets, based on an answerability classifier.

  • top_sentences_IR: A list of top sentences (up to ten) based on Information Retrieval (IR) score with the question.

Social circles: Facebook

Creators: McAuley, Julian; Leskovec, Jure
Publication Date: 2012
Creators: McAuley, Julian; Leskovec, Jure

This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks, offering valuable insights into the structure and characteristics of social connections. The dataset includes 4,039 nodes and 88,234 edges. Facebook data has been anonymized by replacing the Facebook-internal ids for each user with a new value. The dataset is approximately 0.01 GB in size.

The dataset is structured into:

  • Node features (profiles): Anonymized user profile information, where specific attributes (e.g., political affiliation) are replaced with generic labels (e.g., ‘anonymized feature 1’).

  • Circles: Lists of friends grouped by users, representing their social circles.

  • Ego networks: Subgraphs centered around individual users (egos), including their direct friends and the connections among those friends.

YouTube-8M Dataset

Creators: Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, Sudheendra
Publication Date: 2016
Creators: Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, Sudheendra

YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs and with high-quality machine-generated & partially human-verified annotations from a diverse vocabulary of 3,800+ visual entities.

It comprises two subsets:

8M Segments Dataset: 230K human-verified segment labels, 1000 classes, 5 segments/video
8M Dataset: May 2018 version (current): 6.1M videos, 3862 classes, 3.0 labels/video, 2.6B audio-visual features

Thus, it comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This makes it possible to train a strong baseline model on this dataset in less than a day on a single GPU! At the same time, the dataset’s scale and diversity can enable deep exploration of complex audio-visual models that can take weeks to train even in a distributed fashion.

YouTube offers the YouTube8M dataset for download as TensorFlow Record files on their website. Starter code for the dataset can be found on their GitHubpage.

Amazon Reviews: Unlocked Mobile Phones

Creators: PromptCloud, Inc.
Publication Date: 2019
Creators: PromptCloud, Inc.
We analyzed more than 400,000 reviews of close to 4,400 unlocked mobile phones sold on Amazon.com to find out insights with respect to reviews, ratings, price and their relationships, making it a rich resource for analyzing customer sentiment and product performance. The dataset is approximately 0,13 GB in size and is available in CSV format. The author found that on Amazon’s product review platform most of the reviewers have given 4-star and 3-star ratings. The average length of the reviews comes close to 230 characters. They also uncovered that lengthier reviews tend to be more helpful and there is a positive correlation between price & rating. 

Structurally, each entry in the dataset includes the following variables:

  • Product Name: The name of the product (e.g., “Sprint EPIC 4G Galaxy SPH-D7”).
  • Brand Name: The manufacturer or parent company (e.g., “Samsung”).
  • Price: The listed price of the product, with values ranging from a minimum of $1.73 to a maximum of $2,598, and an average price of $226.86.
  • Rating: The user-assigned rating, ranging between 1 and 5 stars.
  • Reviews: The textual content of the user’s review, detailing their experience and opinions.
  • Review Votes: The number of helpfulness votes each review received from other users, with a minimum of 0, a maximum of 645, and an average of 1.50 votes.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.