Showing 1-8 of 17 results

Marketing Technology Survey

Creators: University of Hamburg
Publication Date: 2025-02-01
Creators: University of Hamburg

A survey of marketing decision-makers sheds light on how marketing processes can be successfully automated. The results show which activities are best suited for automation and which mix of technologies is particularly promising. To achieve these research objectives, a survey of marketing decision-makers was conducted. This includes 18 semi-structured interviews with decision-makers from the areas of sales, marketing, and business intelligence, as well as members of top management. Based on the findings, the elements of the subsequent survey were developed. The participants in this preliminary study and the actual survey were contacted via the business-to-business (B2B) panel in order to achieve the greatest possible representativeness for the German-speaking region and to cover all sectors in both B2B and business-to-consumer (B2C) marketing. The participants are decision-makers from marketing and business intelligence who are responsible for relevant software decisions, as well as employees who are responsible for the operationalization of automation software in their companies in the areas of marketing, sales, and business intelligence. A total of 124 companies based in Germany, Austria, and Switzerland were reached.

In addition to general information on marketing automation and its future prospects, the study is divided into two main areas: marketing analytics and communications. Marketing analytics encompasses real-time analysis, target group analysis, the creation of forecasts, and the controlling of marketing activities. The second area of focus, marketing communications, concentrates on automated campaign management and relates to paid and owned media activities, social media campaigns, and customer service.

Car Design Ratings for Text Analysis

Creators: Maximilian Witte
Publication Date: 2025-11-21
Creators: Maximilian Witte

This dataset captures consumer evaluations of car design wireframes along three perceptual dimensions relevant to automotive styling research: aggressiveness, complexity, and typicality. It contains no visual material and is therefore designed exclusively for text-based analysis.

The dataset comprises 232 distinct car design wireframes, each represented through text descriptions detailing the visual form of the design. For every wireframe, consumers rated perceived aggressiveness, perceived complexity, and perceived typicality. Each dimension includes 20 independent ratings per wireframe, resulting in more than 13,000 numeric evaluations. The ratings are stored in separate CSV files, one for each dimension, and include the wireframe ID, anonymized rater ID, and the numeric score.

In addition to the numeric evaluations, a free-text description provided by participants accompanies every wireframe. These statements capture how consumers interpreted individual design elements and why they formed their perceptions. Together, the rating data and participant statements enable quantitative and qualitative analyses of design perception.

The dataset supports a wide range of applications including text-based modeling of aesthetic impressions, computational analysis of design language, semantic feature extraction, prediction of numeric perception ratings from text, and research on variability in consumer interpretations of automotive forms.

Crowdfunding datasets

Creators: Web Robots
Publication Date: 2016-03-01
Creators: Web Robots

This dataset consists of large-scale web scraping projects that provide publicly available datasets of e-commerce product listings, reviews, pricing, and other related data from various sources such as Kickstarter and Indiegogo. These datasets are used for data mining, analysis, and machine learning applications, enabling users to explore trends in product performance, customer sentiment, and pricing strategies across multiple industries. Data collection is performed monthly, ensuring that the datasets remain up-to-date with the latest project information.The initial dataset release included data on approximately 91,500 Indiegogo projects. The dataset has been updated monthly since May 2016. ​As of the latest update, the Kickstarter dataset includes information on all current and historic projects, starting in March 2016. Each dataset entry includes the following key variables:

  • Project ID: A unique identifier for each project.

  • Title: The name of the project.

  • Category: The category under which the project is listed (e.g., Technology, Art).

  • Creator: The name of the project creator.

  • Goal: The funding goal set by the creator.

  • Pledged Amount: The total amount pledged by backers.

  • Backers: The number of backers supporting the project.

  • Launch Date: The date when the project was launched.

  • Deadline: The funding deadline for the project.

  • Status: The current status of the project (e.g., active, successful, failed).

Stack Overflow Q&A

Creators: Stack Exchange
Publication Date: 2014-01-23
Creators: Stack Exchange

The Stack Exchange Data Dump is a quarterly, anonymized release of all user-contributed content from the Stack Exchange network, including posts, comments, votes, and user data, licensed under the Creative Commons BY-SA 3.0. These features facilitated comprehensive analyses of user interactions, content quality, and community dynamics within the Stack Exchange network. ​As of March 1, 2020, the Stack Exchange Data Dump contained a total of 47,931,101 posts, encompassing both questions and answers, accumulated from 2008 to that date. The size of the data dumps varied over time, reflecting the growth of the platform. For instance, the January 2011 dump exceeded 3 GB. As the network expanded, subsequent dumps increased in size, with later versions reaching tens of gigabytes. Each data dump captured the state of the Stack Exchange network up to the date of its release, providing a temporal snapshot of user-generated content and activity. The dumps were released quarterly, offering periodic insights into the evolving dynamics of the platform. Structurally, the dataset is organized into multiple tables, each corresponding to different aspects of the platform:

  • Posts: Contained all questions and answers, including metadata such as post ID, score, and content.

  • Users: Included user-related information like user ID, reputation score, and profile details.

  • Votes: Recorded voting activity on posts, specifying vote types and associated post IDs.

  • Comments: Held all comments made on posts, detailing comment text, scores, and related post and user IDs.

  • Badges: Documented badges awarded to users, noting badge names, dates awarded, and badge classes (e.g., bronze, silver, gold).

  • Post History: Tracked changes to posts, recording edits, rollbacks, and other modifications along with details on the post and the user making the change.

  • Post Links: Contained links between posts, such as duplicates or related posts, along with link types and creation dates.

MLW Zettelmaterial

Creators: Bayerische Akademie der Wissenschaften (BAdW)
Publication Date: 2023
Creators: Bayerische Akademie der Wissenschaften (BAdW)

General information:

The data set comprises a total of 114,653 images (18,9 GB), corresponding to 3,507 distinct lemmas.
All images are in RGB, but not uniform in size, i.e. height, and width differ from image to image.
Additionally, the information on the corresponding lemma is available for each image in a separate json file.

Structure:

Most record cards follow the same structure being composed of three main parts.

  • The first one (1), and the one deemed most challenging, is the lemma, which is always located in the upper left corner of the record card.
  • The second part (2) is the index of the text where the lemma is found.
  • The third part (3) contains a text extract in which the word (corresponding to the lemma) occurs in context.

Character inventory:

There is a total of 17 different first letters, eight of which are each upper- and lowercase, as well as one special character.
The capitalization of a word plays a crucial role since a word’s meaning changes depending on capitalization.
Since the majority of our data stems from the S-series of the dictionary, most lemmas start with the letter “s”.
Likewise, a larger number of lemmas also starts with “m”, “v”, “t”, “u”, “l”, and “n”.

Occurrence frequencies:

  • A total of 2,420 lemmas (69%) were found to appear on ten record cards or less
  • 854 lemmas (24.4%) are present on between 10 and 100 record cards
  • 233 lemmas (6.6%)can be found on more than 100 record cards
  • 1,123 lemmas (approximately 36.7%) had only one record card

Lengths:

  • Lemma lengths range from one character up to a maximum of 19 characters.
  • The average length of the lemmas lies between five and six characters.

Availability:

Research activity:

  • Koch, P., Nuñez, G. V., Arias, E. G., Heumann, C., Schöffel, M., Häberlin, A., & Aßenmacher, M. (2023). A tailored Handwritten-Text-Recognition System for Medieval Latin. arXiv preprint arXiv:2308.09368.

Crowdsourced air traffic data from The OpenSky Network 2020

Creators: Olive, Xavier; Strohmeier, Martin; Lübbe, Jannis
Publication Date: 2022
Creators: Olive, Xavier; Strohmeier, Martin; Lübbe, Jannis

The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network’s more than 2500 members since 1 January 2019. More data will be periodically included in the dataset until the end of the COVID-19 pandemic. Leveraging a network of over 2,500 members, the dataset aggregates ADS-B signals received by volunteers worldwide, ensuring a rich and diverse data source. The dataset includes records of 41,900,660 flights, capturing data from 160,737 unique aircrafts. Flight operations involving 13,934 airports across 127 countries are documented. In total, the dataset has a size of 7,0 GB. Each month is represented by a separate CSV file, containing flight data for that specific period. ​ Each file includes the following columns:

  • callsign: Identifier used for air traffic control communications.
  • number: Commercial flight number, if available.
  • icao24: Unique 24-bit address assigned to the aircraft’s transponder.
  • registration: Aircraft’s registration number.
  • typecode: Aircraft model code.
  • origin: ICAO code of the departure airport.
  • destination: ICAO code of the arrival airport.
  • firstseen: Timestamp of the first detection during the flight.
  • lastseen: Timestamp of the last detection during the flight.
  • day: Date of the flight.

O Say Can You See: Early Washington, D.C., Law and Family Project

Creators: O Say Can You See Project
Publication Date: 2024
Creators: O Say Can You See Project

This site documents the challenge to slavery and the quest for freedom in early Washington, D.C., by collecting, digitizing, making accessible, and analyzing freedom suits filed between 1800 and 1862, as well as tracing the multigenerational family networks they reveal. The project encompasses hundreds of freedom suits from the Circuit Court for the District of Columbia, Maryland state courts, and the U.S. Supreme Court, providing invaluable insights into the legal battles fought by enslaved individuals seeking freedom. By exploring the web of litigants, jurists, attorneys, and community members present in case files, the project shows deep relationship mapping of early Washington, D.C., illustrating how each person is connected to others in the city and beyond. The dataset has a size of 3,0 kB and is organized into several interconnected components:

  • People: A database of individuals involved in the cases, including litigants, jurists, attorneys, and community members, with detailed profiles and social connections.

  • Families: Kinship and family networks of multigenerational Black, white, and mixed families, created using information derived from court records and genealogical research.

  • Cases: A collection of hundreds of freedom cases from various courts, providing detailed accounts of each legal battle.

  • Stories: Interactive analyses of the court cases, families, attorneys, and judges, focusing on historical or legal questions raised by these cases.

AudioSet dataset

Creators: Google
Publication Date: 2017
Creators: Google

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. The dataset has a size of 19,0 kB and is divided into three primary subsets:

  • Evaluation Set: Contains 20,383 segments from distinct videos, ensuring at least 59 examples for each of the 527 sound classes used.

  • Balanced Training Set: Consists of 22,176 segments from distinct videos, selected to provide a balanced representation with at least 59 examples per class.

  • Unbalanced Training Set: Includes 2,042,985 segments from distinct videos, representing the remainder of the dataset

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.