List of Dirty, Naughty, Obscene, and Otherwise Bad Words

Creators:
Shutterstock
Publication Date:
2019
Data Category:
Dataset Description:
With millions of images in our library and billions of user-submitted keywords, we work hard at Shutterstock to make sure that bad words don't show up in places they shouldn't. This repo, published in 2019, contains a list of words that we use to filter results from our autocomplete server and recommendation engine. The dataset encompasses offensive terms in multiple languages. It is open for contributions, allowing users to add or refine entries, particularly in non-English languages, enhancing its comprehensiveness and applicability across diverse cultural contexts. The exact number of entries varies by language. For instance, the English list contains 403 entries. In total, the dataset has a size of 25,7 kB. The data is organized into separate files for each language, with each file containing a list of offensive words in that particular language. For example, the English words are listed in the 'en' file, German words in the 'de' file, and so on. This allows the targeted application of language-specific content filtering systems. Each sub-dataset (language file) consists of a plain text file with one offensive term per line, facilitating easy integration into various text processing pipelines.

Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.