Wikipedia archive

Creators:
DBpedia Association
Publication Date:
2024-10-23
Data Category:
Dataset Description:
The DBpedia individual datasets extract structured information from Wikipedia, such as labels, facts, geo-coordinates, and Wikipedia categories, with wide language coverage for data analysis. This dataset enables users to perform complex queries across a wide range of topics, including information about people, places, organizations, and more. DBpedia systematically extracts structured data from Wikipedia's semi-structured content, such as infoboxes, categorization information, and links. This process transforms unstructured text into a machine-readable format, facilitating advanced data analysis and integration. The dataset is organized according to a cross-domain ontology, providing a consistent framework for representing diverse types of information. This ontology supports complex queries and data integration across various domains. As of the 2016-04 release, DBpedia describes 6.0 million entities, including 1.5 million persons, 810,000 places, 135,000 music albums, 106,000 films, 20,000 video games, 275,000 organizations, 301,000 species, and 5,000 diseases. The dataset comprises 9.5 billion RDF triples, with 1.3 billion extracted from the English edition of Wikipedia and 5.0 billion from other language editions.The dataset reflects the state of Wikipedia at the time of each DBpedia release. For example, the 2016-04 release corresponds to Wikipedia's content as of April 2016. The dataset is structured as RDF triples, each consisting of a subject, predicate, and object. DBpedia utilizes a community-curated ontology to categorize information, with mappings from Wikipedia infoboxes to ontology classes and properties. This structure ensures consistency and facilitates data integration.
Variables:
Details:

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.