Note We are moving the content of this website to our new page currently located here, we will switch within the next days (written 29th of Aprli)
DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.
This Wiki provides information about the DBpedia community project:
- Datasets gives an overview about the DBpedia knowledge base.
- Ontology gives an overview about the DBpedia ontology.
- Online Access describes how the data set can be accessed via a SPARQL endpoint and as Linked Data.
- Downloads provides the DBpedia data sets for download.
- Interlinking describes how the DBpedia data set is interlinked with various other datasets on the Web.
- Use Cases lists different use cases for the DBpedia data set.
- Extraction Framework describes the DBpedia information extraction framework.
- Data Provision Architecture paints a picture of the software and protocols used to serve DBpedia on the Web.
- Community explains how the DBpedia community collaborates and how people can contribute to the DBpedia effort.
- DBpedia Mapping Wiki containing the mappings used by the DBpedia extraction.
- DBpedia Internationalization Effort working towards providing multiple language-specific versions of DBpedia.
- DBpedia-Live presents the new DBpedia-Live framework.
- DBpedia Spotlight presents the DBpedia Spotlight tool for the semantic annotation of textual content.
- Credits lists the people and institutions that have contributed to DBpedia so far.
- Change Log lists the DBpedia releases and gives an overview about the changes for earch release.
- Next steps describes ideas and future plans for the DBpedia project.
The DBpedia Knowledge Base
Knowledge bases are playing an increasingly important role in enhancing the intelligence of Web and enterprise search and in supporting information integration. Today, most knowledge bases cover only specific domains, are created by relatively small groups of knowledge engineers, and are very cost intensive to keep up-to-date as domains change. At the same time, Wikipedia has grown into one of the central knowledge sources of mankind, maintained by thousands of contributors.
The DBpedia project leverages this gigantic source of knowledge by extracting structured information from Wikipedia and by making this information accessible on the Web under the terms of the Creative Commons Attribution-ShareAlike 3.0 License and the GNU Free Documentation License.
The English version of the DBpedia knowledge base describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology, including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases.
In addition, we provide localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 125 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links. Altogether the DBpedia 2014 release consists of 3 billion pieces of information (RDF triples) out of which 580 million were extracted from the English edition of Wikipedia, 2.46 billion were extracted from other language editions. Detailed statistics about the DBpedia datasets in 24 popular languages are provided at Dataset Statistics.
The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolves as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search.
Nucleus for the Web of Data
Within the W3C Linking Open Data (LOD) community effort, an increasing number of data providers have started to publish and interlink data on the Web according to Tim Berners-Lee’s Linked Data principles. The resulting Web of Data currently consists of several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. In addition to publishing and interlinking datasets, there is also ongoing work on Linked Data browsers, Linked Data crawlers, Web of Data search engines and other applications that consume Linked Data from the Web.
The DBpedia knowledge base is served as Linked Data on the Web. As DBpedia defines Linked Data URIs for millions of concepts, various data providers have started to set RDF links from their data sets to DBpedia, making DBpedia one of the central interlinking-hubs of the emerging Web of Data.
Feed Title: News (last 3 items)
Hereby we announce the release of DBpedia 2016-04. The new release is based on updated Wikipedia dumps dating from March/April 2016 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.
During the latest DBpedia meeting in Leipzig we discussed about ways to support DBpedia and what benefits this support would bring. For the next two months, we are aiming to raise money to support the hosting of the main services and the next DBpedia release (especially to shorten release intervals). On top of that we need to buy a new server to host DBpedia Spotlight that was so generously hosted so far by third parties. If you use DBpedia and want us to keep going forward, we kindly invite you to donate here or become a member of the DBpedia association.
The English version of the DBpedia knowledge base currently describes 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates and 1.6M depictions. In total, 5.2M resources are classified in a consistent ontology, consisting of 1.5M persons, 810K places (including 505K populated places), 490K works (including 135K music albums, 106K films and 20K video games), 275K organizations (including 67K companies and 53K educational institutions), 301K species and 5K diseases. The total number of resources in English DBpedia is 16.9M that, besides the 6.0M resources, includes 1.7M skos concepts (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M intermediate nodes.
Altogether the DBpedia 2016-04 release consists of 9.5 billion (2015-10: 8.8 billion) pieces of information (RDF triples) out of which 1.3 billion (2015-10: 1.1 billion) were extracted from the English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were extracted from other language editions and 3.2 billion (2015-10: 3.2 billion) from DBpedia Commons and Wikidata. In general, we observed a growth in mapping-based statements of about 2%.
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:
- 754 classes (DBpedia 2015-10: 739)
- 1,103 object properties (DBpedia 2015-10: 1,099)
- 1,608 datatype properties (DBpedia 2015-10: 1,596)
- 132 specialized datatype properties (DBpedia 2015-10: 132)
- 410 owl:equivalentClass and 221 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 407 – 221)
The editor community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2016-04 extraction, we used a total of 5800 template mappings (DBpedia 2015-10: 5553 mappings). For the second time the top language, gauged by the number of mappings, is Dutch (646 mappings), followed by the English community (604 mappings).
- In addition to normalized datasets to English DBpedia (en-uris) we additionally provide normalized datasets based on the DBpedia Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for the upcoming fusion process with wikidata. The DBw-based uris will be the only ones provided from the following releases on.
- We now filter out triples from the Raw Infobox Extractor that are already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x> dbp:birthPlace|dbp:placeOfBirth|… <z>” in the same resource. These triples are now moved to the “infobox-properties-mapped” datasets and not loaded on the main endpoint. See issue 22 for more details.
- Major improvements in our citation extraction. See here for more details.
- We incorporated the statistical distribution approach of Heiko Paulheim in creating type statements automatically and providing them as an additional datasets (instance_types_sdtyped_dbo).
In case you missed it, what we changed in the previous release (2015-10):
- English DBpedia switched to IRIs. This can be a breaking change to some applications that need to change their stored DBpedia resource URIs / links. We provide the “uri-same-as-iri” dataset for English to ease the transition.
- The instance-types dataset is now split into two files: instance-types (containing only direct types) and instance-types-transitive containing the transitive types of a resource based on the DBpedia ontology
- The mappingbased-properties file is now split into three (3) files:
- “geo-coordinates-mappingbased” that contains the coordinated originating from the mappings wiki. the “geo-coordinates” continues to provide the coordinates originating from the GeoExtractor
- “mappingbased-literals” that contains mapping based fact with literal values
- “mappingbased-objects” that contains mapping based fact with object values
- the “mappingbased-objects-disjoint-[domain|range]” are facts that are filtered out from the “mappingbased-objects” datasets as errors but are still provided
- We added a new extractor for citation data that provides two files:
- citation links: linking resources to citations
- citation data: trying to get additional data from citations. This is a quite interesting dataset but we need help to clean it up
- All datasets are available in .ttl and .tql serialization (nt, nq dataset were neglected for reasons of redundancy and server capacity).
- Dataset normalization: We are going to normalize datasets based on wikidata uris and no longer on the English language edition, as a prerequisite to finally start the fusion process with wikidata.
- RML Integration: Wouter Maroy did already provide the necessary groundwork for switching the mappings wiki to a RML based approach on Github. We are not there yet but this is at the top of our list of changes.
- Starting with the next release we are adding datasets with NIF annotations of the abstracts (as we already provided those for the 2015-04 release). We will eventually extend the NIF annotation dataset to cover the whole Wikipedia article of a resource.
- SDTypes: We extended the coverage of the automatically created type statements (instance_types_sdtyped_dbo) to English, German and Dutch (see above).
- Extensions: In the extension folder (2016-04/ext) we provide two new datasets, both are to be considered in an experimental state:
- DBpedia World Facts: This dataset is authored by the DBpedia association itself. It lists all countries, all currencies in use and (most) languages spoken in the world as well as how these concepts relate to each other (spoken in, primary language etc.) and useful properties like iso codes (ontology diagram). This Dataset extends the very useful LEXVO dataset with facts from DBpedia and the CIA Factbook. Please report any error or suggestions in regard to this dataset to Markus.
- Lector Facts: This experimental dataset was provided by Matteo Cannaviccio and demonstrates his approach to generating facts by using common sequences of words (i.e. phrases) that are frequently used to describe instances of binary relations in a text. We are looking into using this approach as a regular extraction step. It would be helpful to get some feedback from you.
Lots of thanks to
- Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages.
- Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process.
- All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
- The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
- Heiko Paulheim (University of Mannheim) for providing the necessary code for his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements. Which is now part of the DIEF.
- Václav Zeman, Thomas Klieger and the whole LHD team (University of Prague) for their contribution of additional DBpedia types
- Marco Fossati (FBK) for contributing the DBTax types
- Alan Meehan (TCD) for performing a big external link cleanup
- Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
- Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.
- OpenLink Software (http://www.openlinksw.com/) collectively for providing the SPARQL Query Services and Linked Open Data publishing infrastructure for DBpedia in addition to their continuous infrastructure support.
- Ruben Verborgh from Ghent University – iMinds for publishing the dataset as Triple Pattern Fragments, and iMinds for sponsoring DBpedia’s Triple Pattern Fragments server.
- Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata dataset.
- Vladimir Alexiev (Ontotext) for leading a successful mapping and ontology clean up effort.
- All the GSoC students and mentors which directly or indirectly influenced the DBpedia release
- Special thanks to members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig.
The work on the DBpedia 2016-04 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering (http://aligned-project.eu/). More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications.
Have fun with the new DBpedia 2016-04 release!
Very shortly after the largest DBpedia meeting to date we are crossing Atlantic for the second time. We are happy to announce that the 8th DBpedia Community Meeting will be held in Sunnyvale on October 27th 2016, hosted by Yahoo.
Please read below on different ways you can participate. We are looking forward to meeting again in person with the US-based DBpedia community.
- Web URL: http://wiki.dbpedia.org/meetings/California2016
- Hashtag: #DBpediaCA
- When: October 27th, 2016
- Where: Yahoo, 701 First Avenue, Sunnyvale, CA.
- Host: Yahoo
- Call for Contribution: Submit your proposal in our form
- Registration: through eventbrite (limited seats)
If you would like to become a sponsor for the 8th DBpedia Meeting, please contact the DBpedia Association.
|Yahoo!||For hosting the meeting and the catering|
|Google Summer of Code 2016||Amazing program and the reason some of our core DBpedia devs are visiting California|
|ALIGNED – Software and Data Engineering||For funding the development of DBpedia as a project use-case and covering part of the travel cost|
|Institute for Applied Informatics||For supporting the DBpedia Association|
|OpenLink Software||For continuous hosting of the main DBpedia Endpoint|
- Nicolas Torzec, Yahoo Knowledge Graph.
- Pablo N. Mendes, Lattice Data Inc.
- Dimitris Kontokostas, DBpedia Association and AKSW, Uni Leipzig
- Sebastian Hellmann, DBpedia Association and AKSW, Uni Leipzig
Attending the DBpedia Community meeting is free of charge, but seats are limited. Make sure to register to reserve a seat.
Call for Contribution
Please submit your proposal through our form. Contribution proposals may include (but are not limited to) presentations, demos, lightning talks, panels and session suggestions. We intend to accept as many proposals as possible in the available meeting time.
The meeting will take place at the Yahoo headquarters in Sunnyvale. Address: Yahoo! (Building B, 701 First Avenue, Sunnyvale, CA)
Your DBpedia Association
After the success of the last two community meetings in Palo Alto and in The Hague we thought it is time to meet in Leipzig, where the DBpedia Association is located. During the SEMANTiCS 2016 in Leipzig, Sep 12-15, the DBpedia community met on the 15th of September. First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community, the University of Leipzig for hosting our meeting and many thanks to the SEMANTiCS for hosting and sponsoring the meeting.
During the opening session, Lydia Pintscher, product manager of Wikidata, presented Wikidata: bringing structured data to Wikipedia with 16000 volunteers. Lydia described similarities and varieties between DBpedia and Wikidata and she talked about prospective steps for Wikidata. Harald Sack from the Hasso-Plattner-Institut spoke during the opening session. He introduced the dwerft Project – DBpedia and Linked Data for the Media Value Chaintopics which aims the common technology platform »Linked Production Data Cloud«.
The DBpedia showcase session started with the DBpedia 2016-04 release update by Markus Freudenberg (AKSW/KILT). At this session, six speakers presented how to utilize DBpedia in novel and interesting ways. For example:
- Miel Vander Sande (iMinds) talked about DBpedia Archives as Memento with Triple Pattern Fragments.
- Jörn Hees (DFKI) introduced us to Human associations in the Semantic Web and DBpedia.
- Peter de Laat from GoUnitive urged the community to personalize user interaction in a Linked Data environment.
DBpedia Association hour
The 7th edition of the community meeting covered the first DBpedia Association hour, which provided a platform for the community to discuss and give feedback. Sebastian Hellmann (AKSW, KILT), Julia Holze (DBpedia Association) and Dimitris Kontokostas (AKSW, KILT) gave an update on the DBpedia Association status. We talked about our technical progress, DBpedia funding and visions. Sebastian Hellmann introduced the Board of Trustees, which is the main decision-making body of the DBpedia Association and oversees the association and its work as its ultimate corporate authority.
Enno Meijers (KB) of the Dutch DBpedia chapter announced a successful cooperation between Huygens ING, iMinds/Univ. Gent, Vrije Universiteit Amsterdam, Institute for Sound and Vision, Koninklijke Bibliotheek (KB) and the NL-DBpedia community. By signing the Manifest of Understanding (MoU) they support the goals of the DBpedia Association officially and strengthen the Dutch chapter and community.
You will find community feedback and all questions which we discussed at the first DBpedia Association hour here: https://pad.okfn.org/p/how-to-improve-DBpedia. Participants who wanted to learn DBpedia basics joined the DBpedia tutorial session by Markus Freudenberg (AKSW/KILT).
The sessions in the afternoon highlighted two important fields of research and development, namely DBpedia ontology and DBpedia & NLP. At the DBpedia ontology session, Wouter Maroy (iMinds) presented DBpedia RML mappings, which he created during this year’s Google Summer of Code project and Gerard Kuys (Ordina) discussed the question ‘Does extraction prelude structure?’ with the DBpedia ontology group. At the same time, Milan Dojchinovski (AKSW/KILT) chaired the DBpedia & NLP session with eight very interesting talks. You will find all presentations given during this session on our website. The last two presentations Analyzing and improving the Polish Wikipedia Citations (part of the Wikipedia References & Citations challenge) and Greek DBpedia updates were given by Krzysztof Węcel (Poznan University) and Sotiris Karampatakis (OKF Greece).
On the closing session we wrapped up the meeting and gave out our prizes to:
- The “DBpedia Excellence in Engineering” went to Markus Freudenberg for keeping up with the DBpedia releases
- The “Citations Challenge prize” went to Krzysztof Węcel for his very thorough citation analysis.
Summing up, the event brought together more than 150 DBpedians from Europe which engaged in vital conversations about interesting projects and approaches to questions/problems revolving around DBpedia. We would like to thank the organizers Magnus Knuth (HPI, DBpedia German & Commons), Monika Solanki (University of Oxford) and representatives of the DBpedia Association such as Dimitris Kontokostas, Sebastian Hellmann and Julia Holze for devoting their time to the organization of the meeting and the program.
We are now looking forward to the 8th DBpedia Community Meeting (which most probably coming sooner than you think across the Atlantic). Check our website for further updates or follow #DBpedia on twitter.
Your DBpedia Association.
For a recent overview paper about DBpedia, please refer to:
- Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer: DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, Vol. 6 No. 2, pp 167–195, 2015.
- Further papers about DBpedia can be found at Publications