Next Steps

The next steps for the DBpedia project are:

  1. Synchronize Wikipedia and DBpedia by deploying the DBpedia live extraction which updates the DBpedia knowledge base immediately when a Wikipedia article changes.
  2. enable the community to edit and maintain the DBpedia ontology and the infobox mappings that are used by the extraction framework in a public Wiki.
  3. Increase the quality of the extracted data by improving and fine-tuning the extraction code.

Other next steps include:

  • Integrate both data sets into a single data set under a shared URI schema. The new URIs for Wikipedia concepts are likely to be{article name from the English edition of Wikipedia}.
  • Extend the data set to all 1.6 million concepts within the English version of Wikipedia. This will lead to a RDF data set containing about 20-50 million triples.
  • Set up a better server to serve the integrated data set as linked data and to provide a SPARQL endpoint over the data set. Data in now hosted entirely in, and a SPARQL endpoint is provided by, OpenLink Virtuoso, courtesy of OpenLink Software. See Architecture.
  • Improve the information extraction algorithms and apply some data cleansing heuristics to extracted information.
  • Put some user-friendly search and browse interfaces on top of DBpedia. Candidates include Longwell. Search
  • Experiment with domain knowledge and inference over the data set.
  • Implement some cool client applications for specific use cases.
  • Set up the data extraction process to run on a regular schedule.
  • Make the DBpedia data set more useful by interlinking it with additional data sources. Candidates include
  • Improve the classification of DBpedia entries. We are currently trying different approaches, including importing classification information from the PDFYAGO data set and from Freebase. An overview about the work is given in this blog-post by Michael Bergman.
  • Give feedback to the Wikipedia community on how their templates could be changed to ease information extraction.
  • Grow the DBpedia community and engage more interested parties into the project. We currently highly welcome any support in improving classification and linking external data sets to DBpedia. See also Linking Open Data project for the last point.
  • Extract infobox data from more language versions of Wikipedia. Top candidates, in order of official Wikipedia article count —
    • English
    • German
    • French
    • Polish
    • Japanese
    • Italian
    • Dutch
    • Portuguese
    • Spanish
    • Russian
    • Swedish
    • Chinese
    • Norwegian (Bokmål)
    • Finnish
    • Catalan
  • Improve extraction of infobox data from supported non-English versions of Wikipedia.
  • Look for somebody who wants to implement chemistry and bio extractors so that we also get the non-infobox data for these domains.