DBpedia 2014 Data Set Statistics


This page provides statistics about the DBpedia 2014 release. The release contains localized editions of DBpedia for 125 languages which have been extracted from the Wikipedia edition in the corresponding language. For 28 out of these languages, we report the overall number of things (instances) being described in the localized version of DBpedia as well as the number of facts (statements) that have been extracted from infoboxes describing these things. Afterwards, we report the number of instances of popular classes within these 28 DBpedia editions.


Dataset statistics for DBpedia 3.9 can be found here. Below we compare the numbers between the two releases.




1. Instances, Properties, and Statements per Language


The same thing, for instance a person or city, might be described by multiple pages within Wikipedia editions in different languages. Pages describing the same thing are often interlinked by cross-language links within Wikipedia.


When DBpedia extracts data from these pages, it produces two types of data sets. The localized data sets contain all things that are described in a specific language and in which things are identified with a language specific URI. In addition, we produce a canonicalized data set for each language. The canonicalized data sets only contain things for which a corresponding page in the English edition of Wikipedia exists. Within all canonicalized datasets, the same thing is identified with the same URI from the generic namespace http://dbpedia.org/resource/.


DBpedia uses two different extractors to extract data from Wikipedia infoboxes. The mapping-based extractor extracts data only for the infoboxes for which a language-specific extraction mapping to the DBpedia ontology exists in the DBpedia mapping wiki. Based on these mappings, it normalizes the different names that are used in various languages to refer to the same property. The second extractor is the raw infobox extractor which uses a generic heuristic to extract data from all infoboxes. The raw infobox extractor does not normalize property names but produces language-specific properties that directly reflect the property name in the Wikipedia infobox.


Below we report the overall number of things (instances), different ontology and raw-infobox properties, infobox statements and type statements for all 28 languages for which mappings exist in the DBpedia mapping wiki. The rows are sorted according to the number of instances for which mapping-based infobox data exists (Instances, CD, withMD column).


The column heading have the following meaning:


  • LD = Localized Data Sets.
  • CD = Canonicalized Data Sets.
  • all = Overall number of instances in the data set, calculated based on the labels and redirects dumps.
  • withMD = Number of instances for which mapping-based infobox data exists.
  • Raw Properties = Number of different properties that are generated by the raw infobox extractor.
  • Mapping Properties = Number of different properties that are generated by the mapping-based infobox extractor.
  • Raw Statements = Number of statements (facts) that are generated by the raw infobox extractor.
  • Mapping Statements = Number of statements (facts) that are generated by the mapping-based infobox extractor; include type statements.

Instances, LD, all Instances, CD, all Instances, CD, withMD Raw Properties, CD Mapping Properties, CD Raw Statements, CD Mapping Statements, CD Type Statements, CD
en 4,584,616 4,584,616 4,232,626 55,986 1,367 68,091,260 61,255,734 28,563,803
it 1,128,909 745,345 540,474 10,591 304 13,840,025 7,984,501 3,929,338
de 1,692,634 857,196 479,731 11,695 507 9,677,586 6,733,886 3,468,237
nl 1,774,536 674,849 455,222 8,100 814 8,044,539 6,752,260 3,118,581
es 1,086,296 683,251 419,328 17,347 557 9,728,204 7,070,608 3,190,529
fr 1,504,453 942,505 415,390 15,111 928 11,521,313 6,899,052 3,396,756
pl 1,043,400 653,571 411,883 7,751 263 8,554,227 6,031,811 3,189,677
pt 812,610 552,362 321,211 14,637 628 7,069,586 5,098,947 2,185,948
ru 1,119,142 579,612 266,562 15,665 158 8,825,572 4,070,294 1,986,532
ja 913,488 397,907 134,380 15,981 414 4,403,612 2,136,719 1,002,180
ca 426,696 289,485 128,544 10,183 189 3,643,659 1,727,749 962,352
eu 178,822 139,023 90,948 2,947 138 2,010,728 1,038,208 577,224
hu 260,512 171,391 76,273 7,806 298 2,429,115 883,482 536,290
ko 276,881 178,872 58,937 8,503 443 1,409,638 919,678 458,870
tr 233,737 143,914 57,034 9,008 445 1,636,893 868,494 443,345
cs 296,094 193,674 48,356 6,368 332 2,272,303 669,398 377,149
bg 161,427 112,571 44,698 5,095 268 964,269 655,482 333,355
ar 266,386 170,430 44,298 11,008 292 1,185,465 517,106 316,167
id 354,326 142,616 43,980 11,514 378 1,599,822 682,877 347,255
el 96,301 67,390 36,255 4,437 534 389,068 401,475 252,492
sl 140,612 85,167 25,494 4,844 493 950,604 342,495 212,340
ga 30,670 27,674 4,176 1,231 76 83,457 52,378 31,872
hr 135,272 92,952 12,003 3,674 149 827,890 205,023 106,691
bn 29,631 26,136 2,160 6,609 93 271,070 30,740 19,015
be (new) 71,656 52,040 23,512 4,998 214 557,540 358,957 168,132
cy (new) 57,127 43,127 11,945 2,084 31 204,058 59,485 54,578
sk (new) 192,410 138,492 5,268 4,757 30 1,814,997 75,897 21,148
sr (new) 246,996 189,158 138,166 6,069 607 2,278,757 2,059,154 873,394

The following table integrates the Dataset Statistic for DBpedia 3.9 with the statistics presented above, thus allowing for comparison between the versions. %-columns contain the increase in the number of instances/statements in version 2014 with respect to 3.9. There are four new languages in the 2014 release: Belarusian (be), Serbian (sr), Welsh (cy), Slovak (sk), for which property mappings has become available; the respective numbers can be found in the four last rows of the table. The decrease in the number of raw properties is due to the fact that triple de-duplication was introduced in 2014.
Instances, LD, all Instances, CD, all Instances, CD, withMD Raw Properties, CD Mapping Properties, CD Raw Statements, CD Mapping Statements, CD Type Statements, CD
3.9 2014 % 3.9 2014 % 3.9 2014 % 3.9 2014 % 3.9 2014 % 3.9 2014 % 3.9 2014 % 3.9 2014 %
en 4,258,406 4,584,616 7.7 4,258,406 4,584,616 7.7 3,255,435 4,232,626 30 51,736 55,986 8.2 1,373 1,367 -0.4 70,147,399 68,091,260 -2.9 41,804,545 61,255,734 46.5 16,366,701 28,563,803 74.5
it 1,029,528 1,128,909 9.7 672,981 745,345 10.8 473,595 540,474 14.1 10,241 10,591 3.4 211 304 44.1 14,366,288 13,840,025 -3.7 5,724,415 7,984,501 39.5 2,364,096 3,929,338 66.2
de 1,547,785 1,692,634 9.4 779,104 857,196 10 327,548 479,731 46.5 10,659 11,695 9.7 327 507 55 9,284,326 9,677,586 4.2 4,070,927 6,733,886 65.4 1,800,424 3,468,237 92.6
nl 1,461,314 1,774,536 21.4 590,014 674,849 14.4 368,688 455,222 23.5 7,481 8,100 8.3 642 814 26.8 7,916,452 8,044,539 1.6 5,039,583 6,752,260 34 2,144,581 3,118,581 45.4
es 1,003,158 1,086,296 8.3 621,472 683,251 9.9 376,975 419,328 11.2 15,992 17,347 8.5 549 557 1.5 9,147,643 9,728,204 6.3 5,950,626 7,070,608 18.8 2,305,659 3,190,529 38.4
fr 1,378,099 1,504,453 9.2 856,004 942,505 10.1 346,214 415,390 20 13,990 15,111 8 689 928 34.7 10,741,192 11,521,313 7.3 5,273,302 6,899,052 30.8 2,145,950 3,396,756 58.3
pl 960,880 1,043,400 8.6 598,754 653,571 9.2 334,214 411,883 23.2 7,478 7,751 3.7 264 263 -0.4 8,113,838 8,554,227 5.4 4,624,126 6,031,811 30.4 2,031,952 3,189,677 57
pt 764,132 812,610 6.3 511,741 552,362 7.9 298,475 321,211 7.6 13,740 14,637 6.5 620 628 1.3 6,934,107 7,069,586 2 4,489,235 5,098,947 13.6 1,641,916 2,185,948 33.1
ru 999,165 1,119,142 12 516,870 579,612 12.1 236,067 266,562 12.9 14,771 15,665 6.1 149 158 6 8,390,368 8,825,572 5.2 3,174,725 4,070,294 28.2 1,315,619 1,986,532 51
ja 860,917 913,488 6.1 370,912 397,907 7.3 115,227 134,380 16.6 14,752 15,981 8.3 395 414 4.8 4,353,518 4,403,612 1.2 1,674,891 2,136,719 27.6 656,290 1,002,180 52.7
ca 400,271 426,696 6.6 267,856 289,485 8.1 119,675 128,544 7.4 9,391 10,183 8.4 184 189 2.7 4,057,610 3,643,659 -10.2 1,420,025 1,727,749 21.7 757,526 962,352 27
eu 150,294 178,822 19 119,752 139,023 16.1 74,114 90,948 22.7 2,683 2,947 9.8 97 138 42.3 2,381,903 2,010,728 -15.6 975,775 1,038,208 6.4 456,815 577,224 26.4
hu 239,711 260,512 8.7 157,034 171,391 9.1 68,939 76,273 10.6 7,283 7,806 7.2 298 298 0 2,859,593 2,429,115 -15.1 669,836 883,482 31.9 358,586 536,290 49.6
ko 237,506 276,881 16.6 154,397 178,872 15.9 47,081 58,937 25.2 7,605 8,503 11.8 435 443 1.8 1,276,866 1,409,638 10.4 646,461 919,678 42.3 271,610 458,870 68.9
tr 213,820 233,737 9.3 127,281 143,914 13.1 47,673 57,034 19.6 8,172 9,008 10.2 438 445 1.6 1,701,192 1,636,893 -3.8 648,288 868,494 34 270,546 443,345 63.9
cs 263,317 296,094 12.4 172,763 193,674 12.1 40,549 48,356 19.3 5,873 6,368 8.4 340 332 -2.4 2,192,854 2,272,303 3.6 556,742 669,398 20.2 244,058 377,149 54.5
bg 146,608 161,427 10.1 101,310 112,571 11.1 43,961 44,698 1.7 4,728 5,095 7.8 268 268 0 950,554 964,269 1.4 564,830 655,482 16 225,843 333,355 47.6
ar 215,042 266,386 23.9 129,600 170,430 31.5 25,325 44,298 74.9 9,492 11,008 16 286 292 2.1 883,730 1,185,465 34.1 256,761 517,106 101.4 143,042 316,167 121
id 208,891 354,326 69.6 113,047 142,616 26.2 33,385 43,980 31.7 10,264 11,514 12.2 372 378 1.6 1,417,031 1,599,822 12.9 449,244 682,877 52 199,564 347,255 74
el 84,359 96,301 14.2 57,249 67,390 17.7 27,856 36,255 30.2 3,695 4,437 20.1 461 534 15.8 287,562 389,068 35.3 275,669 401,475 45.6 159,570 252,492 58.2
sl 136,684 140,612 2.9 80,102 85,167 6.3 23,584 25,494 8.1 4,473 4,844 8.3 474 493 4 1,335,247 950,604 -28.8 265,908 342,495 28.8 151,203 212,340 40.4
hr 127,930 135,272 5.7 82,016 92,952 13.3 11,452 12,003 4.8 3,501 3,674 4.9 158 149 -5.7 779,862 827,890 6.2 168,804 205,023 21.5 74,455 106,691 43.3
ga 19,450 30,670 57.7 17,350 27,674 59.5 3,791 4,176 10.2 1,128 1,231 9.1 72 76 5.6 76,746 83,457 8.7 41,331 52,378 26.7 21,847 31,872 45.9
bn 25,811 29,631 14.8 20,753 26,136 25.9 1,275 2,160 69.4 5,467 6,609 20.9 86 93 8.1 176,630 271,070 53.5 13,852 30,740 121.9 6,856 19,015 177.3
cy 57,127 43,127 11,945 2,084 31 204,058 59,485 54,578
be 71,656 52,040 23,512 4998 214 557,540 358,957 168,132
sk 192,410 138,492 5,268 4757 30 1,814,997 75,897 21,148
sr 246,996 189,158 138,166 6069 607 2,278,757 2,059,154 873,394

2. Instances of Selected Classes per Language


The table below reports the number of instances for a set of selected classes within the canonicalized DBpedia data sets for each language.
en it pl es pt fr de ru ca bn eu ga hr nl cs el bg hu ko sl tr id ja cy be sk sr
Person 1,445,104 189,448 96,135 99,147 60,056 134,749 179,421 86,269 10,533 1,788 4,366 1,511 5,869 54,879 12,884 5,964 19,047 22,444 21,844 7,740 21,422 17,627 48,642 656 6,724 668 16,300
Athlete 268,773 67,932 45,890 31,527 19,849 65,782 42,101 16,631 905 0 480 197 0 26,113 4,885 1,782 3,625 5,919 5,828 2,577 8,144 6,439 17,883 0 827 668 3,889
Actor 6,501 508 10,106 13,831 7,546 14,019 0 0 2,054 0 1,052 515 1,550 8,117 2,680 0 1,552 2,519 7,000 435 2,912 0 10,633 0 25 0 2,054
Artist 96,282 15,621 20,180 34,898 14,603 32,562 0 30,266 3,193 0 1,052 823 4,348 16,656 3,195 1,567 2,404 6,633 10,118 1,276 7,805 2,867 21,896 0 1,931 0 4,449
Musical Artist 45,089 15,113 6,924 14,594 6,332 11,138 0 9,015 1,139 0 0 76 2,121 5,959 0 200 48 0 2,525 614 3,499 2,186 8,566 0 434 0 1,049
Politician 40,343 4,893 10,639 7,460 4,110 11,461 0 0 1,901 0 977 316 0 1,805 0 792 0 1,025 707 63 0 0 2,849 283 30 0 1,552
Scientist 18,233 0 0 4,626 6,242 2,431 0 9,322 0 604 0 0 189 1,309 356 19 872 487 737 421 0 612 1,148 0 1,002 0 1,042
Place 735,062 177,524 211,084 156,377 123,114 148,586 168,082 91,099 74,835 0 50,969 1,385 1,063 202,393 21,582 5,000 10,865 19,992 11,031 12,634 10,697 7,068 20,669 11,182 11,820 0 78,506
PopulatedPlace 478,351 160,582 191,208 133,947 114,155 118,716 96,556 86,137 74,344 0 48,804 1,385 128 183,335 16,661 4,132 5,321 16,331 3,472 12,148 8,865 4,493 4,889 376 10,031 0 76,063
Building 68,582 3,888 2,549 4,455 228 6,926 990 82 0 0 1,013 0 23 2,373 293 30 236 513 306 42 427 236 1,106 0 324 0 173
Airport 13,649 1,069 3,392 1,921 720 1,499 2,087 0 0 0 0 0 0 1,050 174 0 0 187 528 78 0 1,330 565 0 12 0 132
Bridge 3,543 216 249 444 108 0 711 0 0 0 0 0 0 305 0 0 0 107 31 0 337 0 0 0 0 0 27
River 26,295 2,099 1,859 3 4,397 3,957 7,949 4,598 0 0 469 0 696 1,257 1,712 54 378 730 163 172 0 101 792 0 537 0 759
Organisation 241,286 15,554 15,288 15,955 13,625 27,542 28,935 15,414 1,623 0 912 177 0 12,234 1,327 2,336 3,924 4,829 6,448 704 5,190 2,734 8,475 77 765 0 3,479
Company 58,400 5,337 3,054 1,077 2,610 8,180 9,473 4,512 541 0 0 0 0 2,490 708 192 430 831 1,844 160 956 560 3,485 0 209 0 363
Educ. Institution 49,172 918 845 1,709 514 2,943 2,600 1,418 0 0 0 0 0 775 146 106 175 158 1,001 56 430 449 1,938 77 181 0 103
Band 30,572 0 4,054 0 5,076 5,177 6,462 4,656 297 0 324 104 0 2,057 0 250 1,348 949 1,395 14 21 0 0 0 87 0 336
SportsTeam 28,357 7,900 5,048 5,585 3,575 5,844 4,157 3,930 0 0 429 73 0 5,751 0 1,169 1,032 2,322 1,384 188 2,276 1,301 2,034 0 45 0 1,804
Work 411,295 78,975 37,363 50,374 43,263 61,212 38,945 45,848 6,113 372 1,065 119 4,822 30,126 8,489 16,072 5,204 12,449 9,068 1,221 11,226 6,802 29,139 30 503 0 3,800
Musical Work 180,308 31,309 17,406 21,379 20,815 22,065 7,540 12,445 230 0 151 0 2,808 8,774 4,189 1,018 1,792 5,374 2,137 379 4,338 2,804 8,900 0 9 0 911
Album 123,374 30,252 12,406 12,897 13,006 14,426 5,462 8,856 230 0 151 0 2,135 4,786 4,041 845 1,364 3,952 1,256 202 2,108 1,741 5,178 0 0 0 635
Single 45,433 0 5,000 7,296 6,271 5,621 1,532 3,589 0 0 0 0 673 2,982 0 166 428 1,276 833 169 2,055 1,063 3,722 0 9 0 268
Film 87,282 24,156 12,555 12,140 11,643 15,669 18,707 14,912 5,105 220 914 119 1,487 10,239 2,388 1,128 2,095 3,188 3,392 259 3,768 3,011 10,126 0 80 0 1,733
Book 31,029 6,083 1,687 2,217 1,343 3,549 0 18,491 109 123 0 0 241 953 597 13,133 556 608 216 239 754 241 547 30 67 0 220
Software 31,401 7,145 1,187 6,284 4,245 8,980 5,286 0 669 29 0 0 0 3,878 1,172 305 99 1,105 1,753 182 1,081 430 5,526 0 141 0 276
Television Show 29,466 570 3,071 3,544 3,433 4,373 3,399 0 0 0 0 0 0 2,077 0 186 559 1,152 1,285 117 728 0 888 0 0 0 530
Event 45,377 18,118 4,064 5,050 6,488 23,123 3,294 3,519 0 0 0 0 9 5,552 1,429 359 753 3,627 1,720 1,220 2,006 758 1,864 0 381 0 1,539
Celestial Body 32,864 16,974 13,312 2,541 15,648 0 5,199 544 0 0 0 0 0 1,666 7 2,146 138 10,581 0 735 689 0 3,666 0 59 4,600 5,512
Species 252,166 22,810 23,474 64,950 47,407 0 28,491 23,303 35,440 0 0 749 0 129,539 0 0 3,897 94 6,290 7 0 8,068 10,603 0 2,448 0 7,542
Mean Of Transportation 50,984 6,692 4,293 5,019 3,401 2,005 6,681 0 0 0 0 0 0 2,683 1,207 553 148 1,201 488 429 576 769 2,048 0 26 0 1,073
Disease 6,078 1,868 1,448 2,397 1,134 0 0 0 0 0 0 0 0 1,088 290 286 29 0 536 283 300 0 548 0 192 0 587


3. Cross-Language Overlap


For detailed statistics about the overlap of the DBpedia data sets in different languages, please refer to Cross-Language Overlap Statistics.