{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T15:50:58Z","timestamp":1778860258629,"version":"3.51.4"},"reference-count":68,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2018,3,29]],"date-time":"2018-03-29T00:00:00Z","timestamp":1522281600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6 M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. We furthermore investigate the similarity of models for different languages and show an exemplary geographical breakdown of the information extracted. In a second series of experiments, we show how the approach can be transferred to DBkWik, a knowledge graph extracted from thousands of Wikis. We discuss the challenges and first results of extracting relations from a larger set of Wikis, using a less formalized knowledge graph.<\/jats:p>","DOI":"10.3390\/info9040075","type":"journal-article","created":{"date-parts":[[2018,3,29]],"date-time":"2018-03-29T12:51:56Z","timestamp":1522327916000},"page":"75","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Language-Agnostic Relation Extraction from Abstracts in Wikis"],"prefix":"10.3390","volume":"9","author":[{"given":"Nicolas","family":"Heist","sequence":"first","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim 68131, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sven","family":"Hertling","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim 68131, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4386-8195","authenticated-orcid":false,"given":"Heiko","family":"Paulheim","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim 68131, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,3,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"167","DOI":"10.3233\/SW-140134","article-title":"DBpedia\u2014A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia","volume":"6","author":"Lehmann","year":"2013","journal-title":"Semant. Web J."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor, J. (2008, January 9\u201312). Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada.","DOI":"10.1145\/1376616.1376746"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1145\/2629489","article-title":"Wikidata: A Free Collaborative Knowledge Base","volume":"57","year":"2014","journal-title":"Commun. ACM"},{"key":"ref_4","unstructured":"Mahdisoltani, F., Biega, J., and Suchanek, F.M. (2015, January 4\u20137). YAGO3: A Knowledge Base from Multilingual Wikipedias. Proceedings of the Conference on Innovative Data Systems Research, Asilomar, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ringler, D., and Paulheim, H. (2017, January 25\u201329). One knowledge graph to rule them all? Analyzing the differences between DBpedia, YAGO, Wikidata & Co. Proceedings of the German Conference on Artificial Intelligence (K\u00fcnstliche Intelligenz), Dortmund, Germany.","DOI":"10.1007\/978-3-319-67190-1_33"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"489","DOI":"10.3233\/SW-160218","article-title":"Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods","volume":"8","author":"Paulheim","year":"2016","journal-title":"Semant. Web"},{"key":"ref_7","unstructured":"Hofmann, A., Perchani, S., Portisch, J., Hertling, S., and Paulheim, H. (2017, January 21\u201325). DBkWik: Towards knowledge graph creation from thousands of wikis. Proceedings of the International Semantic Web Conference (Posters and Demos), Vienna, Austria."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/bioinformatics\/btl616","article-title":"RelEx\u2014Relation extraction using dependency parse trees","volume":"23","author":"Fundel","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Schutz, A., and Buitelaar, P. (2005, January 6\u201310). Relext: A tool for relation extraction from text in ontology extension. Proceedings of the Semantic Web\u2014ISWC 2005, Galway, Ireland.","DOI":"10.1007\/11574620_43"},{"key":"ref_10","first-page":"1083","article-title":"Kernel methods for relation extraction","volume":"3","author":"Zelenko","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Bender, E.M. (2009, January 30). Linguistically na\u00efve! = language independent: Why NLP needs linguistic typology. Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?, Athens, Greece.","DOI":"10.3115\/1642038.1642044"},{"key":"ref_12","unstructured":"Leuf, B., and Cunningham, W. (2001). The Wiki Way: Quick Collaboration on the Web, Addison-Wesley."},{"key":"ref_13","unstructured":"(2018, March 29). MediaWiki. Available online: https:\/\/www.mediawiki.org\/."},{"key":"ref_14","unstructured":"(2018, March 29). Wiki Usage. Available online: https:\/\/trends.builtwith.com\/cms\/wiki."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Weaver, G., Strickland, B., and Crane, G. (2006, January 11\u201315). Quantifying the accuracy of relational statements in wikipedia: A methodology. Proceedings of the 6th ACM\/IEEE-CS Joint Conference on Digital Libraries (JCDL \u201906), Chapel Hill, NC, USA.","DOI":"10.1145\/1141753.1141853"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.artint.2012.04.005","article-title":"Evaluating entity linking with Wikipedia","volume":"194","author":"Hachey","year":"2013","journal-title":"Artif. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Moro, A., and Navigli, R. (2015, January 4\u20135). SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA.","DOI":"10.18653\/v1\/S15-2049"},{"key":"ref_18","unstructured":"Heist, N., and Paulheim, H. (2007, January 21\u201325). Language-agnostic relation extraction from wikipedia abstracts. Proceedings of the International Semantic Web Conference, Vienna, Austria."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009, January 2\u20137). Distant supervision for relation extraction without labeled data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore.","DOI":"10.3115\/1690219.1690287"},{"key":"ref_20","unstructured":"Aprosio, A.P., Giuliano, C., and Lavelli, A. (2013, January 22). Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia. Proceedings of the NLP&DBpedia, Sydney, Australia. CEUR Workshop Proceedings."},{"key":"ref_21","unstructured":"Gerber, D., and Ngomo, A.C.N. (2011, January 23\u201327). Bootstrapping the Linked Data web. Proceedings of the Workshop on Web Scale Knowledge Extraction, Bonn, Germany."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Nguyen, D.P., Matsuo, Y., and Ishizuka, M. (2007, January 22\u201326). Relation extraction from wikipedia using subtree mining. Proceedings of the National Conference on Artificial Intelligence, Vancouver, BC, Canada.","DOI":"10.3115\/1614108.1614140"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yan, Y., Okazaki, N., Matsuo, Y., Yang, Z., and Ishizuka, M. (2009, January 2\u20137). Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore.","DOI":"10.3115\/1690219.1690289"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lange, D., B\u00f6hm, C., and Naumann, F. (2010, January 26\u201330). Extracting structured information from Wikipedia articles to populate infoboxes. Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM), Toronto, ON, Canada.","DOI":"10.1145\/1871437.1871698"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wu, F., Hoffmann, R., and Weld, D.S. (2008, January 24\u201327). Information extraction from Wikipedia: Moving down the long tail. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA.","DOI":"10.1145\/1401890.1401978"},{"key":"ref_26","unstructured":"Wang, G., Yu, Y., and Zhu, H. (2007, January 11\u201315). PORE: Positive-only Relation Extraction from Wikipedia Text. Proceedings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, ISWC\u201907\/ASWC\u201907, Busan, Korea."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Faruqui, M., and Kumar, S. (arXiv, 2015). Multilingual open relation extraction using cross-lingual projection, arXiv.","DOI":"10.3115\/v1\/N15-1151"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Nguyen, T.H., and Grishman, R. (2015, January 5). Relation extraction: Perspective from convolutional neural networks. Proceedings of the NAACL-HLT, Denver, CO, USA.","DOI":"10.3115\/v1\/W15-1506"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Verga, P., Belanger, D., Strubell, E., Roth, B., and McCallum, A. (arXiv, 2015). Multilingual relation extraction using compositional universal schema, arXiv.","DOI":"10.18653\/v1\/N16-1103"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zeng, D., Liu, K., Chen, Y., and Zhao, J. (2015, January 17\u201321). Distant supervision for relation extraction via piecewise convolutional neural networks. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1203"},{"key":"ref_31","unstructured":"Zeng, D., Liu, K., Lai, S., Zhou, G., and Zhao, J. (2014, January 23\u201329). Relation Classification via Convolutional Deep Neural Network. Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland."},{"key":"ref_32","unstructured":"(2018, March 29). DBpedia(2014). Available online: http:\/\/oldwiki.dbpedia.org\/Downloads2014."},{"key":"ref_33","unstructured":"(2018, March 29). Web Ontology Language (OWL). Available online: https:\/\/www.w3.org\/OWL\/."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Gal\u00e1rraga, L.A., Teflioudi, C., Hose, K., and Suchanek, F. (2013, January 13\u201317). AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil.","DOI":"10.1145\/2488388.2488425"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Dong, X.L., Murphy, K., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Strohmann, T., Sun, S., and Zhang, W. (2014, January 24\u201327). Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2623330.2623623"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Cohen, W.W. (1995, January 9\u201312). Fast Effective Rule Induction. Proceedings of the Machine Learning, Twelfth International Conference on Machine Learning, Tahoe City, CA, USA.","DOI":"10.1016\/B978-1-55860-377-6.50023-2"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1017\/S0269888998214044","article-title":"Neural networks: A comprehensive foundation by Simon Haykin, Macmillan, 1994, ISBN 0-02-352781-7","volume":"13","author":"Kubat","year":"1999","journal-title":"Knowl. Eng. Rev."},{"key":"ref_39","unstructured":"Cristianini, N., and Shawe-Taylor, J. (2010). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press."},{"key":"ref_40","unstructured":"(2018, March 29). Rapidminer. Available online: http:\/\/www.rapidminer.com\/."},{"key":"ref_41","unstructured":"(2018, March 29). Code and data. Available online: http:\/\/dws.informatik.uni-mannheim.de\/en\/research\/language-agnostic-relation-extraction-from-wikipedia-abstracts."},{"key":"ref_42","unstructured":"(2018, March 29). Wikistats. Available online: http:\/\/wikistats.wmflabs.org\/display.php?t=wp."},{"key":"ref_43","unstructured":"(2018, March 29). DBpedia ontology. Available online: http:\/\/dbpedia.org\/services-resources\/ontology."},{"key":"ref_44","unstructured":"Paulheim, H. (June, January 28). Data-driven joint debugging of the DBpedia mappings and ontology. Proceedings of the European Semantic Web Conference, Portoroz, Slovenia."},{"key":"ref_45","unstructured":"(2018, March 29). Skinner(2017). Available online: https:\/\/tobyskinner.net\/2017\/06\/11\/the-worlds-most-prolific-writer\/."},{"key":"ref_46","unstructured":"(2018, March 29). Oxford Internet Institute(2013). Available online: http:\/\/geography.oii.ox.ac.uk\/?page=geographic-intersections-of-languages-in-wikipedia."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.websem.2015.06.004","article-title":"Mining the web of linked data with rapidminer","volume":"35","author":"Ristoski","year":"2015","journal-title":"Web Semant. Sci. Serv. Agents World Wide Web"},{"key":"ref_48","unstructured":"(2018, March 29). DBpedia Extraction Framework(2018). Available online: https:\/\/github.com\/dbpedia\/extraction-framework."},{"key":"ref_49","unstructured":"(2018, March 29). WikiApiary(2018). Available online: https:\/\/wikiapiary.com\/wiki\/Statistics."},{"key":"ref_50","unstructured":"(2018, March 29). Fandom(2018). Available online: http:\/\/fandom.wikia.com\/."},{"key":"ref_51","unstructured":"(2018, March 29). Alexa. Available online: http:\/\/www.alexa.com\/topsites\/category\/Computers\/Software\/Groupware\/Wiki\/WikiFarms."},{"key":"ref_52","unstructured":"(2018, March 29). DBkWik(2018a). Available online: http:\/\/dbkwik.webdatacommons.org."},{"key":"ref_53","unstructured":"(2018, March 29). DBkWik(2018b). Available online: http:\/\/data.dws.informatik.uni-mannheim.de\/dbkwik\/dbkwik-v1.0.tar.gz."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1007\/s007780100057","article-title":"A survey of approaches to automatic schema matching","volume":"10","author":"Rahm","year":"2001","journal-title":"VLDB J."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Shvaiko, P., and Euzenat, J. (2005). A survey of schema-based matching approaches. Journal on Data Semantics IV, Springer.","DOI":"10.1007\/11603412_5"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: An update","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explor. Newslett."},{"key":"ref_57","unstructured":"V\u00f6lker, J., and Niepert, M. (June, January 29). Statistical schema induction. Proceedings of the Extended Semantic Web Conference, Heraklion, Greece."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"T\u00f6pper, G., Knuth, M., and Sack, H. (2012, January 5\u20137). DBpedia ontology enrichment for inconsistency detection. Proceedings of the 8th International Conference on Semantic Systems, Graz, Austria.","DOI":"10.1145\/2362499.2362505"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Paulheim, H., and Bizer, C. (2013, January 21\u201325). Type inference on noisy rdf data. Proceedings of the International Semantic Web Conference, Sydney, Australia.","DOI":"10.1007\/978-3-642-41335-3_32"},{"key":"ref_60","unstructured":"Str\u00f6tgen, J., and Gertz, M. (2010, January 15\u201316). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. Proceedings of the 5th International Workshop on Semantic Evaluation, Los Angeles, CA, USA."},{"key":"ref_61","unstructured":"Paulheim, H. (, January 25\u201329). A robust number parser based on conditional random fields. Proceedings of the Joint German\/Austrian Conference on Artificial Intelligence (K\u00fcnstliche Intelligenz), Dortmund, Germany."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"63","DOI":"10.4018\/ijswis.2014040104","article-title":"Improving the Quality of Linked Data Using Statistical Distributions","volume":"10","author":"Paulheim","year":"2014","journal-title":"Int. J. Semant. Web Inf. Syst."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Paulheim, H., and Gangemi, A. (2015, January 11\u201315). Serving DBpedia with DOLCE\u2014More than Just Adding a Cherry on Top. Proceedings of the International Semantic Web Conference, Bethlehem, PA, USA. LNCS.","DOI":"10.1007\/978-3-319-25007-6_11"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.websem.2015.08.001","article-title":"DeFacto\u2014 Temporal and multilingual Deep Fact Validation","volume":"35","author":"Gerber","year":"2015","journal-title":"Web Semant. Sci. Serv. Agents World Wide Web"},{"key":"ref_65","first-page":"357","article-title":"Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection","volume":"Volume 8796","author":"Fleischhacker","year":"2014","journal-title":"Proceedings of the Semantic Web\u2014ISWC"},{"key":"ref_66","first-page":"504","article-title":"Detecting Incorrect Numerical Data in DBpedia","volume":"Volume 8465","author":"Wienand","year":"2014","journal-title":"Proceedings of the Semantic Web: Trends and Challenges"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Bryl, V., and Bizer, C. (2014, January 7\u201311). Learning conflict resolution strategies for cross-language wikipedia data fusion. Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea.","DOI":"10.1145\/2567948.2578999"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Mendes, P.N., Jakob, M., Garc\u00eda-Silva, A., and Bizer, C. (2011, January 7\u20139). DBpedia spotlight: Shedding light on the web of documents. Proceedings of the 7th International Conference on Semantic Systems, Graz, Austria.","DOI":"10.1145\/2063518.2063519"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/9\/4\/75\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:58:58Z","timestamp":1760194738000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/9\/4\/75"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3,29]]},"references-count":68,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2018,4]]}},"alternative-id":["info9040075"],"URL":"https:\/\/doi.org\/10.3390\/info9040075","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,3,29]]}}}