{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:07:59Z","timestamp":1760242079194,"version":"build-2065373602"},"reference-count":53,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,12,25]],"date-time":"2018-12-25T00:00:00Z","timestamp":1545696000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN-2017-04031, RGPIN-2017-04323"],"award-info":[{"award-number":["RGPIN-2017-04031, RGPIN-2017-04323"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"UOttawa Startup Fund","award":["602599"],"award-info":[{"award-number":["602599"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This article presents and evaluates a method for the detection of DBpedia types and entities that can be used for knowledge base completion and maintenance. This method compares entity embeddings with traditional N-gram models coupled with clustering and classification. We tackle two challenges: (a) the detection of entity types, which can be used to detect invalid DBpedia types and assign DBpedia types for type-less entities; and (b) the detection of invalid entities in the resource description of a DBpedia entity. Our results show that entity embeddings outperform n-gram models for type and entity detection and can contribute to the improvement of DBpedia\u2019s quality, maintenance, and evolution.<\/jats:p>","DOI":"10.3390\/info10010006","type":"journal-article","created":{"date-parts":[[2018,12,26]],"date-time":"2018-12-26T04:29:54Z","timestamp":1545798594000},"page":"6","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["A Comparison of Word Embeddings and N-gram Models for DBpedia Type and Invalid Entity Detection"],"prefix":"10.3390","volume":"10","author":[{"given":"Hanqing","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa ON K1N 6N5, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amal","family":"Zouaq","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa ON K1N 6N5, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Diana","family":"Inkpen","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa ON K1N 6N5, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,12,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1038\/scientificamerican0501-34","article-title":"The semantic web","volume":"284","author":"Hendler","year":"2001","journal-title":"Sci. Am."},{"key":"ref_2","first-page":"1","article-title":"Linked Data\u2014The Story So Far","volume":"5","author":"Bizer","year":"2009","journal-title":"Int. J. Semant. Web Inf. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.websem.2009.07.002","article-title":"DBpedia\u2014A crystallization point for the Web of Data","volume":"7","author":"Bizer","year":"2009","journal-title":"J. Web Semant."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.ieri.2014.08.018","article-title":"Knowledge-based Data Mining Using Semantic Web","volume":"7","author":"Kabir","year":"2014","journal-title":"IERI Procedia"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"542","DOI":"10.1016\/j.autcon.2013.07.002","article-title":"Architecture of an open knowledge base for sustainable buildings based on Linked Data technologies","volume":"35","author":"Dirnbek","year":"2013","journal-title":"Autom. Construct."},{"key":"ref_6","unstructured":"Syed, Z., and Finin, T. (2010, January 22\u201324). Creating and Exploiting a Hybrid Knowledge Base for Linked Data. Proceedings of the Second International Conference, ICAART 2010, Valencia, Spain."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hellmann, S., Stadler, C., Lehmann, J., and Auer, S. (2009, January 1\u20136). DBpedia live extraction. Proceedings of the OTM Confederated International Conferences \u201cOn the Move to Meaningful Internet Systems\u201d, Vilamoura, Portugal.","DOI":"10.1007\/978-3-642-05151-7_33"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Moran, S. (2012). Using Linked Data to Create a Typological Knowledge Base. Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata, Springer.","DOI":"10.1007\/978-3-642-28249-2_13"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Morsey, M., Lehmann, J., Auer, S., Stadler, C., and Hellmann, S. (2012). DBpedia and the live extraction of structured data from Wikipedia. Program.","DOI":"10.1108\/00330331211221828"},{"key":"ref_10","unstructured":"Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007, January 11\u201315). DBpedia: A nucleus for a Web of open data. Proceedings of the 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"167","DOI":"10.3233\/SW-140134","article-title":"DBpedia\u2014A large-scale, multilingual knowledge base extracted from Wikipedia","volume":"6","author":"Lehmann","year":"2015","journal-title":"Semant. Web"},{"key":"ref_12","unstructured":"Zhang, Z., Chen, S., and Feng, Z. (2013, January 25\u201328). Semantic annotation for web services based on DBpedia. Proceedings of the 2013 IEEE 7th International Symposium on Service-Oriented System Engineering (SOSE 2013), San Francisco Bay, CA, USA."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/MIS.2003.1179189","article-title":"Automatic Ontology-Based Knowledge Extraction from Web Documents","volume":"18","author":"Alani","year":"2003","journal-title":"IEEE Intell. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Keong, B.V., and Anthony, P. (2011, January 27\u201329). Meta search engine powered by DBpedia. Proceedings of the 2011 International Conference on Semantic Technology and Information Retrieval (STAIR 2011), Kuala Lumpur, Malaysia.","DOI":"10.1109\/STAIR.2011.5995770"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Meimaris, M., Papastefanatos, G., Mamoulis, N., and Anagnostopoulos, I. (2017, January 19\u201322). Extended Characteristic Sets: Graph Indexing for SPARQL Query Optimization. Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA.","DOI":"10.1109\/ICDE.2017.106"},{"key":"ref_16","unstructured":"Vrande\u010di\u0107, D., Bontcheva, K., Su\u00e1rez-Figueroa, M.C., Celino, V.P., Sabou, M., Kaffee, Lu., and Simperl, E. (2018, January 8\u201312). Towards Empty Answers in SPARQL: Approximating Querying with RDF Embedding. Proceedings of the 17th International Semantic Web Conference, Monterey, CA, USA."},{"key":"ref_17","unstructured":"Azmy, M., Shi, P., Lin, J., and Ilyas, I.F. (2018, January 20\u201326). Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Fleischhacker, D., Paulheim, H., Bryl, V., V\u00f6lker, J., and Bizer, C. (2014, January 19\u201323). Detecting errors in numerical linked data using cross-checked outlier detection. Proceedings of the 13th International Semantic Web Conference, Riva del Garda, Italy.","DOI":"10.1007\/978-3-319-11964-9_23"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Font, L., Zouaq, A., and Gagnon, M. (2015, January 23\u201327). Assessing the Quality of Domain Concepts Descriptions in DBpedia. Proceedings of the 11th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS 2015), Bangkok, Thailand.","DOI":"10.1109\/SITIS.2015.104"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sheng, Z., Wang, X., Shi, H., and Feng, Z. (2012, January 26\u201328). Checking and handling inconsistency of DBpedia. Proceedings of the International Conference on Web Information Systems and Mining (WISM 2012), Chengdu, China.","DOI":"10.1007\/978-3-642-33469-6_60"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"T\u00f6pper, G., Knuth, M., and Sack, H. (2012, January 5\u20137). DBpedia ontology enrichment for inconsistency detection. Proceedings of the 8th International Conference on Semantic Systems\u2014I-SEMANTICS\u201912, Graz, Austria.","DOI":"10.1145\/2362499.2362505"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wienand, D., and Paulheim, H. (2014, January 25\u201329). Detecting incorrect numerical data in DBpedia. Proceedings of the 11th International Conference, The Semantic Web: Trends and Challenges (ESWC 2014), Anissaras, Greece.","DOI":"10.1007\/978-3-319-07443-6_34"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.websem.2014.11.001","article-title":"Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery","volume":"31","author":"Kliegr","year":"2015","journal-title":"J. Web Semant."},{"key":"ref_24","unstructured":"Kliegr, T., and Zamazal, O. (2014, January 26\u201331). Towards Linked Hypernyms Dataset 2.0: Complementing DBpedia with hypernym discovery. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland."},{"key":"ref_25","unstructured":"Chen, T., Tang, L.A., Sun, Y., Chen, Z., and Zhang, K. (2016, January 9\u201315). Entity embedding-based anomaly detection for heterogeneous categorical events. Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, New York, NY, USA."},{"key":"ref_26","unstructured":"Hu, Z., Huang, P., Deng, Y., Gao, Y., and Xing, E. (2015, January 26\u201331). Entity Hierarchy Embedding. Proceedings of the Association for Computational Linguistics 2015 (ACL 2015), Beijing, China."},{"key":"ref_27","first-page":"3111","article-title":"Distributed Representations of Words and Phrases and Their Compositionality","volume":"2","author":"Mikolov","year":"2013","journal-title":"Proc. Adv. Neural Inf. Process. Syst."},{"key":"ref_28","unstructured":"Mikolov, T., Corrado, G., Chen, K., and Dean, J. (2013, January 2\u20134). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), Scottsdale, Arizona."},{"key":"ref_29","unstructured":"Mikolov, T., Yih, W.-T., and Zweig, G. (2013, January 9\u201314). Linguistic Regularities in Continuous Space Word Representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013), Atlanta, GA, USA."},{"key":"ref_30","first-page":"93","article-title":"Named entity recognition using word embedding as a feature","volume":"10","author":"Seok","year":"2016","journal-title":"In. J. Softw. Eng. Its Appl."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ganguly, D., Roy, D., Mitra, M., and Jones, G.J.F. (2015, January 9\u201313). Word Embedding based Generalized Language Model for Information Retrieval. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval\u2014SIGIR\u201915, Santiago, Chile.","DOI":"10.1145\/2766462.2767780"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhou, G., He, T., Zhao, J., and Hu, P. (2015, January 26\u201331). Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China.","DOI":"10.3115\/v1\/P15-1025"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhou, H., Zouaq, A., and Inkpen, D. (2017, January 8\u201310). DBpedia entity type detection using entity embeddings and N-gram models. Proceedings of the 8th International Conference on Knowledge Engineering and the Semantic Web (KESW 2017), Szczecin, Poland.","DOI":"10.1007\/978-3-319-69548-8_21"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web.","DOI":"10.3233\/SW-160218"},{"key":"ref_35","unstructured":"F\u00e4rber, M., Ell, B., Menne, C., and Rettinger, A. (Semantic Web, 2015). A Comparative Survey of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO, Semantic Web."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zaveri, A., Kontokostas, D., Sherif, M.A., B\u00fchmann, L., Morsey, M., Auer, S., and Lehmann, J. (2013, January 4\u20136). User-driven quality evaluation of DBpedia. Proceedings of the 9th International Conference on Semantic Systems\u2013I-SEMANTICS\u201913, Graz, Austria.","DOI":"10.1145\/2506182.2506195"},{"key":"ref_37","first-page":"1","article-title":"Assessing and Improving Domain Knowledge Representation in DBpedia","volume":"4","author":"Font","year":"2017","journal-title":"Open J. Semant. Web"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lehmann, J., and B\u00fchmann, L. (2010, January 7\u201311). ORE\u2014A tool for repairing and enriching knowledge bases. Proceedings of the 9th International Semantic Web Conference (ISWC 2010), Shanghai, China.","DOI":"10.1007\/978-3-642-17749-1_12"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Gangemi, A., Nuzzolese, A., Presutti, V., Draicchio, F., Musetti, A., and Ciancarini, P. (2012, January 11\u201315). Automatic Typing of DBpedia Entities. Proceedings of the Semantic Web\u2014ISWC 11th International Semantic Web Conference, Boston, MA, USA.","DOI":"10.1007\/978-3-642-35176-1_5"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Haidar-Ahmad, L., Font, L., Zouaq, A., and Gagnon, M. (June, January 29). Entity typing and linking using SPARQL patterns and DBpedia. Proceedings of the Third SemWebEval Challenge at ESWC 2016, Heraklion, Greece.","DOI":"10.1007\/978-3-319-46565-4_5"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Paulheim, H., and Bizer, C. (2013, January 21\u201325). Type inference on noisy RDF data. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia.","DOI":"10.1007\/978-3-642-41335-3_32"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Paulheim, H., and Bizer, C. (2014). Improving the Quality of Linked Data Using Statistical Distributions. Int. J. Semant. Web Inf. Syst.","DOI":"10.4018\/ijswis.2014040104"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"R\u00f6der, M., Usbeck, R., Speck, R., and Ngonga Ngomo, A.C. (June, January 31). CETUS\u2014A baseline approach to type extraction. Proceedings of the Second SemWebEval Challenge at ESWC 2015, Portoro\u017e, Slovenia.","DOI":"10.1007\/978-3-319-25518-7_2"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Van Erp, M., and Vossen, P. (2017, January 17\u201321). Entity typing using distributional semantics and DBpedia. Proceedings of the SWC 2016 International Workshops: KEKI and NLP&DBpedia, Kobe, Japan.","DOI":"10.1007\/978-3-319-68723-0_9"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Debattista, J., Lange, C., and Auer, S. (2016, January 2\u20134). A preliminary investigation towards improving linked data quality using distance-based outlier detection. Proceedings of the 6th Joint International Semantic Technology Conference (JIST 2016), Singapore.","DOI":"10.1007\/978-3-319-50112-3_9"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1002\/wics.101","article-title":"Principal component analysis","volume":"2","author":"Abdi","year":"2010","journal-title":"Wiley Interdiscip. Rev. Comput. Stat."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"2225","DOI":"10.1016\/j.patrec.2010.03.014","article-title":"Variable selection using random forests","volume":"31","author":"Genuer","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"19","DOI":"10.32614\/RJ-2015-018","article-title":"VSURF: An R Package for Variable Selection Using Random Forests","volume":"7","author":"Genuer","year":"2015","journal-title":"R J."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1447","DOI":"10.2174\/092986609789839250","article-title":"Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection","volume":"16","author":"Pan","year":"2009","journal-title":"Protein Pept. Lett."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Rogers, J., and Gunn, S. (2006, January 23\u201325). Identifying feature relevance using a random forest. Proceedings of the Statistical and Optimization Perspectives Workshop \u201cSubspace, Latent Structure and Feature Selection\u201d (SLSFS 2005), Bohinj, Slovenia.","DOI":"10.1007\/11752790_12"},{"key":"ref_51","first-page":"2825","article-title":"Scikit-Learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Grover, A., and Leskovec, J. (2016, January 13\u201317). node2vec. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u2014KDD\u201916, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939754"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 25\u201329). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/1\/6\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:36:02Z","timestamp":1760196962000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/1\/6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,25]]},"references-count":53,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["info10010006"],"URL":"https:\/\/doi.org\/10.3390\/info10010006","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2018,12,25]]}}}