{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,2]],"date-time":"2022-04-02T22:10:42Z","timestamp":1648937442053},"reference-count":47,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2013,10,14]],"date-time":"2013-10-14T00:00:00Z","timestamp":1381708800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2014,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper proposes a new method for semantic document analysis: densification, which identifies and ranks Wikipedia pages relevant to a given document. Although there are similarities with established tasks such as wikification and entity linking, the method does not aim for strict disambiguation of named entity mentions. Instead, densification uses existing links to rank additional articles that are relevant to the document, a form of explicit semantic indexing that enables higher-level semantic retrieval procedures that can be beneficial for a wide range of NLP applications. Because a gold standard for densification evaluation does not exist, a study is carried out to investigate the level of agreement achievable by humans, which questions the feasibility of creating an annotated data set. As a result, a semi-supervised approach is employed to develop a two-stage densification system: filtering unlikely candidate links and then ranking the remaining links. In a first evaluation experiment, Wikipedia articles are used to automatically estimate the performance in terms of recall. Results show that the proposed densification approach outperforms several wikification systems. A second experiment measures the impact of integrating the links predicted by the densification system into a semantic question answering (QA) system that relies on Wikipedia links to answer complex questions. Densification enables the QA system to find twice as many additional answers than when using a state-of-the-art wikification system.<\/jats:p>","DOI":"10.1017\/s1351324913000296","type":"journal-article","created":{"date-parts":[[2013,10,14]],"date-time":"2013-10-14T13:55:47Z","timestamp":1381758947000},"page":"469-500","source":"Crossref","is-referenced-by-count":2,"title":["Densification: Semantic document analysis using Wikipedia"],"prefix":"10.1017","volume":"20","author":[{"given":"IUSTIN","family":"DORNESCU","sequence":"first","affiliation":[]},{"given":"CONSTANTIN","family":"OR\u0102SAN","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2013,10,14]]},"reference":[{"key":"S1351324913000296_ref009","volume-title":"9th Extended Semantic Web Conference (ESWC2012)","author":"Damljanovic","year":"2012"},{"key":"S1351324913000296_ref028","doi-asserted-by":"publisher","DOI":"10.1002\/asi.22829"},{"key":"S1351324913000296_ref034","first-page":"1","volume-title":"Proceedings of the 7th International Conference on Semantic Systems (I-Semantics)","author":"Mendes","year":"2011"},{"key":"S1351324913000296_ref014","doi-asserted-by":"publisher","DOI":"10.1037\/h0031619"},{"key":"S1351324913000296_ref043","volume-title":"Proceedings of the Sixteenth Text REtrieval Conference (TREC)","author":"Schlaefer","year":"2007"},{"key":"S1351324913000296_ref026","volume-title":"Content Analysis: An Introduction to Its Methodology","author":"Krippendorff","year":"2004"},{"key":"S1351324913000296_ref002","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324913000296_ref030","first-page":"1","volume-title":"Proceedings of the 19th International Conference on Computational Linguistics-Volume 1","author":"Li","year":"2002"},{"key":"S1351324913000296_ref021","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324901002807"},{"key":"S1351324913000296_ref006","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.48"},{"key":"S1351324913000296_ref018","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324903003176"},{"key":"S1351324913000296_ref003","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-17746-0_6"},{"key":"S1351324913000296_ref036","first-page":"25","volume-title":"Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008)","author":"Milne","year":"2008"},{"key":"S1351324913000296_ref041","first-page":"212","volume-title":"CLEF 1","author":"Santos","year":"2009"},{"key":"S1351324913000296_ref046","doi-asserted-by":"publisher","DOI":"10.2307\/1412159"},{"key":"S1351324913000296_ref024","first-page":"344","volume-title":"CLEF","author":"Jijkoun","year":"2007"},{"key":"S1351324913000296_ref037","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1145\/1458082.1458150","volume-title":"Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008)","author":"Milne","year":"2008"},{"key":"S1351324913000296_ref027","first-page":"331","volume-title":"Proceedings of the Twelfth International Conference on Machine Learning","author":"Lang","year":"1995"},{"key":"S1351324913000296_ref032","volume-title":"Proceedings of the 2009 Text Analysis Conference","author":"McNamee","year":"2009"},{"key":"S1351324913000296_ref033","volume-title":"Proceedings of the Fifth Text Analysis Conference (TAC 2012)","author":"McNamee","year":"2012"},{"key":"S1351324913000296_ref001","first-page":"19","volume-title":"Proceedings of the 2nd Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources","author":"Bentivogli","year":"2010"},{"key":"S1351324913000296_ref012","unstructured":"Dornescu I. , 2012. Encyclopaedic Question Answering. PhD thesis. Wolverhampton: University of Wolverhampton, UK."},{"key":"S1351324913000296_ref004","first-page":"9","volume-title":"Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06)","author":"Bunescu","year":"2006"},{"key":"S1351324913000296_ref005","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073449"},{"key":"S1351324913000296_ref007","doi-asserted-by":"publisher","DOI":"10.1177\/001316446002000104"},{"key":"S1351324913000296_ref008","first-page":"708","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)","author":"Cucerzan","year":"2007"},{"key":"S1351324913000296_ref011","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15754-7_39"},{"key":"S1351324913000296_ref013","first-page":"277","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics","author":"Dredze","year":"2010"},{"key":"S1351324913000296_ref015","first-page":"1606","volume-title":"Proceedings of the Twentieth International Joint Conference for Artificial Intelligence","author":"Gabrilovich","year":"2007"},{"key":"S1351324913000296_ref016","first-page":"804","volume-title":"Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011)","author":"Gottipati","year":"2011"},{"key":"S1351324913000296_ref042","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04447-2_118"},{"key":"S1351324913000296_ref017","first-page":"945","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies \u2013 Volume 1","author":"Han","year":"2011"},{"key":"S1351324913000296_ref019","first-page":"560","volume-title":"The Oxford Handbook of Computational Linguistics","author":"Harabagiu","year":"2003"},{"key":"S1351324913000296_ref020","volume-title":"Lucene in Action","author":"Hatcher","year":"2004"},{"key":"S1351324913000296_ref022","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1145\/312624.312649","volume-title":"Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Hofmann","year":"1999"},{"key":"S1351324913000296_ref025","first-page":"1036","volume-title":"Proceedings of the 22nd Annual Conference of the Cognitive Science Society","author":"Kanerva","year":"2000"},{"key":"S1351324913000296_ref029","volume-title":"Proceedings of the 2009 Text Analysis Conference","author":"Li","year":"2009"},{"key":"S1351324913000296_ref031","volume-title":"Proceedings of the 2009 Text Analysis Conference","author":"McNamee","year":"2009"},{"key":"S1351324913000296_ref035","first-page":"233","volume-title":"Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM 2007)","author":"Mihalcea","year":"2007"},{"key":"S1351324913000296_ref038","doi-asserted-by":"publisher","DOI":"10.1016\/j.jal.2005.12.005"},{"key":"S1351324913000296_ref040","volume-title":"Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE","author":"Sahlgren","year":"2005"},{"key":"S1351324913000296_ref044","doi-asserted-by":"publisher","DOI":"10.1086\/266577"},{"key":"S1351324913000296_ref010","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"S1351324913000296_ref023","first-page":"1","volume-title":"HLT '01: Proceedings of the First International Conference on Human Language Technology Research","author":"Hovy","year":"2001"},{"key":"S1351324913000296_ref045","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/ptj\/85.3.257","article-title":"The kappa statistic in reliability studies: use, interpretation, and sample size requirements","volume":"85","author":"Sim","year":"2005","journal-title":"Physical Therapy"},{"key":"S1351324913000296_ref039","first-page":"1","volume-title":"Proceedings of the ISWC\u201911 Workshop on Web Scale Knowledge Extraction (WEKEX\u201911)","author":"Rizzo","year":"2011"},{"key":"S1351324913000296_ref047","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324901002789"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324913000296","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,9]],"date-time":"2022-03-09T02:38:29Z","timestamp":1646793509000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324913000296\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,10,14]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,10]]}},"alternative-id":["S1351324913000296"],"URL":"https:\/\/doi.org\/10.1017\/s1351324913000296","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,10,14]]}}}