{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T05:34:45Z","timestamp":1741152885840,"version":"3.38.0"},"reference-count":44,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2013,2,8]],"date-time":"2013-02-08T00:00:00Z","timestamp":1360281600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2013,6]]},"abstract":"<jats:p> Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents\u2019 content. We have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. We first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. We have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of our proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods. <\/jats:p>","DOI":"10.1177\/0165551512472138","type":"journal-article","created":{"date-parts":[[2013,2,9]],"date-time":"2013-02-09T03:36:07Z","timestamp":1360380967000},"page":"410-426","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":17,"title":["Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms"],"prefix":"10.1177","volume":"39","author":[{"given":"Arash","family":"Joorabchi","sequence":"first","affiliation":[{"name":"Department of Electronic and Computer Engineering, University of Limerick, Ireland"}]},{"given":"Abdulhussain E.","family":"Mahdi","sequence":"additional","affiliation":[{"name":"Department of Electronic and Computer Engineering, University of Limerick, Ireland"}]}],"member":"179","published-online":{"date-parts":[[2013,2,8]]},"reference":[{"key":"bibr1-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1016\/j.molcel.2006.02.012"},{"key":"bibr2-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-008-9044-1"},{"key":"bibr3-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10068"},{"volume-title":"18th international conference on World wide web","author":"Grineva M","key":"bibr4-0165551512472138"},{"key":"bibr5-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1177\/0165551510388080"},{"volume-title":"Fourth ACM conference on Digital libraries","author":"Witten IH","key":"bibr6-0165551512472138"},{"key":"bibr7-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009976227802"},{"first-page":"434","volume-title":"Proceedings of the 18th international joint conference on artificial intelligence","author":"Turney PD","key":"bibr8-0165551512472138"},{"first-page":"317","volume-title":"Proceedings of the 10th international conference on Asian digital libraries: Looking back 10 years and forging new frontiers","author":"Nguyen TD","key":"bibr9-0165551512472138"},{"key":"bibr10-0165551512472138","first-page":"82","author":"Mark\u00f3 KG","year":"2004","journal-title":"Computer-assisted information retrieval (recherche d'information et ses applications) \u2013 RIAO"},{"volume-title":"Ontologies and Information Extraction. Workshop at EUROLAN\u20192003","year":"2003","author":"Pouliquen B","key":"bibr11-0165551512472138"},{"first-page":"296","volume-title":"Proceedings of the 6th ACM\/IEEE-CS joint conference on digital libraries","author":"Medelyan O","key":"bibr12-0165551512472138"},{"key":"bibr13-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20790"},{"first-page":"442","volume-title":"Proceedings of the 2006 IEEE\/WIC\/ACM international conference on web intelligence","author":"Milne D","key":"bibr14-0165551512472138"},{"key":"bibr15-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijhcs.2009.05.004"},{"volume-title":"First AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI'08)","author":"Medelyan O","key":"bibr16-0165551512472138"},{"key":"bibr17-0165551512472138","unstructured":"Medelyan O. Human-competitive automatic topic indexing. PhD thesis, University of Waikato, New Zealand, 2009, http:\/\/adt.waikato.ac.nz\/public\/adt-uow20091029.160923 (accessed 11 March 2012)."},{"volume-title":"New Zealand computer science research student conference","author":"Milne D","key":"bibr18-0165551512472138"},{"key":"bibr19-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"volume-title":"Proceedings of the Association for Computational Linguistics (ACL 2008)","author":"Csomai A","key":"bibr20-0165551512472138"},{"key":"bibr21-0165551512472138","unstructured":"Hulth A. Combining machine learning and natural language processing for automatic keyword extraction. PhD thesis, Stockholm University, 2004, http:\/\/people.dsv.su.se\/~hulth\/thesis_hulth.pdf (accessed 11 March 2012)."},{"first-page":"9","volume-title":"Proceedings of the workshop on multiword expressions: Identification, interpretation, disambiguation and applications","author":"Kim SN","key":"bibr22-0165551512472138"},{"key":"bibr23-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45486-1_4"},{"key":"bibr24-0165551512472138","unstructured":"enwiki dump progress on 22\/07\/2011 (Wikimedia dump service, 2011), http:\/\/dumps.wikimedia.org\/enwiki\/20110722\/ (accessed 11 March 2012)"},{"key":"bibr25-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"bibr26-0165551512472138","unstructured":"Porter MF. The English (Porter2) stemming algorithm (Snowball, 2002), http:\/\/snowball.tartarus.org\/algorithms\/english\/stemmer.html (accessed 11 March 2012)."},{"first-page":"509","volume-title":"Proceedings of the 17th ACM conference on Information and knowledge management","author":"Milne D","key":"bibr27-0165551512472138"},{"volume-title":"First AAAI workshop on Wikipedia and artificial intelligence (WIKIAI\u201908)","author":"Milne D","key":"bibr28-0165551512472138"},{"key":"bibr29-0165551512472138","first-page":"1419","volume-title":"Proceedings of the 21st national conference on Artificial intelligence","volume":"2","author":"Strube M"},{"key":"bibr30-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1162\/LEON_a_00344"},{"key":"bibr31-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1109\/21.24528"},{"key":"bibr32-0165551512472138","doi-asserted-by":"crossref","first-page":"265","DOI":"10.7551\/mitpress\/7287.003.0018","volume-title":"WordNet: An electronic lexical database","author":"Leacock C","year":"1998"},{"key":"bibr33-0165551512472138","unstructured":"O\u2019Madadhain J, Fisher D, Nelson T, White S, Boey Y-B. JUNG 2.0 (released under the open source GPL licence, 2009), http:\/\/jung.sourceforge.net\/index.html (accessed 11 March 2012)."},{"key":"bibr34-0165551512472138","doi-asserted-by":"crossref","unstructured":"Bastian M, Heymann S, Jacomy M. Gephi: An open source software for exploring and manipulating networks (2009).","DOI":"10.1609\/icwsm.v3i1.13937"},{"key":"bibr35-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560591"},{"key":"bibr36-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324900000139"},{"key":"bibr37-0165551512472138","unstructured":"Luke S, Panait L, Balan G, Paus S, Skolicki Z, Popovici E, Sullivan K, Harrison J, Bassett J, Hubley R, Chircop A, Compton J, Haddon W, Donnelly S, Jamil B, Zelibor J, Kangas E, Abidi F, Mooers H, O\u2019Beirne AJ. ECJ \u2013 A Java-based evolutionary computation research system. George Mason University's Evolutionary Computation Laboratory, http:\/\/cs.gmu.edu\/~eclab\/projects\/ecj\/ (accessed 11 March 2012)."},{"key":"bibr38-0165551512472138","unstructured":"Medelyan O, Witten IH. Wiki-20 dataset, University of Waikato, New Zealand, 2009, http:\/\/maui-indexer.googlecode.com\/files\/wiki20.tar.gz (accessed 11 March 2012)."},{"key":"bibr39-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(81)90028-5"},{"volume-title":"Information retrieval","year":"1979","author":"Rijsbergen CJV","key":"bibr40-0165551512472138"},{"key":"bibr41-0165551512472138","doi-asserted-by":"publisher","DOI":"10.3115\/1699648.1699678"},{"key":"bibr42-0165551512472138","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630200313"},{"volume-title":"Proceedings of the third ACM conference on digital libraries","author":"Giles CL","key":"bibr43-0165551512472138"},{"first-page":"248","volume-title":"Proceedings of the 5th international workshop on semantic evaluation","author":"Lopez P","key":"bibr44-0165551512472138"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551512472138","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551512472138","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551512472138","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T11:19:52Z","timestamp":1741087192000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551512472138"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2,8]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,6]]}},"alternative-id":["10.1177\/0165551512472138"],"URL":"https:\/\/doi.org\/10.1177\/0165551512472138","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"type":"print","value":"0165-5515"},{"type":"electronic","value":"1741-6485"}],"subject":[],"published":{"date-parts":[[2013,2,8]]}}}