{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T13:47:18Z","timestamp":1764251238730},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2223,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Recognizing words that are key to a document is important for ranking relevant scientific documents. Traditionally, important words in a document are either nominated subjectively by authors and indexers or selected objectively by some statistical measures. As an alternative, we propose to use documents' words popularity in user queries to identify click-words, a set of prominent words from the users' perspective. Although they often overlap, click-words differ significantly from other document keywords.<\/jats:p>\n               <jats:p>Results: We developed a machine learning approach to learn the unique characteristics of click-words. Each word was represented by a set of features that included different types of information, such as semantic type, part of speech tag, term frequency\u2013inverse document frequency (TF\u2013IDF) weight and location in the abstract. We identified the most important features and evaluated our model using 6 months of PubMed click-through logs. Our results suggest that, in addition to carrying high TF\u2013IDF weight, click-words tend to be biomedical entities, to exist in article titles, and to occur repeatedly in article abstracts. Given the abstract and title of a document, we are able to accurately predict the words likely to appear in user queries that lead to document clicks.<\/jats:p>\n               <jats:p>Contact: \u00a0luzh@ncbi.nlm.nih.gov<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq459","type":"journal-article","created":{"date-parts":[[2010,9,2]],"date-time":"2010-09-02T02:00:56Z","timestamp":1283392856000},"page":"2767-2775","source":"Crossref","is-referenced-by-count":11,"title":["Click-words: learning to predict document keywords from a user perspective"],"prefix":"10.1093","volume":"26","author":[{"given":"Rezarta","family":"Islamaj Do\u011fan","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2010,9,1]]},"reference":[{"key":"2023012507542894400_B1","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1093\/bioinformatics\/14.7.600","article-title":"Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families","volume":"14","author":"Andrade","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012507542894400_B2","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","author":"Aronson","year":"2001","journal-title":"Proc. AMIA Symp."},{"key":"2023012507542894400_B3","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1145\/1367497.1367529","article-title":"Online learning from click data for sponsored search","volume-title":"WWW '08: Proceeding of the 17th International Conference on World Wide Web","author":"Ciaramita","year":"2008"},{"key":"2023012507542894400_B4","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1145\/1390334.1390392","article-title":"A user browsing model to predict search engine click data from past observations","volume-title":"SIGIR'08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Dupret","year":"2008"},{"key":"2023012507542894400_B5","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1111\/j.1553-2712.1999.tb00392.x","article-title":"The effect of abbreviations on MEDLINE searching","volume":"6","author":"Federiuk","year":"1999","journal-title":"Acad. Emerg. Med."},{"key":"2023012507542894400_B6","first-page":"61","article-title":"Using the wisdom of the crowds for keyword generation","volume-title":"International Conference on World Wide Web (WWW)","author":"Fuxman","year":"2008"},{"key":"2023012507542894400_B7","first-page":"17","article-title":"Improving rankings in small-scale Web search using click-implied descriptions","author":"Hawking","year":"2006","journal-title":"Aust. J. Intell. Inf. Process. Syst."},{"key":"2023012507542894400_B8","volume-title":"Information Retrieval: A Health and Biomedical Perspective.","author":"Hersh","year":"2003"},{"key":"2023012507542894400_B9","doi-asserted-by":"crossref","first-page":"216","DOI":"10.3115\/1119355.1119383","article-title":"Improved automatic keyword extraction given more linguistic knowledge","volume-title":"Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing","author":"Hulth","year":"2003"},{"key":"2023012507542894400_B10","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bap018","article-title":"Understanding PubMed(R) user search behavior through log analysis","author":"Islamaj Do\u011fan","year":"2009","journal-title":"Database"},{"key":"2023012507542894400_B11","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1145\/1571941.1571950","article-title":"Global ranking by exploiting user clicks","volume-title":"SIGIR'09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Ji","year":"2009"},{"key":"2023012507542894400_B12","doi-asserted-by":"crossref","first-page":"756","DOI":"10.1145\/1571941.1572113","article-title":"A ranking approach to keyphrase extraction","volume-title":"SIGIR'09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Jiang","year":"2009"},{"key":"2023012507542894400_B13","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1145\/1148170.1148265","article-title":"Learning to advertise","volume-title":"SIGIR'06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Lacerda","year":"2006"},{"key":"2023012507542894400_B14","doi-asserted-by":"crossref","first-page":"17","DOI":"10.3115\/1613172.1613178","article-title":"Graph-based keyword extraction for single-document summarization","volume-title":"MMIES'08: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization","author":"Litvak","year":"2008"},{"key":"2023012507542894400_B15","first-page":"620","article-title":"Unsupervised approaches for automatic keyword extraction using meeting transcripts","volume-title":"NAACL'09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Liu","year":"2009"},{"key":"2023012507542894400_B16","first-page":"394","article-title":"Comparison of two Schemes for automatic keyword extraction from MEDLINE for functional gene clustering","volume-title":"CSB'04: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference","author":"Liu","year":"2004"},{"key":"2023012507542894400_B17","first-page":"292","article-title":"Text mining functional keywords associated with genes","volume":"107","author":"Liu","year":"2004","journal-title":"Stud. Health Technol. Inform."},{"key":"2023012507542894400_B18","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1197\/jamia.M2935","article-title":"Evaluating relevance ranking strategies for MEDLINE retrieval","volume":"16","author":"Lu","year":"2009","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2023012507542894400_B19","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval.","author":"Manning","year":"2008"},{"key":"2023012507542894400_B20","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1142\/S0218213004001466","article-title":"Keyword extraction from a single document using word co-occurrence statistical information","volume":"13","author":"Matsuo","year":"2003","journal-title":"Int. J. Artif. Intell. Tools"},{"key":"2023012507542894400_B21","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term-weighting approaches in automatic text retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Inf. Process. Manag."},{"key":"2023012507542894400_B22","first-page":"341","article-title":"Mining web query hierarchies from clickthrough data","volume-title":"AAAI'07: Proceedings of the 22nd National Conference on Artificial Intelligence","author":"Shen","year":"2007"},{"key":"2023012507542894400_B23","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1186\/1471-2105-9-402","article-title":"Abbreviation definition identification based on automatic precision estimates","volume":"9","author":"Sohn","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012507542894400_B24","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1093\/bioinformatics\/bth227","article-title":"MedPost: a part-of-speech tagger for bioMedical text","volume":"20","author":"Smith","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012507542894400_B25","doi-asserted-by":"crossref","first-page":"28","DOI":"10.3115\/1572306.1572311","article-title":"Mining the biomedical literature for genic information","volume-title":"BioNLP'08: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing","author":"Tudor","year":"2008"},{"key":"2023012507542894400_B26","doi-asserted-by":"crossref","first-page":"3031","DOI":"10.1093\/bioinformatics\/btp475","article-title":"PubMed-EX: a web browser extension to enhance PubMed search with text mining features","volume":"25","author":"Tsai","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012507542894400_B27","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.1093\/bioinformatics\/btn469","article-title":"FACTA: a text search engine for finding associated biomedical concepts","volume":"24","author":"Tsuruoka","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012507542894400_B28","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1002\/asi.20979","article-title":"How to interpret PubMed queries and why it matters","volume":"60","author":"Yeganova","year":"2009","journal-title":"JASIST"},{"key":"2023012507542894400_B29","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1145\/1135777.1135813","article-title":"Finding advertising keywords on web pages","volume-title":"WWW'06: Proceedings of the 15th international conference on World Wide Web","author":"Yih","year":"2006"},{"key":"2023012507542894400_B30","doi-asserted-by":"crossref","first-page":"918","DOI":"10.1145\/1015330.1015332","article-title":"Solving large scale linear prediction problems using stochastic gradient descent algorithms","volume-title":"Twenty-first International Conference on Machine Learning","author":"Zhang","year":"2004"},{"issue":"15","key":"2023012507542894400_B31","doi-asserted-by":"crossref","first-page":"1944","DOI":"10.1093\/bioinformatics\/btp338","article-title":"Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity","volume":"25","author":"Zhu","year":"2009","journal-title":"Bioinformatics."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2767\/48852851\/bioinformatics_26_21_2767.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2767\/48852851\/bioinformatics_26_21_2767.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:55:13Z","timestamp":1674633313000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/21\/2767\/212598"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9,1]]},"references-count":31,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2010,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq459","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,11,1]]},"published":{"date-parts":[[2010,9,1]]}}}