{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T22:52:07Z","timestamp":1759963927360,"version":"3.38.0"},"reference-count":18,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2019,9,30]],"date-time":"2019-09-30T00:00:00Z","timestamp":1569801600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"NIH","award":["1U24HL126126-01"],"award-info":[{"award-number":["1U24HL126126-01"]}]},{"name":"UTHealth Innovation for Cancer Prevention Research Training Program","award":["RP160015"],"award-info":[{"award-number":["RP160015"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Health Informatics J"],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:p> Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning\u2013based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) domain knowledge features and (2) unsupervised word representation features of clustered and binarized word embeddings. Our best system achieved an F-measure of 91.79% for recognizing software from titles and an F-measure of 86.35% for recognizing software from both titles and abstracts using inexact matching criteria. We then created a biomedical software catalog with 19,557 entries using the developed system. This study demonstrates the feasibility of using natural language processing methods to automatically build a high-quality software index from biomedical literature. <\/jats:p>","DOI":"10.1177\/1460458219869490","type":"journal-article","created":{"date-parts":[[2019,10,1]],"date-time":"2019-10-01T03:11:01Z","timestamp":1569899461000},"page":"21-33","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Recognizing software names in biomedical literature using machine learning"],"prefix":"10.1177","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8665-0201","authenticated-orcid":false,"given":"Qiang","family":"Wei","sequence":"first","affiliation":[]},{"given":"Yaoyun","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Muhammad","family":"Amith","sequence":"additional","affiliation":[{"name":"The University of Texas Health Science Center at Houston, USA"}]},{"given":"Rebecca","family":"Lin","sequence":"additional","affiliation":[{"name":"Johns Hopkins University, USA"}]},{"given":"Jenay","family":"Lapeyrolerie","sequence":"additional","affiliation":[{"name":"Baylor University, USA"}]},{"given":"Cui","family":"Tao","sequence":"additional","affiliation":[]},{"given":"Hua","family":"Xu","sequence":"additional","affiliation":[{"name":"The University of Texas Health Science Center at Houston, USA"}]}],"member":"179","published-online":{"date-parts":[[2019,9,30]]},"reference":[{"key":"bibr1-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2004-5-10-r80"},{"key":"bibr2-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btt100"},{"key":"bibr3-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkq394"},{"key":"bibr4-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkv1116"},{"key":"bibr5-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bau069"},{"key":"bibr6-1460458219869490","unstructured":"Wang W, Bleakley B, Ju C, et al. Aztec: a platform to render biomedical software findable, accessible, interoperable, and reusable, 2017, https:\/\/arxiv.org\/abs\/1706.06087"},{"key":"bibr7-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0146300"},{"key":"bibr8-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0157989"},{"key":"bibr9-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocx132"},{"key":"bibr10-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2013.03.002"},{"key":"bibr11-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-7-S1-S8"},{"key":"bibr12-1460458219869490","unstructured":"Van Antwerp M, Madey G. Advances in the sourceforge research data archive. In: Workshop on public data about software development (WoPDaSD) at the 4th international conference on open source systems, Milan, 2008, https:\/\/flosshub.org\/sites\/flosshub.org\/files\/srda2008.pdf"},{"first-page":"160","volume-title":"Proceedings of the 25th international conference on machine learning","author":"Collobert R","key":"bibr13-1460458219869490"},{"key":"bibr14-1460458219869490","unstructured":"Mnih A, Hinton GE. A scalable hierarchical distributed language model. In: Advances in neural information processing systems, Vancouver, BC, Canada, 2009, pp. 1081\u20131088, https:\/\/papers.nips.cc\/paper\/3583-a-scalable-hierarchical-distributed-language-model.pdf"},{"key":"bibr15-1460458219869490","unstructured":"Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space, 2013, https:\/\/arxiv.org\/abs\/1301.3781"},{"first-page":"110","volume-title":"Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)","author":"Guo J","key":"bibr16-1460458219869490"},{"key":"bibr17-1460458219869490","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-14-194"},{"first-page":"386","volume-title":"International conference on industrial, engineering and other applications of applied intelligent systems","author":"Amith M","key":"bibr18-1460458219869490"}],"container-title":["Health Informatics Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1460458219869490","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1460458219869490","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1460458219869490","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T01:18:39Z","timestamp":1740878319000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1460458219869490"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,30]]},"references-count":18,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["10.1177\/1460458219869490"],"URL":"https:\/\/doi.org\/10.1177\/1460458219869490","relation":{},"ISSN":["1460-4582","1741-2811"],"issn-type":[{"type":"print","value":"1460-4582"},{"type":"electronic","value":"1741-2811"}],"subject":[],"published":{"date-parts":[[2019,9,30]]}}}