{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T02:24:23Z","timestamp":1755224663852,"version":"3.43.0"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:00:00Z","timestamp":1750982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The rise of transformer-based architectures has dramatically improved our ability to analyze natural language. However, the power and flexibility of these general-purpose models come at the cost of highly complex model architectures with billions of parameters that are not always needed.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this work, we present CSpace: a concise word embedding of biomedical concepts that outperforms all alternatives in terms of out-of-vocabulary ratio and semantic textual similarity task, and has comparable performance with respect to transformer-based alternatives in the sentence similarity task. This ability can serve as the foundation for semantic search by enabling efficient retrieval of conceptually related terms. Additionally, CSpace incorporates ontological identifiers (MeSH, NCBI gene and taxonomy IDs), enabling computationally efficient disease, gene or condition relatedness measurement, potentially unlocking previously unknown disease-condition associations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Full and compressed models are available on Zenodo at https:\/\/doi.org\/10.5281\/zenodo.14781672, while training code, examples, interactive visualizations and experiments are available at https:\/\/doi.org\/10.5281\/zenodo.15125706 and on the GitHub repository.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf376","type":"journal-article","created":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T07:36:58Z","timestamp":1751009818000},"source":"Crossref","is-referenced-by-count":0,"title":["CSpace: a concept embedding space for biomedical applications"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8427-4230","authenticated-orcid":false,"given":"Danilo","family":"Tomasoni","sequence":"first","affiliation":[{"name":"Fondazione The Microsoft Research\u2014University of Trento Centre for Computational and Systems Biology (COSBI) , 38068 Rovereto (TN),","place":["Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9043-7705","authenticated-orcid":false,"given":"Luca","family":"Marchetti","sequence":"additional","affiliation":[{"name":"Fondazione The Microsoft Research\u2014University of Trento Centre for Computational and Systems Biology (COSBI) , 38068 Rovereto (TN),","place":["Italy"]},{"name":"Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento , 38123 Povo (TN),","place":["Italy"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,27]]},"reference":[{"key":"2025081218484417800_btaf376-B1","doi-asserted-by":"crossref","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","author":"Bojanowski","year":"2017","journal-title":"2016.Transactions of the Association for Computational Linguistics"},{"first-page":"166","year":"2016","author":"Chiu","key":"2025081218484417800_btaf376-B2"},{"author":"Devlin","key":"2025081218484417800_btaf376-B3"},{"author":"Dupr\u00e9","key":"2025081218484417800_btaf376-B4"},{"key":"2025081218484417800_btaf376-B5","article-title":"POT: python optimal transport","author":"Flamary","year":"2021","journal-title":"J Mach Learn Res"},{"year":"2025","author":"Fondazione The Microsoft Research - COSBI","key":"2025081218484417800_btaf376-B20"},{"key":"2025081218484417800_btaf376-B6","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1080\/00437956.1954.11659520","article-title":"Distributional structure","volume":"10","author":"Harris","year":"1954","journal-title":"Word"},{"year":"2017","author":"Honnibal","key":"2025081218484417800_btaf376-B7"},{"year":"2015","author":"Kusner","key":"2025081218484417800_btaf376-B8"},{"author":"Mikolov","key":"2025081218484417800_btaf376-B9","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1301.3781"},{"year":"2013","author":"Mikolov","key":"2025081218484417800_btaf376-B10"},{"year":"2022","author":"Neelakantan","key":"2025081218484417800_btaf376-B11","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.10005"},{"author":"Neumann","key":"2025081218484417800_btaf376-B12"},{"key":"2025081218484417800_btaf376-B13","doi-asserted-by":"publisher","first-page":"103867","DOI":"10.1016\/j.jbi.2021.103867","article-title":"Improved biomedical word embeddings in the transformer era","volume":"120","author":"Noh","year":"2021","journal-title":"J Biomed Inform"},{"first-page":"572","year":"2010","author":"Pakhomov","key":"2025081218484417800_btaf376-B14"},{"key":"2025081218484417800_btaf376-B15","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1016\/j.jbi.2006.06.004","article-title":"Measures of semantic similarity and relatedness in the biomedical domain","volume":"40","author":"Pedersen","year":"2007","journal-title":"J Biomed Inform"},{"key":"2025081218484417800_btaf376-B16","doi-asserted-by":"publisher","first-page":"45","DOI":"10.13140\/2.1.2393.1847","author":"\u0158eh\u016f\u0159ek","year":"2010"},{"year":"2017","author":"Sanjeev","key":"2025081218484417800_btaf376-B17"},{"key":"2025081218484417800_btaf376-B18","doi-asserted-by":"publisher","first-page":"i49","DOI":"10.1093\/bioinformatics\/btx238","article-title":"BIOSSES: a semantic sentence similarity estimation system for the biomedical domain","volume":"33","author":"So\u011fanc\u0131o\u011flu","year":"2017","journal-title":"Bioinformatics"},{"year":"2024","author":"Sturua","key":"2025081218484417800_btaf376-B19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2409.10173"},{"author":"Vaswani","key":"2025081218484417800_btaf376-B21"},{"key":"2025081218484417800_btaf376-B22","doi-asserted-by":"publisher","first-page":"W540","DOI":"10.1093\/nar\/gkae235","article-title":"PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge","volume":"52","author":"Wei","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025081218484417800_btaf376-B23","doi-asserted-by":"publisher","first-page":"W587","DOI":"10.1093\/nar\/gkz389","article-title":"PubTator central: automated concept annotation for biomedical full text articles","volume":"47","author":"Wei","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025081218484417800_btaf376-B24","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1186\/s12931-024-03008-5","article-title":"Plasma genome-wide Mendelian randomization identifies potentially causal genes in idiopathic pulmonary fibrosis","volume":"25","author":"Zhang","year":"2024","journal-title":"Respir Res"},{"key":"2025081218484417800_btaf376-B25","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1038\/s41597-019-0055-0","article-title":"BioWordVec, improving biomedical word embeddings with subword information and MeSH","volume":"6","author":"Zhang","year":"2019","journal-title":"Sci Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf376\/63606578\/btaf376.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf376\/63606578\/btaf376.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf376\/63606578\/btaf376.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T22:48:47Z","timestamp":1755038927000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf376\/8176565"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6,27]]},"references-count":25,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf376","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,6,27]]},"article-number":"btaf376"}}