{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T21:50:05Z","timestamp":1772833805574,"version":"3.50.1"},"reference-count":62,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,10,18]],"date-time":"2022-10-18T00:00:00Z","timestamp":1666051200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,18]],"date-time":"2022-10-18T00:00:00Z","timestamp":1666051200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000936","name":"Gordon and Betty Moore Foundation","doi-asserted-by":"publisher","award":["GBMF 4552"],"award-info":[{"award-number":["GBMF 4552"]}],"id":[{"id":"10.13039\/100000936","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000936","name":"Gordon and Betty Moore Foundation","doi-asserted-by":"publisher","award":["GBMF 4552"],"award-info":[{"award-number":["GBMF 4552"]}],"id":[{"id":"10.13039\/100000936","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000936","name":"Gordon and Betty Moore Foundation","doi-asserted-by":"publisher","award":["GBMF 4552"],"award-info":[{"award-number":["GBMF 4552"]}],"id":[{"id":"10.13039\/100000936","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["R01 HG010067"],"award-info":[{"award-number":["R01 HG010067"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BioData Mining"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s13040-022-00311-z","type":"journal-article","created":{"date-parts":[[2022,10,18]],"date-time":"2022-10-18T07:04:10Z","timestamp":1666076650000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts"],"prefix":"10.1186","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0002-5761","authenticated-orcid":false,"given":"David N.","family":"Nicholson","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3012-7446","authenticated-orcid":false,"given":"Daniel S.","family":"Himmelstein","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8713-9213","authenticated-orcid":false,"given":"Casey S.","family":"Greene","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,10,18]]},"reference":[{"key":"311_CR1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0084912","author":"r gramatica","year":"2014","unstructured":"Gramatica R, Di Matteo T, Giorgetti S, Barbiani M, Bevec D, Aste T. Graph theory enables drug repurposing \u2013 how a mathematical model can drive the discovery of hidden mechanisms of action. PLOS One. 2014. https:\/\/doi.org\/10.1371\/journal.pone.0084912. https:\/\/doi.org\/gf45zp. PMID: 24416311 \u00b7 PMCID: PMC3886994.","journal-title":"plos one"},{"key":"311_CR2","doi-asserted-by":"publisher","DOI":"10.1101\/385617","author":"m alshahrani","year":"2018","unstructured":"Alshahrani M, Hoehndorf R. Drug repurposing through joint learning on knowledge graphs and literature. Cold Spring Harbor Labor. 2018. https:\/\/doi.org\/10.1101\/385617 https:\/\/doi.org\/gf45zk.","journal-title":"cold spring harbor labor"},{"key":"311_CR3","doi-asserted-by":"publisher","DOI":"10.7554\/elife.26726","author":"ds himmelstein","year":"2017","unstructured":"Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, Green A, Khankhanian P, Baranzini SE. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife. 2017. https:\/\/doi.org\/10.7554\/elife.26726https:\/\/doi.org\/cdfk. PMID: 28936969 \u00b7 PMCID: PMC5640425.","journal-title":"elife"},{"key":"311_CR4","doi-asserted-by":"publisher","unstructured":"Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. in: proceedings of the joint conference of the 47th annual meeting of the acl and the 4th international joint conference on natural language processing of the afnlp: volume 2 - acl-ijcnlp \u201909. 2009. https:\/\/doi.org\/10.3115\/1690219.1690287.","DOI":"10.3115\/1690219.1690287"},{"key":"311_CR5","doi-asserted-by":"publisher","DOI":"10.1101\/444398","author":"a junge","year":"2018","unstructured":"Junge A, Jensen Lj. COCOSCORE: context-aware co-occurrence scoring for text mining applications using distant supervision. Cold Spring Harbor Labor. 2018. https:\/\/doi.org\/10.1101\/444398https:\/\/doi.org\/gf45zm.","journal-title":"cold spring harbor labor"},{"key":"311_CR6","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-019-2873-7","author":"h zhou","year":"2019","unstructured":"Zhou H, Lang C, Liu Z, Ning S, Lin Y, Du L. Knowledge-guided convolutional networks for chemical-disease relation extraction. BMC Bioinformatics. 2019. https:\/\/doi.org\/10.1186\/s12859-019-2873-7https:\/\/doi.org\/gf45zn. \u00b7 PMID: 31113357 \u00b7 PMCID: PMC6528333.","journal-title":"bmc bioinformatics"},{"key":"311_CR7","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbn043","author":"r winnenburg","year":"2008","unstructured":"Winnenburg R, Wachter T, Plake C, Doms A, Schroeder M. Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? brief bioinformatics. 2008. https:\/\/doi.org\/10.1093\/bib\/bbn043https:\/\/doi.org\/bfsnwg. PMID: 19060303.","journal-title":"brief bioinformatics"},{"key":"311_CR8","doi-asserted-by":"publisher","unstructured":"Baumgartner WA Jr, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007. https:\/\/doi.org\/10.1093\/bioinformatics\/btm229https:\/\/doi.org\/dtck86. PMID: 17646325 \u00b7 PMCID: PMC2516305.","DOI":"10.1093\/bioinformatics\/btm229"},{"key":"311_CR9","doi-asserted-by":"publisher","DOI":"10.1002\/asi.23329","author":"l bornmann","year":"2015","unstructured":"Bornmann L, Mutz R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inf Sci Technol. 2015. https:\/\/doi.org\/10.1002\/asi.23329https:\/\/doi.org\/gfj5zc).","journal-title":"j assoc inf sci technol"},{"key":"311_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.ymeth.2014.11.020","author":"s pletscher-frankild","year":"2015","unstructured":"Pletscher-Frankild S, Pallej\u00e0 A, Tsafou K, Binder JX, Jensen LJ. diseases: text mining and data integration of disease\u2013gene associations. Methods. 2015. https:\/\/doi.org\/10.1016\/j.ymeth.2014.11.020 (https:\/\/doi.org\/f3mn6s pmid: 25484339).","journal-title":"methods"},{"key":"311_CR11","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkv383","author":"y liu","year":"2015","unstructured":"Liu Y, Liang Y, Wishart D. Polysearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res. 2015. https:\/\/doi.org\/10.1093\/nar\/gkv383 (https:\/\/doi.org\/f7nzn5. PMID: 25925572 \u00b7 PMCID: PMC4489268).","journal-title":"nucleic acids res"},{"key":"311_CR12","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-018-2048-y","author":"j zhou","year":"2018","unstructured":"Zhou J, Fu B. The research on gene-disease association based on text-mining of pubmed. BMC Bioinformatics. 2018. https:\/\/doi.org\/10.1186\/s12859-018-2048-y (https:\/\/doi.org\/gf479k. pmid: 29415654 \u00b7 pmcid: pmc5804013).","journal-title":"bmc bioinformatics"},{"key":"311_CR13","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1005962","author":"d westergaard","year":"2018","unstructured":"Westergaard D, St\u00e6rfeldt H-H, T\u00f8nsberg C, Jensen LJ, Brunak S. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLOS Comput Biol. 2018. https:\/\/doi.org\/10.1371\/journal.pcbi.1005962 (https:\/\/doi.org\/gcx747. PMID: 29447159 \u00b7 PMCID: PMC5831415).","journal-title":"plos comput biol"},{"key":"311_CR14","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1000943","author":"r frijters","year":"2010","unstructured":"Frijters R, van Vugt M, Smeets R, van Schaik R, de Vlieg J, Alkema W. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLOS Comput Biol. 2010. https:\/\/doi.org\/10.1371\/journal.pcbi.1000943 (https:\/\/doi.org\/bhrw7x. PMID: 20885778 \u00b7 PMCID: PMC2944780).","journal-title":"plos comput biol"},{"key":"311_CR15","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-019-2634-7","author":"a al-aamri","year":"2019","unstructured":"Al-aamri A, Taha K, Al-hammadi Y, Maalouf M, Homouz D. analyzing a co-occurrence gene-interaction network to identify disease-gene association. BMC Bioinformatics. 2019. https:\/\/doi.org\/10.1186\/s12859-019-2634-7 (https:\/\/doi.org\/gf49nm. PMID: 30736752 \u00b7 PMCID: PMC6368766).","journal-title":"bmc bioinformatics"},{"key":"311_CR16","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bau012","author":"jx binder","year":"2014","unstructured":"Binder JX, Pletscher-frankild S, Tsafou K, Stolte C, O\u2019Donoghue SI, Schneider R, Jensen LJ. Compartments: unification and visualization of protein subcellular localization evidence. database. 2014. https:\/\/doi.org\/10.1093\/database\/bau012 (https:\/\/doi.org\/btbm. PMID: 24573882 \u00b7 PMCID: PMC3935310).","journal-title":"database"},{"key":"311_CR17","doi-asserted-by":"publisher","DOI":"10.1109\/bibm.2015.7359766","author":"m rastegar-mojarad","year":"2015","unstructured":"Rastegar-Mojarad M, Komandurelayavilli R, Li D, Prasad R, Liu H. A new method for prioritizing drug repositioning candidates extracted by literature-based discovery. 2015 Int Conf Bioinform Biomed (BIBM). 2015. https:\/\/doi.org\/10.1109\/bibm.2015.7359766 (https:\/\/doi.org\/gf479j).","journal-title":"2015 int conf bioinform biomed (bibm)"},{"key":"311_CR18","doi-asserted-by":"publisher","DOI":"10.7717\/peerj.1054","author":"a santos","year":"2015","unstructured":"Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O\u2019Donoghue SI, Jensen LJ. Comprehensive comparison of large-scale tissue expression datasets. PeerJ. 2015. https:\/\/doi.org\/10.7717\/peerj.1054 (https:\/\/doi.org\/f3mn6p. PMID: 26157623 \u00b7 PMCID: PMC4493645).","journal-title":"peerj"},{"key":"311_CR19","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty114","author":"b percha","year":"2018","unstructured":"Percha B, Altman RB. A global network of biomedical relationships derived from text. Bioinformatics. 2018. https:\/\/doi.org\/10.1093\/bioinformatics\/bty114 (https:\/\/doi.org\/gc3ndk. PMCID: PMC6061699).","journal-title":"bioinformatics"},{"key":"311_CR20","doi-asserted-by":"publisher","DOI":"10.1109\/tcbb.2014.2372765","author":"m torii","year":"2015","unstructured":"Torii M, Arighi Cn, Li G, Wang Q, Wu Ch, Vijay-shanker K. RLIMS-P 20: a generalizable rule-based information extraction system for literature mining of protein phosphorylation information. IEEE\/ACM Trans Comput Biol Bioinform. 2015. https:\/\/doi.org\/10.1109\/tcbb.2014.2372765 (https:\/\/doi.org\/gf8fpv. PMID: 26357075 \u00b7 PMCID: PMC4568560).","journal-title":"ieee\/acm trans comput biol bioinform"},{"key":"311_CR21","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-14-181","author":"r xu","year":"2013","unstructured":"Xu R, Wang QQ. Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing. BMC Bioinformatics. 2013. https:\/\/doi.org\/10.1186\/1471-2105-14-181 (https:\/\/doi.org\/gb8v3k. PMID: 23742147 \u00b7 PMCID: PMC3702428).","journal-title":"bmc bioinformatics"},{"key":"311_CR22","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-10-s2-s6","author":"y garten","year":"2009","unstructured":"Garten Y, Altman RB. Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text. BMC Bioinformatics. 2009. https:\/\/doi.org\/10.1186\/1471-2105-10-s2-s6 (https:\/\/doi.org\/df75hq. PMID: 19208194 \u00b7 PMCID: PMC2646239).","journal-title":"bmc bioinformatics"},{"key":"311_CR23","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkx462","author":"a ca\u00f1ada","year":"2017","unstructured":"Ca\u00f1ada A, Capella-gutierrez S, Rabal O, Oyarzabal J, Valencia A, Krallinger M. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes. Nucleic Acids Res. 2017. https:\/\/doi.org\/10.1093\/nar\/gkx462 (https:\/\/doi.org\/gf479h. PMID: 28531339 \u00b7 PMCID: PMC5570141).","journal-title":"nucleic acids res"},{"key":"311_CR24","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bas052","author":"k raja","year":"2013","unstructured":"Raja K, Subramani S, Natarajan J. PPinterFinder\u2014a mining tool for extracting causal relations on human proteins from literature. Database. 2013. https:\/\/doi.org\/10.1093\/database\/bas052 (https:\/\/doi.org\/gf479b. PMID: 23325628 \u00b7 PMCID: PMC3548331).","journal-title":"database"},{"key":"311_CR25","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2015.08.008","author":"m song","year":"2015","unstructured":"Song M, Kim WC, Lee D, Heo GE, Kang KY. PKDE4J: entity and relation extraction for public knowledge discovery. J Biomed Inform. 2015. https:\/\/doi.org\/10.1016\/j.jbi.2015.08.008 (https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/26277115 PMID: 26277115).","journal-title":"j biomed inform"},{"key":"311_CR26","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0200699","author":"b bhasuran","year":"2018","unstructured":"Bhasuran B, Natarajan J. Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLOS One. 2018. https:\/\/doi.org\/10.1371\/journal.pone.0200699 (https:\/\/doi.org\/gdx63f. PMID: 30048465 \u00b7 PMCID: PMC6061985).","journal-title":"plos one"},{"key":"311_CR27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw503","author":"d xu","year":"2016","unstructured":"Xu D, Zhang M, Xie Y, Wang F, Chen M, Zhu KQ, Wei J. DTMiner: identification of potential disease targets through biomedical literature mining. Bioinformatics. 2016. https:\/\/doi.org\/10.1093\/bioinformatics\/btw503 (https:\/\/doi.org\/f9nw36. PMID: 27506226 \u00b7 PMCID: PMC5181534).","journal-title":"bioinformatics"},{"key":"311_CR28","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bay102","author":"s liu","year":"2018","unstructured":"Liu S, Shen F, Komandurelayavilli R, Wang Y, Rastegar-mojarad M, Chaudhary V, Liu H. Extracting chemical\u2013protein relations using attention-based neural networks. Database. 2018. https:\/\/doi.org\/10.1093\/database\/bay102 (https:\/\/doi.org\/gfdz8d. PMID: 30295724 \u00b7 PMCID: PMC6174551).","journal-title":"database"},{"key":"311_CR29","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003","author":"j schmidhuber","year":"2015","unstructured":"Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015. https:\/\/doi.org\/10.1016\/j.neunet.2014.09.003 (https:\/\/doi.org\/f6v78n. PMID: 25462637).","journal-title":"neural netw"},{"key":"311_CR30","doi-asserted-by":"crossref","unstructured":"Jin Q, Dhingra B, Cohen Ww, Lu X. Probing biomedical embeddings from language models. arXiv. 2019. https:\/\/arxiv.org\/abs\/1904.02181","DOI":"10.18653\/v1\/W19-2011"},{"key":"311_CR31","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz682","author":"j lee","year":"2019","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So Ch, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. arXiv. 2019. https:\/\/doi.org\/10.1093\/bioinformatics\/btz682 (https:\/\/arxiv.org\/abs\/1901.08746).","journal-title":"arxiv"},{"key":"311_CR32","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. attention is all you need. arXiv. 2017. https:\/\/arxiv.org\/abs\/1706.03762"},{"key":"311_CR33","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bay060","author":"s lim","year":"2018","unstructured":"Lim S, Kang J. Chemical\u2013gene relation extraction using recursive neural network. Database. 2018. https:\/\/doi.org\/10.1093\/database\/bay060 (https:\/\/doi.org\/gdss6f PMID: 29961818 \u00b7 PMCID: PMC6014134).","journal-title":"database"},{"key":"311_CR34","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-015-0472-9","author":"\u00e0 bravo","year":"2015","unstructured":"Bravo \u00c0, Pi\u00f1ero J, Queralt-Rosinach N, Rautschka M, Furlong LI. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics. 2015. https:\/\/doi.org\/10.1186\/s12859-015-0472-9 (https:\/\/doi.org\/f7kn8s PMID: 25886734 \u00b7 PMCID: PMC4466840).","journal-title":"bmc bioinformatics"},{"key":"311_CR35","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2012.04.004","author":"em van mulligen","year":"2012","unstructured":"van Mulligen EM, Fourrier-Reglat A, Gurwitz D, Molokhia M, Nieto A, Trifiro G, Kors JA, Furlong LI. The eu-adr corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012. https:\/\/doi.org\/10.1016\/j.jbi.2012.04.004 (https:\/\/doi.org\/f36vn6. PMID: 22554700).","journal-title":"j biomed inform"},{"key":"311_CR36","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2004.07.016","author":"r bunescu","year":"2005","unstructured":"Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW. Comparative experiments on learning information extractors for proteins and their interactions. Artif Intell Med. 2005. https:\/\/doi.org\/10.1016\/j.artmed.2004.07.016 (https:\/\/doi.org\/dhztpn. PMID: 15811782).","journal-title":"artif intell med"},{"key":"311_CR37","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-8-50","author":"s pyysalo","year":"2007","unstructured":"Pyysalo S, Ginter F, Heimonen J, Bj\u00f6rne J, Boberg J, J\u00e4rvinen J, Salakoski T. BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics. 2007. https:\/\/doi.org\/10.1186\/1471-2105-8-50 (https:\/\/doi.org\/b7bhhc. PMID: 17291334 \u00b7 PMCID: PMC1808065).","journal-title":"bmc bioinformatics"},{"key":"311_CR38","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl616","author":"k fundel","year":"2006","unstructured":"Fundel K, Kuffner R, Zimmer R. Relex\u2013relation extraction using dependency parse trees. Bioinformatics. 2006. https:\/\/doi.org\/10.1093\/bioinformatics\/btl616 (https:\/\/doi.org\/cz7q4d. PMID: 17142812).","journal-title":"bioinformatics"},{"key":"311_CR39","doi-asserted-by":"publisher","DOI":"10.1093\/database\/baw068","author":"j li","year":"2016","unstructured":"Li J, Sun Y, Johnson Rj, Sciaky D, Wei C-h, Leaman R, Davis Ap, Mattingly Cj, Wiegers Tc, Lu Z. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. 2016. https:\/\/doi.org\/10.1093\/database\/baw068 (https:\/\/doi.org\/gf5hfw. PMID: 27161011 \u00b7 PMCID: PMC4860626).","journal-title":"database"},{"key":"311_CR40","unstructured":"Krallinger M, Rabal O, Akhondiothers SA. Overview of the biocreative vi chemical-protein interaction track. Proc Sixth Biocreative Chall Eval Workshop. 2017. https:\/\/www.semanticscholar.org\/paper\/overview-of-the-biocreative-vi-chemical-protein-krallinger-rabal\/eed781f498b563df5a9e8a241c67d63dd1d92ad5"},{"key":"311_CR41","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-s3-s6","author":"s pyysalo","year":"2008","unstructured":"Pyysalo S, Airola A, Heimonen J, Bj\u00f6rne J, Ginter F, Salakoski T. Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics. 2008. https:\/\/doi.org\/10.1186\/1471-2105-9-s3-s6 (https:\/\/doi.org\/fh3df7. PMID: 18426551 \u00b7 PMCID: PMC2349296).","journal-title":"bmc bioinformatics"},{"key":"311_CR42","unstructured":"Jiang T, Liu J, Lin C-y, Sui Z. Revisiting distant supervision for relation extraction. Proc Eleventh Int Conf Lang Resour Eval (LREC 2018) 2018. https:\/\/aclanthology.org\/l18-1566"},{"key":"311_CR43","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btv476","author":"ek mallory","year":"2015","unstructured":"Mallory EK, Zhang C, R\u00e9 C, Altman RB. Large-scale extraction of gene interactions from full-text literature using deepdive. Bioinformatics. 2015. https:\/\/doi.org\/10.1093\/bioinformatics\/btv476 (https:\/\/doi.org\/gb5g7b. PMID: 26338771 \u00b7 PMCID: PMC4681986).","journal-title":"bioinformatics"},{"key":"311_CR44","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-2354-6_39","author":"b bhasuran","year":"2018","unstructured":"Bhasuran B, Natarajan J. Distant supervision for large-scale extraction of gene-disease associations from literature using deepdive. Int Conf Innov Comput Commun. 2018. https:\/\/doi.org\/10.1007\/978-981-13-2354-6_39 (https:\/\/doi.org\/gf5hfv).","journal-title":"int conf innov comput commun"},{"key":"311_CR45","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btz490","author":"a junge","year":"2019","unstructured":"Junge A, Jensen LJ. CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision. Bioinformatics. 2019. https:\/\/doi.org\/10.1093\/bioinformatics\/btz490 (https:\/\/doi.org\/gf4789. PMID: 31199464 \u00b7 PMCID: PMC6956794).","journal-title":"bioinformatics"},{"key":"311_CR46","unstructured":"Ratner A, De Sa C, Wu S, Selsam D, R\u00e9 C. Data programming: creating large training sets, quickly. arXiv. 2018. https:\/\/arxiv.org\/abs\/1605.07723"},{"key":"311_CR47","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkw1133","author":"j macarthur","year":"2016","unstructured":"Macarthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, Mcmahon A, Milano A, Morales J, et al. The new nhgri-ebi catalog of published genome-wide association studies (gwas catalog). Nucleic Acids Res. 2016. https:\/\/doi.org\/10.1093\/nar\/gkw1133 (https:\/\/doi.org\/f9v7cp. PMID: 27899670 \u00b7 PMCID: PMC5210590).","journal-title":"nucleic acids res"},{"key":"311_CR48","doi-asserted-by":"publisher","DOI":"10.1016\/j.cell.2014.10.050","author":"t rolland","year":"2014","unstructured":"Rolland T, Ta\u015fan M, Charloteaux B, Pevzner SJ, Zhong Q, Sahni N, Yi S, Lemmens I, Fontanillo C, Mosca R, et al. A proteome-scale map of the human interactome network. Cell. 2014. https:\/\/doi.org\/10.1016\/j.cell.2014.10.050 (https:\/\/doi.org\/f3mn6x. PMID: 25416956 \u00b7 PMCID: PMC4266588).","journal-title":"cell"},{"key":"311_CR49","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkx1037","author":"ds wishart","year":"2017","unstructured":"Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, et al. DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 2017. https:\/\/doi.org\/10.1093\/nar\/gkx1037 (https:\/\/doi.org\/gcwtzk. PMID: 29126136 \u00b7 PMCID: PMC5753335).","journal-title":"nucleic acids res"},{"key":"311_CR50","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkz389","author":"c-h wei","year":"2019","unstructured":"Wei C-H, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 2019. https:\/\/doi.org\/10.1093\/nar\/gkz389 (https:\/\/doi.org\/ggzfsc. PMID: 31114887 \u00b7 PMCID: PMC6602571).","journal-title":"nucleic acids res"},{"key":"311_CR51","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw343","author":"r leaman","year":"2016","unstructured":"Leaman R, Lu Z. TaggerOne: joint named entity recognition and normalization with semi-markov models. Bioinformatics. 2016. https:\/\/doi.org\/10.1093\/bioinformatics\/btw343.","journal-title":"bioinformatics"},{"key":"311_CR52","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btx541","author":"c-h wei","year":"2017","unstructured":"Wei C-H, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z. tmVar 2.0: integrating genomic variant information from literature with dbsnp and clinvar for precision medicine. 2017. Bioinformatics. https:\/\/doi.org\/10.1093\/bioinformatics\/btx541 (https:\/\/doi.org\/gbzsmc. PMID: 28968638 \u00b7 PMCID: PMC5860583).","journal-title":"bioinformatics"},{"key":"311_CR53","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/918710","author":"c-h wei","year":"2015","unstructured":"Wei C-H, Kao H-Y, Lu Z. GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Biomed Res Int. 2015. https:\/\/doi.org\/10.1155\/2015\/918710 (https:\/\/doi.org\/gb85jb. PMID: 26380306 \u00b7 PMCID: PMC4561873).","journal-title":"biomed res int"},{"key":"311_CR54","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0038460","author":"c-h wei","year":"2012","unstructured":"Wei C-H, Kao H-Y, Lu Z. SR4GN: A species recognition software tool for gene normalization. PLOS One. 2012. https:\/\/doi.org\/10.1371\/journal.pone.0038460 (https:\/\/doi.org\/gpq498. PMID: 22679507 \u00b7 PMCID: PMC3367953).","journal-title":"plos one"},{"key":"311_CR55","volume-title":"spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing","author":"M Honnibal","year":"2017","unstructured":"Honnibal M, Montani I. spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. 2017."},{"key":"311_CR56","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00552-1","author":"a ratner","year":"2019","unstructured":"Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, R\u00e9 C. Snorkel: rapid training data creation with weak supervision. VLDB J. 2019. https:\/\/doi.org\/10.1007\/s00778-019-00552-1 (https:\/\/doi.org\/ghbw5f. PMID: 32214778 \u00b7 PMCID: PMC7075849).","journal-title":"vldb j"},{"key":"311_CR57","unstructured":"Devlin J, Chang M-w, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv. 2019. https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"311_CR58","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.98.2.381","author":"rj roberts","year":"2001","unstructured":"Roberts RJ. PubMed central: the genbank of the published literature. Proc National Acad Sci. 2001. https:\/\/doi.org\/10.1073\/pnas.98.2.381 (https:\/\/doi.org\/bbn9k8. PMID: 11209037 \u00b7 PMCID: PMC33354).","journal-title":"proc national acad sci"},{"key":"311_CR59","doi-asserted-by":"crossref","unstructured":"Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Ma C, Jernite Y, Plu J, et al. Transformers: state-of-the-art natural language processing. Assoc Comput Linguist. 2020. https:\/\/www.aclweb.org\/anthology\/2020.emnlp-demos.6","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"311_CR60","unstructured":"Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv. 2017. https:\/\/arxiv.org\/abs\/1412.6980"},{"key":"311_CR61","doi-asserted-by":"publisher","DOI":"10.1145\/3209889.3209898","author":"a ratner","year":"2018","unstructured":"Ratner A, Hancock B, Dunnmon J, Goldman R, R\u00e9 C. Snorkel MeTal. Proc Second Workshop Data Manag End End Mach Learn. 2018. https:\/\/doi.org\/10.1145\/3209889.3209898 (https:\/\/doi.org\/gf3xk7. PMID: 30931438 \u00b7 PMCID: PMC6436830).","journal-title":"proc second workshop data manag end end mach learn"},{"key":"311_CR62","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-016-0043-6","author":"k weiss","year":"2016","unstructured":"Weiss K, Khoshgoftaar TM, Wang Dd. a survey of transfer learning. J Big Data. 2016. https:\/\/doi.org\/10.1186\/s40537-016-0043-6 (https:\/\/doi.org\/gfkr2w).","journal-title":"j big data"}],"container-title":["BioData Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-022-00311-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13040-022-00311-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-022-00311-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,18]],"date-time":"2022-10-18T07:04:25Z","timestamp":1666076665000},"score":1,"resource":{"primary":{"URL":"https:\/\/biodatamining.biomedcentral.com\/articles\/10.1186\/s13040-022-00311-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,18]]},"references-count":62,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["311"],"URL":"https:\/\/doi.org\/10.1186\/s13040-022-00311-z","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/730085","asserted-by":"object"}]},"ISSN":["1756-0381"],"issn-type":[{"value":"1756-0381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,18]]},"assertion":[{"value":"1 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 September 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"DNN receives a salary from Digital Science and Research Solutions Ltd. where he contributes NLP expertise for bibliometric analyses on research papers and grants. The research performed occurred before DNN's employment with Digital Science. Digital Science did not restrict the results or interpretations that could be published in this manuscript. The opinions expressed here do not reflect the official policy or positions of Digital Science and Research Solutions Ltd.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"26"}}