{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T20:57:03Z","timestamp":1781125023803,"version":"3.54.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,11,1]],"date-time":"2023-11-01T00:00:00Z","timestamp":1698796800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,1]],"date-time":"2023-11-01T00:00:00Z","timestamp":1698796800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000092","name":"U.S. National Library of Medicine","doi-asserted-by":"publisher","award":["T15 LM007359"],"award-info":[{"award-number":["T15 LM007359"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science Foundation","award":["ICER 1343760"],"award-info":[{"award-number":["ICER 1343760"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["K01AR07212"],"award-info":[{"award-number":["K01AR07212"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["NIH GM102756"],"award-info":[{"award-number":["NIH GM102756"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A\u2013B\u2013C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      We demonstrate SKiM\u2019s ability to discover useful A\u2013B\u2013C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface (\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/skim.morgridge.org\">https:\/\/skim.morgridge.org<\/jats:ext-link>\n                      ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-023-05539-y","type":"journal-article","created":{"date-parts":[[2023,11,1]],"date-time":"2023-11-01T04:02:41Z","timestamp":1698811361000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models"],"prefix":"10.1186","volume":"24","author":[{"given":"Robert J.","family":"Millikin","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kalpana","family":"Raja","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"John","family":"Steill","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Cannon","family":"Lock","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xuancheng","family":"Tu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ian","family":"Ross","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lam C.","family":"Tsoi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Finn","family":"Kuusisto","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zijian","family":"Ni","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Miron","family":"Livny","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Brian","family":"Bockelman","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James","family":"Thomson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ron","family":"Stewart","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,11,1]]},"reference":[{"key":"5539_CR1","doi-asserted-by":"publisher","first-page":"103141","DOI":"10.1016\/j.jbi.2019.103141","volume":"93","author":"V Gopalakrishnan","year":"2019","unstructured":"Gopalakrishnan V, Jha K, Jin W, Zhang A. A survey on literature based discovery approaches in biomedical domain. J Biomed Inform. 2019;93:103141.","journal-title":"J Biomed Inform"},{"issue":"6","key":"5539_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3365756","volume":"52","author":"M Thilakaratne","year":"2019","unstructured":"Thilakaratne M, Falkner K, Atapattu T. A systematic review on literature-based discovery. ACM Comput Surv. 2019;52(6):1\u201334.","journal-title":"ACM Comput Surv"},{"issue":"4","key":"5539_CR3","first-page":"43","volume":"2","author":"NR Smalheiser","year":"2017","unstructured":"Smalheiser NR. Rediscovering Don Swanson: the past, present and future of literature-based discovery. J Data Inf Sci. 2017;2(4):43\u201364.","journal-title":"J Data Inf Sci"},{"issue":"1","key":"5539_CR4","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1353\/pbm.1986.0087","volume":"30","author":"DR Swanson","year":"1986","unstructured":"Swanson DR. Fish oil, Raynaud\u2019s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986;30(1):7\u201318.","journal-title":"Perspect Biol Med"},{"key":"5539_CR5","doi-asserted-by":"publisher","first-page":"827207","DOI":"10.3389\/fbinf.2022.827207","volume":"2","author":"A Lardos","year":"2022","unstructured":"Lardos A, Aghaebrahimian A, Koroleva A, Sidorova J, Wolfram E, Anisimova M, Gil M. Computational literature-based discovery for natural products research: current state and future prospects. Front Bioinform. 2022;2:827207.","journal-title":"Front Bioinform"},{"key":"5539_CR6","doi-asserted-by":"crossref","unstructured":"Zhao S, Su C, Lu Z, Wang F. Recent advances in biomedical literature mining. Brief Bioinform. 2021;22(3).","DOI":"10.1093\/bib\/bbaa057"},{"key":"5539_CR7","doi-asserted-by":"publisher","first-page":"832","DOI":"10.12688\/f1000research.25523.1","volume":"9","author":"F Kuusisto","year":"2020","unstructured":"Kuusisto F, Ng D, Steill J, Ross I, Livny M, Thomson J, Page D, Stewart R. KinderMiner Web: a simple web tool for ranking pairwise associations in biomedical applications. F1000Research. 2020;9:832.","journal-title":"F1000Research"},{"key":"5539_CR8","first-page":"166","volume":"2017","author":"F Kuusisto","year":"2017","unstructured":"Kuusisto F, Steill J, Kuang Z, Thomson J, Page D, Stewart R. A simple text mining approach for ranking pairwise associations in biomedical applications. AMIA Summits Transl Sci Proc. 2017;2017:166.","journal-title":"AMIA Summits Transl Sci Proc"},{"issue":"7","key":"5539_CR9","doi-asserted-by":"publisher","first-page":"548","DOI":"10.1002\/asi.1104","volume":"52","author":"M Weeber","year":"2001","unstructured":"Weeber M, Klein H, de Jong-van den Berg LTW, Vos R. Using concepts in literature-based discovery: simulating Swanson\u2019s Raynaud-fish oil and migraine-magnesium discoveries. J Am Soc Inf Sci Technol. 2001;52(7):548\u201357.","journal-title":"J Am Soc Inf Sci Technol"},{"issue":"1","key":"5539_CR10","doi-asserted-by":"publisher","first-page":"188","DOI":"10.1186\/s12859-020-3517-7","volume":"21","author":"H Kilicoglu","year":"2020","unstructured":"Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinform. 2020;21(1):188.","journal-title":"BMC Bioinform"},{"key":"5539_CR11","doi-asserted-by":"publisher","first-page":"1414","DOI":"10.1016\/j.csbj.2020.05.017","volume":"18","author":"DN Nicholson","year":"2020","unstructured":"Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J. 2020;18:1414\u201328.","journal-title":"Comput Struct Biotechnol J"},{"issue":"1","key":"5539_CR12","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1186\/s13040-022-00311-z","volume":"15","author":"DN Nicholson","year":"2022","unstructured":"Nicholson DN, Himmelstein DS, Greene CS. Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts. BioData Min. 2022;15(1):26.","journal-title":"BioData Min"},{"key":"5539_CR13","unstructured":"Nadkarni R, Wadden D, Beltagy I, Smith N, Hajishirzi H, Hope T. Scientific language models for biomedical knowledge base completion: an empirical study. arXiv preprint. 2020(2106.09700)"},{"issue":"1","key":"5539_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3458754","volume":"3","author":"Y Gu","year":"2021","unstructured":"Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc. 2021;3(1):1\u201323.","journal-title":"ACM Trans Comput Healthc"},{"issue":"4","key":"5539_CR15","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234\u201340.","journal-title":"Bioinformatics"},{"issue":"9","key":"5539_CR16","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1093\/bioinformatics\/bty845","volume":"35","author":"S Pyysalo","year":"2019","unstructured":"Pyysalo S, Baker S, Ali I, Haselwimmer S, Shah T, Young A, Guo Y, Hogberg J, Stenius U, Narita M, Korhonen A. LION LBD: a literature-based discovery system for cancer biology. Bioinformatics. 2019;35(9):1553\u201361.","journal-title":"Bioinformatics"},{"issue":"2\u20134","key":"5539_CR17","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1016\/j.ijmedinf.2004.04.024","volume":"74","author":"D Hristovski","year":"2005","unstructured":"Hristovski D, Peterlin B, Mitchell JA, Humphrey SM. Using literature-based discovery to identify disease candidate genes. Int J Med Inform. 2005;74(2\u20134):289\u201398.","journal-title":"Int J Med Inform"},{"issue":"2","key":"5539_CR18","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1016\/S0004-3702(97)00008-8","volume":"91","author":"D Swanson","year":"1997","unstructured":"Swanson D, Smalheiser N. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell. 1997;91(2):183\u2013203.","journal-title":"Artif Intell"},{"issue":"2","key":"5539_CR19","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1016\/j.cmpb.2008.12.006","volume":"94","author":"NR Smalheiser","year":"2009","unstructured":"Smalheiser NR, Torvik VI, Zhou W. Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE. Comput Methods Programs Biomed. 2009;94(2):190\u20137.","journal-title":"Comput Methods Programs Biomed"},{"key":"5539_CR20","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser \u0141, Polosukhin I. Attention is all you need. Advances in neural information processing systems. 2017:30."},{"key":"5539_CR21","unstructured":"Honnibal M, Montani I, Van Landeghem S, Boyd A. spaCy: industrial-strength natural language processing in python. 2020."},{"key":"5539_CR22","unstructured":"Montani I, Honnibal M. Prodigy: a modern and scriptable annotation tool for creating training data for machine learning models."},{"key":"5539_CR23","doi-asserted-by":"publisher","unstructured":"The Center for High Throughput Computing [Available from: https:\/\/doi.org\/10.21231\/GNT1-HW21].","DOI":"10.21231\/GNT1-HW21"},{"key":"5539_CR24","doi-asserted-by":"crossref","unstructured":"Swanson DR. Migraine and magnesium: eleven neglected connections. 1988.","DOI":"10.1353\/pbm.1988.0009"},{"key":"5539_CR25","doi-asserted-by":"crossref","unstructured":"Smalheiser NR, Swanson DR. Indomethacin and Alzheimer's disease. 1996.","DOI":"10.1212\/WNL.46.2.583"},{"key":"5539_CR26","doi-asserted-by":"crossref","unstructured":"Smalheiser NR, Swanson DR. Linking estrogen to Alzheimer's disease: an informatics approach. 1996.","DOI":"10.1212\/WNL.47.3.809"},{"issue":"2","key":"5539_CR27","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1353\/pbm.1990.0031","volume":"33","author":"DR Swanson","year":"1990","unstructured":"Swanson DR. Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect Biol Med. 1990;33(2):157\u201386.","journal-title":"Perspect Biol Med"},{"issue":"5","key":"5539_CR28","doi-asserted-by":"publisher","first-page":"bbac282","DOI":"10.1093\/bib\/bbac282","volume":"23","author":"L Luo","year":"2022","unstructured":"Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform. 2022;23(5):bbac282.","journal-title":"Brief Bioinform"},{"issue":"4","key":"5539_CR29","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1111\/j.1600-0404.1988.tb03667.x","volume":"78","author":"KV Sorensen","year":"1988","unstructured":"Sorensen KV. Valproate: a new drug in migraine prophylaxis. Acta Neurol Scand. 1988;78(4):346\u20138.","journal-title":"Acta Neurol Scand"},{"issue":"Suppl 2","key":"5539_CR30","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1186\/s12911-022-01848-z","volume":"22","author":"J Peng","year":"2022","unstructured":"Peng J, Xu D, Lee R, Xu S, Zhou Y, Wang K. Expediting knowledge acquisition by a web framework for Knowledge Graph Exploration and Visualization (KGEV): case studies on COVID-19 and Human Phenotype Ontology. BMC Med Inform Decis Mak. 2022;22(Suppl 2):147.","journal-title":"BMC Med Inform Decis Mak"},{"issue":"23","key":"5539_CR31","doi-asserted-by":"publisher","first-page":"3158","DOI":"10.1093\/bioinformatics\/bts591","volume":"28","author":"H Kilicoglu","year":"2012","unstructured":"Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics. 2012;28(23):3158\u201360.","journal-title":"Bioinformatics"},{"issue":"6917","key":"5539_CR32","doi-asserted-by":"publisher","first-page":"860","DOI":"10.1038\/nature01322","volume":"420","author":"LM Coussens","year":"2002","unstructured":"Coussens LM, Werb Z. Inflammation and cancer. Nature. 2002;420(6917):860\u20137.","journal-title":"Nature"},{"key":"5539_CR33","doi-asserted-by":"crossref","unstructured":"Guarnieri T. Aryl hydrocarbon receptor connects inflammation to breast cancer. Int J Mol Sci. 2020;21(15).","DOI":"10.3390\/ijms21155264"},{"key":"5539_CR34","doi-asserted-by":"publisher","first-page":"636595","DOI":"10.3389\/fcell.2021.636595","volume":"9","author":"X Li","year":"2021","unstructured":"Li X, Wang F, Xu X, Zhang J, Xu G. The dual role of STAT1 in ovarian cancer: insight into molecular mechanisms and application potentials. Front Cell Dev Biol. 2021;9:636595.","journal-title":"Front Cell Dev Biol"},{"key":"5539_CR35","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.mce.2017.02.023","volume":"451","author":"R Lu","year":"2017","unstructured":"Lu R, Zhang YG, Sun J. STAT3 activation in infection and infection-associated cancer. Mol Cell Endocrinol. 2017;451:80\u20137.","journal-title":"Mol Cell Endocrinol"},{"issue":"4","key":"5539_CR36","doi-asserted-by":"publisher","first-page":"35","DOI":"10.3390\/cancers9040035","volume":"9","author":"BY Owusu","year":"2017","unstructured":"Owusu BY, Galemmo R, Janetka J, Klampfer L. Hepatocyte growth factor, a key tumor-promoting factor in the tumor microenvironment. Cancers. 2017;9(4):35.","journal-title":"Cancers"},{"issue":"1","key":"5539_CR37","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1038\/s41392-021-00658-5","volume":"6","author":"H Zhao","year":"2021","unstructured":"Zhao H, Wu L, Yan G, Chen Y, Zhou M, Wu Y, Li Y. Inflammation and tumor progression: signaling pathways and targeted intervention. Signal Transduct Target Ther. 2021;6(1):263.","journal-title":"Signal Transduct Target Ther"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05539-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05539-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05539-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,1]],"date-time":"2023-11-01T04:02:55Z","timestamp":1698811375000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05539-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,1]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5539"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05539-y","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.05.30.542911","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,1]]},"assertion":[{"value":"24 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"412"}}