{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T07:39:52Z","timestamp":1765438792276,"version":"3.37.3"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2022,4,20]],"date-time":"2022-04-20T00:00:00Z","timestamp":1650412800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000050","name":"National Heart, Lung, and Blood Institute","doi-asserted-by":"crossref","award":["1-OT3-HL142479-01"],"award-info":[{"award-number":["1-OT3-HL142479-01"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"crossref","award":["1-OT3-TR002020-01"],"award-info":[{"award-number":["1-OT3-TR002020-01"]}],"id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Helping to End Addiction Long-Term (HEAL) Office","award":["1-OT2-OD031940-01"],"award-info":[{"award-number":["1-OT2-OD031940-01"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,13]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Developed through the National Heart, Lung and Blood Institute\u2019s (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug\u2019s total recall (total relevant results\/total results) of 0.79 outperformed default Elasticsearch\u2019s total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Dug is freely available at https:\/\/github.com\/helxplatform\/dug. An example Dug deployment is also available for use at https:\/\/search.biodatacatalyst.renci.org\/.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac284","type":"journal-article","created":{"date-parts":[[2022,4,15]],"date-time":"2022-04-15T19:20:39Z","timestamp":1650050439000},"page":"3252-3258","source":"Crossref","is-referenced-by-count":3,"title":["Dug: a semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4502-7894","authenticated-orcid":false,"given":"Alexander M","family":"Waldrop","sequence":"first","affiliation":[{"name":"Center for Genomics, Bioinformatics, and Translational Research, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John B","family":"Cheadle","sequence":"additional","affiliation":[{"name":"Research Computing Division, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kira","family":"Bradford","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Preiss","sequence":"additional","affiliation":[{"name":"Center for Data Science, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Chew","sequence":"additional","affiliation":[{"name":"Center for Data Science, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan R","family":"Holt","sequence":"additional","affiliation":[{"name":"Center for Data Science, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yaphet","family":"Kebede","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nathan","family":"Braswell","sequence":"additional","affiliation":[{"name":"Research Computing Division, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matt","family":"Watson","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Virginia","family":"Hench","sequence":"additional","affiliation":[{"name":"Center for Genomics, Bioinformatics, and Translational Research, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Crerar","sequence":"additional","affiliation":[{"name":"Center for Genomics, Bioinformatics, and Translational Research, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chris M","family":"Ball","sequence":"additional","affiliation":[{"name":"Research Computing Division, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carl","family":"Schreep","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P J","family":"Linebaugh","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hannah","family":"Hiles","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rebecca","family":"Boyles","sequence":"additional","affiliation":[{"name":"Research Computing Division, RTI International , Research Triangle Park, NC 27709-2194, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9491-7674","authenticated-orcid":false,"given":"Chris","family":"Bizon","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ashok","family":"Krishnamurthy","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"},{"name":"Department of Computer Science, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599-7548, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Steve","family":"Cox","sequence":"additional","affiliation":[{"name":"Renaissance Computing Institute, University of Chapel Hill, North Carolina , Chapel Hill, NC 27599-7568, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,4,20]]},"reference":[{"key":"2023041408101377000_","first-page":"816","article-title":"Finding useful data across multiple biomedical data repositories using DataMed","volume":"49","author":"Bell","year":"2019","journal-title":"Nat. Genet"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1111\/cts.12592","article-title":"The biomedical data translator program: conception, culture, and community","volume":"12","year":"2019","journal-title":"Clin. Transl. Sci"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"4968","DOI":"10.1021\/acs.jcim.9b00683","article-title":"ROBOKOP KG and KGB: integrated knowledge graphs from federated sources","volume":"59","author":"Bizon","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"first-page":"1365","year":"2019","author":"Brickley","key":"2023041408101377000_"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baz132","article-title":"GenoSurf: metadata driven semantic search system for integrated genomic datasets","volume":"2019","author":"Canakoglu","year":"2019","journal-title":"Database (Oxford)"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/s00778-019-00564-x","article-title":"Dataset search: a survey","volume":"29","author":"Chapman","year":"2020","journal-title":"VLDB J"},{"first-page":"0","year":"2019","author":"Chen","key":"2023041408101377000_"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1093\/jamia\/ocx121","article-title":"DataMed \u2013 an open source discovery index for finding biomedical datasets","volume":"25","author":"Chen","year":"2018","journal-title":"J. Am. Med. Informatics Assoc"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1001\/jama.2018.8826","article-title":"Helping to end addiction over the long-term: the research plan for the NIH HEAL initiative","volume":"320","author":"Collins","year":"2018","journal-title":"JAMA"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"e17964","DOI":"10.2196\/17964","article-title":"Visualization environment for federated knowledge graphs: development of an interactive biomedical query language and web application interface","volume":"8","author":"Cox","year":"2020","journal-title":"JMIR Med. Inform"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/1472-6947-6-19","article-title":"NIDDK data repository: a Central collection of clinical trial data","volume":"6","author":"Cuticchia","year":"2006","journal-title":"BMC Med. Inform. Decis. Mak"},{"key":"2023041408101377000_"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/s13326-016-0064-2","article-title":"OmniSearch: a semantic search system based on the ontology for MIcroRNA target (OMIT) for microRNA-target gene interaction data","volume":"7","author":"Huang","year":"2016","journal-title":"J. Biomed. Semantics"},{"key":"2023041408101377000_","first-page":"339","article-title":"Analysis of document viewing patterns of web search engine users","author":"Jansen","year":"2005"},{"year":"2016","author":"Ku\u0107","key":"2023041408101377000_"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bas016","article-title":"Ontology searching and browsing at the rat genome database","volume":"2012","author":"Laulederkind","year":"2012","journal-title":"Database (Oxford)"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"D712","DOI":"10.1093\/nar\/gkw1128","article-title":"The monarch initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species","volume":"45","author":"Mungall","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023041408101377000_"},{"year":"2020","key":"2023041408101377000_"},{"year":"2018","author":"Pagliardini","key":"2023041408101377000_"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1136\/amiajnl-2013-002577","article-title":"BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing","volume":"22","author":"Pang","year":"2015","journal-title":"J. Am. Med. Inform. Assoc"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","DOI":"10.1038\/d41586-021-00331-5","article-title":"The broken promise that undermines human genome research","author":"Powell","year":"2021","journal-title":"Nat. News"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"710","DOI":"10.2215\/CJN.06570714","article-title":"The national institute of diabetes and digestive and kidney diseases central repositories: a valuable resource for nephrology research","volume":"10","author":"Rasooly","year":"2015","journal-title":"Clin. J. Am. Soc. Nephrol"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"100155","DOI":"10.1016\/j.patter.2020.100155","article-title":"KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response","volume":"2","author":"Reese","year":"2021","journal-title":"Patterns (NY)"},{"key":"2023041408101377000_","first-page":"1","article-title":"OPEN DATS, the data tag suite to enable discoverability of datasets","author":"Sansone","year":"2017","journal-title":"Sci Data"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"1799","DOI":"10.1093\/bioinformatics\/bty871","article-title":"Thalia: semantic search engine for biomedical abstracts","volume":"35","author":"Soto","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041408101377000_","first-page":"1977","article-title":"A system for phenotype harmonization in the NHLBI Trans-Omics for precision medicine (TOPMed) program. Am. J. Epidemiol.,","author":"Stilp","year":"2021"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1056\/NEJMsr1809937","article-title":"The \u201cAll of Us\u201d Research Program","volume":"381","year":"2019","journal-title":"N. Engl. J. Med"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1007\/978-3-540-76298-0_38","volume-title":"The Semantic Web","author":"Tran","year":"2007"},{"journal-title":"Natl. Inst. Heal","key":"2023041408101377000_","article-title":"What is the HEAL data ecosystem?"},{"year":"2020","key":"2023041408101377000_"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"{SciPy} 1.0: fundamental algorithms for scientific computing in python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat. Methods"},{"key":"2023041408101377000_","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR guiding principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac284\/43753747\/btac284.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/12\/3252\/49885805\/btac284.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/12\/3252\/49885805\/btac284.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T02:13:47Z","timestamp":1700446427000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/12\/3252\/6571145"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,4,20]]},"references-count":34,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2022,6,13]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac284","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,6,15]]},"published":{"date-parts":[[2022,4,20]]}}}