{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T03:00:00Z","timestamp":1777863600131,"version":"3.51.4"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008920","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,5,18]],"date-time":"2021-05-18T00:00:00Z","timestamp":1621296000000}}],"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"5","license":[{"start":{"date-parts":[[2021,5,4]],"date-time":"2021-05-04T00:00:00Z","timestamp":1620086400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100013407","name":"Netherlands eScience Center","doi-asserted-by":"publisher","award":["ASDI.2017.030"],"award-info":[{"award-number":["ASDI.2017.030"]}],"id":[{"id":"10.13039\/100013407","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000268","name":"Biotechnology and Biological Sciences Research Council","doi-asserted-by":"publisher","award":["BB\/R022054\/1"],"award-info":[{"award-number":["BB\/R022054\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000268","name":"Biotechnology and Biological Sciences Research Council","doi-asserted-by":"publisher","award":["BB\/R022054\/1"],"award-info":[{"award-number":["BB\/R022054\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000268","name":"Biotechnology and Biological Sciences Research Council","doi-asserted-by":"publisher","award":["BB\/R022054\/1"],"award-info":[{"award-number":["BB\/R022054\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000582","name":"Carnegie Trust for the Universities of Scotland","doi-asserted-by":"publisher","award":["Collaborative Research Grant"],"award-info":[{"award-number":["Collaborative Research Grant"]}],"id":[{"id":"10.13039\/501100000582","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000582","name":"Carnegie Trust for the Universities of Scotland","doi-asserted-by":"publisher","award":["Collaborative Research Grant"],"award-info":[{"award-number":["Collaborative Research Grant"]}],"id":[{"id":"10.13039\/501100000582","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000582","name":"Carnegie Trust for the Universities of Scotland","doi-asserted-by":"publisher","award":["Collaborative Research Grant"],"award-info":[{"award-number":["Collaborative Research Grant"]}],"id":[{"id":"10.13039\/501100000582","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["310107"],"award-info":[{"award-number":["310107"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"crossref","award":["334790"],"award-info":[{"award-number":["334790"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Scottish Informatics and Computing Science Alliance","award":["Distinguished Visiting Fellow Scheme"],"award-info":[{"award-number":["Distinguished Visiting Fellow Scheme"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1008920","type":"journal-article","created":{"date-parts":[[2021,5,5]],"date-time":"2021-05-05T12:07:28Z","timestamp":1620216448000},"page":"e1008920","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":54,"title":["Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9580-6685","authenticated-orcid":true,"given":"Gr\u00edmur","family":"Hj\u00f6rleifsson Eldj\u00e1rn","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1451-8973","authenticated-orcid":true,"given":"Andrew","family":"Ramsay","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9340-5511","authenticated-orcid":true,"given":"Justin J. J.","family":"van der Hooft","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3670-4849","authenticated-orcid":true,"given":"Katherine R.","family":"Duncan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3868-102X","authenticated-orcid":true,"given":"Sylvia","family":"Soldatou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0705-4314","authenticated-orcid":true,"given":"Juho","family":"Rousu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1275-6820","authenticated-orcid":true,"given":"R\u00f3n\u00e1n","family":"Daly","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3068-4664","authenticated-orcid":true,"given":"Joe","family":"Wandy","sequence":"additional","affiliation":[]},{"given":"Simon","family":"Rogers","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2021,5,4]]},"reference":[{"key":"pcbi.1008920.ref001","article-title":"Natural Products as Sources of New Drugs over the Nearly Four Decades from 01\/1981 to 09\/2019","author":"DJ Newman","year":"2020","journal-title":"J Nat Prod"},{"issue":"W1","key":"pcbi.1008920.ref002","doi-asserted-by":"crossref","first-page":"W81","DOI":"10.1093\/nar\/gkz310","article-title":"antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline","volume":"47","author":"K Blin","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"18","key":"pcbi.1008920.ref003","doi-asserted-by":"crossref","first-page":"e110","DOI":"10.1093\/nar\/gkz654","article-title":"A deep learning genome-mining strategy for biosynthetic gene cluster prediction","volume":"47","author":"GD Hannigan","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"pcbi.1008920.ref004","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1016\/j.cell.2014.06.034","article-title":"Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters","volume":"158","author":"P Cimermancic","year":"2014","journal-title":"Cell"},{"issue":"4-5","key":"pcbi.1008920.ref005","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1007\/s10295-016-1815-x","article-title":"Gifted microbes for genome mining and natural product discovery","volume":"44","author":"RH Baltz","year":"2017","journal-title":"J Ind Microbiol Biotechnol"},{"issue":"18","key":"pcbi.1008920.ref006","doi-asserted-by":"crossref","first-page":"e110","DOI":"10.1093\/nar\/gkz654","article-title":"A deep learning genome-mining strategy for biosynthetic gene cluster prediction","volume":"47","author":"GD Hannigan","year":"2019","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"pcbi.1008920.ref007","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1038\/nchembio.2219","article-title":"Dereplication of peptidic natural products through database search of mass spectra","volume":"13","author":"H Mohimani","year":"2017","journal-title":"Nat Chem Biol"},{"issue":"1","key":"pcbi.1008920.ref008","doi-asserted-by":"crossref","first-page":"4035","DOI":"10.1038\/s41467-018-06082-8","article-title":"Dereplication of microbial metabolites through database search of mass spectra","volume":"9","author":"H Mohimani","year":"2018","journal-title":"Nat Commun"},{"issue":"4","key":"pcbi.1008920.ref009","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1016\/j.chembiol.2015.03.010","article-title":"Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species","volume":"22","author":"KR Duncan","year":"2015","journal-title":"Chem Biol"},{"issue":"37","key":"pcbi.1008920.ref010","doi-asserted-by":"crossref","first-page":"7311","DOI":"10.1039\/C8SC02170H","article-title":"Genome mining, isolation, chemical synthesis and biological evaluation of a novel lanthipeptide, tikitericin, from the extremophilic microorganism strain T81","volume":"9","author":"B Xu","year":"2018","journal-title":"Chem Sci"},{"issue":"2","key":"pcbi.1008920.ref011","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1039\/C8SC03814G","article-title":"Triggering the expression of a silent gene cluster from genetically intractable bacteria results in scleric acid discovery","volume":"10","author":"F Alberti","year":"2019","journal-title":"Chem Sci"},{"issue":"12","key":"pcbi.1008920.ref012","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1002\/ajoc.201700433","article-title":"Isolation and Structure Determination of New Antibacterial Peptide Curacomycin Based on Genome Mining","volume":"6","author":"I Kaweewan","year":"2017","journal-title":"Asian Journal of Organic Chemistry"},{"issue":"1","key":"pcbi.1008920.ref013","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1038\/s41589-019-0400-9","article-title":"A computational framework to explore large-scale biosynthetic diversity","volume":"16","author":"JC Navarro-Mu\u00f1oz","year":"2020","journal-title":"Nat Chem Biol"},{"issue":"12","key":"pcbi.1008920.ref014","doi-asserted-by":"crossref","first-page":"3452","DOI":"10.1021\/acschembio.6b00779","article-title":"Elucidating the Rimosamide-Detoxin Natural Product Families and Their Biosynthesis Using Metabolite\/Gene Cluster Correlations","volume":"11","author":"RA McClure","year":"2016","journal-title":"ACS Chemical Biology"},{"issue":"2","key":"pcbi.1008920.ref015","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1021\/acscentsci.5b00331","article-title":"Metabologenomics: Correlation of Microbial Gene Clusters with Metabolites Drives Discovery of a Nonribosomal Peptide with an Unusual Amino Acid Monomer","volume":"2","author":"AW Goering","year":"2016","journal-title":"ACS Cent Sci"},{"issue":"11","key":"pcbi.1008920.ref016","doi-asserted-by":"crossref","first-page":"794","DOI":"10.1038\/nchembio.684","article-title":"A mass spectrometry-guided genome mining approach for natural product peptidogenomics","volume":"7","author":"RD Kersten","year":"2011","journal-title":"Nat Chem Biol"},{"issue":"11","key":"pcbi.1008920.ref017","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1038\/nchembio.1659","article-title":"A roadmap for natural product discovery based on large-scale genomics and metabolomics","volume":"10","author":"JR Doroghazi","year":"2014","journal-title":"Nat Chem Biol"},{"issue":"20","key":"pcbi.1008920.ref018","doi-asserted-by":"crossref","first-page":"3202","DOI":"10.1093\/bioinformatics\/btx400","article-title":"SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria","volume":"33","author":"MG Chevrette","year":"2017","journal-title":"Bioinformatics"},{"key":"pcbi.1008920.ref019","doi-asserted-by":"crossref","first-page":"8421","DOI":"10.1038\/ncomms9421","article-title":"An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products","volume":"6","author":"CW Johnston","year":"2015","journal-title":"Nat Commun"},{"issue":"6","key":"pcbi.1008920.ref020","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1016\/j.cels.2019.09.004","article-title":"MetaMiner: A Scalable Peptidogenomics Approach for Discovery of Ribosomal Peptide Natural Products with Blind Modifications from Microbial Communities","volume":"9","author":"L Cao","year":"2019","journal-title":"Cell Systems"},{"issue":"1","key":"pcbi.1008920.ref021","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/giaa154","article-title":"BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters","volume":"10","author":"SA Kautsar","year":"2021","journal-title":"GigaScience"},{"issue":"26","key":"pcbi.1008920.ref022","doi-asserted-by":"crossref","first-page":"E1743","DOI":"10.1073\/pnas.1203689109","article-title":"Mass spectral molecular networking of living microbial colonies","volume":"109","author":"J Watrous","year":"2012","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"13","key":"pcbi.1008920.ref023","doi-asserted-by":"crossref","DOI":"10.1093\/femsle\/fnz142","article-title":"Linking biosynthetic and chemical space to accelerate microbial secondary metabolite discovery","volume":"366","author":"S Soldatou","year":"2019","journal-title":"FEMS Microbiol Lett"},{"key":"pcbi.1008920.ref024","article-title":"Linking genomics and metabolomics to chart specialized metabolic diversity","author":"JJJ van der Hooft","year":"2020","journal-title":"Chem Soc Rev"},{"issue":"7","key":"pcbi.1008920.ref025","doi-asserted-by":"crossref","first-page":"1545","DOI":"10.1021\/cb500199h","article-title":"Automated genome mining of ribosomal peptide natural products","volume":"9","author":"H Mohimani","year":"2014","journal-title":"ACS Chem Biol"},{"issue":"12","key":"pcbi.1008920.ref026","doi-asserted-by":"crossref","first-page":"i28","DOI":"10.1093\/bioinformatics\/btw246","article-title":"Fast metabolite identification with Input Output Kernel Regression","volume":"32","author":"C Brouard","year":"2016","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1008920.ref027","first-page":"1","article-title":"Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels","volume":"17","author":"C Brouard","year":"2016","journal-title":"J Mach Learn Res"},{"issue":"41","key":"pcbi.1008920.ref028","doi-asserted-by":"crossref","first-page":"12580","DOI":"10.1073\/pnas.1509788112","article-title":"Searching molecular structure databases with tandem mass spectra using CSI:FingerID","volume":"112","author":"K D\u00fchrkop","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"1","key":"pcbi.1008920.ref029","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/s13321-017-0220-4","article-title":"The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching","volume":"9","author":"EL Willighagen","year":"2017","journal-title":"J Cheminform"},{"key":"pcbi.1008920.ref030","first-page":"819","article-title":"Probability Product Kernels","volume":"5","author":"T Jebara","year":"2004","journal-title":"J Mach Learn Res"},{"issue":"2","key":"pcbi.1008920.ref031","doi-asserted-by":"crossref","DOI":"10.3390\/md19020103","article-title":"Comparative Metabologenomics Analysis of Polar Actinomycetes","volume":"19","author":"S Soldatou","year":"2021","journal-title":"Marine Drugs"},{"issue":"9","key":"pcbi.1008920.ref032","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1038\/nchembio.1890","article-title":"Minimum Information about a Biosynthetic Gene cluster","volume":"11","author":"MH Medema","year":"2015","journal-title":"Nat Chem Biol"},{"issue":"8","key":"pcbi.1008920.ref033","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1038\/nbt.3597","article-title":"Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking","volume":"34","author":"M Wang","year":"2016","journal-title":"Nat Biotechnol"},{"issue":"4","key":"pcbi.1008920.ref034","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1038\/s41589-020-00724-z","article-title":"A community resource for paired genomic and metabolomic data mining","volume":"17","author":"MA Schorn","year":"2021","journal-title":"Nature Chemical Biology"},{"issue":"1","key":"pcbi.1008920.ref035","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"D Weininger","year":"1988","journal-title":"J Chem Inf Model"},{"key":"pcbi.1008920.ref036","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s13321-015-0068-4","article-title":"InChI, the IUPAC International Chemical Identifier","volume":"7","author":"SR Heller","year":"2015","journal-title":"J Cheminform"},{"issue":"D1","key":"pcbi.1008920.ref037","first-page":"D454","article-title":"MIBiG 2.0: a repository for biosynthetic gene clusters of known function","volume":"48","author":"SA Kautsar","year":"2019","journal-title":"Nucleic Acids Research"},{"issue":"3","key":"pcbi.1008920.ref038","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1021\/acs.jnatprod.6b00722","article-title":"Prioritizing Natural Product Diversity in a Collection of 146 Bacterial Strains Based on Growth and Extraction Protocols","volume":"80","author":"M Cr\u00fcsemann","year":"2017","journal-title":"J Nat Prod"},{"issue":"1","key":"pcbi.1008920.ref039","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.chembiol.2006.11.007","article-title":"The genomisotopic approach: a systematic method to isolate products of orphan biosynthetic gene clusters","volume":"14","author":"H Gross","year":"2007","journal-title":"Chem Biol"},{"issue":"12","key":"pcbi.1008920.ref040","doi-asserted-by":"crossref","first-page":"3198","DOI":"10.1073\/pnas.1618556114","article-title":"Comparative genomics uncovers the prolific and distinctive metabolic potential of the cyanobacterial genus","volume":"114","author":"T Leao","year":"2017","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1008920.ref041","article-title":"Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships","author":"F Huber","year":"2020","journal-title":"bioRxiv"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008920","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,5,18]],"date-time":"2021-05-18T00:00:00Z","timestamp":1621296000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008920","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,18]],"date-time":"2021-05-18T13:50:47Z","timestamp":1621345847000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008920"}},"subtitle":[],"editor":[{"given":"Niranjan","family":"Nagarajan","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,5,4]]},"references-count":41,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5,4]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008920","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.06.12.148205","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,4]]}}}