{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T03:29:26Z","timestamp":1762918166575,"version":"3.37.3"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2020,11,22]],"date-time":"2020-11-22T00:00:00Z","timestamp":1606003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100011753","name":"National Institute of Immunology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100011753","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010803","name":"Department of Biotechnology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100010803","id-type":"DOI","asserted-by":"publisher"}]},{"name":"BTIS project","award":["BT\/BI\/03\/009\/2002"],"award-info":[{"award-number":["BT\/BI\/03\/009\/2002"]}]},{"DOI":"10.13039\/100011194","name":"COE","doi-asserted-by":"publisher","award":["BT\/COE\/34\/SP15138\/2015"],"award-info":[{"award-number":["BT\/COE\/34\/SP15138\/2015"]}],"id":[{"id":"10.13039\/100011194","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Even though genome mining tools have successfully identified large numbers of non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) biosynthetic gene clusters (BGCs) in bacterial genomes, currently no tool can predict the chemical structure of the secondary metabolites biosynthesized by these BGCs. Lack of algorithms for predicting complex macrocyclization patterns of linear PK\/NRP biosynthetic intermediates has been the major bottleneck in deciphering the final bioactive chemical structures of PKs\/NRPs by genome mining.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Using a large dataset of known chemical structures of macrocyclized PKs\/NRPs, we have developed a machine learning (ML) algorithm for distinguishing the correct macrocyclization pattern of PKs\/NRPs from the library of all theoretically possible cyclization patterns. Benchmarking of this ML classifier on completely independent datasets has revealed ROC\u2013AUC and PR\u2013AUC values of 0.82 and 0.81, respectively. This cyclization prediction algorithm has been used to develop SBSPKSv3, a genome mining tool for completely automated prediction of macrocyclized structures of NRPs\/PKs. SBSPKSv3 has been extensively benchmarked on a dataset of over 100 BGCs with known PKs\/NRPs products.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The macrocyclization prediction pipeline and all the datasets used in this study are freely available at http:\/\/www.nii.ac.in\/sbspks3.html.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa851","type":"journal-article","created":{"date-parts":[[2020,9,19]],"date-time":"2020-09-19T04:34:36Z","timestamp":1600490076000},"page":"603-611","source":"Crossref","is-referenced-by-count":12,"title":["A machine learning-based method for prediction of macrocyclization patterns of polyketides and non-ribosomal peptides"],"prefix":"10.1093","volume":"37","author":[{"given":"Priyesh","family":"Agrawal","sequence":"first","affiliation":[{"name":"Bioinformatics Centre, National Institute of Immunology , New Delhi 110067, India"}]},{"given":"Debasisa","family":"Mohanty","sequence":"additional","affiliation":[{"name":"Bioinformatics Centre, National Institute of Immunology , New Delhi 110067, India"}]}],"member":"286","published-online":{"date-parts":[[2020,11,22]]},"reference":[{"key":"2023051704101860200_btaa851-B1","doi-asserted-by":"crossref","first-page":"W80","DOI":"10.1093\/nar\/gkx408","article-title":"RiPPMiner: a bioinformatics resource for deciphering chemical structures of RiPPs based on prediction of cleavage and cross-links","volume":"45","author":"Agrawal","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B2","doi-asserted-by":"crossref","first-page":"361","DOI":"10.2217\/17460913.3.3.361","article-title":"Evolution and taxonomic distribution of nonribosomal peptide and polyketide synthases","volume":"3","author":"Amoutzias","year":"2008","journal-title":"Future Microbiol"},{"key":"2023051704101860200_btaa851-B3","doi-asserted-by":"crossref","first-page":"W36","DOI":"10.1093\/nar\/gkx319","article-title":"antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification","volume":"45","author":"Blin","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B4","doi-asserted-by":"crossref","first-page":"1103","DOI":"10.1093\/bib\/bbx146","article-title":"Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters","volume":"20","author":"Blin","year":"2019","journal-title":"Brief. Bioinf"},{"key":"2023051704101860200_btaa851-B5","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/S1074-5521(00)00091-0","article-title":"Predictive, structure-based model of amino acid recognition by nonribosomal peptide synthetase adenylation domains","volume":"7","author":"Challis","year":"2000","journal-title":"Chem. Biol"},{"key":"2023051704101860200_btaa851-B6","doi-asserted-by":"crossref","first-page":"D402","DOI":"10.1093\/nar\/gks993","article-title":"ClusterMine360: a database of microbial PKS\/NRPS biosynthesis","volume":"41","author":"Conway","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B7","doi-asserted-by":"crossref","first-page":"D509","DOI":"10.1093\/nar\/gkx893","article-title":"ClusterCAD: a computational platform for type I modular polyketide synthase design","volume":"46","author":"Eng","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B8","doi-asserted-by":"crossref","first-page":"D1113","DOI":"10.1093\/nar\/gkv1143","article-title":"Norine, the knowledgebase dedicated to non-ribosomal peptides, is now open to crowdsourcing","volume":"44","author":"Flissi","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B9","doi-asserted-by":"crossref","first-page":"2479","DOI":"10.1093\/bioinformatics\/bth261","article-title":"Data mining in bioinformatics using Weka","volume":"20","author":"Frank","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051704101860200_btaa851-B10","doi-asserted-by":"crossref","first-page":"D408","DOI":"10.1093\/nar\/gks1177","article-title":"DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters","volume":"41","author":"Ichikawa","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B11","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.synbio.2016.03.001","article-title":"In silico methods for linking genes and secondary metabolites: the way forward","volume":"1","author":"Khater","year":"2016","journal-title":"Synth. Syst. Biotechnol"},{"key":"2023051704101860200_btaa851-B12","doi-asserted-by":"crossref","first-page":"W72","DOI":"10.1093\/nar\/gkx344","article-title":"SBSPKSv2: structure-based sequence analysis of polyketide synthases and non-ribosomal peptide synthetases","volume":"45","author":"Khater","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B13","doi-asserted-by":"crossref","first-page":"D509","DOI":"10.1093\/nar\/gkv1319","article-title":"StreptomeDB 2.0\u2014an extended resource of natural products produced by streptomycetes","volume":"44","author":"Klementz","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B14","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1038\/nchembio.1884","article-title":"Computational approaches to natural product discovery","volume":"11","author":"Medema","year":"2015","journal-title":"Nat. Chem. Biol"},{"key":"2023051704101860200_btaa851-B15","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1038\/nchembio.1890","article-title":"Minimum Information about a Biosynthetic Gene cluster","volume":"11","author":"Medema","year":"2015","journal-title":"Nat. Chem. Biol"},{"key":"2023051704101860200_btaa851-B16","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model"},{"key":"2023051704101860200_btaa851-B17","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1007\/s001140100211","article-title":"Multimodular biocatalysts for natural product assembly","volume":"88","author":"Schwarzer","year":"2001","journal-title":"Die Naturwissenschaften"},{"key":"2023051704101860200_btaa851-B18","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1016\/S1074-5521(01)00068-0","article-title":"Exploring the impact of different thioesterase domains for the design of hybrid peptide synthetases","volume":"8","author":"Schwarzer","year":"2001","journal-title":"Chem. Biol"},{"key":"2023051704101860200_btaa851-B19","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/nature25978","article-title":"Planning chemical syntheses with deep neural networks and symbolic AI","volume":"555","author":"Segler","year":"2018","journal-title":"Nature"},{"key":"2023051704101860200_btaa851-B20","doi-asserted-by":"crossref","first-page":"W49","DOI":"10.1093\/nar\/gkx320","article-title":"PRISM 3: expanded prediction of natural product chemical structures from microbial genomes","volume":"45","author":"Skinnider","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051704101860200_btaa851-B21","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1016\/S1074-5521(99)80082-9","article-title":"The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases","volume":"6","author":"Stachelhaus","year":"1999","journal-title":"Chem. Biol"},{"key":"2023051704101860200_btaa851-B22","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/nchembio.2319","article-title":"A new genome-mining tool redefines the lasso peptide biosynthetic landscape","volume":"13","author":"Tietz","year":"2017","journal-title":"Nat. Chem. Biol"},{"key":"2023051704101860200_btaa851-B23","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1038\/nbt.3597","article-title":"Sharing and community curation of mass spectrometry data with global natural products social molecular networking","volume":"34","author":"Wang","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051704101860200_btaa851-B24","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1016\/S0022-2836(03)00232-8","article-title":"Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases","volume":"328","author":"Yadav","year":"2003","journal-title":"J. Mol. Biol"},{"key":"2023051704101860200_btaa851-B25","doi-asserted-by":"crossref","first-page":"e1000351","DOI":"10.1371\/journal.pcbi.1000351","article-title":"Towards prediction of metabolic products of polyketide synthases: an in silico analysis","volume":"5","author":"Yadav","year":"2009","journal-title":"PLoS Comput. Biol"},{"key":"2023051704101860200_btaa851-B26","doi-asserted-by":"crossref","first-page":"W64","DOI":"10.1093\/nar\/gkx289","article-title":"SeMPI: a genome-based secondary metabolite prediction and identification web server","volume":"45","author":"Zierep","year":"2017","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa851\/34500778\/btaa851.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/5\/603\/50356527\/btaa851.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/5\/603\/50356527\/btaa851.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T04:10:54Z","timestamp":1684296654000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/5\/603\/5917628"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,22]]},"references-count":26,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,5,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa851","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,3,1]]},"published":{"date-parts":[[2020,11,22]]}}}