{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T06:28:54Z","timestamp":1764052134957,"version":"3.45.0"},"reference-count":51,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T00:00:00Z","timestamp":1764028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>\n                    Bacteriocins offer a promising solution to antibiotic resistance, possessing the ability to target a wide range of bacteria with precision. Thus, there is an urgent need for a computational model to predict new bacteriocins and aid in drug development. This work centers on constructing web-based predictive models using the XGBoost machine learning algorithm, based on the physicochemical properties, structural characteristics, and sequence profiles of protein sequences. We employed correlation analyses, cross-validation, and hypergraph-based techniques to select features. Cross-validated feature selection (CVFS) partitions the dataset, selects features within each partition, and identifies common features, ensuring representativeness. On the contrary, hypergraph-based feature evaluation (HFE) focuses on minimizing hypergraph cut conductance, leveraging higher-order data relationships to precisely utilize information regarding feature and sample correlations. The XGBoost models were built using the selected features obtained from these two feature evaluation methods. We also analyzed the feature contributions directly from the best model using SHapley Additive exPlanations (SHAP). Our HFE-based approach achieved 99.11% accuracy and an AUC of 0.9974 on the test data, overall outperforming the CVFS-based feature evaluation method and yielding results comparable to existing approaches. The most influential features are related to solvent accessibility for buried residues, followed by the composition of cysteine. Our web application, accessible at\n                    <jats:ext-link>https:\/\/shiny.tricities.wsu.edu\/bacteriocin-prediction\/<\/jats:ext-link>\n                    , offers prediction results, probability scores, and SHAP plots using both cross-validation- and hypergraph-based methods, along with previously implemented approaches for feature selection.\n                  <\/jats:p>","DOI":"10.3389\/fbinf.2025.1694009","type":"journal-article","created":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T06:25:27Z","timestamp":1764051927000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Bacteriocin prediction through cross-validation-based and hypergraph-based feature evaluation approaches"],"prefix":"10.3389","volume":"5","author":[{"given":"Suraiya","family":"Akhter","sequence":"first","affiliation":[]},{"given":"John H.","family":"Miller","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,11,25]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1186\/s12859-023-05330-z","article-title":"BaPreS: a software tool for predicting bacteriocins using an optimal set of features","volume":"24","author":"Akhter","year":"","journal-title":"BMC Bioinforma."},{"key":"B2","doi-asserted-by":"publisher","first-page":"1284705","DOI":"10.3389\/fbinf.2023.1284705","article-title":"BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier","volume":"3","author":"Akhter","year":"","journal-title":"Front. Bioinforma."},{"key":"B3","doi-asserted-by":"publisher","first-page":"630695","DOI":"10.3389\/fmicb.2021.630695","article-title":"Bacteriocins: an overview of antimicrobial, toxicity, and biosafety assessment by in vivo models","volume":"12","author":"Ben\u00edtez-Chao","year":"2021","journal-title":"Front. Microbiol."},{"key":"B4","doi-asserted-by":"publisher","first-page":"W29","DOI":"10.1093\/nar\/gkt282","article-title":"BLAST: a more efficient report with usability improvements","volume":"41","author":"Boratyn","year":"2013","journal-title":"Nucleic acids Res."},{"key":"B5","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/j.micpath.2018.02.021","article-title":"Safety, potential biotechnological and probiotic properties of bacteriocinogenic Enterococcus lactis strains isolated from raw shrimps","volume":"117","author":"Bra\u00efek","year":"2018","journal-title":"Microb. Pathog."},{"key":"B6","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1038\/nrmicro1098","article-title":"Antimicrobial peptides: pore formers or metabolic inhibitors in bacteria?","volume":"3","author":"Brogden","year":"2005","journal-title":"Nat. Rev. Microbiol."},{"key":"B7","doi-asserted-by":"publisher","first-page":"S12","DOI":"10.1186\/1471-2105-9-s12-s12","article-title":"Real value prediction of protein solvent accessibility using enhanced PSSM features","volume":"9","author":"Chang","year":"2008","journal-title":"BMC Bioinforma."},{"key":"B8","doi-asserted-by":"crossref","DOI":"10.1145\/2939672.2939785","article-title":"Xgboost: a scalable tree boosting system","volume-title":"Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining","author":"Chen","year":"2016"},{"key":"B9","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1038\/nrmicro2937","article-title":"Bacteriocins\u2014A viable alternative to antibiotics?","volume":"11","author":"Cotter","year":"2013","journal-title":"Nat. Rev. Microbiol."},{"key":"B10","doi-asserted-by":"publisher","first-page":"3663","DOI":"10.1109\/TCBB.2021.3122183","article-title":"A random multi-scale convolutional neural network for marine microbial bacteriocins identification","volume":"19","author":"Cui","year":"2021","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinforma."},{"key":"B11","doi-asserted-by":"publisher","first-page":"e24093","DOI":"10.1002\/jcla.24093","article-title":"Bacteriocins: properties and potential use as antimicrobials","volume":"36","author":"Darbandi","year":"2022","journal-title":"J. Clin. Laboratory Analysis"},{"key":"B12","doi-asserted-by":"publisher","first-page":"467","DOI":"10.1016\/s0968-4328(96)00028-5","article-title":"Bacteriocins: nature, function and structure","volume":"27","author":"Daw","year":"1996","journal-title":"Micron"},{"key":"B13","doi-asserted-by":"publisher","first-page":"8615","DOI":"10.3390\/ijms22168615","article-title":"Identification of potential probiotics producing bacteriocins active against Listeria monocytogenes by a combination of screening tools","volume":"22","author":"Desiderato","year":"2021","journal-title":"Int. J. Mol. Sci."},{"key":"B14","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1128\/mmbr.00016-05","article-title":"The continuing story of class IIa bacteriocins","volume":"70","author":"Drider","year":"2006","journal-title":"Microbiol. Mol. Biol. Rev."},{"key":"B15","doi-asserted-by":"publisher","first-page":"8700","DOI":"10.1073\/pnas.92.19.8700","article-title":"Prediction of protein folding class using global description of amino acid sequence","volume":"92","author":"Dubchak","year":"1995","journal-title":"Proc. Natl. Acad. Sci."},{"key":"B16","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1002\/ddr.21601","article-title":"Novel antimicrobial peptide discovery using machine learning and biophysical selection of minimal bacteriocin domains","volume":"81","author":"Fields","year":"2020","journal-title":"Drug Dev. Res."},{"key":"B17","doi-asserted-by":"publisher","first-page":"688","DOI":"10.1002\/psc.699","article-title":"Pediocin\u2010like antimicrobial peptides (class IIa bacteriocins) and their immunity proteins: biosynthesis, structure, and mode of action","volume":"11","author":"Fimland","year":"2005","journal-title":"J. peptide Sci. official Publ. Eur. Peptide Soc."},{"key":"B18","doi-asserted-by":"publisher","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"B19","doi-asserted-by":"publisher","first-page":"901","DOI":"10.1021\/cr400031z","article-title":"Multifaceted roles of disulfide bonds. Peptides as therapeutics","volume":"114","author":"Gongora-Benitez","year":"2014","journal-title":"Chem. Rev."},{"key":"B20","article-title":"Bacteriocin detection with distributed biological sequence representation","volume-title":"ICML computational Biology workshop","author":"Hamid","year":"2017"},{"key":"B21","doi-asserted-by":"publisher","first-page":"2009","DOI":"10.1093\/bioinformatics\/bty937","article-title":"Identifying antimicrobial peptides using word embedding with deep recurrent neural networks","volume":"35","author":"Hamid","year":"2019","journal-title":"Bioinformatics"},{"key":"B22","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/1471-2180-10-22","article-title":"BACTIBASE second release: a database and tool platform for bacteriocin characterization","volume":"10","author":"Hammami","year":"2010","journal-title":"Bmc Microbiol."},{"key":"B23","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1128\/cmr.00056-05","article-title":"Peptide antimicrobial agents","volume":"19","author":"Jenssen","year":"2006","journal-title":"Clin. Microbiol. Rev."},{"key":"B24","doi-asserted-by":"publisher","first-page":"172","DOI":"10.5851\/kosfa.2018.38.1.172","article-title":"Isolation and molecular identification of bacteriocin-producing enterococci with broad antibacterial activity from traditional dairy products in Kerman province of Iran","volume":"38","author":"Khodaei","year":"2018","journal-title":"Korean J. Food Sci. Animal Resour."},{"key":"B25","doi-asserted-by":"publisher","first-page":"137","DOI":"10.3390\/genes12020137","article-title":"Ensemble-AMPPred: robust AMP prediction and recognition using the ensemble learning method with a new hybrid feature for differentiating AMPs","volume":"12","author":"Lertampaiporn","year":"2021","journal-title":"Genes"},{"key":"B26","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1038\/s42256-019-0138-9","article-title":"From local explanations to global understanding with explainable AI for trees","volume":"2","author":"Lundberg","year":"2020","journal-title":"Nat. Mach. Intell."},{"key":"B27","doi-asserted-by":"publisher","first-page":"540","DOI":"10.3390\/fermentation10110540","article-title":"A review of antimicrobial peptides: structure, mechanism of action, and molecular optimization strategies","volume":"10","author":"Ma","year":"2024","journal-title":"Fermentation"},{"key":"B28","doi-asserted-by":"publisher","first-page":"32","DOI":"10.3390\/antibiotics9010032","article-title":"Bacteriocins, potent antimicrobial peptides and the fight against multi drug resistant species: resistance is futile?","volume":"9","author":"Meade","year":"2020","journal-title":"Antibiotics"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1654","DOI":"10.3389\/fmicb.2018.01654","article-title":"Heterologous expression of biopreservative bacteriocins with a view to low cost production","volume":"9","author":"Mesa-Pereira","year":"2018","journal-title":"Front. Microbiol."},{"key":"B30","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013"},{"key":"B31","doi-asserted-by":"publisher","first-page":"1657","DOI":"10.1007\/s10115-022-01786-2","article-title":"Hypergraph-based importance assessment for binary classification data","volume":"65","author":"Misiorek","year":"2023","journal-title":"Knowl. Inf. Syst."},{"key":"B32","doi-asserted-by":"publisher","first-page":"bpac008","DOI":"10.1093\/biomethods\/bpac008","article-title":"PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles","volume":"7","author":"Mohammadi","year":"2022","journal-title":"Biol. Methods Protoc."},{"key":"B33","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1186\/s12859-015-0792-9","article-title":"A large scale prediction of bacteriocin gene blocks suggests a wide functional spectrum for bacteriocins","volume":"16","author":"Morton","year":"2015","journal-title":"BMC Bioinforma."},{"key":"B34","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1016\/j.tibtech.2011.05.001","article-title":"The expanding scope of antimicrobial peptide structures and their modes of action","volume":"29","author":"Nguyen","year":"2011","journal-title":"Trends Biotechnol."},{"key":"B35","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1038\/nchembio.286","article-title":"Follow the leader: the use of leader peptides to guide natural product biosynthesis","volume":"6","author":"Oman","year":"2010","journal-title":"Nat. Chem. Biol."},{"key":"B36","doi-asserted-by":"publisher","first-page":"11833","DOI":"10.1038\/s41598-018-30271-6","article-title":"The potency of the broad spectrum bacteriocin, bactofencin A, against staphylococci is highly dependent on primary structure, N-terminal charge and disulphide formation","volume":"8","author":"O\u2019Connor","year":"2018","journal-title":"Sci. Rep."},{"key":"B37","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1475-2859-13-s1-s3","article-title":"Novel bacteriocins from lactic acid bacteria (LAB): various structures and applications","volume":"13","author":"Perez","year":"2014","journal-title":"Microb. cell factories"},{"key":"B38","doi-asserted-by":"publisher","first-page":"9571","DOI":"10.3390\/su14159571","article-title":"Bacteriocin from Lacticaseibacillus rhamnosus sp. A5: isolation, purification, characterization, and antibacterial evaluation for sustainable food processing","volume":"14","author":"Ren","year":"2022","journal-title":"Sustainability"},{"key":"B39","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1146\/annurev.micro.56.012302.161024","article-title":"Bacteriocins: evolution, ecology, and application","volume":"56","author":"Riley","year":"2002","journal-title":"Annu. Rev. Microbiol."},{"key":"B40","doi-asserted-by":"publisher","first-page":"584","DOI":"10.1006\/jmbi.1993.1413","article-title":"Prediction of protein secondary structure at better than 70% accuracy","volume":"232","author":"Rost","year":"1993","journal-title":"J. Mol. Biol."},{"key":"B41","doi-asserted-by":"publisher","first-page":"756","DOI":"10.17706\/jsw.11.8.756-767","article-title":"Protein fold recognition using genetic algorithm optimized voting scheme and profile bigram","volume":"11","author":"Saini","year":"2016","journal-title":"J. Softw."},{"key":"B42","doi-asserted-by":"publisher","first-page":"e80635","DOI":"10.1371\/journal.pone.0080635","article-title":"Maximum allowed solvent accessibilites of residues in proteins","volume":"8","author":"Tien","year":"2013","journal-title":"PloS one"},{"key":"B43","doi-asserted-by":"publisher","first-page":"W448","DOI":"10.1093\/nar\/gkt391","article-title":"BAGEL3: automated identification of genes encoding bacteriocins and (non-) bactericidal posttranslationally modified peptides","volume":"41","author":"Van Heel","year":"2013","journal-title":"Nucleic acids Res."},{"key":"B44","doi-asserted-by":"publisher","first-page":"W278","DOI":"10.1093\/nar\/gky383","article-title":"BAGEL4: a user-friendly web server to thoroughly mine RiPPs and bacteriocins","volume":"46","author":"van Heel","year":"2018","journal-title":"Nucleic acids Res."},{"key":"B45","doi-asserted-by":"publisher","first-page":"W237","DOI":"10.1093\/nar\/gkv437","article-title":"antiSMASH 3.0\u2014a comprehensive resource for the genome mining of biosynthetic gene clusters","volume":"43","author":"Weber","year":"2015","journal-title":"Nucleic acids Res."},{"key":"B46","doi-asserted-by":"publisher","first-page":"905","DOI":"10.1021\/cb1001558","article-title":"Describing the mechanism of antimicrobial peptide action with the interfacial activity model","volume":"5","author":"Wimley","year":"2010","journal-title":"ACS Chem. Biol."},{"key":"B47","doi-asserted-by":"publisher","first-page":"1857","DOI":"10.1093\/bioinformatics\/btv042","article-title":"protr\/ProtrWeb: r package and web server for generating various numerical representation schemes of protein sequences","volume":"31","author":"Xiao","year":"2015","journal-title":"Bioinformatics"},{"key":"B48","doi-asserted-by":"publisher","first-page":"769","DOI":"10.1016\/j.csbj.2022.12.046","article-title":"A Cross-Validated feature Selection (CVFS) approach for extracting the most parsimonious feature sets and discovering potential antimicrobial resistance (AMR) biomarkers","volume":"21","author":"Yang","year":"2023","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"B49","doi-asserted-by":"publisher","first-page":"241","DOI":"10.3389\/fmicb.2014.00241","article-title":"Antibacterial activities of bacteriocins: application in foods and pharmaceuticals","volume":"5","author":"Yang","year":"2014","journal-title":"Front. Microbiol."},{"key":"B50","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1111\/j.1365-2672.2007.03575.x","article-title":"Bacteriocin detection by liquid chromatography\/mass spectrometry for rapid identification","volume":"104","author":"Zendo","year":"2008","journal-title":"J. Appl. Microbiol."},{"key":"B51","doi-asserted-by":"publisher","first-page":"2165","DOI":"10.3389\/fmicb.2018.02165","article-title":"Purification and partial characterization of bacteriocin Lac-B23, a novel bacteriocin production by Lactobacillus plantarum J23, isolated from Chinese traditional fermented milk","volume":"9","author":"Zhang","year":"2018","journal-title":"Front. Microbiol."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1694009\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T06:25:28Z","timestamp":1764051928000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1694009\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,25]]},"references-count":51,"alternative-id":["10.3389\/fbinf.2025.1694009"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1694009","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,25]]},"article-number":"1694009"}}