{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,9]],"date-time":"2026-02-09T23:42:36Z","timestamp":1770680556804,"version":"3.49.0"},"reference-count":57,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T00:00:00Z","timestamp":1679011200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Department of Biotechnology, Government of India"},{"DOI":"10.13039\/501100011753","name":"National Institute of Immunology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100011753","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010803","name":"Department of Biotechnology","doi-asserted-by":"publisher","award":["BT\/PR40325\/BTIS\/137\/1\/2020"],"award-info":[{"award-number":["BT\/PR40325\/BTIS\/137\/1\/2020"]}],"id":[{"id":"10.13039\/501100010803","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Supercomputing Mission, MeiTY, India","award":["MeitY\/R&D\/HPC\/2(1)\/2014\/"],"award-info":[{"award-number":["MeitY\/R&D\/HPC\/2(1)\/2014\/"]}]},{"name":"National Supercomputing Mission, MeiTY, India","award":["3191"],"award-info":[{"award-number":["3191"]}]},{"name":"Senior Research Fellowship from CSIR, India"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,5,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Small open reading frames (smORFs) encoding proteins less than 100\u00a0amino acids (aa) are known to be important regulators of key cellular processes. However, their computational identification remains a challenge. Based on a comprehensive analysis of known prokaryotic small ORFs, we have developed the ProsmORF-pred resource which uses a machine learning (ML)-based method for prediction of smORFs in the prokaryotic genome sequences. ProsmORF-pred consists of two ML models, one for initiation site recognition in nucleic acid sequences upstream of putative start codons and the other uses translated amino acid sequences to decipher functional protein like sequences. The nucleotide sequence-based initiation site recognition model has been trained using longer ORFs (&amp;gt;100\u00a0aa) in the same genome while the ML model for identification of protein like sequences has been trained using annotated smORFs from Escherichia coli. Comprehensive benchmarking of ProsmORF-pred reveals that its performance is comparable to other state-of-the-art approaches on the annotated smORF set derived from 32 prokaryotic genomes. Its performance is distinctly superior to other tools like PRODIGAL and RANSEPS for prediction of newly identified smORFs which have a length range of 10\u201330\u00a0aa, where prediction of smORFs has been a major challenge. Apart from identification of smORFs in genomic sequences, ProsmORF-pred can also aid in functional annotation of the predicted smORFs based on sequence similarity and genomic neighbourhood similarity searches in ProsmORFDB, a well-curated database of known smORFs. ProsmORF-pred along with its backend database ProsmORFDB is available as a user-friendly web server (http:\/\/www.nii.ac.in\/prosmorfpred.html).<\/jats:p>","DOI":"10.1093\/bib\/bbad101","type":"journal-article","created":{"date-parts":[[2023,3,30]],"date-time":"2023-03-30T05:46:32Z","timestamp":1680155192000},"source":"Crossref","is-referenced-by-count":10,"title":["ProsmORF-pred: a machine learning-based method for the identification of small ORFs in prokaryotic genomes"],"prefix":"10.1093","volume":"24","author":[{"given":"Akshay","family":"Khanduja","sequence":"first","affiliation":[{"name":"National Institute of Immunology , Aruna Asaf Ali Marg, New Delhi 110067 , India"}]},{"given":"Manish","family":"Kumar","sequence":"additional","affiliation":[{"name":"National Institute of Immunology , Aruna Asaf Ali Marg, New Delhi 110067 , India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3374-0588","authenticated-orcid":false,"given":"Debasisa","family":"Mohanty","sequence":"additional","affiliation":[{"name":"National Institute of Immunology , Aruna Asaf Ali Marg, New Delhi 110067 , India"}]}],"member":"286","published-online":{"date-parts":[[2023,3,17]]},"reference":[{"key":"2023052021585520300_ref1","doi-asserted-by":"crossref","first-page":"1178","DOI":"10.1002\/cbic.201900677","article-title":"Rapid biophysical characterization and NMR spectroscopy structural analysis of small proteins from bacteria and archaea","volume":"21","author":"Kubatova","year":"2020","journal-title":"Chembiochem"},{"key":"2023052021585520300_ref2","doi-asserted-by":"crossref","first-page":"e104763","DOI":"10.15252\/embj.2020104763","article-title":"Translation of small downstream ORFs enhances translation of canonical main open reading frames","volume":"39","author":"Wu","year":"2020","journal-title":"EMBO J"},{"key":"2023052021585520300_ref3","doi-asserted-by":"crossref","first-page":"4131","DOI":"10.1021\/acs.biochem.0c00672","article-title":"The NBDY microprotein regulates cellular RNA Decapping","volume":"59","author":"Na","year":"2020","journal-title":"Biochemistry"},{"key":"2023052021585520300_ref4","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1038\/ncb1595","article-title":"Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA","volume":"9","author":"Kondo","year":"2007","journal-title":"Nat Cell Biol"},{"key":"2023052021585520300_ref5","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1126\/science.1188158","article-title":"Small peptides switch the transcriptional activity of Shavenbaby during drosophila embryogenesis","volume":"329","author":"Kondo","year":"2010","journal-title":"Science"},{"key":"2023052021585520300_ref6","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1093\/nar\/gkz734","article-title":"Alternative ORFs and small ORFs: shedding light on the dark proteome","volume":"48","author":"Orr","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref7","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1038\/nrm.2017.58","article-title":"Classification and function of small open reading frames","volume":"18","author":"Couso","year":"2017","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2023052021585520300_ref8","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1038\/nchembio.1964","article-title":"Discovery and characterization of smORF-encoded bioactive polypeptides","volume":"11","author":"Saghatelian","year":"2015","journal-title":"Nat Chem Biol"},{"key":"2023052021585520300_ref9","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/j.cell.2015.01.009","article-title":"A micropeptide encoded by a putative long noncoding RNA regulates muscle performance","volume":"160","author":"Anderson","year":"2015","journal-title":"Cell"},{"key":"2023052021585520300_ref10","doi-asserted-by":"crossref","first-page":"e36972","DOI":"10.1371\/journal.pone.0036972","article-title":"Phylogenomics of prokaryotic ribosomal proteins","volume":"7","author":"Yutin","year":"2012","journal-title":"PloS One"},{"key":"2023052021585520300_ref11","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.mib.2017.09.010","article-title":"Small bacterial and phagic proteins: an updated view on a rapidly moving field","volume":"39","author":"Duval","year":"2017","journal-title":"Curr Opin Microbiol"},{"key":"2023052021585520300_ref12","doi-asserted-by":"crossref","first-page":"16696","DOI":"10.1073\/pnas.1210093109","article-title":"Conserved small protein associates with the multidrug efflux pump AcrB and differentially affects antibiotic resistance","volume":"109","author":"Hobbs","year":"2012","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023052021585520300_ref13","doi-asserted-by":"crossref","first-page":"e1005641","DOI":"10.1371\/journal.pgen.1005641","article-title":"Leaderless transcripts and small proteins are common features of the mycobacterial translational landscape","volume":"11","author":"Shell","year":"2015","journal-title":"PLoS Genet"},{"key":"2023052021585520300_ref14","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1534\/g3.116.036939","article-title":"Identification of unannotated small genes in salmonella","volume":"7","author":"Baek","year":"2017","journal-title":"G3 (Bethesda)"},{"key":"2023052021585520300_ref15","article-title":"Identifying small proteins by ribosome profiling with stalled initiation complexes","volume":"10","author":"Weaver","journal-title":"mBio"},{"key":"2023052021585520300_ref16","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1016\/j.molcel.2019.02.017","article-title":"Retapamulin-assisted ribosome profiling reveals the alternative bacterial proteome","volume":"74","author":"Meydan","year":"2019","journal-title":"Mol Cell"},{"key":"2023052021585520300_ref17","doi-asserted-by":"crossref","first-page":"103604","DOI":"10.1016\/j.jprot.2019.103604","article-title":"Enrichment and identification of small proteins in a simplified human gut microbiome","volume":"213","author":"Petruschke","year":"2020","journal-title":"J Proteomics"},{"key":"2023052021585520300_ref18","doi-asserted-by":"crossref","first-page":"e1009585","DOI":"10.1371\/journal.pgen.1009585","article-title":"Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach","volume":"17","author":"Fuchs","year":"2021","journal-title":"PLoS Genet"},{"key":"2023052021585520300_ref19","doi-asserted-by":"crossref","first-page":"3268","DOI":"10.1038\/s41467-020-17081-z","article-title":"MetaRibo-Seq measures translation in microbiomes","volume":"11","author":"Fremin","year":"2020","journal-title":"Nat Commun"},{"key":"2023052021585520300_ref20","article-title":"microProteInS - a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs","volume":"38","author":"Souza","journal-title":"Bioinformatics"},{"key":"2023052021585520300_ref21","doi-asserted-by":"crossref","first-page":"e45103","DOI":"10.1371\/journal.pone.0045103","article-title":"Predicting statistical properties of open reading frames in bacterial genomes","volume":"7","author":"Mir","year":"2012","journal-title":"PLoS One"},{"key":"2023052021585520300_ref22","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1016\/j.cell.2019.07.016","article-title":"Large-scale analyses of human microbiomes reveal thousands of small, novel genes","volume":"178","author":"Sberro","year":"2019","journal-title":"Cell"},{"key":"2023052021585520300_ref23","doi-asserted-by":"crossref","first-page":"e1700064","DOI":"10.1002\/pmic.201700064","article-title":"Identifying new small proteins in Escherichia coli","volume":"18","author":"VanOrsdel","year":"2018","journal-title":"Proteomics"},{"key":"2023052021585520300_ref24","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1111\/j.1365-2958.2008.06495.x","article-title":"Small membrane proteins found by comparative genomics and ribosome binding site models","volume":"70","author":"Hemm","year":"2008","journal-title":"Mol Microbiol"},{"key":"2023052021585520300_ref25","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1261\/rna.2536111","article-title":"RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data","volume":"17","author":"Washietl","year":"2011","journal-title":"RNA"},{"key":"2023052021585520300_ref26","article-title":"OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques","volume":"2020","author":"RC","year":"2020","journal-title":"Database (Oxford)"},{"key":"2023052021585520300_ref27","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2023052021585520300_ref28","doi-asserted-by":"crossref","first-page":"946","DOI":"10.1186\/1471-2164-15-946","article-title":"Conservation analysis of the CydX protein yields insights into small protein identification and evolution","volume":"15","author":"Allen","year":"2014","journal-title":"BMC Genomics"},{"key":"2023052021585520300_ref29","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1186\/1471-2105-11-119","article-title":"Prodigal: prokaryotic gene recognition and translation initiation site identification","volume":"11","author":"Hyatt","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023052021585520300_ref30","doi-asserted-by":"crossref","first-page":"103933","DOI":"10.1016\/j.compbiomed.2020.103933","article-title":"VEPAD - predicting the effect of variants associated with Alzheimer's disease using machine learning","volume":"124","author":"Rangaswamy","year":"2020","journal-title":"Comput Biol Med"},{"key":"2023052021585520300_ref31","doi-asserted-by":"crossref","first-page":"1690","DOI":"10.3389\/fphar.2019.01690","article-title":"SAMbinder: a web server for predicting S-Adenosyl-L-methionine binding residues of a protein from its amino acid sequence","volume":"10","author":"Agrawal","year":"2019","journal-title":"Front Pharmacol"},{"key":"2023052021585520300_ref32","doi-asserted-by":"crossref","first-page":"4118","DOI":"10.1093\/bioinformatics\/bty496","article-title":"Operon-mapper: a web server for precise operon identification in bacterial and archaeal genomes","volume":"34","author":"Taboada","year":"2018","journal-title":"Bioinformatics"},{"key":"2023052021585520300_ref33","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/s40360-018-0282-6","article-title":"eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates","volume":"20","author":"Pu","year":"2019","journal-title":"BMC Pharmacol Toxicol"},{"key":"2023052021585520300_ref34","doi-asserted-by":"crossref","first-page":"17314","DOI":"10.1038\/s41598-017-17330-0","article-title":"Identification of a gene signature for discriminating metastatic from primary melanoma using a molecular interaction network approach","volume":"7","author":"Metri","year":"2017","journal-title":"Sci Rep"},{"key":"2023052021585520300_ref35","doi-asserted-by":"crossref","first-page":"e8290","DOI":"10.15252\/msb.20188290","article-title":"Unraveling the hidden universe of small proteins in bacterial genomes","volume":"15","author":"Miravet-Verde","year":"2019","journal-title":"Mol Syst Biol"},{"key":"2023052021585520300_ref36","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.chom.2020.11.002","article-title":"Automated prediction and annotation of small open reading frames in microbial genomes","volume":"29","author":"Durrant","year":"2021","journal-title":"Cell Host Microbe"},{"key":"2023052021585520300_ref37","doi-asserted-by":"crossref","first-page":"e36","DOI":"10.1093\/nar\/gkz061","article-title":"DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns","volume":"47","author":"Clauwaert","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref38","doi-asserted-by":"crossref","first-page":"e168","DOI":"10.1093\/nar\/gkx758","article-title":"REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes","volume":"45","author":"Ndah","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref39","doi-asserted-by":"crossref","first-page":"e89","DOI":"10.1093\/nar\/gkab477","article-title":"smORFer: a modular algorithm to detect small ORFs in prokaryotes","volume":"49","author":"Bartholomaus","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref40","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"UniProt","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref41","doi-asserted-by":"crossref","first-page":"D851","DOI":"10.1093\/nar\/gkx1068","article-title":"RefSeq: an update on prokaryotic genome annotation and curation","volume":"46","author":"Haft","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref42","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1093\/dnares\/dsw008","article-title":"Comprehensive identification of translation start sites by tetracycline-inhibited ribosome profiling","volume":"23","author":"Nakahigashi","year":"2016","journal-title":"DNA Res"},{"key":"2023052021585520300_ref43","doi-asserted-by":"crossref","DOI":"10.1093\/femsml\/uqaa002","article-title":"A global data-driven census of salmonella small proteins and their potential functions in bacterial virulence","volume":"1","author":"Venturini","year":"2020","journal-title":"microLife"},{"key":"2023052021585520300_ref44","doi-asserted-by":"crossref","first-page":"e0124722","DOI":"10.1128\/mbio.01247-22","article-title":"Discovery of unannotated small open reading frames in Streptococcus pneumoniae D39 involved in quorum sensing and virulence using ribosome profiling","volume":"13","author":"Laczkovich","year":"2022","journal-title":"MBio"},{"key":"2023052021585520300_ref45","doi-asserted-by":"crossref","first-page":"e1004463","DOI":"10.1371\/journal.pgen.1004463","article-title":"The coding and noncoding architecture of the Caulobacter crescentus genome","volume":"10","author":"Schrader","year":"2014","journal-title":"PLoS Genet"},{"key":"2023052021585520300_ref46","doi-asserted-by":"crossref","first-page":"2479","DOI":"10.1093\/bioinformatics\/bth261","article-title":"Data mining in bioinformatics using Weka","volume":"20","author":"Frank","year":"2004","journal-title":"Bioinformatics"},{"key":"2023052021585520300_ref47","doi-asserted-by":"crossref","first-page":"3940","DOI":"10.1093\/bioinformatics\/bti623","article-title":"ROCR: visualizing classifier performance in R","volume":"21","author":"Sing","year":"2005","journal-title":"Bioinformatics"},{"key":"2023052021585520300_ref48","doi-asserted-by":"crossref","first-page":"1047","DOI":"10.1093\/bib\/bbz041","article-title":"iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA","volume":"21","author":"Chen","year":"2020","journal-title":"RNA and protein sequence data, Brief Bioinform"},{"key":"2023052021585520300_ref49","doi-asserted-by":"crossref","first-page":"3019","DOI":"10.1093\/bioinformatics\/btab090","article-title":"Orfipy: a fast and flexible tool for extracting ORFs","volume":"37","author":"Singh","year":"2021","journal-title":"Bioinformatics"},{"key":"2023052021585520300_ref50","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gku1221","article-title":"CDD: NCBI's conserved domain database","volume":"43","author":"Marchler-Bauer","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023052021585520300_ref51","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: a sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res"},{"key":"2023052021585520300_ref52","first-page":"917540","article-title":"Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction","volume":"2012","author":"Xia","year":"2012","journal-title":"Scientifica (Cairo)"},{"key":"2023052021585520300_ref53","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1186\/s12864-015-1808-6","article-title":"Distribution and diversity of ribosome binding sites in prokaryotic genomes","volume":"16","author":"Omotajo","year":"2015","journal-title":"BMC Genomics"},{"key":"2023052021585520300_ref54","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1186\/s12859-019-3039-3","article-title":"A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network","volume":"20","author":"Wen","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023052021585520300_ref55","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1007\/s00726-007-0016-3","article-title":"Combing ontologies and dipeptide composition for predicting DNA-binding proteins","volume":"34","author":"Nanni","year":"2008","journal-title":"Amino Acids"},{"key":"2023052021585520300_ref56","doi-asserted-by":"crossref","first-page":"3691","DOI":"10.1093\/bioinformatics\/btv421","article-title":"Roary: rapid large-scale prokaryote pan genome analysis","volume":"31","author":"Page","year":"2015","journal-title":"Bioinformatics"},{"key":"2023052021585520300_ref57","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/3\/bbad101\/50410196\/bbad101.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/3\/bbad101\/50410196\/bbad101.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,20]],"date-time":"2023-05-20T21:59:31Z","timestamp":1684619971000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad101\/7079710"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,17]]},"references-count":57,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad101","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,5]]},"published":{"date-parts":[[2023,3,17]]},"article-number":"bbad101"}}