{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:32Z","timestamp":1772138072291,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T00:00:00Z","timestamp":1741305600000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["KAKENHI JP16H06279"],"award-info":[{"award-number":["KAKENHI JP16H06279"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene clusters and reaction class. DeepES uses protein sequences as inputs and evaluates whether the input genes contain biosynthetic gene clusters of interest by integrating the outputs of the binary classifier for each reaction class. The validation results suggested that DeepES can capture functional similarity between protein sequences, and it can be implemented to explore orphan enzyme genes. By applying DeepES to 4744 metagenome-assembled genomes, we identified candidate genes for 236 orphan enzymes, including those involved in short-chain fatty acid production as a characteristic pathway in human gut bacteria.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>DeepES is available at https:\/\/github.com\/yamada-lab\/DeepES. Model weights and the candidate genes are available at Zenodo (https:\/\/doi.org\/10.5281\/zenodo.11123900).<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf053","type":"journal-article","created":{"date-parts":[[2025,2,5]],"date-time":"2025-02-05T22:42:16Z","timestamp":1738795336000},"source":"Crossref","is-referenced-by-count":3,"title":["DeepES: deep learning-based enzyme screening to identify orphan enzyme genes"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-6697-2355","authenticated-orcid":false,"given":"Keisuke","family":"Hirota","sequence":"first","affiliation":[{"name":"School of Life Science and Technology, Institute of Science Tokyo , Tokyo, 152-8550,","place":["Japan"]}]},{"given":"Felix","family":"Salim","sequence":"additional","affiliation":[{"name":"School of Life Science and Technology, Institute of Science Tokyo , Tokyo, 152-8550,","place":["Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9622-1849","authenticated-orcid":false,"given":"Takuji","family":"Yamada","sequence":"additional","affiliation":[{"name":"School of Life Science and Technology, Institute of Science Tokyo , Tokyo, 152-8550,","place":["Japan"]},{"name":"Metagen, Inc. , Yamagata, 997-0052,","place":["Japan"]},{"name":"Metagen Therapeutics, Inc. , Yamagata, 997-0052,","place":["Japan"]},{"name":"digzyme, Inc. , Tokyo, 105-0001,","place":["Japan"]}]}],"member":"286","published-online":{"date-parts":[[2025,2,6]]},"reference":[{"key":"2025030711564358600_btaf053-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2025030711564358600_btaf053-B2","doi-asserted-by":"crossref","first-page":"btad620","DOI":"10.1093\/bioinformatics\/btad620","article-title":"Predicting enzymatic function of protein sequences with attention","volume":"39","author":"Buton","year":"2023","journal-title":"Bioinformatics"},{"key":"2025030711564358600_btaf053-B3","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1093\/bioinformatics\/btab198","article-title":"TALE: transformer-based protein function annotation with joint sequence\u2013label embedding","volume":"37","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2025030711564358600_btaf053-B4","doi-asserted-by":"crossref","first-page":"D498","DOI":"10.1093\/nar\/gkaa1025","article-title":"BRENDA, the ELIXIR core data resource in 2021: new developments and updates","volume":"49","author":"Chang","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030711564358600_btaf053-B5","doi-asserted-by":"crossref","first-page":"1028","DOI":"10.1016\/j.foodchem.2017.08.003","article-title":"Absorption and metabolism of yerba mate phenolic compounds in humans","volume":"240","author":"G\u00f3mez-Juaristi","year":"2018","journal-title":"Food Chem"},{"key":"2025030711564358600_btaf053-B6","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025030711564358600_btaf053-B7","doi-asserted-by":"crossref","first-page":"D545","DOI":"10.1093\/nar\/gkaa970","article-title":"KEGG: integrating viruses and cellular organisms","volume":"49","author":"Kanehisa","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030711564358600_btaf053-B8","doi-asserted-by":"crossref","first-page":"D490","DOI":"10.1093\/nar\/gkaa812","article-title":"BiG-FAM: the biosynthetic gene cluster families database","volume":"49","author":"Kautsar","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030711564358600_btaf053-B9","doi-asserted-by":"crossref","first-page":"7370","DOI":"10.1038\/s41467-023-43216-z","article-title":"Functional annotation of enzyme-encoding genes using deep learning with transformer layers","volume":"14","author":"Kim","year":"2023","journal-title":"Nat Commun"},{"key":"2025030711564358600_btaf053-B10","doi-asserted-by":"crossref","first-page":"16487","DOI":"10.1021\/ja0466457","article-title":"Computational assignment of the EC numbers for genomic-scale analysis of enzymatic reactions","volume":"126","author":"Kotera","year":"2004","journal-title":"J Am Chem Soc"},{"key":"2025030711564358600_btaf053-B11","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1126\/science.307.5706.42a","article-title":"Orphan enzymes?","volume":"307","author":"Lespinet","year":"2005","journal-title":"Science"},{"key":"2025030711564358600_btaf053-B12","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025030711564358600_btaf053-B13","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1038\/s41598-020-80786-0","article-title":"Embeddings from deep learning transfer GO annotations beyond homology","volume":"11","author":"Littmann","year":"2021","journal-title":"Sci Rep"},{"key":"2025030711564358600_btaf053-B14","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.gpb.2022.07.002","article-title":"Gut microbiome in colorectal cancer: clinical diagnosis and treatment","volume":"21","author":"Liu","year":"2023","journal-title":"Genom Proteom Bioinform"},{"key":"2025030711564358600_btaf053-B15","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1038\/nchembio.1890","article-title":"Minimum information about a biosynthetic gene cluster","volume":"11","author":"Medema","year":"2015","journal-title":"Nat Chem Biol"},{"key":"2025030711564358600_btaf053-B16","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1021\/acs.jcim.5b00216","article-title":"Identification of enzyme genes using chemical structure alignments of substrate\u2013product pairs","volume":"56","author":"Moriya","year":"2016","journal-title":"J Chem Inf Model"},{"key":"2025030711564358600_btaf053-B17","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1021\/ci3005379","article-title":"Modular architecture of metabolic pathways revealed by conserved sequences of reactions","volume":"53","author":"Muto","year":"2013","journal-title":"J Chem Inf Model"},{"key":"2025030711564358600_btaf053-B18","doi-asserted-by":"publisher","first-page":"1486","DOI":"10.3389\/fimmu.2019.00277","article-title":"Short chain fatty acids (SCFAs)-mediated gut epithelial and immune regulation and its relevance for inflammatory bowel diseases","volume":"10","author":"Parada Venegas","year":"2019","journal-title":"Front Immunol"},{"key":"2025030711564358600_btaf053-B19","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1186\/1471-2105-8-244","article-title":"A survey of orphan enzyme activities","volume":"8","author":"Pouliot","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2025030711564358600_btaf053-B20","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1186\/s12885-021-09054-2","article-title":"Gut microbiome and its role in colorectal cancer","volume":"21","author":"Rebersek","year":"2021","journal-title":"BMC Cancer"},{"key":"2025030711564358600_btaf053-B21","doi-asserted-by":"crossref","first-page":"D753","DOI":"10.1093\/nar\/gkac1080","article-title":"MGnify: the microbiome sequence data analysis resource in 2023","volume":"51","author":"Richardson","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025030711564358600_btaf053-B22","doi-asserted-by":"crossref","first-page":"13996","DOI":"10.1073\/pnas.1821905116","article-title":"Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers","volume":"116","author":"Ryu","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025030711564358600_btaf053-B23","doi-asserted-by":"crossref","first-page":"e80942","DOI":"10.7554\/eLife.80942","article-title":"ProteInfer, deep neural networks for protein functional inference","volume":"12","author":"Sanderson","year":"2023","journal-title":"Elife"},{"key":"2025030711564358600_btaf053-B24","doi-asserted-by":"crossref","first-page":"bbae419","DOI":"10.1093\/bib\/bbae419","article-title":"Enteropathway: the metabolic pathway database for the human gut microbiota","volume":"25","author":"Shiroma","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025030711564358600_btaf053-B25","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/1745-6150-9-10","article-title":"Profiling the orphan enzymes","volume":"9","author":"Sorokina","year":"2014","journal-title":"Biol Direct"},{"key":"2025030711564358600_btaf053-B26","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025030711564358600_btaf053-B27","doi-asserted-by":"crossref","first-page":"7344","DOI":"10.1038\/s41598-019-43708-3","article-title":"DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks","volume":"9","author":"Sureyya Rifaioglu","year":"2019","journal-title":"Sci Rep"},{"key":"2025030711564358600_btaf053-B28","doi-asserted-by":"crossref","first-page":"7292","DOI":"10.1128\/JB.187.21.7292-7308.2005","article-title":"Whole-genome sequencing of Staphylococcus haemolyticus uncovers the extreme plasticity of its genome and the evolution of human-colonizing staphylococcal species","volume":"187","author":"Takeuchi","year":"2005","journal-title":"J Bacteriol"},{"key":"2025030711564358600_btaf053-B29","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/msb.2012.13","article-title":"Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours","volume":"8","author":"Yamada","year":"2012","journal-title":"Mol Syst Biol"},{"key":"2025030711564358600_btaf053-B30","doi-asserted-by":"crossref","first-page":"1358","DOI":"10.1126\/science.adf2465","article-title":"Enzyme function prediction using contrastive learning","volume":"379","author":"Yu","year":"2023","journal-title":"Science"},{"key":"2025030711564358600_btaf053-B31","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/j.gce.2022.10.003","article-title":"Enzyme annotation for orphan reactions and its applications in biomanufacturing","volume":"4","author":"Zhang","year":"2023","journal-title":"Green Chem Eng"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf053\/61772428\/btaf053.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/3\/btaf053\/62327836\/btaf053.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/3\/btaf053\/62327836\/btaf053.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T06:57:52Z","timestamp":1741330672000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf053\/8002965"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,2,6]]},"references-count":31,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf053","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.05.09.592857","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2025,2,6]]},"article-number":"btaf053"}}