{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:35Z","timestamp":1772138075266,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T00:00:00Z","timestamp":1747267200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001665","name":"French National Research Agency","doi-asserted-by":"publisher","award":["ANR-21-ESRE-0021"],"award-info":[{"award-number":["ANR-21-ESRE-0021"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,6,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The analysis of enzyme active sites is essential for understanding their activity in terms of catalyzed reaction and substrate specificity, providing insights for engineering to obtain targeted properties or modify the substrate scope. In 2010, a first version of the Active Site Modeling and Clustering (ASMC) workflow was published. ASMC predicts isofunctional clusters from enzyme families, based on structural modeling and clustering of active sites. Since then, structure- and sequence-based methods have developed considerably.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present here a redesign of the ASMC workflow. This new major version includes recent pocket prediction, structural alignment and clustering methods, as well as a refined amino acid distance matrix, thereby improving the relevance of results and reducing the need for laborious manual analysis to obtain relevant clusters. In addition, we have implemented multiple sequence alignment as a possible input for the clustering step, along with an additional script to compare 2D and 3D active sites. Finally, the code has been unified from three to one programming language (Python) to facilitate its installation and maintenance. This new version of ASMC was evaluated on a set of protein families, resulting in overall better performances compared to its original version.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>ASMC is supported on Linux operating system and freely available at https:\/\/github.com\/labgem\/ASMC, along with a complete documentation (wiki, tutorial).<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf307","type":"journal-article","created":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T08:18:14Z","timestamp":1747210694000},"source":"Crossref","is-referenced-by-count":0,"title":["ASMC: investigating the amino acid diversity of enzyme active sites"],"prefix":"10.1093","volume":"41","author":[{"given":"Thomas","family":"Bailly","sequence":"first","affiliation":[{"name":"LABGeM, G\u00e9nomique M\u00e9tabolique, Genoscope, Institut Fran\u00e7ois Jacob, CEA, CNRS, Universit\u00e9 Evry, Universit\u00e9 Paris-Saclay , 91057 Evry,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2971-5782","authenticated-orcid":false,"given":"Eddy","family":"Elis\u00e9e","sequence":"additional","affiliation":[{"name":"LABGeM, G\u00e9nomique M\u00e9tabolique, Genoscope, Institut Fran\u00e7ois Jacob, CEA, CNRS, Universit\u00e9 Evry, Universit\u00e9 Paris-Saclay , 91057 Evry,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6648-0332","authenticated-orcid":false,"given":"David","family":"Vallenet","sequence":"additional","affiliation":[{"name":"LABGeM, G\u00e9nomique M\u00e9tabolique, Genoscope, Institut Fran\u00e7ois Jacob, CEA, CNRS, Universit\u00e9 Evry, Universit\u00e9 Paris-Saclay , 91057 Evry,","place":["France"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,15]]},"reference":[{"key":"2025070408320879200_btaf307-B1","doi-asserted-by":"crossref","first-page":"16587","DOI":"10.1038\/s41598-018-34795-9","article-title":"Structural studies based on two lysine dioxygenases with distinct regioselectivity brings insights into enzyme specificity within the clavaminate synthase-like family","volume":"8","author":"Bastard","year":"2018","journal-title":"Sci Rep"},{"key":"2025070408320879200_btaf307-B2","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1038\/nchembio.2397","article-title":"Parallel evolution of non-homologous isofunctional enzymes in methionine biosynthesis","volume":"13","author":"Bastard","year":"2017","journal-title":"Nat Chem Biol"},{"key":"2025070408320879200_btaf307-B3","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1038\/nchembio.1387","article-title":"Revealing the hidden functional diversity of an enzyme family","volume":"10","author":"Bastard","year":"2014","journal-title":"Nat Chem Biol"},{"key":"2025070408320879200_btaf307-B4","doi-asserted-by":"crossref","first-page":"e160","DOI":"10.1371\/journal.pcbi.0030160","article-title":"Automated protein subfamily identification and classification","volume":"3","author":"Brown","year":"2007","journal-title":"PLoS Comput Biol"},{"key":"2025070408320879200_btaf307-B5","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1186\/s12864-023-09677-8","article-title":"plotnineSeqSuite: a Python package for visualizing sequence data using ggplot2 style","volume":"24","author":"Cao","year":"2023","journal-title":"BMC Genomics"},{"key":"2025070408320879200_btaf307-B6","doi-asserted-by":"crossref","first-page":"W148","DOI":"10.1093\/nar\/gkv488","article-title":"CATH FunFHMMer web server: protein functional annotations using functional family assignments","volume":"43","author":"Das","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025070408320879200_btaf307-B7","doi-asserted-by":"crossref","first-page":"e1005001","DOI":"10.1371\/journal.pcbi.1005001","article-title":"Isofunctional protein subfamily detection using data integration and spectral clustering","volume":"12","author":"de Lima","year":"2016","journal-title":"PLoS Comput. Biol"},{"key":"2025070408320879200_btaf307-B8","doi-asserted-by":"crossref","first-page":"3075","DOI":"10.1093\/bioinformatics\/btq595","article-title":"Identification of subfamily-specific sites based on active sites modeling and clustering","volume":"26","author":"de Melo-Minardi","year":"2010","journal-title":"Bioinformatics"},{"key":"2025070408320879200_btaf307-B9","doi-asserted-by":"crossref","first-page":"4933","DOI":"10.1038\/s41467-024-49009-2","article-title":"A refined picture of the native amine dehydrogenase family revealed by extensive biodiversity screening","volume":"15","author":"Elis\u00e9e","year":"2024","journal-title":"Nat Commun"},{"key":"2025070408320879200_btaf307-B10","first-page":"226","article-title":"A density-based algorithm for discovering clusters in large spatial databases with noise","volume":"1","author":"Ester","year":"1996","journal-title":"Knowledge Discov Data Min"},{"key":"2025070408320879200_btaf307-B11","doi-asserted-by":"crossref","first-page":"8.10.1","DOI":"10.1002\/0471250953.bi0810s14","article-title":"Active site profiling to identify protein functional sites in sequences and structures using the deacon active site profiler (DASP)","volume":"14","author":"Fetrow","year":"2006","journal-title":"Curr Protoc Bioinform"},{"key":"2025070408320879200_btaf307-B0232675","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/BF00114265","article-title":"Knowledge acquisition via incremental conceptual clustering","volume":"2","author":"Fisher","year":"1987","journal-title":"Mach Learn"},{"key":"2025070408320879200_btaf307-B12","doi-asserted-by":"crossref","first-page":"e1005284","DOI":"10.1371\/journal.pcbi.1005284","article-title":"An atlas of peroxiredoxins created using an active site profile-based approach to functionally relevant clustering of proteins","volume":"13","author":"Harper","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2025070408320879200_btaf307-B13","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1002\/pro.3112","article-title":"An approach to functionally relevant clustering of the protein universe: active site profile-based clustering of protein structures and sequences","volume":"26","author":"Knutson","year":"2017","journal-title":"Protein Sci"},{"key":"2025070408320879200_btaf307-B14","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1186\/s13321-018-0285-8","article-title":"P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure","volume":"10","author":"Kriv\u00e1k","year":"2018","journal-title":"J Cheminform"},{"key":"2025070408320879200_btaf307-B15","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1093\/nar\/gkp1049","article-title":"GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains","volume":"38","author":"Lee","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2025070408320879200_btaf307-B16","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025070408320879200_btaf307-B17","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1093\/protein\/gzp040","article-title":"Alignment of multiple protein structures based on sequence and structure features","volume":"22","author":"Madhusudhan","year":"2009","journal-title":"Protein Eng Des Sel"},{"key":"2025070408320879200_btaf307-B18","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1038\/s41929-019-0249-z","article-title":"A family of native amine dehydrogenases for the asymmetric reductive amination of ketones","volume":"2","author":"Mayol","year":"2019","journal-title":"Nat Catal"},{"key":"2025070408320879200_btaf307-B8544311","doi-asserted-by":"publisher","first-page":"e0291801","DOI":"10.1371\/journal.pone.0291801","article-title":"AutoPhy: Automated phylogenetic identification of novel protein subfamilies","volume":"19","author":"Ortiz-Velez","year":"2024","journal-title":"PLoS One"},{"key":"2025070408320879200_btaf307-B19","doi-asserted-by":"crossref","first-page":"D368","DOI":"10.1093\/nar\/gkad1011","article-title":"AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences","volume":"52","author":"Varadi","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025070408320879200_btaf307-B20","doi-asserted-by":"crossref","first-page":"5.6.1","DOI":"10.1002\/cpbi.3","article-title":"Comparative protein structure modeling using MODELLER","volume":"54","author":"Webb","year":"2016","journal-title":"Curr. Protoc Bioinform"},{"key":"2025070408320879200_btaf307-B21","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1038\/s41592-022-01585-1","article-title":"US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes","volume":"19","author":"Zhang","year":"2022","journal-title":"Nat Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf307\/63196704\/btaf307.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf307\/63196704\/btaf307.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf307\/63196704\/btaf307.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T08:32:22Z","timestamp":1751617942000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf307\/8132948"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,5,15]]},"references-count":23,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf307","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2025.04.07.647545","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,6]]},"published":{"date-parts":[[2025,5,15]]},"article-number":"btaf307"}}