{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T13:10:06Z","timestamp":1760793006930,"version":"build-2065373602"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:00:00Z","timestamp":1758672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 Research and Innovation Programme"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Machine learning tools have become increasingly common in biological research, driven by the emergence of pre-trained large language models. However, training effective models remains a complex task, since many choices influence their performance. AutoML (automated machine learning) approaches help address these challenges by streamlining the entire model development pipeline.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We developed aMLProt, an AutoML framework tailored specifically for protein applications, such as enzyme engineering and bioprospecting. It features a modular design, allowing each component to be used independently or in combination. Notably, aMLProt integrates 19 classifiers and 26 regressors, along with pre-trained protein language models. It also includes standalone applications proven useful for protein-related workflows. To enhance usability, aMLProt is integrated with Horus, a GUI-based application with a visual interface.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>aMLProt is available on https:\/\/github.com\/etiur\/aMLProt.git and https:\/\/doi.org\/10.5281\/zenodo.14971157; The aMLProt plugin is available via the official Horus Plugin Repository https:\/\/horus.bsc.es\/repo\/plugins\/amlprot, and Horus itself can be freely downloaded from https:\/\/horus.bsc.es. Moreover, a demo of aMLProt can be found, without previous registration or download, at the horus.bsc.es\/amlprot and horus.bsc.es\/amlprot-suggest. The results and data from the pH optima regression model are available at: https:\/\/zenodo.org\/records\/15394097.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf543","type":"journal-article","created":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:58:41Z","timestamp":1758628721000},"source":"Crossref","is-referenced-by-count":0,"title":["aMLProt: an automated machine learning library for protein applications"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3541-599X","authenticated-orcid":false,"given":"Ruite","family":"Xiang","sequence":"first","affiliation":[{"name":"Department of Life Sciences, Barcelona Supercomputing Center (BSC) , Barcelona 08034,","place":["Spain"]},{"name":"Facultat de Farm\u00e0cia i Ci\u00e8ncies de l\u2018Alimentaci\u00f3, Universitat de Barcelona , Barcelona 08028,","place":["Spain"]}]},{"given":"Christian","family":"Dom\u00ednguez-Dalmases","sequence":"additional","affiliation":[{"name":"Department of Life Sciences, Barcelona Supercomputing Center (BSC) , Barcelona 08034,","place":["Spain"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8120-888X","authenticated-orcid":false,"given":"Albert","family":"Ca\u00f1ellas-Sol\u00e9","sequence":"additional","affiliation":[{"name":"Department of Life Sciences, Barcelona Supercomputing Center (BSC) , Barcelona 08034,","place":["Spain"]},{"name":"Facultat de Farm\u00e0cia i Ci\u00e8ncies de l\u2018Alimentaci\u00f3, Universitat de Barcelona , Barcelona 08028,","place":["Spain"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4580-1114","authenticated-orcid":false,"given":"Victor","family":"Guallar","sequence":"additional","affiliation":[{"name":"Department of Life Sciences, Barcelona Supercomputing Center (BSC) , Barcelona 08034,","place":["Spain"]},{"name":"Catalan Institution for Research and Advanced Studies (ICREA) , Barcelona 08010,","place":["Spain"]}]}],"member":"286","published-online":{"date-parts":[[2025,9,24]]},"reference":[{"year":"2020","author":"Ali","key":"2025101808514373400_btaf543-B1"},{"year":"2024","author":"Chen","key":"2025101808514373400_btaf543-B2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2412.12154"},{"key":"2025101808514373400_btaf543-B3","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chen","year":"2016"},{"key":"2025101808514373400_btaf543-B4","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1093\/bioinformatics\/bty140","article-title":"iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"first-page":"1","year":"2024","author":"de Oliveira","key":"2025101808514373400_btaf543-B5"},{"year":"2022","author":"Feurer","key":"2025101808514373400_btaf543-B6","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2007.04074"},{"key":"2025101808514373400_btaf543-B7","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1093\/nar\/gkg015","article-title":"The Lipase Engineering Database: a navigation and analysis tool for protein families","volume":"31","author":"Fischer","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2025101808514373400_btaf543-B8","first-page":"716"},{"key":"2025101808514373400_btaf543-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s43586-022-00184-w","article-title":"Principal component analysis","volume":"2","author":"Greenacre","year":"2022","journal-title":"Nat Rev Methods Primers"},{"key":"2025101808514373400_btaf543-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-540-35488-8","volume-title":"Feature Extraction","author":"Guyon","year":"2006"},{"key":"2025101808514373400_btaf543-B11","doi-asserted-by":"crossref","first-page":"1323","DOI":"10.1093\/bioinformatics\/btw006","article-title":"MMseqs software suite for fast and deep clustering and searching of large protein sequence sets","volume":"32","author":"Hauser","year":"2016","journal-title":"Bioinformatics"},{"key":"2025101808514373400_btaf543-B12","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1038\/s41587-023-01763-2","article-title":"Efficient evolution of human antibodies from general protein language models","volume":"42","author":"Hie","year":"2024","journal-title":"Nat Biotechnol"},{"year":"2021","author":"Hopf","key":"2025101808514373400_btaf543-B13","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2111.12140"},{"year":"2021","author":"Hu","key":"2025101808514373400_btaf543-B14","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2106.09685"},{"key":"2025101808514373400_btaf543-B15","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1038\/s41580-019-0176-5","article-title":"Setting the standards for machine learning in biology","volume":"20","author":"Jones","year":"2019","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2025101808514373400_btaf543-B16","doi-asserted-by":"crossref","first-page":"11453","DOI":"10.1021\/acsomega.3c08036","article-title":"Evaluation and optimization methods for applicability domain methods and their hyperparameters, considering the prediction performance of machine learning models","volume":"9","author":"Kaneko","year":"2024","journal-title":"ACS Omega"},{"key":"2025101808514373400_btaf543-B17","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"year":"2024","author":"Liu","key":"2025101808514373400_btaf543-B18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.09353"},{"year":"2024","author":"Liu","key":"2025101808514373400_btaf543-B19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2411.04440"},{"key":"2025101808514373400_btaf543-B20","first-page":"29287","volume-title":"Advances in Neural Information Processing Systems","author":"Meier","year":"2021"},{"key":"2025101808514373400_btaf543-B21","doi-asserted-by":"crossref","first-page":"btae157","DOI":"10.1093\/bioinformatics\/btae157","article-title":"TemStaPro: protein thermostability prediction using sequence representations from protein language models","volume":"40","author":"Pud\u017eiuvelyt\u0117","year":"2024","journal-title":"Bioinformatics"},{"key":"2025101808514373400_btaf543-B22","doi-asserted-by":"crossref","first-page":"e1003571","DOI":"10.1371\/journal.pcbi.1003571","article-title":"rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids","volume":"10","author":"Ruiz-Carmona","year":"2014","journal-title":"PLOS Comput Biol"},{"key":"2025101808514373400_btaf543-B23","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Sch\u00e4ffer","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2025101808514373400_btaf543-B24","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025101808514373400_btaf543-B25","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1038\/s41587-021-01156-3","article-title":"SignalP 6.0 predicts all five types of signal peptides using protein language models","volume":"40","author":"Teufel","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025101808514373400_btaf543-B26","doi-asserted-by":"crossref","first-page":"W83","DOI":"10.1093\/nar\/gkae410","article-title":"The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update","volume":"52","author":"The Galaxy Community","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025101808514373400_btaf543-B27","doi-asserted-by":"crossref","first-page":"e0224365","DOI":"10.1371\/journal.pone.0224365","article-title":"Machine learning algorithm validation with a limited sample size","volume":"14","author":"Vabalas","year":"2019","journal-title":"PLoS ONE"},{"key":"2025101808514373400_btaf543-B28","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1016\/j.cels.2023.05.007","article-title":"BioAutoMATED: an end-to-end automated machine learning tool for explanation and design of biological sequences","volume":"14","author":"Valeri","year":"2023","journal-title":"Cell Syst"},{"key":"2025101808514373400_btaf543-B29","doi-asserted-by":"crossref","first-page":"2756","DOI":"10.1093\/bioinformatics\/btx302","article-title":"POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles","volume":"33","author":"Wang","year":"2017","journal-title":"Bioinformatics"},{"key":"2025101808514373400_btaf543-B30","doi-asserted-by":"crossref","first-page":"2736","DOI":"10.1038\/s41467-025-58038-4","article-title":"Robust enzyme discovery and engineering with deep learning using CataPro","volume":"16","author":"Wang","year":"2025","journal-title":"Nat Commun"},{"key":"2025101808514373400_btaf543-B31","doi-asserted-by":"crossref","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2025101808514373400_btaf543-B32","doi-asserted-by":"crossref","first-page":"1529","DOI":"10.3390\/biom12101529","article-title":"EP-Pred: a machine learning tool for bioprospecting promiscuous ester hydrolases","volume":"12","author":"Xiang","year":"2022","journal-title":"Biomolecules"},{"year":"2022","author":"Xu","key":"2025101808514373400_btaf543-B33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.02096"},{"key":"2025101808514373400_btaf543-B34","doi-asserted-by":"crossref","first-page":"3013","DOI":"10.1021\/acssynbio.4c00465","article-title":"Approaching optimal pH enzyme prediction with large language models","volume":"13","author":"Zaretckii","year":"2024","journal-title":"ACS Synth Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf543\/64373682\/btaf543.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf543\/64373682\/btaf543.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf543\/64373682\/btaf543.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T12:51:52Z","timestamp":1760791912000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf543\/8262956"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,9,24]]},"references-count":34,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf543","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,10]]},"published":{"date-parts":[[2025,9,24]]},"article-number":"btaf543"}}