{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T23:11:26Z","timestamp":1774307486511,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:00:00Z","timestamp":1758672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Renewable Energy Laboratory for the US Department of Energy","award":["DE-AC36-08GO28308"],"award-info":[{"award-number":["DE-AC36-08GO28308"]}]},{"name":"US Department of Energy Office of Science Biological and Environmental Research","award":["DE-SC0023278"],"award-info":[{"award-number":["DE-SC0023278"]}]},{"name":"US Department of Energy Office of Energy Efficiency and Renewable Energy Bioenergy Technologies Office"},{"DOI":"10.13039\/100016791","name":"Agile BioFoundry","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100016791","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research","award":["DE-SC0022024"],"award-info":[{"award-number":["DE-SC0022024"]}]},{"name":"US Government"},{"name":"Novo Nordisk Foundation centre PRISM","award":["NNF18OC0033950"],"award-info":[{"award-number":["NNF18OC0033950"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>Protein property prediction via machine learning with and without labeled data is becoming increasingly powerful, yet methods are disparate and capabilities vary widely over applications. The software presented here, \u201cArtificial Intelligence Driven protein Estimation (AIDE)\u201d, enables instantiating, optimizing, and testing many zero-shot and supervised property prediction methods for variants and variable length homologs in a single, reproducible notebook or script by defining a modular, standardized application programming interface (API), i.e. drop-in compatible with scikit-learn transformers and pipelines.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>AIDE is an installable, importable python package inheriting from scikit-learn classes and API and is installable on Windows, Mac, and Linux. Many of the wrapped models internal to AIDE will be effectively inaccessible without a GPU, and some assume CUDA. The newest stable, tested version can be found at https:\/\/github.com\/beckham-lab\/aide_predict and a full user guide and API reference can be found at https:\/\/beckham-lab.github.io\/aide_predict\/. Static versions of both at the time of writing can be found on Zenodo.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf544","type":"journal-article","created":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T12:16:35Z","timestamp":1758716195000},"source":"Crossref","is-referenced-by-count":1,"title":["Accessible, uniform protein property prediction with a scikit-learn based toolset AIDE"],"prefix":"10.1093","volume":"41","author":[{"given":"Evan","family":"Komp","sequence":"first","affiliation":[{"name":"Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory , Golden Colorado, CO 80401,","place":["United States"]},{"name":"Agile BioFoundry , Emeryville, CA 94608,","place":["United States"]}]},{"given":"Kristoffer E","family":"Johansson","sequence":"additional","affiliation":[{"name":"Linderstr\u00f8m-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology, University of Copenhagen , Copenhagen,","place":["Denmark"]}]},{"given":"Nicholas P","family":"Gauthier","sequence":"additional","affiliation":[{"name":"Department of Systems Biology, Harvard Medical School , Boston, MA 02115,","place":["United States"]},{"name":"Department of Data Sciences, Dana-Farber Cancer Institute , Boston, MA 02115,","place":["United States"]}]},{"given":"Japheth E","family":"Gado","sequence":"additional","affiliation":[{"name":"Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory , Golden Colorado, CO 80401,","place":["United States"]}]},{"given":"Kresten","family":"Lindorff-Larsen","sequence":"additional","affiliation":[{"name":"Linderstr\u00f8m-Lang Centre for Protein Science, Section for Biomolecular Sciences, Department of Biology, University of Copenhagen , Copenhagen,","place":["Denmark"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3480-212X","authenticated-orcid":false,"given":"Gregg T","family":"Beckham","sequence":"additional","affiliation":[{"name":"Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory , Golden Colorado, CO 80401,","place":["United States"]},{"name":"Agile BioFoundry , Emeryville, CA 94608,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,9,24]]},"reference":[{"key":"2025102511212723400_btaf544-B1","doi-asserted-by":"crossref","first-page":"1514","DOI":"10.1038\/s41592-024-02272-z","article-title":"OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization","volume":"21","author":"Ahdritz","year":"2024","journal-title":"Nat Methods"},{"key":"2025102511212723400_btaf544-B2","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1146\/annurev-ecolsys-102320-112153","article-title":"Epistasis and adaptation on fitness landscapes","volume":"53","author":"Bank","year":"2022","journal-title":"Annu Rev Ecol Evol Syst"},{"key":"2025102511212723400_btaf544-B3","doi-asserted-by":"crossref","first-page":"9646","DOI":"10.1038\/s41467-024-53982-z","article-title":"SSEmb: a joint embedding of protein sequence and structure enables robust variant effect predictions","volume":"15","author":"Blaabjerg","year":"2024","journal-title":"Nat Commun"},{"key":"2025102511212723400_btaf544-B4","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1002\/prot.21104","article-title":"Physicochemical descriptors to discriminate protein\u2013protein interactions in permanent and transient complexes selected by means of machine learning algorithms","volume":"65","author":"Block","year":"2006","journal-title":"Proteins Struct Funct Bioinf"},{"key":"2025102511212723400_btaf544-B5","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/S0164-1212(01)00136-4","article-title":"Fundamental principles of software engineering\u2014a journey","volume":"62","author":"Bourque","year":"2002","journal-title":"J Syst Softw"},{"key":"2025102511212723400_btaf544-B6","doi-asserted-by":"crossref","first-page":"W434","DOI":"10.1093\/nar\/gkac351","article-title":"iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets","volume":"50","author":"Chen","year":"2022","journal-title":"Nucleic Acids Research"},{"key":"2025102511212723400_btaf544-B7","doi-asserted-by":"crossref","first-page":"e113","DOI":"10.1002\/cpz1.113","article-title":"Learned embeddings from deep learning to visualize and predict protein sets","volume":"1","author":"Dallago","year":"2021","journal-title":"Curr Protoc"},{"key":"2025102511212723400_btaf544-B8","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1126\/science.add2187","article-title":"Robust deep learning-based protein sequence design using ProteinMPNN","volume":"378","author":"Dauparas","year":"2022","journal-title":"Science"},{"key":"2025102511212723400_btaf544-B9","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1038\/s41467-024-45621-4","article-title":"Protein design using structure-based residue preferences","volume":"15","author":"Ding","year":"2024","journal-title":"Nat Commun"},{"key":"2025102511212723400_btaf544-B10","doi-asserted-by":"crossref","first-page":"2411","DOI":"10.1038\/s41467-023-38039-x","article-title":"Engineering protein-based therapeutics through structural and chemical design","volume":"14","author":"Ebrahimi","year":"2023","journal-title":"Nat Commun"},{"key":"2025102511212723400_btaf544-B11","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLOS Comput Biol"},{"key":"2025102511212723400_btaf544-B12","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/s41929-021-00648-4","article-title":"Chemical and biological catalysis for plastics recycling and upcycling","volume":"4","author":"Ellis","year":"2021","journal-title":"Nat Catal"},{"key":"2025102511212723400_btaf544-B13","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025102511212723400_btaf544-B14","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1016\/j.csbj.2022.11.014","article-title":"From sequence to function through structure: deep learning for protein design","volume":"21","author":"Ferruz","year":"2023","journal-title":"Comput Struct Biotechnol J"},{"key":"2025102511212723400_btaf544-B15","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt1286","article-title":"Improving catalytic function by ProSAR-driven enzyme evolution","volume":"25","author":"Fox","year":"2007","journal-title":"Nat Biotechnol"},{"key":"2025102511212723400_btaf544-B16","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1038\/s41586-021-04043-8","article-title":"Disease variant prediction with deep generative models of evolutionary data","volume":"599","author":"Frazer","year":"2021","journal-title":"Nature"},{"key":"2025102511212723400_btaf544-B17","author":"Funk","year":"2024"},{"key":"2025102511212723400_btaf544-B18","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1038\/s42256-025-01026-6","article-title":"Machine learning prediction of enzyme optimum ph","volume":"7","author":"Gado","year":"2025","journal-title":"Nat Mach Intell"},{"key":"2025102511212723400_btaf544-B19","author":"Gomez-Uribe","year":"2024"},{"key":"2025102511212723400_btaf544-B20","doi-asserted-by":"publisher","first-page":"850","DOI":"10.1126\/science.ads0018","article-title":"Simulating 500 million years of evolution with a language model","volume":"387","author":"Hayes","year":"2025","journal-title":"Science"},{"key":"2025102511212723400_btaf544-B21","doi-asserted-by":"crossref","first-page":"1582","DOI":"10.1093\/bioinformatics\/bty862","article-title":"The EVcouplings Python framework for coevolutionary sequence analysis","volume":"35","author":"Hopf","year":"2019","journal-title":"Bioinformatics"},{"key":"2025102511212723400_btaf544-B22","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation effects predicted from sequence co-variation","volume":"35","author":"Hopf","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025102511212723400_btaf544-B222","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.1038\/s41587-021-01146-5","article-title":"Learning protein fitness models from evolutionary and assay-labeled data","volume":"40","author":"Hsu","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025102511212723400_btaf544-B23","doi-asserted-by":"crossref","first-page":"e2400439121","DOI":"10.1073\/pnas.2400439121","article-title":"A combinatorially complete epistatic fitness landscape in an enzyme active site","volume":"121","author":"Johnston","year":"2024","journal-title":"Proc Natl Acad Sci"},{"key":"2025102511212723400_btaf544-B24","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1038\/s41597-023-02553-w","article-title":"Homologous pairs of low and high temperature originating proteins spanning the known prokaryotic universe","volume":"10","author":"Komp","year":"2023","journal-title":"Sci Data"},{"key":"2025102511212723400_btaf544-B25","unstructured":"Komp E, Beckham GT. \u00a0beckham-lab\/aide_predict. \u00a02025. 10.5281\/zenodo.16986183"},{"key":"2025102511212723400_btaf544-B26","doi-asserted-by":"crossref","first-page":"124617","DOI":"10.1016\/j.biortech.2020.124617","article-title":"Design of novel enzyme biocatalysts for industrial bioprocess: harnessing the power of protein engineering, high throughput screening and synthetic biology","volume":"325","author":"Madhavan","year":"2021","journal-title":"Bioresour Technol"},{"key":"2025102511212723400_btaf544-B27","doi-asserted-by":"crossref","first-page":"1629","DOI":"10.1007\/s00439-021-02411-y","article-title":"Embeddings from protein language models predict conservation and variant effects","volume":"141","author":"Marquet","year":"2022","journal-title":"Hum Genet"},{"key":"2025102511212723400_btaf544-B28","doi-asserted-by":"publisher","author":"Meier","year":"2021","DOI":"10.1101\/2021.07.09.450648"},{"key":"2025102511212723400_btaf544-B29","unstructured":"Norton-Baker B, Komp E, Gado J \u00a0et al \u00a0Activity across temperature and pH of PET hydrolase candidates. [Data set]. Zenodo. 2025. 10.5281\/zenodo.15417757"},{"key":"2025102511212723400_btaf544-B30","doi-asserted-by":"crossref","first-page":"16070","DOI":"10.1021\/acscatal.5c03460","article-title":"Machine learning-guided identification of PET hydrolases from natural diversity","volume":"15","author":"Norton-Baker","year":"2025","journal-title":"ACS Catal"},{"key":"2025102511212723400_btaf544-B31","doi-asserted-by":"publisher","author":"Notin","year":"2022","DOI":"10.48550\/arXiv.2205.13760"},{"key":"2025102511212723400_btaf544-B32","author":"Notin","year":"2023"},{"key":"2025102511212723400_btaf544-B33","author":"Notin","year":"2022"},{"key":"2025102511212723400_btaf544-B34","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1038\/s41587-024-02127-0","article-title":"Machine learning for functional protein design","volume":"42","author":"Notin","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025102511212723400_btaf544-B35","author":"Notin","year":"2023"},{"key":"2025102511212723400_btaf544-B36","author":"Park","year":"2024"},{"key":"2025102511212723400_btaf544-B37","doi-asserted-by":"publisher","author":"Paszke","year":"2019","DOI":"10.48550\/arXiv.1912.01703"},{"key":"2025102511212723400_btaf544-B38","author":"Pedregosa","year":"2011"},{"key":"2025102511212723400_btaf544-B39","doi-asserted-by":"crossref","first-page":"btae157","DOI":"10.1093\/bioinformatics\/btae157","article-title":"TemStaPro: protein thermostability prediction using sequence representations from protein language models","volume":"40","author":"Pud\u017eiuvelyt\u0117","year":"2024","journal-title":"Bioinformatics"},{"key":"2025102511212723400_btaf544-B40","author":"Rao","year":"2021"},{"key":"2025102511212723400_btaf544-B41","doi-asserted-by":"crossref","first-page":"E193","DOI":"10.1073\/pnas.1215251110","article-title":"Navigating the protein fitness landscape with Gaussian processes","volume":"110","author":"Romero","year":"2013","journal-title":"Proc Natl Acad Sci"},{"key":"2025102511212723400_btaf544-B42","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1016\/j.neucom.2021.07.102","article-title":"ProPythia: a python package for protein classification based on machine and deep learning","volume":"484","author":"Sequeira","year":"2022","journal-title":"Neurocomputing (Amst)"},{"key":"2025102511212723400_btaf544-B43","doi-asserted-by":"crossref","first-page":"3463","DOI":"10.1021\/acs.jcim.1c00099","article-title":"PyPEF\u2014an integrated framework for data-driven protein engineering","volume":"61","author":"Siedhoff","year":"2021","journal-title":"J Chem Inf Model"},{"key":"2025102511212723400_btaf544-B44","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025102511212723400_btaf544-B45","author":"Su","year":"2023"},{"key":"2025102511212723400_btaf544-B46","author":"Su","year":"2024"},{"key":"2025102511212723400_btaf544-B47","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1002\/bab.2117","article-title":"Enzyme engineering and its industrial applications","volume":"69","author":"Victorino de Silva Amatto","year":"2022","journal-title":"Biotechnol Appl Biochem"},{"key":"2025102511212723400_btaf544-B48","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1016\/j.cels.2021.07.008","article-title":"Informed training set design enables efficient machine learning-assisted directed protein evolution","volume":"12","author":"Wittmann","year":"2021","journal-title":"Cell Syst"},{"key":"2025102511212723400_btaf544-B49","doi-asserted-by":"crossref","first-page":"8852","DOI":"10.1073\/pnas.1901979116","article-title":"Machine learning-assisted directed protein evolution with combinatorial libraries","volume":"116","author":"Wu","year":"2019","journal-title":"Proc Natl Acad Sci U S A"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf544\/64373661\/btaf544.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf544\/64373661\/btaf544.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf544\/64373661\/btaf544.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T15:21:39Z","timestamp":1761405699000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf544\/8262952"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,9,24]]},"references-count":50,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf544","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,10]]},"published":{"date-parts":[[2025,9,24]]},"article-number":"btaf544"}}