{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T07:03:45Z","timestamp":1775891025818,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,1,8]],"date-time":"2021-01-08T00:00:00Z","timestamp":1610064000000},"content-version":"vor","delay-in-days":7,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Czech Ministry of Education","award":["857560"],"award-info":[{"award-number":["857560"]}]},{"name":"Czech Ministry of Education","award":["02.1.01\/0.0\/0.0\/18_046\/0015975"],"award-info":[{"award-number":["02.1.01\/0.0\/0.0\/18_046\/0015975"]}]},{"name":"Czech Ministry of Education","award":["CZ.02.1.01\/0.0\/0.0\/16_026\/0008451"],"award-info":[{"award-number":["CZ.02.1.01\/0.0\/0.0\/16_026\/0008451"]}]},{"name":"LQ1602"},{"name":"Czech Grant Agency","award":["20-15915Y"],"award-info":[{"award-number":["20-15915Y"]}]},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["857560"],"award-info":[{"award-number":["857560"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["720776"],"award-info":[{"award-number":["720776"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["814418"],"award-info":[{"award-number":["814418"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"name":"AI Methods for Cybersecurity and Control Systems project"},{"DOI":"10.13039\/501100004585","name":"Brno University of Technology","doi-asserted-by":"publisher","award":["FIT-S-20-6293"],"award-info":[{"award-number":["FIT-S-20-6293"]}],"id":[{"id":"10.13039\/501100004585","id-type":"DOI","asserted-by":"publisher"}]},{"name":"e-Infrastruktura CZ","award":["e-INFRA LM2018140"],"award-info":[{"award-number":["e-INFRA LM2018140"]}]},{"name":"ELIXIR-CZ","award":["LM2018131"],"award-info":[{"award-number":["LM2018131"]}]},{"name":"Czech Ministry of Education"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt\u2019s accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https:\/\/loschmidt.chemi.muni.cz\/soluprot\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>https:\/\/loschmidt.chemi.muni.cz\/soluprot\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa1102","type":"journal-article","created":{"date-parts":[[2020,12,28]],"date-time":"2020-12-28T07:41:22Z","timestamp":1609141282000},"page":"23-28","source":"Crossref","is-referenced-by-count":207,"title":["SoluProt: prediction of soluble protein expression in\n                    <i>Escherichia coli<\/i>"],"prefix":"10.1093","volume":"37","author":[{"given":"Jiri","family":"Hon","sequence":"first","affiliation":[{"name":"Loschmidt Laboratories, Centre for Toxic Compounds in the Environment RECETOX and Department of Experimental Biology, Faculty of Science, Masaryk University , Brno 625 00, Czech Republic"},{"name":"International Clinical Research Center, St. Anne\u2019s University Hospital Brno , Brno 656 91, Czech Republic"},{"name":"IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology , Brno 612 66, Czech Republic"}]},{"given":"Martin","family":"Marusiak","sequence":"additional","affiliation":[{"name":"IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology , Brno 612 66, Czech Republic"}]},{"given":"Tomas","family":"Martinek","sequence":"additional","affiliation":[{"name":"IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology , Brno 612 66, Czech Republic"}]},{"given":"Antonin","family":"Kunka","sequence":"additional","affiliation":[{"name":"Loschmidt Laboratories, Centre for Toxic Compounds in the Environment RECETOX and Department of Experimental Biology, Faculty of Science, Masaryk University , Brno 625 00, Czech Republic"},{"name":"International Clinical Research Center, St. Anne\u2019s University Hospital Brno , Brno 656 91, Czech Republic"}]},{"given":"Jaroslav","family":"Zendulka","sequence":"additional","affiliation":[{"name":"IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology , Brno 612 66, Czech Republic"}]},{"given":"David","family":"Bednar","sequence":"additional","affiliation":[{"name":"Loschmidt Laboratories, Centre for Toxic Compounds in the Environment RECETOX and Department of Experimental Biology, Faculty of Science, Masaryk University , Brno 625 00, Czech Republic"},{"name":"International Clinical Research Center, St. Anne\u2019s University Hospital Brno , Brno 656 91, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7848-8216","authenticated-orcid":false,"given":"Jiri","family":"Damborsky","sequence":"additional","affiliation":[{"name":"Loschmidt Laboratories, Centre for Toxic Compounds in the Environment RECETOX and Department of Experimental Biology, Faculty of Science, Masaryk University , Brno 625 00, Czech Republic"},{"name":"International Clinical Research Center, St. Anne\u2019s University Hospital Brno , Brno 656 91, Czech Republic"}]}],"member":"286","published-online":{"date-parts":[[2021,1,8]]},"reference":[{"key":"2023051510463243400_btaa1102-B1","doi-asserted-by":"crossref","first-page":"2975","DOI":"10.1093\/bioinformatics\/btu420","article-title":"ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli","volume":"30","author":"Agostini","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B2","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/j.jmb.2011.12.005","article-title":"Sequence-based prediction of protein solubility","volume":"421","author":"Agostini","year":"2012","journal-title":"J. Mol. Biol"},{"key":"2023051510463243400_btaa1102-B3","author":"Berman","year":"2017"},{"key":"2023051510463243400_btaa1102-B4","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023051510463243400_btaa1102-B5","doi-asserted-by":"crossref","first-page":"4691","DOI":"10.1093\/bioinformatics\/btaa578","article-title":"Solubility-Weighted Index: fast and accurate prediction of protein solubility","volume":"36","author":"Bhandari","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B6","doi-asserted-by":"crossref","first-page":"D464","DOI":"10.1093\/nar\/gky1004","article-title":"RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy","volume":"47","author":"Burley","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051510463243400_btaa1102-B7","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/s12896-019-0520-z","article-title":"Surface patches on recombinant erythropoietin predict protein solubility: engineering proteins to minimise aggregation","volume":"19","author":"Carballo-Amador","year":"2019","journal-title":"BMC Biotechnology"},{"key":"2023051510463243400_btaa1102-B8","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1016\/j.biotechadv.2011.09.016","article-title":"Cell-free protein synthesis: applications come of age","volume":"30","author":"Carlson","year":"2012","journal-title":"Biotechnol. Adv"},{"key":"2023051510463243400_btaa1102-B9","doi-asserted-by":"crossref","first-page":"3333","DOI":"10.1038\/srep03333","article-title":"Soluble expression of proteins correlates with a lack of positively-charged surface","volume":"3","author":"Chan","year":"2013","journal-title":"Sci. Rep"},{"key":"2023051510463243400_btaa1102-B10","doi-asserted-by":"crossref","first-page":"W264","DOI":"10.1093\/nar\/gku270","article-title":"The DynaMine webserver: predicting protein dynamics from sequence","volume":"42","author":"Cilia","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051510463243400_btaa1102-B11","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B12","doi-asserted-by":"crossref","first-page":"63","DOI":"10.3389\/fmicb.2014.00063","article-title":"Fusion tags for protein solubility, purification and immunogenicity in Escherichia coli: the novel Fh8 system","volume":"5","author":"Costa","year":"2014","journal-title":"Front. Microbiol"},{"key":"2023051510463243400_btaa1102-B13","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1002\/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I","article-title":"New fusion protein systems designed to give soluble expression in Escherichia coli","volume":"65","author":"Davis","year":"1999","journal-title":"Biotechnol. Bioeng"},{"key":"2023051510463243400_btaa1102-B14","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1002\/bit.22537","article-title":"Prediction of protein solubility in Escherichia coli using logistic regression","volume":"105","author":"Diaz","year":"2010","journal-title":"Biotechnol. Bioeng"},{"key":"2023051510463243400_btaa1102-B15","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","article-title":"Search and clustering orders of magnitude faster than BLAST","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B16","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat"},{"key":"2023051510463243400_btaa1102-B17","doi-asserted-by":"crossref","first-page":"3098","DOI":"10.1093\/bioinformatics\/btx345","article-title":"Protein\u2013Sol: a web tool for predicting protein solubility from sequence","volume":"33","author":"Hebditch","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B18","doi-asserted-by":"crossref","first-page":"1444","DOI":"10.1002\/pmic.201200175","article-title":"ESPRESSO: a system for estimating protein expression and solubility in protein expression systems","volume":"13","author":"Hirose","year":"2013","journal-title":"Proteomics"},{"key":"2023051510463243400_btaa1102-B19","doi-asserted-by":"crossref","first-page":"W104","DOI":"10.1093\/nar\/gkaa372","article-title":"EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities","volume":"48","author":"Hon","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023051510463243400_btaa1102-B20","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1093\/bioinformatics\/bty166","article-title":"DeepSol: a deep learning framework for sequence-based protein solubility prediction","volume":"34","author":"Khurana","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B21","doi-asserted-by":"crossref","first-page":"1907","DOI":"10.1016\/j.bpj.2012.01.060","article-title":"Toward a molecular understanding of protein solubility: increased negative surface charge correlates with increased solubility","volume":"102","author":"Kramer","year":"2012","journal-title":"Biophys. J"},{"key":"2023051510463243400_btaa1102-B22","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1006\/jmbi.2000.4315","article-title":"Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes","volume":"305","author":"Krogh","year":"2001","journal-title":"J. Mol. Biol"},{"key":"2023051510463243400_btaa1102-B23","doi-asserted-by":"crossref","first-page":"2200","DOI":"10.1093\/bioinformatics\/btp386","article-title":"SOLpro: accurate sequence-based prediction of protein solubility","volume":"25","author":"Magnan","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B24","first-page":"56","author":"McKinney","year":"2010"},{"key":"2023051510463243400_btaa1102-B25","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1021\/acscatal.8b03613","article-title":"Computational design of stable and soluble biocatalysts","volume":"9","author":"Musil","year":"2019","journal-title":"ACS Catal"},{"key":"2023051510463243400_btaa1102-B26","doi-asserted-by":"crossref","first-page":"4201","DOI":"10.1073\/pnas.0811922106","article-title":"Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins","volume":"106","author":"Niwa","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051510463243400_btaa1102-B27","doi-asserted-by":"crossref","first-page":"8937","DOI":"10.1073\/pnas.1201380109","article-title":"Global analysis of chaperone effects using a reconstituted cell-free translation system","volume":"109","author":"Niwa","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051510463243400_btaa1102-B28","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051510463243400_btaa1102-B29","doi-asserted-by":"crossref","first-page":"1889","DOI":"10.1093\/bioinformatics\/btx085","article-title":"FELLS: fast estimator of latent local structure","volume":"33","author":"Piovesan","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B30","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1186\/2042-5783-1-6","article-title":"Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli","volume":"1","author":"Price","year":"2011","journal-title":"Microb. Inf. Exp"},{"key":"2023051510463243400_btaa1102-B31","doi-asserted-by":"crossref","first-page":"e1007722","DOI":"10.1371\/journal.pcbi.1007722","article-title":"Insight into the protein solubility driving forces with neural attention","volume":"16","author":"Raimondi","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023051510463243400_btaa1102-B32","doi-asserted-by":"crossref","first-page":"172","DOI":"10.3389\/fmicb.2014.00172","article-title":"Recombinant protein expression in Escherichia coli: advances and challenges","volume":"5","author":"Rosano","year":"2014","journal-title":"Front. Microbiol"},{"key":"2023051510463243400_btaa1102-B33","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1002\/prot.25594","article-title":"AggScore: prediction of aggregation-prone regions in proteins based on the distribution of surface patches","volume":"86","author":"Sankar","year":"2018","journal-title":"Proteins"},{"key":"2023051510463243400_btaa1102-B34","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1038\/90802","article-title":"Cell-free translation reconstituted with purified components","volume":"19","author":"Shimizu","year":"2001","journal-title":"Nat. Biotechnol"},{"key":"2023051510463243400_btaa1102-B35","doi-asserted-by":"crossref","first-page":"2192","DOI":"10.1111\/j.1742-4658.2012.08603.x","article-title":"PROSO II - a new method for protein solubility prediction","volume":"279","author":"Smialowski","year":"2012","journal-title":"FEBS J"},{"key":"2023051510463243400_btaa1102-B36","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1016\/j.jmb.2014.09.026","article-title":"The CamSol method of rational design of protein mutants with enhanced solubility","volume":"427","author":"Sormanni","year":"2015","journal-title":"J. Mol. Biol"},{"key":"2023051510463243400_btaa1102-B37","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023051510463243400_btaa1102-B38","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the Lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023051510463243400_btaa1102-B39","doi-asserted-by":"crossref","first-page":"W401","DOI":"10.1093\/nar\/gkv485","article-title":"The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides","volume":"43","author":"Tsirigos","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051510463243400_btaa1102-B40","doi-asserted-by":"crossref","first-page":"2402","DOI":"10.1021\/acscatal.7b03523","article-title":"Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterization","volume":"8","author":"Vanacek","year":"2018","journal-title":"ACS Catal"},{"key":"2023051510463243400_btaa1102-B41","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1093\/bioinformatics\/btr682","article-title":"ESpritz: accurate and fast prediction of protein disorder","volume":"28","author":"Walsh","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051510463243400_btaa1102-B42","first-page":"443","article-title":"Predicting the solubility of recombinant proteins in Escherichia coli","volume":"9","author":"Wilkinson","year":"1991","journal-title":"Biotechnology (N.Y.)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa1102\/36158337\/btaa1102.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/1\/23\/50321271\/btaa1102.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/1\/23\/50321271\/btaa1102.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,20]],"date-time":"2024-08-20T03:01:54Z","timestamp":1724122914000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/1\/23\/6070085"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1,1]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,4,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa1102","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.13047818.v1","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv.13047818","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,1,1]]},"published":{"date-parts":[[2021,1,1]]}}}