{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:39:26Z","timestamp":1773794366767,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100002661","name":"FNRS","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002661","id-type":"DOI","asserted-by":"publisher"}]},{"name":"John von Neumann Institute for Computing"},{"DOI":"10.13039\/100013137","name":"NIC","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100013137","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue\u2013residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The SOLart webserver is freely available at http:\/\/babylone.ulb.ac.be\/SOLART\/.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz773","type":"journal-article","created":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T15:33:06Z","timestamp":1570548786000},"page":"1445-1452","source":"Crossref","is-referenced-by-count":73,"title":["SOLart: a structure-based method to predict protein solubility and aggregation"],"prefix":"10.1093","volume":"36","author":[{"given":"Qingzhen","family":"Hou","sequence":"first","affiliation":[{"name":"Computational Biology and Bioinformatics, Universit\u00e9 Libre de Bruxelles , Avenue Roosevelt 50, 1050 Brussels, Belgium"},{"name":"Interuniversity Institute of Bioinformatics in Brussels , Boulevard du Triomphe, 1050 Brussels, Belgium"}]},{"given":"Jean Marc","family":"Kwasigroch","sequence":"additional","affiliation":[{"name":"Computational Biology and Bioinformatics, Universit\u00e9 Libre de Bruxelles , Avenue Roosevelt 50, 1050 Brussels, Belgium"},{"name":"Interuniversity Institute of Bioinformatics in Brussels , Boulevard du Triomphe, 1050 Brussels, Belgium"}]},{"given":"Marianne","family":"Rooman","sequence":"additional","affiliation":[{"name":"Computational Biology and Bioinformatics, Universit\u00e9 Libre de Bruxelles , Avenue Roosevelt 50, 1050 Brussels, Belgium"},{"name":"Interuniversity Institute of Bioinformatics in Brussels , Boulevard du Triomphe, 1050 Brussels, Belgium"}]},{"given":"Fabrizio","family":"Pucci","sequence":"additional","affiliation":[{"name":"Computational Biology and Bioinformatics, Universit\u00e9 Libre de Bruxelles , Avenue Roosevelt 50, 1050 Brussels, Belgium"},{"name":"Interuniversity Institute of Bioinformatics in Brussels , Boulevard du Triomphe, 1050 Brussels, Belgium"},{"name":"John von Neumann Institute for Computing , J\u00fclich Supercomputer Centre, Forschungszentrum J\u00fclich, 52428 J\u00fclich, Germany"}]}],"member":"286","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"2023060910270784400_btz773-B1","doi-asserted-by":"crossref","first-page":"2975","DOI":"10.1093\/bioinformatics\/btu420","article-title":"cc SOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli","volume":"30","author":"Agostini","year":"2014","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B2","doi-asserted-by":"crossref","first-page":"D115","DOI":"10.1093\/nar\/gkh131","article-title":"UniProt: the universal protein knowledgebase","volume":"32","author":"Apweiler","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023060910270784400_btz773-B3","doi-asserted-by":"crossref","first-page":"1399.","DOI":"10.1038\/nbt1029","article-title":"Recombinant protein folding and misfolding in Escherichia coli","volume":"22","author":"Baneyx","year":"2004","journal-title":"Nat. Biotechnol"},{"key":"2023060910270784400_btz773-B4","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023060910270784400_btz773-B5","doi-asserted-by":"crossref","first-page":"D313","DOI":"10.1093\/nar\/gkw1132","article-title":"The SWISS-MODEL repository\u2014new features and functionality","volume":"45","author":"Bienert","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023060910270784400_btz773-B6","doi-asserted-by":"crossref","first-page":"507.","DOI":"10.1038\/416507a","article-title":"Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases","volume":"416","author":"Bucciantini","year":"2002","journal-title":"Nature"},{"key":"2023060910270784400_btz773-B7","doi-asserted-by":"crossref","first-page":"3333.","DOI":"10.1038\/srep03333","article-title":"Soluble expression of proteins correlates with a lack of positively-charged surface","volume":"3","author":"Chan","year":"2013","journal-title":"Sci. Rep"},{"key":"2023060910270784400_btz773-B8","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1146\/annurev.biochem.75.101304.123901","article-title":"Protein misfolding, functional amyloid, and human disease","volume":"75","author":"Chiti","year":"2006","journal-title":"Annu. Rev. Biochem"},{"key":"2023060910270784400_btz773-B9","doi-asserted-by":"crossref","first-page":"1734","DOI":"10.1002\/prot.24527","article-title":"Cation-\u03c0, amino-\u03c0, \u03c0-\u03c0, and h-bond interactions stabilize antigen-antibody interfaces","volume":"82","author":"Dalkas","year":"2014","journal-title":"Proteins"},{"key":"2023060910270784400_btz773-B10","doi-asserted-by":"crossref","first-page":"D289","DOI":"10.1093\/nar\/gkw1098","article-title":"CATH: an expanded resource to predict protein function through structure and sequence","volume":"45","author":"Dawson","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023060910270784400_btz773-B11","doi-asserted-by":"crossref","first-page":"2537","DOI":"10.1093\/bioinformatics\/btp445","article-title":"Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: popmusic-2.0","volume":"25","author":"Dehouck","year":"2009","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B12","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1016\/j.bpj.2009.10.050","article-title":"Thermo- and mesostabilizing protein interactions identified by temperature-dependent statistical potentials","volume":"98","author":"Folch","year":"2010","journal-title":"Biophys. J"},{"key":"2023060910270784400_btz773-B13","doi-asserted-by":"crossref","first-page":"8933","DOI":"10.1021\/ja049297h","article-title":"A simple method for improving protein solubility and long-term stability","volume":"126","author":"Golovanov","year":"2004","journal-title":"J. Am. Chem. Soc"},{"key":"2023060910270784400_btz773-B14","doi-asserted-by":"crossref","first-page":"3098","DOI":"10.1093\/bioinformatics\/btx345","article-title":"Protein\u2013Sol: a web tool for predicting protein solubility from sequence","volume":"33","author":"Hebditch","year":"2017","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B15","doi-asserted-by":"crossref","first-page":"1444","DOI":"10.1002\/pmic.201200175","article-title":"Espresso: a system for estimating protein expression and solubility in protein expression systems","volume":"13","author":"Hirose","year":"2013","journal-title":"Proteomics"},{"key":"2023060910270784400_btz773-B16","doi-asserted-by":"crossref","first-page":"14661.","DOI":"10.1038\/s41598-018-32988-w","article-title":"Computational analysis of the amino acid interactions that promote or decrease protein solubility","volume":"8","author":"Hou","year":"2018","journal-title":"Sci. Rep"},{"key":"2023060910270784400_btz773-B17","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1093\/bioinformatics\/bti810","article-title":"A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli","volume":"22","author":"Idicula-Thomas","year":"2006","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B18","doi-asserted-by":"crossref","first-page":"451.","DOI":"10.2119\/2007-00100.Irvine","article-title":"Protein aggregation in the brain: the molecular basis for Alzheimer\u2019s and Parkinson\u2019s diseases","volume":"14","author":"Irvine","year":"2008","journal-title":"Mol. Med"},{"key":"2023060910270784400_btz773-B19","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1093\/bioinformatics\/bty166","article-title":"Deepsol: a deep learning framework for sequence-based protein solubility prediction","volume":"34","author":"Khurana","year":"2018","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B20","doi-asserted-by":"crossref","first-page":"1598","DOI":"10.1006\/jmbi.1994.1109","article-title":"Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches","volume":"235","author":"Kocher","year":"1994","journal-title":"J. Mol. Biol"},{"key":"2023060910270784400_btz773-B21","doi-asserted-by":"crossref","first-page":"1907","DOI":"10.1016\/j.bpj.2012.01.060","article-title":"Toward a molecular understanding of protein solubility: increased negative surface charge correlates with increased solubility","volume":"102","author":"Kramer","year":"2012","journal-title":"Biophys. J"},{"key":"2023060910270784400_btz773-B22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v028.i05","article-title":"Building predictive models in r using the caret package","volume":"28","author":"Kuhn","year":"2008","journal-title":"J. Stat. Softw"},{"key":"2023060910270784400_btz773-B23","doi-asserted-by":"crossref","first-page":"W300","DOI":"10.1093\/nar\/gkz321","article-title":"Aggrescan3D (A3D) 2.0: prediction and engineering of protein solubility","volume":"47","author":"Kuriata","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023060910270784400_btz773-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v036.i11","article-title":"Feature selection with the Boruta package","volume":"36","author":"Kursa","year":"2010","journal-title":"J. Stat. Softw"},{"key":"2023060910270784400_btz773-B25","first-page":"18","article-title":"Classification and regression by randomforest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"2023060910270784400_btz773-B26","doi-asserted-by":"crossref","first-page":"2200","DOI":"10.1093\/bioinformatics\/btp386","article-title":"Solpro: accurate sequence-based prediction of protein solubility","volume":"25","author":"Magnan","year":"2009","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B27","doi-asserted-by":"crossref","first-page":"4.","DOI":"10.1186\/1475-2859-8-4","article-title":"Learning about protein solubility from bacterial inclusion bodies","volume":"8","author":"Mart\u00ednez-Alonso","year":"2009","journal-title":"Microb. Cell Fact"},{"key":"2023060910270784400_btz773-B28","doi-asserted-by":"crossref","first-page":"4201","DOI":"10.1073\/pnas.0811922106","article-title":"Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins","volume":"106","author":"Niwa","year":"2009","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023060910270784400_btz773-B29","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1146\/annurev-chembioeng-062011-081052","article-title":"Engineering aggregation-resistant antibodies","volume":"3","author":"Perchiacca","year":"2012","journal-title":"Annu. Rev. Chem. Biomol. Eng"},{"key":"2023060910270784400_btz773-B30","doi-asserted-by":"crossref","first-page":"23257.","DOI":"10.1038\/srep23257","article-title":"Predicting protein thermal stability changes upon point mutations using statistical potentials: introducing hotmusic","volume":"6","author":"Pucci","year":"2016","journal-title":"Sci. Rep"},{"key":"2023060910270784400_btz773-B31","doi-asserted-by":"crossref","first-page":"e91659.","DOI":"10.1371\/journal.pone.0091659","article-title":"Protein thermostability prediction within homologous families using temperature-dependent statistical potentials","volume":"9","author":"Pucci","year":"2014","journal-title":"PLoS One"},{"key":"2023060910270784400_btz773-B32","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1093\/bioinformatics\/btx662","article-title":"PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine","volume":"34","author":"Rawi","year":"2018","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B33","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1016\/j.tibtech.2014.05.005","article-title":"Therapeutic protein aggregation: mechanisms, design, and control","volume":"32","author":"Roberts","year":"2014","journal-title":"Trends Biotechnol"},{"key":"2023060910270784400_btz773-B34","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1016\/0022-2836(91)80186-X","article-title":"Prediction of protein backbone conformation based on seven structure assignments: influence of local interactions","volume":"221","author":"Rooman","year":"1991","journal-title":"J. Mol. Biol"},{"key":"2023060910270784400_btz773-B35","doi-asserted-by":"crossref","first-page":"S10.","DOI":"10.1038\/nm1066","article-title":"Protein aggregation and neurodegenerative disease","volume":"10","author":"Ross","year":"2004","journal-title":"Nat. Med"},{"key":"2023060910270784400_btz773-B36","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1016\/j.ymeth.2005.04.006","article-title":"Protein synthesis by pure translation systems","volume":"36","author":"Shimizu","year":"2005","journal-title":"Methods"},{"key":"2023060910270784400_btz773-B37","doi-asserted-by":"crossref","first-page":"41.","DOI":"10.1186\/s12934-015-0222-8","article-title":"Protein recovery from inclusion bodies of Escherichia coli using mild solubilization process","volume":"14","author":"Singh","year":"2015","journal-title":"Microb. Cell Fact"},{"key":"2023060910270784400_btz773-B38","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1263\/jbb.99.303","article-title":"Solubilization and refolding of bacterial inclusion body proteins","volume":"99","author":"Singh","year":"2005","journal-title":"J. Biosci. Bioeng"},{"key":"2023060910270784400_btz773-B39","doi-asserted-by":"crossref","first-page":"2192","DOI":"10.1111\/j.1742-4658.2012.08603.x","article-title":"PROSO II\u2013a new method for protein solubility prediction","volume":"279","author":"Smialowski","year":"2012","journal-title":"FEBS J"},{"key":"2023060910270784400_btz773-B40","doi-asserted-by":"crossref","first-page":"2536","DOI":"10.1093\/bioinformatics\/btl623","article-title":"Protein solubility: sequence based prediction and experimental verification","volume":"23","author":"Smialowski","year":"2007","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B41","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1016\/j.jmb.2014.09.026","article-title":"The CamSol method of rational design of protein mutants with enhanced solubility","volume":"427","author":"Sormanni","year":"2015","journal-title":"J. Mol. Biol"},{"key":"2023060910270784400_btz773-B42","doi-asserted-by":"crossref","first-page":"2601","DOI":"10.1529\/biophysj.107.127746","article-title":"Prediction of protein solubility from calculation of transfer free energy","volume":"95","author":"Tjong","year":"2008","journal-title":"Biophys. J"},{"key":"2023060910270784400_btz773-B43","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.sbi.2017.01.004","article-title":"Exploring the relationships between protein sequence, structure and solubility","volume":"42","author":"Trainor","year":"2017","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023060910270784400_btz773-B44","doi-asserted-by":"crossref","first-page":"4155","DOI":"10.1002\/jps.21327","article-title":"Measuring and increasing protein solubility","volume":"97","author":"Trevino","year":"2008","journal-title":"J. Pharm. Sci"},{"key":"2023060910270784400_btz773-B45","doi-asserted-by":"crossref","first-page":"678.","DOI":"10.1038\/s41598-017-18977-5","article-title":"Large-scale aggregation analysis of eukaryotic proteins reveals an involvement of intrinsically disordered regions in protein folding","volume":"8","author":"Uemura","year":"2018","journal-title":"Sci. Rep"},{"key":"2023060910270784400_btz773-B46","doi-asserted-by":"crossref","first-page":"11.","DOI":"10.1186\/1475-2859-3-11","article-title":"Strategies for the recovery of active proteins through refolding of bacterial inclusion body proteins","volume":"3","author":"Vallejo","year":"2004","journal-title":"Microb. Cell Fact"},{"key":"2023060910270784400_btz773-B47","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1093\/bioinformatics\/btg224","article-title":"Pisces: a protein sequence culling server","volume":"19","author":"Wang","year":"2003","journal-title":"Bioinformatics"},{"key":"2023060910270784400_btz773-B48","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1021\/mp4004749","article-title":"Lysine and arginine content of proteins: computational analysis suggests a new tool for solubility design","volume":"11","author":"Warwicker","year":"2014","journal-title":"Mol. Pharm"},{"key":"2023060910270784400_btz773-B49","doi-asserted-by":"crossref","first-page":"443.","DOI":"10.1038\/nbt0591-443","article-title":"Predicting the solubility of recombinant proteins in Escherichia coli","volume":"9","author":"Wilkinson","year":"1991","journal-title":"Nat. Biotechnol"},{"key":"2023060910270784400_btz773-B50","doi-asserted-by":"crossref","first-page":"D613","DOI":"10.1093\/nar\/gks1235","article-title":"EcoGene 3.0","volume":"41","author":"Zhou","year":"2013","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz773\/30332404\/btz773.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1445\/50552858\/bioinformatics_36_5_1445.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1445\/50552858\/bioinformatics_36_5_1445.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T06:28:10Z","timestamp":1686292090000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/5\/1445\/5585748"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":50,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz773","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/600734","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3]]},"published":{"date-parts":[[2019,10,11]]}}}