{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:26:20Z","timestamp":1767961580522,"version":"3.49.0"},"reference-count":62,"publisher":"Oxford University Press (OUP)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Inclusion body formation has been a major deterrent for overexpression studies since a large number of proteins form insoluble inclusion bodies when overexpressed in Escherichia coli. The formation of inclusion bodies is known to be an outcome of improper protein folding; thus the composition and arrangement of amino acids in the proteins would be a major influencing factor in deciding its aggregation propensity. There is a significant need for a prediction algorithm that would enable the rational identification of both mutants and also the ideal protein candidates for mutations that would confer higher solubility-on-overexpression instead of the presently used trial-and-error procedures.<\/jats:p>\n               <jats:p>Results: Six physicochemical properties together with residue and dipeptide-compositions have been used to develop a support vector machine-based classifier to predict the overexpression status in E.coli. The prediction accuracy is \u223c72% suggesting that it performs reasonably well in predicting the propensity of a protein to be soluble or to form inclusion bodies. The algorithm could also correctly predict the change in solubility for most of the point mutations reported in literature. This algorithm can be a useful tool in screening protein libraries to identify soluble variants of proteins.<\/jats:p>\n               <jats:p>Avalibility: Software is available on request from the authors.<\/jats:p>\n               <jats:p>Contact: \u00a0balaji@iitcb.ac.in; vk.jayaraman@ncl.res.in<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics Online web site.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti810","type":"journal-article","created":{"date-parts":[[2005,12,7]],"date-time":"2005-12-07T03:03:48Z","timestamp":1133924628000},"page":"278-284","source":"Crossref","is-referenced-by-count":85,"title":["A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in <i>Escherichia coli<\/i>"],"prefix":"10.1093","volume":"22","author":[{"given":"Susan","family":"Idicula-Thomas","sequence":"first","affiliation":[{"name":"School of Biosciences and Bioengineering, Indian Institute of Technology Bombay 1 \u00a0 1 \u00a0 \u00a0 Powai, Mumbai 400 076, India"}]},{"given":"Abhijit J.","family":"Kulkarni","sequence":"additional","affiliation":[{"name":"Chemical Engineering and Process Development Division, National Chemical Laboratory 2 \u00a0 2 \u00a0 \u00a0 Dr Homi Bhabha Road, Pune 411 008, India"}]},{"given":"Bhaskar D.","family":"Kulkarni","sequence":"additional","affiliation":[{"name":"Chemical Engineering and Process Development Division, National Chemical Laboratory 2 \u00a0 2 \u00a0 \u00a0 Dr Homi Bhabha Road, Pune 411 008, India"}]},{"given":"Valadi K.","family":"Jayaraman","sequence":"additional","affiliation":[{"name":"Chemical Engineering and Process Development Division, National Chemical Laboratory 2 \u00a0 2 \u00a0 \u00a0 Dr Homi Bhabha Road, Pune 411 008, India"}]},{"given":"Petety V.","family":"Balaji","sequence":"additional","affiliation":[{"name":"School of Biosciences and Bioengineering, Indian Institute of Technology Bombay 1 \u00a0 1 \u00a0 \u00a0 Powai, Mumbai 400 076, India"}]}],"member":"286","published-online":{"date-parts":[[2005,12,6]]},"reference":[{"key":"2023012408323420800_b1","doi-asserted-by":"crossref","first-page":"2884","DOI":"10.1093\/nar\/29.13.2884","article-title":"SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics","volume":"29","author":"Bertone","year":"2001","journal-title":"Nucleic Acids Res."},{"issue":"(Web Server issue)","key":"2023012408323420800_b2","doi-asserted-by":"crossref","first-page":"W414","DOI":"10.1093\/nar\/gkh350","article-title":"ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST","volume":"32","author":"Bhasin","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012408323420800_b3","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1073\/pnas.97.1.262","article-title":"Knowledge-based analysis of microarray gene expression data by using support vector machines","volume":"97","author":"Brown","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA."},{"key":"2023012408323420800_b4","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1023\/A:1009715923555","article-title":"A tutorial on support vector machines for pattern recognition","volume":"2","author":"Burges","year":"1998","journal-title":"Data Min. Knowl. Disc."},{"key":"2023012408323420800_b5","first-page":"67","article-title":"Support vector machine applications in bioinformatics","volume":"2","author":"Byvatov","year":"2003","journal-title":"Appl. Bioinformatics"},{"key":"2023012408323420800_b6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0079-6107(01)00005-0","article-title":"The interrelationships of side-chain and main-chain conformations in proteins","volume":"76","author":"Chakrabarti","year":"2001","journal-title":"Prog. Biophys. Mol. Biol."},{"key":"2023012408323420800_b7","doi-asserted-by":"crossref","first-page":"9238","DOI":"10.1063\/1.466677","article-title":"Transition states and folding dynamics of proteins and heteropolymers","volume":"100","author":"Chan","year":"1994","journal-title":"J. Chem. Phys."},{"key":"2023012408323420800_b8","article-title":"LIBSVM: a library for support vector machines","author":"Chang","year":"2001"},{"key":"2023012408323420800_b9","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1038\/nature01891","article-title":"Rationalization of the effects of mutations on peptide and protein aggregation rates","volume":"424","author":"Chiti","year":"2003","journal-title":"Nature"},{"key":"2023012408323420800_b10","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/S0958-1669(98)80109-2","article-title":"Refolding of recombinant proteins","volume":"9","author":"Clark","year":"1998","journal-title":"Curr. Opin. Biotechnol."},{"key":"2023012408323420800_b11","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1016\/S0006-291X(02)00226-7","article-title":"Silent mutations affect in vivo protein folding in Escherichia coli","volume":"293","author":"Cortazzo","year":"2002","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"2023012408323420800_b12","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1006\/mben.1998.0112","article-title":"Classification and sensitivity analysis of a proposed primary metabolic reaction network for Streptomyces lividans","volume":"1","author":"Daae","year":"1999","journal-title":"Metab. Eng."},{"key":"2023012408323420800_b13","doi-asserted-by":"crossref","first-page":"933","DOI":"10.1093\/protein\/7.7.933","article-title":"Improving protein solubility through rationally designed amino acid replacements: solubilization of the trimethoprim-resistant type S1 dihydrofolate reductase","volume":"7","author":"Dale","year":"1994","journal-title":"Protein Eng."},{"key":"2023012408323420800_b14","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1002\/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I","article-title":"New fusion protein systems designed to give soluble expression in Escherichia coli","volume":"65","author":"Davis","year":"1999","journal-title":"Biotechnol. Bioeng."},{"key":"2023012408323420800_b15","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1093\/bioinformatics\/17.4.349","article-title":"Multi-class protein fold recognition using support vector machines and neural networks","volume":"17","author":"Ding","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012408323420800_b16","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1023\/B:JCAM.0000017375.61558.ad","article-title":"Comparison of correlation vector methods for ligand-based similarity searching","volume":"17","author":"Fechner","year":"2003","journal-title":"J. Comput. Aided Mol. Des."},{"key":"2023012408323420800_b17","doi-asserted-by":"crossref","first-page":"R9","DOI":"10.1016\/S1359-0278(98)00002-9","article-title":"Protein aggregation: folding aggregates, inclusion bodies and amyloid","volume":"3","author":"Fink","year":"1998","journal-title":"Fold Des."},{"key":"2023012408323420800_b18","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1021\/bi991518m","article-title":"Aggregation events occur prior to stable intermediate formation during refolding of interleukin 1beta","volume":"39","author":"Finke","year":"2000","journal-title":"Biochemistry"},{"key":"2023012408323420800_b19","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1093\/bioinformatics\/16.10.906","article-title":"Support vector machine classification and validation of cancer tissue samples using microarray expression data","volume":"16","author":"Furey","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012408323420800_b20","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/S0076-6879(99)09005-9","article-title":"Isolating inclusion bodies from bacteria","volume":"309","author":"Georgiou","year":"1999","journal-title":"Methods Enzymol."},{"key":"2023012408323420800_b21","doi-asserted-by":"crossref","first-page":"418","DOI":"10.2144\/04373ST07","article-title":"Method for enhancing solubility of the expressed recombinant proteins in Escherichia coli","volume":"37","author":"Ghosh","year":"2004","journal-title":"Biotechniques"},{"key":"2023012408323420800_b22","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.jmb.2003.11.053","article-title":"Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis","volume":"336","author":"Goh","year":"2004","journal-title":"J. Mol. Biol."},{"key":"2023012408323420800_b23","article-title":"Support vector machines for classification and regression","volume-title":"ISIS technical report","author":"Gunn","year":"1997"},{"key":"2023012408323420800_b24","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1110\/ps.22102","article-title":"Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia coli","volume":"11","author":"Hammarstrom","year":"2002","journal-title":"Protein Sci."},{"key":"2023012408323420800_b25","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1002\/1097-0290(20010205)72:3<315::AID-BIT8>3.0.CO;2-G","article-title":"Kinetic model of in vivo folding and inclusion body formation in recombinant Escherichia coli","volume":"72","author":"Hoffmann","year":"2001","journal-title":"Biotechnol Bioeng."},{"key":"2023012408323420800_b26","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1110\/ps.041009005","article-title":"Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli","volume":"14","author":"Idicula-Thomas","year":"2005","journal-title":"Protein Sci."},{"key":"2023012408323420800_b27","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1089\/10665270050081405","article-title":"A discriminative framework for detecting remote protein homologies","volume":"7","author":"Jaakkola","year":"2000","journal-title":"J Comput Biol."},{"key":"2023012408323420800_b28","doi-asserted-by":"crossref","first-page":"6057","DOI":"10.1073\/pnas.92.13.6057","article-title":"Catalytic domain of human immunodeficiency virus type 1 integrase: identification of a soluble mutant by systematic replacement of hydrophobic residues","volume":"92","author":"Jenkins","year":"1995","journal-title":"Proc. Natl Acad. Sci. USA."},{"key":"2023012408323420800_b29","doi-asserted-by":"crossref","first-page":"12945","DOI":"10.1074\/jbc.M010402200","article-title":"Prediction of amyloid fibril-forming proteins","volume":"276","author":"Kallberg","year":"2001","journal-title":"J. Biol. Chem."},{"key":"2023012408323420800_b30","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1096\/fasebj.10.1.8566549","article-title":"Thermolabile folding intermediates: inclusion body precursors and chaperonin substrates","volume":"10","author":"King","year":"1996","journal-title":"FASEB J."},{"key":"2023012408323420800_b31","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1006\/abio.2001.5331","article-title":"Screening for soluble expression of recombinant proteins in a 96-well format","volume":"297","author":"Knaust","year":"2001","journal-title":"Anal. Biochem."},{"key":"2023012408323420800_b32","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1016\/S0014-5793(99)01566-5","article-title":"Synonymous codon substitutions affect ribosome traffic and protein folding during in vitro translation","volume":"462","author":"Komar","year":"1999","journal-title":"FEBS Lett."},{"key":"2023012408323420800_b33","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/S0098-1354(03)00188-1","article-title":"Support vector classification with parameter tuning assisted by agent-based technique","volume":"28","author":"Kulkarni","year":"2004","journal-title":"Comput. Chem. Eng."},{"key":"2023012408323420800_b34","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1016\/S0958-1669(98)80035-9","article-title":"Advances in refolding of proteins produced in E. coli.","volume":"9","author":"Lilie","year":"1998","journal-title":"Curr. Opin. Biotechnol."},{"key":"2023012408323420800_b35","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1023\/A:1012406528296","article-title":"Support vector machines for classification in nonstandard situations","volume":"46","author":"Lin","year":"2002","journal-title":"Machine Learning"},{"key":"2023012408323420800_b36","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1101\/gr.2520504","article-title":"High-throughput expression of C. elegans proteins","volume":"14","author":"Luan","year":"2004","journal-title":"Genome Res."},{"key":"2023012408323420800_b37","first-page":"41","article-title":"Overproduction of beta-glucosidase in active form by an Escherichia coli system coexpressing the chaperonin GroEL\/ES","volume":"159","author":"Machida","year":"1998","journal-title":"FEMS Microbiol Lett."},{"key":"2023012408323420800_b38","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1128\/mr.60.3.512-538.1996","article-title":"Strategies for achieving high-level expression of genes in Escherichia coli","volume":"60","author":"Makrides","year":"1996","journal-title":"Microbiol. Rev."},{"key":"2023012408323420800_b39","doi-asserted-by":"crossref","first-page":"4352","DOI":"10.1046\/j.1432-1327.2001.02357.x","article-title":"Improving solubility of catalytic domain of human beta-1,4-galactosyltransferase 1 through rationally designed amino acid replacements","volume":"268","author":"Malissard","year":"2001","journal-title":"Eur. J. Biochem."},{"key":"2023012408323420800_b40","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/j.jmb.2003.10.082","article-title":"The regions of the sequence most exposed to the solvent within the amyloidogenic state of a protein initiate the aggregation process","volume":"336","author":"Monti","year":"2004","journal-title":"J Mol Biol."},{"key":"2023012408323420800_b41","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1109\/72.914517","article-title":"An Introduction to Kernel-Based Learning Algorithms","volume":"2","author":"Muller","year":"2001","journal-title":"IEEE Trans Neural Netw."},{"key":"2023012408323420800_b42","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1111\/j.1432-1033.1995.tb20531.x","article-title":"Hydrophobicity engineering to increase solubility and stability of a recombinant protein from respiratory syncytial virus","volume":"230","author":"Murby","year":"1995","journal-title":"Eur. J. Biochem."},{"key":"2023012408323420800_b43","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1093\/protein\/13.3.149","article-title":"Simplified amino acid alphabets for protein fold recognition and implications for folding","volume":"13","author":"Murphy","year":"2000","journal-title":"Protein Eng."},{"key":"2023012408323420800_b44","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1002\/prot.20092","article-title":"Prediction of transmembrane regions of beta-barrel proteins using ANN- and SVM-based methods","volume":"56","author":"Natt","year":"2004","journal-title":"Proteins"},{"key":"2023012408323420800_b45","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1038\/nbt732","article-title":"Engineering soluble proteins for structural genomics","volume":"20","author":"Pedelacq","year":"2002","journal-title":"Nat. Biotechnol."},{"key":"2023012408323420800_b46","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1093\/protein\/7.1.131","article-title":"Secondary structure characterization of beta-lactamase inclusion bodies","volume":"7","author":"Przybycien","year":"1994","journal-title":"Protein Eng."},{"key":"2023012408323420800_b47","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1126\/science.4023714","article-title":"Hydrophobicity of amino acid residues in globular proteins","volume":"229","author":"Rose","year":"1985","journal-title":"Science"},{"key":"2023012408323420800_b48","first-page":"308","article-title":"Solubility as a function of protein structure and solvent components","volume":"8","author":"Schein","year":"1990","journal-title":"Biotechnology"},{"key":"2023012408323420800_b49","doi-asserted-by":"crossref","first-page":"1519","DOI":"10.1063\/1.467775","article-title":"Folding kinetics of protein-like heteropolymers","volume":"100","author":"Socci","year":"1994","journal-title":"J. Chem. Phys."},{"key":"2023012408323420800_b50","doi-asserted-by":"crossref","first-page":"R177","DOI":"10.1016\/S0969-2126(00)00193-3","article-title":"Design of high-throughput methods of protein production for structural biology","volume":"8","author":"Stevens","year":"2000","journal-title":"Structure"},{"key":"2023012408323420800_b51","doi-asserted-by":"crossref","first-page":"1767","DOI":"10.1046\/j.1432-1033.2003.03538.x","article-title":"Functional analysis of disease-causing mutations in human galactokinase","volume":"270","author":"Timson","year":"2003","journal-title":"Eur. J. Biochem."},{"key":"2023012408323420800_b52","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1023\/B:JSFG.0000029017.46332.e3","article-title":"Refolding strategies from inclusion bodies in a structural genomics project","volume":"5","author":"Tresaugues","year":"2004","journal-title":"J. Struct. Funct. Genomics"},{"key":"2023012408323420800_b53","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The nature of statistical learning theory","author":"Vapnik","year":"1995","edition":"1st edn."},{"key":"2023012408323420800_b54","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/S1367-5931(02)00017-0","article-title":"Genetic screens and directed evolution for protein solubility","volume":"7","author":"Waldo","year":"2003","journal-title":"Curr. Opin. Chem. Biol."},{"key":"2023012408323420800_b55","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btg054","article-title":"Feature selection and transduction for prediction of molecular bioactivity for drug design","volume":"19","author":"Weston","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012408323420800_b56","first-page":"731","article-title":"Mutations in human interferon gamma affecting inclusion body formation identified by a general immunochemical screen","volume":"9","author":"Wetzel","year":"1991","journal-title":"Biotechnology"},{"key":"2023012408323420800_b57","first-page":"443","article-title":"Predicting the solubility of recombinant proteins in Escherichia coli","volume":"9","author":"Wilkinson","year":"1991","journal-title":"Biotechnology"},{"key":"2023012408323420800_b58","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/S0168-1656(00)00356-4","article-title":"Increased production of human proinsulin in the periplasmic space of Escherichia coli by fusion to DsbA","volume":"84","author":"Winter","year":"2001","journal-title":"J. Biotechnol."},{"key":"2023012408323420800_b59","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1073\/pnas.0137017100","article-title":"Directed evolution approach to a structural genomics project: Rv2002 from Mycobacterium tuberculosis","volume":"100","author":"Yang","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA."},{"key":"2023012408323420800_b60","doi-asserted-by":"crossref","first-page":"689","DOI":"10.1093\/bioinformatics\/18.5.689","article-title":"Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions","volume":"18","author":"Zavaljevski","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012408323420800_b61","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1006\/prep.1997.0834","article-title":"Expression of eukaryotic proteins in soluble form in Escherichia coli","volume":"12","author":"Zhang","year":"1998","journal-title":"Protein Expr. Purif."},{"key":"2023012408323420800_b62","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1093\/bioinformatics\/16.9.799","article-title":"Engineering support vector machine kernels that recognize translation initiation sites","volume":"16","author":"Zien","year":"2000","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/3\/278\/48838012\/bioinformatics_22_3_278.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/3\/278\/48838012\/bioinformatics_22_3_278.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T08:49:51Z","timestamp":1674550191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/3\/278\/220718"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,12,6]]},"references-count":62,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2006,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti810","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,2,1]]},"published":{"date-parts":[[2005,12,6]]}}}