{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T23:47:12Z","timestamp":1776469632340,"version":"3.51.2"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We present a novel computational method for predicting which proteins from highly and abnormally expressed genes in diseased human tissues, such as cancers, can be secreted into the bloodstream, suggesting possible marker proteins for follow-up serum proteomic studies. A main challenging issue in tackling this problem is that our understanding about the downstream localization after proteins are secreted outside the cells is very limited and not sufficient to provide useful hints about secretion to the bloodstream. To bypass this difficulty, we have taken a data mining approach by first collecting, through extensive literature searches, human proteins that are known to be secreted into the bloodstream due to various pathological conditions as detected by previous proteomic studies, and then asking the question: \u2018what do these secreted proteins have in common in terms of their physical and chemical properties, amino acid sequence and structural features that can be used to predict them?\u2019 We have identified a list of features, such as signal peptides, transmembrane domains, glycosylation sites, disordered regions, secondary structural content, hydrophobicity and polarity measures that show relevance to protein secretion. Using these features, we have trained a support vector machine-based classifier to predict protein secretion to the bloodstream. On a large test set containing 98 secretory proteins and 6601 non-secretory proteins of human, our classifier achieved \u223c90% prediction sensitivity and \u223c98% prediction specificity. Several additional datasets are used to further assess the performance of our classifier. On a set of 122 proteins that were found to be of abnormally high abundance in human blood due to various cancers, our program predicted 62 as blood-secreted proteins. By applying our program to abnormally highly expressed genes in gastric cancer and lung cancer tissues detected through microarray gene expression studies, we predicted 13 and 31 as blood secreted, respectively, suggesting that they could serve as potential biomarkers for these two cancers, respectively. Our study demonstrated that our method can provide highly useful information to link genomic and proteomic studies for disease biomarker discovery. Our software can be accessed at http:\/\/csbl1.bmb.uga.edu\/cgi-bin\/Secretion\/secretion.cgi.<\/jats:p>\n               <jats:p>Contact: \u00a0xyn@bmb.uga.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn418","type":"journal-article","created":{"date-parts":[[2008,8,13]],"date-time":"2008-08-13T02:35:24Z","timestamp":1218594924000},"page":"2370-2375","source":"Crossref","is-referenced-by-count":59,"title":["Computational prediction of human proteins that can be secreted into the bloodstream"],"prefix":"10.1093","volume":"24","author":[{"given":"Juan","family":"Cui","sequence":"first","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA, 2Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou 310029, China and 3Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA"}]},{"given":"Qi","family":"Liu","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA, 2Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou 310029, China and 3Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA"},{"name":"1 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA, 2Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou 310029, China and 3Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA"}]},{"given":"David","family":"Puett","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA, 2Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou 310029, China and 3Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA"}]},{"given":"Ying","family":"Xu","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA, 2Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou 310029, China and 3Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA"},{"name":"1 Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30602, USA, 2Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou 310029, China and 3Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA"}]}],"member":"286","published-online":{"date-parts":[[2008,8,12]]},"reference":[{"key":"2023020211253227400_B1","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1074\/mcp.M200066-MCP200","article-title":"Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry","volume":"1","author":"Adkins","year":"2002","journal-title":"Mol. Cell Proteomics"},{"key":"2023020211253227400_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023020211253227400_B3","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1074\/mcp.R200007-MCP200","article-title":"The human plasma proteome: history, character, and diagnostic prospects","volume":"1","author":"Anderson","year":"2002","journal-title":"Mol. Cell Proteomics"},{"key":"2023020211253227400_B4","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1093\/nar\/30.1.276","article-title":"The Pfam protein families database","volume":"30","author":"Bateman","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023020211253227400_B5","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1186\/1471-2105-6-167","article-title":"Prediction of twin-arginine signal peptides","volume":"6","author":"Bendtsen","year":"2005","journal-title":"BMC Bioinformatics"},{"issue":"Suppl. 1","key":"2023020211253227400_B6","doi-asserted-by":"crossref","first-page":"i38","DOI":"10.1093\/bioinformatics\/bti1016","article-title":"Kernel methods for predicting protein-protein interactions","volume":"21","author":"Ben-Hur","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020211253227400_B7","doi-asserted-by":"crossref","first-page":"23262","DOI":"10.1074\/jbc.M401932200","article-title":"Classification of nuclear receptors based on amino acid composition and dipeptide composition","volume":"279","author":"Bhasin","year":"2004","journal-title":"J. Biol. Chem."},{"key":"2023020211253227400_B8","doi-asserted-by":"crossref","first-page":"1100","DOI":"10.1038\/nbt0906-1100","article-title":"The sweet side of biomarker discovery","volume":"24","author":"Bosques","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023020211253227400_B9","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1016\/j.urolonc.2006.07.004","article-title":"Molecular markers of prostate cancer","volume":"24","author":"Bradford","year":"2006","journal-title":"Urol. Oncol."},{"key":"2023020211253227400_B10","first-page":"1408","article-title":"The unique physiology of solid tumors: opportunities (and problems) for cancer therapy","volume":"58","author":"Brown","year":"1998","journal-title":"Cancer Res."},{"key":"2023020211253227400_B11","first-page":"6996","article-title":"Secreted and cell surface genes expressed in benign and malignant colorectal tumors","volume":"61","author":"Buckhaults","year":"2001","journal-title":"Cancer Res."},{"key":"2023020211253227400_B12","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/S0097-8485(01)00094-8","article-title":"Drug design by machine learning: support vector machines for pharmaceutical data analysis","volume":"26","author":"Burbidge","year":"2001","journal-title":"Comput. Chem."},{"key":"2023020211253227400_B13","doi-asserted-by":"crossref","first-page":"3692","DOI":"10.1093\/nar\/gkg600","article-title":"SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence","volume":"31","author":"Cai","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023020211253227400_B14","doi-asserted-by":"crossref","first-page":"D169","DOI":"10.1093\/nar\/gki093","article-title":"SPD \u2013 a web-based secreted protein database","volume":"33","author":"Chen","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023020211253227400_B15","doi-asserted-by":"crossref","first-page":"514","DOI":"10.1016\/j.molimm.2006.02.010","article-title":"Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties","volume":"44","author":"Cui","year":"2007","journal-title":"Mol. Immunol."},{"key":"2023020211253227400_B16","doi-asserted-by":"crossref","first-page":"95","DOI":"10.2174\/157489307780618222","article-title":"Advances in exploration of machine learning methods for predicting functional class and interaction profiles of proteins and peptides irrespective of sequence homology","volume":"2","author":"Cui","year":"2007","journal-title":"Curr. Bioinformatics"},{"key":"2023020211253227400_B17","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1146\/annurev.biochem.73.011303.074048","article-title":"Structural insights into the signal recognition particle","volume":"73","author":"Doudna","year":"2004","journal-title":"Annu. Rev. Biochem."},{"key":"2023020211253227400_B18","doi-asserted-by":"crossref","first-page":"8700","DOI":"10.1073\/pnas.92.19.8700","article-title":"Prediction of protein folding class using global description of amino acid sequence","volume":"92","author":"Dubchak","year":"1995","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020211253227400_B19","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1002\/(SICI)1097-0134(199606)25:2<157::AID-PROT2>3.0.CO;2-F","article-title":"Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods","volume":"25","author":"Eisenhaber","year":"1996","journal-title":"Proteins"},{"key":"2023020211253227400_B20","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1023\/A:1007091128394","article-title":"Prediction of membrane protein types based on the hydrophobic index of amino acids","volume":"19","author":"Feng","year":"2000","journal-title":"J. Protein Chem."},{"key":"2023020211253227400_B21","doi-asserted-by":"crossref","first-page":"W210","DOI":"10.1093\/nar\/gkl093","article-title":"pTARGET: a web server for predicting protein subcellular localization","volume":"34","author":"Guda","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020211253227400_B22","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/S0092-8674(00)81683-9","article-title":"The hallmarks of cancer","volume":"100","author":"Hanahan","year":"2000","journal-title":"Cell"},{"key":"2023020211253227400_B23","doi-asserted-by":"crossref","first-page":"W585","DOI":"10.1093\/nar\/gkm259","article-title":"WoLF PSORT: protein localization predictor","volume":"35","author":"Horton","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023020211253227400_B24","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1006\/jmbi.2001.4580","article-title":"A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach","volume":"308","author":"Hua","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023020211253227400_B25","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/j.lungcan.2006.06.011","article-title":"Proteomics-based identification of secreted protein dihydrodiol dehydrogenase as a novel serum markers of non-small cell lung cancer","volume":"54","author":"Huang","year":"2006","journal-title":"Lung Cancer"},{"key":"2023020211253227400_B26","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1162\/089976601300014493","article-title":"Improvements to Platt's SMO algorithm for SVM classifier design","volume":"13","author":"Keerthi","year":"2001","journal-title":"Neural Comput."},{"key":"2023020211253227400_B27","doi-asserted-by":"crossref","first-page":"1671","DOI":"10.1001\/jama.287.13.1671","article-title":"Osteopontin as a potential diagnostic biomarker for ovarian cancer","volume":"287","author":"Kim","year":"2002","journal-title":"J. Am. Med. Assoc."},{"key":"2023020211253227400_B28","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1158\/1078-0432.473.11.2","article-title":"Identification of gastric cancer-related genes using a cDNA microarray containing novel expressed sequence tags expressed in gastric cancer cells","volume":"11","author":"Kim","year":"2005","journal-title":"Clin. Cancer Res."},{"key":"2023020211253227400_B29","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1016\/S1567-5769(02)00028-0","article-title":"Synthesis of factor D by gastric cancer-derived cell lines","volume":"2","author":"Kitano","year":"2002","journal-title":"Int. Immunopharmacol."},{"key":"2023020211253227400_B30","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1016\/j.drudis.2007.01.008","article-title":"Computational classification of classically secreted proteins","volume":"12","author":"Klee","year":"2007","journal-title":"Drug Discov. Today"},{"key":"2023020211253227400_B31","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.lungcan.2007.08.037","article-title":"Identification of genes involved in squamous cell carcinoma of the lung using synchronized data from DNA copy number and transcript expression profiling analysis","volume":"59","author":"Lo","year":"2008","journal-title":"Lung Cancer"},{"key":"2023020211253227400_B32","doi-asserted-by":"crossref","first-page":"2145","DOI":"10.1256\/003590002320603584","article-title":"Areas beneath the relative operating characteristics (ROC) and levels (ROL) curves: statistical significance and interpretation","volume":"128","author":"Mason","year":"2002","journal-title":"Q. J. Roy. Meteorol. Soc"},{"key":"2023020211253227400_B33","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1093\/bioinformatics\/16.8.741","article-title":"A comparison of signal sequence prediction methods using a test set of signal peptides","volume":"16","author":"Menne","year":"2000","journal-title":"Bioinformatics"},{"key":"2023020211253227400_B34","doi-asserted-by":"crossref","first-page":"1458","DOI":"10.1093\/jnci\/93.19.1458","article-title":"Prostasin, a potential serum marker for ovarian cancer: identification through microarray technology","volume":"93","author":"Mok","year":"2001","journal-title":"J. Natl Cancer Inst."},{"key":"2023020211253227400_B35","doi-asserted-by":"crossref","first-page":"1168","DOI":"10.1101\/gr.96802","article-title":"Predicting protein cellular localization using a domain projection method","volume":"12","author":"Mott","year":"2002","journal-title":"Genome Res."},{"key":"2023020211253227400_B36","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.jmb.2005.02.025","article-title":"Mimicking cellular sorting improves prediction of subcellular localization","volume":"348","author":"Nair","year":"2005","journal-title":"J. Mol. Biol."},{"key":"2023020211253227400_B37","doi-asserted-by":"crossref","first-page":"3226","DOI":"10.1002\/pmic.200500358","article-title":"Overview of the HUPO plasma proteome project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database","volume":"5","author":"Omenn","year":"2005","journal-title":"Proteomics"},{"key":"2023020211253227400_B38","doi-asserted-by":"crossref","first-page":"17923","DOI":"10.1073\/pnas.0506483102","article-title":"A human transporter protein that mediates the final excretion step for toxic organic cations","volume":"102","author":"Otsuka","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020211253227400_B39","doi-asserted-by":"crossref","first-page":"2802","DOI":"10.1021\/pr070021t","article-title":"Biomarker discovery from uveal melanoma secretomes: identification of gp100 and cathepsin D in patient serum","volume":"6","author":"Pardo","year":"2007","journal-title":"J. Proteome Res."},{"key":"2023020211253227400_B40","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1002\/pmic.200300449","article-title":"The human serum proteome: display of nearly 3700 chromatographically separated protein spots on two-dimensional electrophoresis gels and identification of 325 distinct proteins","volume":"3","author":"Pieper","year":"2003","journal-title":"Proteomics"},{"key":"2023020211253227400_B41","first-page":"185","article-title":"Fast training of support vector machines using sequential minimal optimization","volume-title":"Advances in Kernel Methods: Support Vector Learning","author":"Platt","year":"1999"},{"key":"2023020211253227400_B42","first-page":"3616","article-title":"The DEF data base of sequence based protein fold class predictions","volume":"22","author":"Reczko","year":"1994","journal-title":"Nucleic Acids Res."},{"key":"2023020211253227400_B43","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1002\/pmic.200390058","article-title":"Use of serological proteomic methods to find biomarkers associated with breast cancer","volume":"3","author":"Rui","year":"2003","journal-title":"Proteomics"},{"key":"2023020211253227400_B44","doi-asserted-by":"crossref","first-page":"S55","DOI":"10.1016\/S0167-7799(01)01800-5","article-title":"Peptidomics technologies for human body fluids","volume":"19","author":"Schrader","year":"2001","journal-title":"Trends Biotechnol"},{"key":"2023020211253227400_B45","doi-asserted-by":"crossref","first-page":"2536","DOI":"10.1093\/bioinformatics\/btl623","article-title":"Protein solubility: sequence based prediction and experimental verification","volume":"23","author":"Smialowski","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020211253227400_B46","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1038\/313745a0","article-title":"Autocrine growth factors and cancer","volume":"313","author":"Sporn","year":"1985","journal-title":"Nature"},{"key":"2023020211253227400_B47","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1186\/1471-2105-8-330","article-title":"Protein subcellular localization prediction based on compartment-specific features and structure conservation","volume":"8","author":"Su","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023020211253227400_B48","doi-asserted-by":"crossref","first-page":"9996","DOI":"10.1158\/0008-5472.CAN-07-1601","article-title":"Derivation of stable microarray cancer-differentiating signatures using consensus scoring of multiple random sampling and gene-ranking consistency evaluation","volume":"67","author":"Tang","year":"2007","journal-title":"Cancer Res."},{"key":"2023020211253227400_B49","doi-asserted-by":"crossref","first-page":"184","DOI":"10.6026\/97320630001184","article-title":"TATPred: a Bayesian method for the identification of twin arginine translocation pathway signal sequences","volume":"1","author":"Taylor","year":"2006","journal-title":"Bioinformation"},{"key":"2023020211253227400_B50","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1128\/MMBR.64.3.515-547.2000","article-title":"Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome","volume":"64","author":"Tjalsma","year":"2000","journal-title":"Microbiol. Mol. Biol. Rev."},{"key":"2023020211253227400_B51","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1002\/pmic.200390008","article-title":"Serological and proteomic evaluation of antibody responses in the identification of tumor antigens in renal cell carcinoma","volume":"3","author":"Unwin","year":"2003","journal-title":"Proteomics"},{"key":"2023020211253227400_B52","doi-asserted-by":"crossref","first-page":"1176","DOI":"10.1073\/pnas.98.3.1176","article-title":"Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer","volume":"98","author":"Welsh","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020211253227400_B53","doi-asserted-by":"crossref","first-page":"3410","DOI":"10.1073\/pnas.0530278100","article-title":"Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum","volume":"100","author":"Welsh","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/20\/2370\/49051208\/bioinformatics_24_20_2370.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/20\/2370\/49051208\/bioinformatics_24_20_2370.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T14:08:46Z","timestamp":1675346926000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/20\/2370\/258417"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,12]]},"references-count":53,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2008,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn418","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,10,15]]},"published":{"date-parts":[[2008,8,12]]}}}