{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T16:15:30Z","timestamp":1774800930409,"version":"3.50.1"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T00:00:00Z","timestamp":1473379200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T00:00:00Z","timestamp":1473379200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"crossref","award":["N66001-11-1-4183"],"award-info":[{"award-number":["N66001-11-1-4183"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"crossref","award":["W911NF-16-C-0050"],"award-info":[{"award-number":["W911NF-16-C-0050"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic\/classification models in clinical\/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-016-1236-x","type":"journal-article","created":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T10:48:06Z","timestamp":1473418086000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":178,"title":["A robust data scaling algorithm to improve classification accuracies in biomedical data"],"prefix":"10.1186","volume":"17","author":[{"given":"Xi Hang","family":"Cao","sequence":"first","affiliation":[]},{"given":"Ivan","family":"Stojkovic","sequence":"additional","affiliation":[]},{"given":"Zoran","family":"Obradovic","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2016,9,9]]},"reference":[{"key":"1236_CR1","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.csbj.2014.11.005","volume":"13","author":"K Kourou","year":"2015","unstructured":"Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015; 13:8\u201317.","journal-title":"Comput Struct Biotechnol J"},{"issue":"12","key":"1236_CR2","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1089\/omi.2013.0017","volume":"17","author":"AL Swan","year":"2013","unstructured":"Swan AL, Mobasheri A, Allaway D, Liddell S, Bacardit J. Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. Omics: J Integr Biol. 2013; 17(12):595\u2013610.","journal-title":"Omics: J Integr Biol"},{"issue":"4\u20135","key":"1236_CR3","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1002\/pmic.201300289","volume":"14","author":"P Kelchtermans","year":"2014","unstructured":"Kelchtermans P, Bittremieux W, Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: How the past can boost the future. Proteomics. 2014; 14(4\u20135):353\u201366.","journal-title":"Proteomics"},{"issue":"1","key":"1236_CR4","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1475-925X-13-94","volume":"13","author":"KR Foster","year":"2014","unstructured":"Foster KR, Koprowski R, Skufca JD. Machine learning, medical diagnosis, and biomedical engineering research-commentary. Biomed Eng Online. 2014; 13(1):94.","journal-title":"Biomed Eng Online"},{"issue":"25","key":"1236_CR5","doi-asserted-by":"publisher","first-page":"6240","DOI":"10.1200\/JCO.2005.06.866","volume":"23","author":"M Maltoni","year":"2005","unstructured":"Maltoni M, Caraceni A, Brunelli C, Broeckaert B, Christakis N, Eychmueller S, Glare P, Nabal M, Vigano A, Larkin P, et al. Prognostic factors in advanced cancer patients: evidence-based clinical recommendations\u2013a study by the steering committee of the european association for palliative care. Journal of Clinical Oncology. 2005; 23(25):6240\u2013248.","journal-title":"Journal of Clinical Oncology"},{"key":"1236_CR6","volume-title":"Data Mining: Concepts and Techniques: Concepts and Techniques","author":"J Han","year":"2011","unstructured":"Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques: Concepts and Techniques. Massachusetts: Morgan Kaufmann Publishers; 2011."},{"key":"1236_CR7","volume-title":"Neural Networks and Learning Machines","author":"SS Haykin","year":"2009","unstructured":"Haykin SS. Neural Networks and Learning Machines. New Jersey: Pearson Education Upper Saddle River; 2009."},{"key":"1236_CR8","first-page":"111","volume":"12","author":"S Dudoit","year":"2002","unstructured":"Dudoit S, Yang YH, Callow MJ, Speed TP. Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Stat Sin. 2002; 12:111\u2013139.","journal-title":"Stat Sin."},{"key":"1236_CR9","volume-title":"Bioinformatics and Bioengineering (BIBE), 2015 IEEE 15th International Conference On","author":"XH Cao","year":"2015","unstructured":"Cao XH, Obradovic Z. A robust data scaling algorithm for gene expression classification. In: Bioinformatics and Bioengineering (BIBE), 2015 IEEE 15th International Conference On. Belgrade: IEEE: 2015. p. 1\u20134."},{"key":"1236_CR10","volume-title":"Digital image processing","author":"R Gonzalez","year":"2008","unstructured":"Gonzalez R, Woods R. Digital image processing. Upper Saddle River: Pearson Prentice Hall; 2008."},{"issue":"1","key":"1236_CR11","first-page":"114","volume":"2","author":"SR Bowling","year":"2009","unstructured":"Bowling SR, Khasawneh MT, Kaewkuekool S, Cho BR. A logistic approximation to the cumulative normal distribution. J Ind Eng Manag. 2009; 2(1):114\u201327.","journal-title":"J Ind Eng Manag"},{"key":"1236_CR12","unstructured":"Acuna E, Rodriguez C. A meta analysis study of outlier detection methods in classification. Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez. 2004."},{"issue":"7","key":"1236_CR13","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1016\/j.ijmedinf.2005.05.002","volume":"74","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF. Gems: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform. 2005; 74(7):491\u2013503.","journal-title":"Int J Med Inform"},{"issue":"1","key":"1236_CR14","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1093\/nar\/30.1.207","volume":"30","author":"R Edgar","year":"2002","unstructured":"Edgar R, Domrachev M, Lash AE. Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30(1):207\u201310.","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"1236_CR15","doi-asserted-by":"publisher","first-page":"2130","DOI":"10.1101\/gr.138347.112","volume":"22","author":"R H\u00e4sler","year":"2012","unstructured":"H\u00e4sler R, Feng Z, B\u00e4ckdahl L, Spehlmann ME, Franke A, Teschendorff A, Rakyan VK, Down TA, Wilson GA, Feber A, et al. A functional methylome map of ulcerative colitis. Genome Res. 2012; 22(11):2130\u2013137.","journal-title":"Genome Res"},{"issue":"26","key":"1236_CR16","doi-asserted-by":"publisher","first-page":"15149","DOI":"10.1073\/pnas.211566398","volume":"98","author":"S Ramaswamy","year":"2001","unstructured":"Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci. 2001; 98(26):15149\u201315154.","journal-title":"Proc Natl Acad Sci"},{"issue":"5439","key":"1236_CR17","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999; 286(5439):531\u20137.","journal-title":"Science"},{"issue":"1","key":"1236_CR18","first-page":"50","volume":"4","author":"MG Kibriya","year":"2011","unstructured":"Kibriya MG, Raza M, Jasmine F, Roy S, Paul-Brutus R, Rahaman R, Dodsworth C, Rakibuz-Zaman M, Kamal M, Ahsan H. A genome-wide dna methylation study in colorectal carcinoma. BMC Med Genet. 2011; 4(1):50.","journal-title":"BMC Med Genet"},{"issue":"4","key":"1236_CR19","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1016\/j.bbrc.2011.02.082","volume":"406","author":"OH Kwon","year":"2011","unstructured":"Kwon OH, Park JL, Kim M, Kim JH, Lee HC, Kim HJ, Noh SM, Song KS, Yoo HS, Paik SG, et al. Aberrant up-regulation of lamb3 and lamc2 by promoter demethylation in gastric cancer. Biochem Biophys Res Commun. 2011; 406(4):539\u201345.","journal-title":"Biochem Biophys Res Commun"},{"issue":"5","key":"1236_CR20","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1007\/BF02520002","volume":"34","author":"J Jossinet","year":"1996","unstructured":"Jossinet J. Variability of impedivity in normal and pathological breast tissue. Med Biol Eng Comput. 1996; 34(5):346\u201350.","journal-title":"Med Biol Eng Comput"},{"issue":"1","key":"1236_CR21","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1109\/TNSRE.2013.2293575","volume":"22","author":"A Tsanas","year":"2014","unstructured":"Tsanas A, Little MA, Fox C, Ramig LO. Objective automatic assessment of rehabilitative speech treatment in parkinson\u2019s disease. IEEE Trans Neural Syst Rehabil Eng. 2014; 22(1):181\u201390.","journal-title":"IEEE Trans Neural Syst Rehabil Eng"},{"issue":"25","key":"1236_CR22","doi-asserted-by":"publisher","first-page":"1937","DOI":"10.1056\/NEJMoa012914","volume":"346","author":"A Rosenwald","year":"2002","unstructured":"Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Giltnane JM, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med. 2002; 346(25):1937\u20131947.","journal-title":"N Engl J Med"},{"issue":"26","key":"1236_CR23","doi-asserted-by":"publisher","first-page":"2483","DOI":"10.1056\/NEJMoa030847","volume":"349","author":"E Tian","year":"2003","unstructured":"Tian E, Zhan F, Walker R, Rasmussen E, Ma Y, Barlogie B, Shaughnessy Jr JD. The role of the wnt-signaling antagonist dkk1 in the development of osteolytic lesions in multiple myeloma. N Engl J Med. 2003; 349(26):2483\u2013494.","journal-title":"N Engl J Med"},{"issue":"4","key":"1236_CR24","doi-asserted-by":"publisher","first-page":"1015","DOI":"10.1109\/TBME.2008.2005954","volume":"56","author":"MA Little","year":"2009","unstructured":"Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO. Suitability of dysphonia measurements for telemonitoring of parkinson\u2019s disease. IEEE Trans Biomed Eng. 2009; 56(4):1015\u20131022.","journal-title":"IEEE Trans Biomed Eng"},{"key":"1236_CR25","volume-title":"IS&T\/SPIE\u2019s Symposium on Electronic Imaging: Science and Technology","author":"WN Street","year":"1993","unstructured":"Street WN, Wolberg WH, Mangasarian OL. Nuclear feature extraction for breast tumor diagnosis. In: IS&T\/SPIE\u2019s Symposium on Electronic Imaging: Science and Technology. San Jose: International Society for Optics and Photonics: 1993. p. 861\u201370."},{"issue":"2","key":"1236_CR26","doi-asserted-by":"publisher","first-page":"101","DOI":"10.5121\/ijdms.2011.3207","volume":"3","author":"BV Ramana","year":"2011","unstructured":"Ramana BV, Babu MSP, Venkateswarlu N. A critical study of selected classification algorithms for liver disease diagnosis. Int J Database Manag Syst. 2011; 3(2):101\u201314.","journal-title":"Int J Database Manag Syst"},{"key":"1236_CR27","unstructured":"Smith JW, Everhart J, Dickson W, Knowler W, Johannes R. Using the adap learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Annual Symposium on Computer Application in Medical Care. American Medical Informatics Association: 1988. p. 261."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1236-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-016-1236-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1236-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T18:10:04Z","timestamp":1706811004000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-1236-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,9]]},"references-count":27,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["1236"],"URL":"https:\/\/doi.org\/10.1186\/s12859-016-1236-x","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,9]]},"assertion":[{"value":"1 April 2016","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 September 2016","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2016","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"359"}}