{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,17]],"date-time":"2024-05-17T21:45:07Z","timestamp":1715982307307},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-153","type":"journal-article","created":{"date-parts":[[2011,5,13]],"date-time":"2011-05-13T18:28:50Z","timestamp":1305311330000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["To aggregate or not to aggregate high-dimensional classifiers"],"prefix":"10.1186","volume":"12","author":[{"given":"Cheng-Jian","family":"Xu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huub CJ","family":"Hoefsloot","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Age K","family":"Smilde","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2011,5,13]]},"reference":[{"key":"4530_CR1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining, Inference and Prediction","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd edition. New York: Springer; 2009.","edition":"2"},{"key":"4530_CR2","volume-title":"Proceedings of the international Congress of Mathematicians","author":"JQ Fan","year":"2006","unstructured":"Fan JQ, Li RZ, Statistical challenges with high dimensionality: feature selection in knowledge discovery. In Proceedings of the international Congress of Mathematicians. Madrid, Spain: 2006 European Mathematical Society; 2006."},{"key":"4530_CR3","volume-title":"Introduction to Statistical Pattern Recognition","author":"K Fukunaga","year":"1990","unstructured":"Fukunaga K: Introduction to Statistical Pattern Recognition. New York: Academic Press; 1990."},{"issue":"10","key":"4530_CR4","doi-asserted-by":"publisher","first-page":"1713","DOI":"10.1016\/S0031-3203(99)00139-9","volume":"33","author":"LF Chen","year":"2000","unstructured":"Chen LF, Liao HYM, Ko MT, Lin JC, Yu GJ: A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 2000, 33(10):1713\u20131726.","journal-title":"Pattern Recognit"},{"issue":"7","key":"4530_CR5","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1016\/S0031-3203(97)00110-6","volume":"31","author":"M Skurichina","year":"1998","unstructured":"Skurichina M, Duin RPW: Bagging for linear classifiers. Pattern Recognit 1998, 31(7):909\u2013930.","journal-title":"Pattern Recognit"},{"issue":"2","key":"4530_CR6","first-page":"123","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L: Bagging predictors. Mach Learn 1996, 24(2):123\u2013140.","journal-title":"Mach Learn"},{"issue":"14","key":"4530_CR7","doi-asserted-by":"publisher","first-page":"3138","DOI":"10.1093\/bioinformatics\/bti494","volume":"21","author":"P Geurts","year":"2005","unstructured":"Geurts P, Fillet M, de Seny D, Meuwis MA, Malaise M, Merville MP, Wehenkel L: Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics 2005, 21(14):3138\u20133145.","journal-title":"Bioinformatics"},{"key":"4530_CR8","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1186\/1471-2105-9-319","volume":"9","author":"A Statnikov","year":"2008","unstructured":"Statnikov A, Wang L, Aliferis CF: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 2008, 9: 319.","journal-title":"BMC Bioinformatics"},{"issue":"16","key":"4530_CR9","doi-asserted-by":"publisher","first-page":"9608","DOI":"10.1073\/pnas.1632587100","volume":"100","author":"EC Gunther","year":"2003","unstructured":"Gunther EC, Stone DJ, Gerwien RW, Bento P, Heyes MP: Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc Natl Acad Sci USA 2003, 100(16):9608\u20139613.","journal-title":"Proc Natl Acad Sci USA"},{"key":"4530_CR10","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1155\/2009\/158368","volume":"2009","author":"TT Vu","year":"2009","unstructured":"Vu TT, Braga-Neto UM: Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data? EURASIP Journal on Bioinformatics and Systems Biology 2009, 2009: 10. Article ID 158368 Article ID 158368","journal-title":"EURASIP Journal on Bioinformatics and Systems Biology"},{"key":"4530_CR11","volume-title":"International Joint Conference on Artificial Intelligence (IJCAI)","author":"R Kohavi","year":"1995","unstructured":"Kohavi R: \"A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence (IJCAI) 1995. [http:\/\/robotics.stanford.edu\/users\/ronnyk\/]"},{"issue":"1","key":"4530_CR12","first-page":"323","volume":"2004","author":"SB Kotsiantis","year":"2004","unstructured":"Kotsiantis SB, Pintelas PE: Combining Bagging and Boosting. International Journal of computational Intelligence 2004, 2004(1):323\u2013333.","journal-title":"International Journal of computational Intelligence"},{"issue":"2","key":"4530_CR13","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1016\/j.aca.2007.04.043","volume":"592","author":"S Smit","year":"2007","unstructured":"Smit S, van Breemen MJ, Hoefsloot HCJ, Smilde AK, Aerts J, de Koster CG: Assessing the statistical validity of proteomics based biomarkers. Anal Chim Acta 2007, 592(2):210\u2013217.","journal-title":"Anal Chim Acta"},{"issue":"11","key":"4530_CR14","doi-asserted-by":"publisher","first-page":"1710","DOI":"10.1021\/ac00261a016","volume":"55","author":"R Hoogerbrugge","year":"1983","unstructured":"Hoogerbrugge R, Willig SJ, Kistemaker PG: Discriminant-analysis by double stage principal component analysis. Anal Chem 1983, 55(11):1710\u20131712.","journal-title":"Anal Chem"},{"issue":"7","key":"4530_CR15","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1109\/34.598228","volume":"19","author":"PN Belhumeur","year":"1997","unstructured":"Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 1997, 19(7):711\u2013720.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"4530_CR16","first-page":"7","volume-title":"Stat Appl Genet Mol Biol","author":"HCJ Hoefsloot","year":"2008","unstructured":"Hoefsloot HCJ, Smit S, Smilde AK: A classification model for the Leiden proteomics competition. Stat Appl Genet Mol Biol 2008., 7(2): Article 8 Article 8"},{"key":"4530_CR17","first-page":"111","volume-title":"Cross-Validatory Choice and Assessment of Statistical Predictions J R Stat Soc B","author":"M Stone","year":"1974","unstructured":"Stone M: Cross-Validatory Choice and Assessment of Statistical Predictions J R Stat Soc B. 1974, 36: 111\u2013147."},{"key":"4530_CR18","volume-title":"Handbook of Chemometrics and Qualimerics: Part B","author":"BGM Vandeginste","year":"1998","unstructured":"Vandeginste BGM, Massart DL, Buydens LMC, Jong SD, Lewi PJ, Smeyers-Verbeke J: Handbook of Chemometrics and Qualimerics: Part B. Amsterdam: Elsevier; 1998."},{"issue":"9","key":"4530_CR19","doi-asserted-by":"publisher","first-page":"1591","DOI":"10.1089\/cmb.2006.13.1591","volume":"13","author":"BJA Mertens","year":"2006","unstructured":"Mertens BJA, De Noo ME, Tollenaar R, Deelder AM: Mass spectrometry proteomic diagnosis: Enacting the double cross-validatory paradigm. J Comput Biol 2006, 13(9):1591\u20131605.","journal-title":"J Comput Biol"},{"issue":"3-4","key":"4530_CR20","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1002\/cem.1225","volume":"23","author":"P Filmoser","year":"2009","unstructured":"Filmoser P, Liebmann B, Varmuza K: Repeated double cross validation. J Chemometr 2009, 23(3\u20134):160\u2013171.","journal-title":"J Chemometr"},{"issue":"5439","key":"4530_CR21","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531\u2013537.","journal-title":"Science"},{"issue":"10","key":"4530_CR22","doi-asserted-by":"publisher","first-page":"6567","DOI":"10.1073\/pnas.082099299","volume":"99","author":"R Tibshirani","year":"2002","unstructured":"Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002, 99(10):6567\u20136572.","journal-title":"Proc Natl Acad Sci USA"},{"key":"4530_CR23","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1080\/00401706.1969.10490666","volume":"11","author":"RW Kennard","year":"1969","unstructured":"Kennard RW, Stone L: Computer aided design of experiments. Technometrics 1969, 11: 137\u2013148.","journal-title":"Technometrics"},{"issue":"1","key":"4530_CR24","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1023\/A:1009778005914","volume":"1","author":"JH Friedman","year":"1997","unstructured":"Friedman JH: On bias, variance, 0\/1 - Loss, and the curse-of-dimensionality. Data Min Knowl Discov 1997, 1(1):55\u201377.","journal-title":"Data Min Knowl Discov"},{"issue":"4","key":"4530_CR25","doi-asserted-by":"publisher","first-page":"927","DOI":"10.1214\/aos\/1031689014","volume":"30","author":"P Buhlmann","year":"2002","unstructured":"Buhlmann P, Yu B: Analyzing bagging. Ann Stat 2002, 30(4):927\u2013961.","journal-title":"Ann Stat"},{"issue":"3","key":"4530_CR26","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1023\/B:MACH.0000027783.34431.42","volume":"55","author":"Y Grandvalet","year":"2004","unstructured":"Grandvalet Y: Bagging equalizes influence. Mach Learn 2004, 55(3):251\u2013270.","journal-title":"Mach Learn"},{"key":"4530_CR27","volume-title":"Statistical Learning from a Regression Perspective","author":"RA Berk","year":"2008","unstructured":"Berk RA: Statistical Learning from a Regression Perspective. New York: Springer-Verlag; 2008."},{"issue":"2","key":"4530_CR28","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1016\/S0031-3203(02)00048-1","volume":"36","author":"J Yang","year":"2003","unstructured":"Yang J, Yang JY: Why can LDA be performed in PCA transformed space? Pattern Recognit 2003, 36(2):563\u2013566.","journal-title":"Pattern Recognit"},{"key":"4530_CR29","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory","author":"V Vapnik","year":"1995","unstructured":"Vapnik V: The Nature of Statistical Learning Theory. Springer-Verlag; 1995."},{"issue":"2","key":"4530_CR30","first-page":"323","volume":"16","author":"A Buja","year":"2006","unstructured":"Buja A, Stuetzle W: Observations on bagging. Stat Sin 2006, 16(2):323\u2013351.","journal-title":"Stat Sin"},{"issue":"4","key":"4530_CR31","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1093\/bioinformatics\/17.4.349","volume":"17","author":"CHQ Ding","year":"2001","unstructured":"Ding CHQ, Dubchak I: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 2001, 17(4):349\u2013358.","journal-title":"Bioinformatics"},{"issue":"1-3","key":"4530_CR32","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1016\/j.febslet.2004.07.055","volume":"573","author":"R Breitling","year":"2004","unstructured":"Breitling R, Armengaud P, Amtmann A, Herzyk P, Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004, 573(1\u20133):83\u201392.","journal-title":"FEBS Lett"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-153.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T13:58:54Z","timestamp":1630504734000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5,13]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4530"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-153","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,5,13]]},"assertion":[{"value":"4 November 2010","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 May 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 May 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"153"}}