{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T20:02:47Z","timestamp":1774987367324,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data (\"expression deconvolution\"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected \"hub\" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-322","type":"journal-article","created":{"date-parts":[[2011,8,5]],"date-time":"2011-08-05T06:57:16Z","timestamp":1312527436000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":271,"title":["Strategies for aggregating gene expression data: The collapseRows R function"],"prefix":"10.1186","volume":"12","author":[{"given":"Jeremy A","family":"Miller","sequence":"first","affiliation":[]},{"given":"Chaochao","family":"Cai","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Langfelder","sequence":"additional","affiliation":[]},{"given":"Daniel H","family":"Geschwind","sequence":"additional","affiliation":[]},{"given":"Sunil M","family":"Kurian","sequence":"additional","affiliation":[]},{"given":"Daniel R","family":"Salomon","sequence":"additional","affiliation":[]},{"given":"Steve","family":"Horvath","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,8,4]]},"reference":[{"issue":"1","key":"4746_CR1","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1186\/1752-0509-1-54","volume":"1","author":"P Langfelder","year":"2007","unstructured":"Langfelder P, Horvath S: Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol 2007, 1(1):54. 10.1186\/1752-0509-1-54","journal-title":"BMC Syst Biol"},{"issue":"4","key":"4746_CR2","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1038\/ng776","volume":"29","author":"H Ge","year":"2001","unstructured":"Ge H, Liu Z, Church GM, Vidal M: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nature Genetics 2001, 29(4):482\u2013486. 10.1038\/ng776","journal-title":"Nature Genetics"},{"issue":"28","key":"4746_CR3","doi-asserted-by":"publisher","first-page":"12698","DOI":"10.1073\/pnas.0914257107","volume":"107","author":"J Miller","year":"2010","unstructured":"Miller J, Horvath S, Geschwind D: Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proceedings of the National Academy of Sciences of the United States of America 2010, 107(28):12698\u201312703. 10.1073\/pnas.0914257107","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"11","key":"4746_CR4","doi-asserted-by":"publisher","first-page":"1271","DOI":"10.1038\/nn.2207","volume":"11","author":"M Oldham","year":"2008","unstructured":"Oldham M, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind D: Functional organization of the transcriptome in human brain. Nature Neuroscience 2008, 11(11):1271\u20131282. 10.1038\/nn.2207","journal-title":"Nature Neuroscience"},{"issue":"1","key":"4746_CR5","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1186\/1471-2105-9-559","volume":"9","author":"P Langfelder","year":"2008","unstructured":"Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9(1):559. 10.1186\/1471-2105-9-559","journal-title":"BMC Bioinformatics"},{"key":"4746_CR6","unstructured":"Miller J, Langfelder P, Chaochao C, Horvath S: The collapseRows function.[http:\/\/www.genetics.ucla.edu\/labs\/horvath\/CoexpressionNetwork\/collapseRows]"},{"issue":"7","key":"4746_CR7","doi-asserted-by":"publisher","first-page":"e6098","DOI":"10.1371\/journal.pone.0006098","volume":"4","author":"A Abbas","year":"2009","unstructured":"Abbas A, Wolslegel K, Seshasayee D, Modrusan Z, Clark H: Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PloS one 2009, 4(7):e6098. 10.1371\/journal.pone.0006098","journal-title":"PloS one"},{"issue":"3","key":"4746_CR8","doi-asserted-by":"publisher","first-page":"e1000873","DOI":"10.1371\/journal.pgen.1000873","volume":"6","author":"V Dumeaux","year":"2010","unstructured":"Dumeaux V, Olsen K, Nuel G, Paulssen R, Borresen-Dale AL, Lund E: Deciphering normal blood gene expression variation--The NOWAC postgenome study. PLoS Genet 2010, 6(3):e1000873. 10.1371\/journal.pgen.1000873","journal-title":"PLoS Genet"},{"issue":"10","key":"4746_CR9","doi-asserted-by":"publisher","first-page":"1208","DOI":"10.1038\/ng2119","volume":"39","author":"H Goring","year":"2007","unstructured":"Goring H, Curran J, Johnson M, Dyer T, Charlesworth J, Cole S, Jowett J, Abraham L, Rainwater D, Comuzzie A, et al.: Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nature Genetics 2007, 39(10):1208\u20131216. 10.1038\/ng2119","journal-title":"Nature Genetics"},{"issue":"10","key":"4746_CR10","doi-asserted-by":"publisher","first-page":"e13358","DOI":"10.1371\/journal.pone.0013358","volume":"5","author":"Y Grigoryev","year":"2010","unstructured":"Grigoryev Y, Kurian S, Avnur Z, Borie D, Deng J, Campbell D, Sung J, Nikolcheva T, Quinn A, Schulman H, et al.: Deconvoluting post-transplant immunity: cell subset-specific mapping reveals pathways for activation and expansion of memory T, monocytes and B cells. PloS one 2010, 5(10):e13358. 10.1371\/journal.pone.0013358","journal-title":"PloS one"},{"issue":"11","key":"4746_CR11","doi-asserted-by":"publisher","first-page":"R127","DOI":"10.1186\/gb-2009-10-11-r127","volume":"10","author":"R Pankla","year":"2009","unstructured":"Pankla R, Buddhisa S, Berry M, Blankenship D, Bancroft G, Banchereau J, Lertmemongkolchai G, Chaussabel D: Genomic transcriptional profiling identifies a candidate blood biomarker signature for the diagnosis of septicemic melioidosis. Genome Biol 2009, 10(11):R127. 10.1186\/gb-2009-10-11-r127","journal-title":"Genome Biol"},{"issue":"1","key":"4746_CR12","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1186\/1471-2164-10-405","volume":"10","author":"C Saris","year":"2009","unstructured":"Saris C, Horvath S, van Vught P, van Es M, Blauw H, Fuller T, Langfelder P, DeYoung J, Wokke J, Veldink J, et al.: Weighted gene co-expression network analysis of the peripheral blood from Amyotrophic Lateral Sclerosis patients. BMC Genomics 2009, 10(1):405. 10.1186\/1471-2164-10-405","journal-title":"BMC Genomics"},{"issue":"8","key":"4746_CR13","doi-asserted-by":"publisher","first-page":"e1000117","DOI":"10.1371\/journal.pcbi.1000117","volume":"4","author":"S Horvath","year":"2008","unstructured":"Horvath S, Dong J: Geometric interpretation of gene coexpression network analysis. PLoS computational biology 2008, 4(8):e1000117. 10.1371\/journal.pcbi.1000117","journal-title":"PLoS computational biology"},{"issue":"47","key":"4746_CR14","doi-asserted-by":"publisher","first-page":"17973","DOI":"10.1073\/pnas.0605938103","volume":"103","author":"MC Oldham","year":"2006","unstructured":"Oldham MC, Horvath S, Geschwind DH: Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 2006, 103(47):17973\u201317978. 10.1073\/pnas.0605938103","journal-title":"Proc Natl Acad Sci U S A"},{"key":"4746_CR15","doi-asserted-by":"crossref","first-page":"Article17","DOI":"10.2202\/1544-6115.1128","volume":"4","author":"B Zhang","year":"2005","unstructured":"Zhang B, Horvath S: A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 2005, 4: Article17.","journal-title":"Stat Appl Genet Mol Biol"},{"issue":"24","key":"4746_CR16","doi-asserted-by":"publisher","first-page":"9490","DOI":"10.1158\/0008-5472.CAN-09-2183","volume":"69","author":"L Wang","year":"2009","unstructured":"Wang L, Tang H, Thayanithy V, Subramanian S, Oberg A, Cunningham J, Cerhan J, Steer C, Thibodeau SN: Gene Networks and microRNAs Implicated in Aggressive Prostate Cancer. Cancer Research 2009, 69(24):9490\u20139497. 10.1158\/0008-5472.CAN-09-2183","journal-title":"Cancer Research"},{"issue":"1","key":"4746_CR17","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/1755-8794-4-5","volume":"4","author":"S Ma","year":"2011","unstructured":"Ma S, Kosorok M, Huang J, Dai Y: Incorporating higher-order representative features improves prediction in network-based cancer prognosis analysis. BMC Med Genomics 2011, 4(1):5. 10.1186\/1755-8794-4-5","journal-title":"BMC Med Genomics"},{"issue":"24","key":"4746_CR18","doi-asserted-by":"publisher","first-page":"10060","DOI":"10.1158\/0008-5472.CAN-10-2465","volume":"70","author":"A Ivliev","year":"2010","unstructured":"Ivliev A, 't Hoen P, Sergeeva M: Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Research 2010, 70(24):10060\u201310070. 10.1158\/0008-5472.CAN-10-2465","journal-title":"Cancer Research"},{"issue":"1","key":"4746_CR19","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1186\/1752-0509-2-16","volume":"2","author":"D Weston","year":"2008","unstructured":"Weston D, Gunter L, Rogers A, Wullschleger S: Connecting Genes, Coexpression Modules, and Molecular Signatures to Environmental Stress Phenotypes in Plants. BMC Syst Biol 2008, 2(1):16. 10.1186\/1752-0509-2-16","journal-title":"BMC Syst Biol"},{"issue":"5","key":"4746_CR20","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1093\/bioinformatics\/btm563","volume":"24","author":"P Langfelder","year":"2008","unstructured":"Langfelder P, Zhang B, Horvath S: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 2008, 24(5):719\u2013720. 10.1093\/bioinformatics\/btm563","journal-title":"Bioinformatics"},{"issue":"8","key":"4746_CR21","doi-asserted-by":"publisher","first-page":"1043","DOI":"10.1093\/bioinformatics\/btq097","volume":"26","author":"J Clarke","year":"2010","unstructured":"Clarke J, Seo P, Clarke B: Statistical expression deconvolution from mixed tissue samples. Bioinformatics 2010, 26(8):1043\u20131049. 10.1093\/bioinformatics\/btq097","journal-title":"Bioinformatics"},{"issue":"18","key":"4746_CR22","doi-asserted-by":"publisher","first-page":"10370","DOI":"10.1073\/pnas.1832361100","volume":"100","author":"P Lu","year":"2003","unstructured":"Lu P, Nakorchevskiy A, Marcotte E: Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proceedings of the National Academy of Sciences of the United States of America 2003, 100(18):10370\u201310375. 10.1073\/pnas.1832361100","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"2","key":"4746_CR23","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1073\/pnas.2536479100","volume":"101","author":"R Stuart","year":"2004","unstructured":"Stuart R, Wachsman W, Berry C, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, et al.: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proceedings of the National Academy of Sciences of the United States of America 2004, 101(2):615\u2013620. 10.1073\/pnas.2536479100","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"1","key":"4746_CR24","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1186\/1471-2164-11-294","volume":"11","author":"B Ballester","year":"2010","unstructured":"Ballester B, Johnson N, Proctor G, Flicek P: Consistent annotation of gene expression arrays. BMC Genomics 2010, 11(1):294. 10.1186\/1471-2164-11-294","journal-title":"BMC Genomics"},{"issue":"11","key":"4746_CR25","doi-asserted-by":"publisher","first-page":"879","DOI":"10.1038\/nmeth1107-879","volume":"4","author":"R Chen","year":"2007","unstructured":"Chen R, Li L, Butte A: AILUN: reannotating gene expression data automatically. Nature methods 2007, 4(11):879\u2013879. 10.1038\/nmeth1107-879","journal-title":"Nature methods"},{"issue":"20","key":"4746_CR26","doi-asserted-by":"publisher","first-page":"e175","DOI":"10.1093\/nar\/gni179","volume":"33","author":"M Dai","year":"2005","unstructured":"Dai M, Wang P, Boyd A, Kostov G, Athey B, Jones E, Bunney W, Myers R, Speed T, Akil H, et al.: Evolving gene\/transcript definitions significantly alter the interpretation of GeneChip data. Nucl Acids Res 2005, 33(20):e175-e175. 10.1093\/nar\/gni179","journal-title":"Nucl Acids Res"},{"issue":"11","key":"4746_CR27","doi-asserted-by":"publisher","first-page":"SOFTWARE0002","DOI":"10.1186\/gb-2001-2-11-software0002","volume":"2","author":"J Tsai","year":"2001","unstructured":"Tsai J, Sultana R, Lee Y, Pertea G, Karamycheva S, Antonescu V, Cho J, Parvizi B, Cheung F, Quackenbush J: RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol 2001, 2(11):SOFTWARE0002.","journal-title":"Genome Biol"},{"issue":"18","key":"4746_CR28","doi-asserted-by":"publisher","first-page":"3681","DOI":"10.1093\/bioinformatics\/bti587","volume":"21","author":"T Liefeld","year":"2005","unstructured":"Liefeld T, Reich M, Gould J, Zhang P, Tamayo P, Mesirov J: GeneCruiser: a web service for the annotation of microarray data. Bioinformatics 2005, 21(18):3681\u20133682. 10.1093\/bioinformatics\/bti587","journal-title":"Bioinformatics"},{"issue":"13","key":"4746_CR29","doi-asserted-by":"publisher","first-page":"1665","DOI":"10.1093\/bioinformatics\/btl163","volume":"22","author":"F Pan","year":"2006","unstructured":"Pan F, Kamath K, Zhang K, Pulapura S, Achar A, Nunez-Iglesias J, Huang Y, Yan X, Han J, Hu H, et al.: Integrative Array Analyzer: a software package for analysis of cross-platform and cross-species microarray data. Bioinformatics 2006, 22(13):1665\u20131667. 10.1093\/bioinformatics\/btl163","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-322.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T15:59:54Z","timestamp":1630511994000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-322"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,8,4]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4746"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-322","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,8,4]]},"assertion":[{"value":"23 May 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"322"}}