{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T13:01:03Z","timestamp":1761742863580},"reference-count":17,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Many of the most popular pre-processing methods for Affymetrix expression arrays, such as RMA, gcRMA, and PLIER, simultaneously analyze data across a set of predetermined arrays to improve precision of the final measures of expression. One problem associated with these algorithms is that expression measurements for a particular sample are highly dependent on the set of samples used for normalization and results obtained by normalization with a different set may not be comparable. A related problem is that an organization producing and\/or storing large amounts of data in a sequential fashion will need to either re-run the pre-processing algorithm every time an array is added or store them in batches that are pre-processed together. Furthermore, pre-processing of large numbers of arrays requires loading all the feature-level data into memory which is a difficult task even with modern computers. We utilize a scheme that produces all the information necessary for pre-processing using a very large training set that can be used for summarization of samples outside of the training set. All subsequent pre-processing tasks can be done on an individual array basis. We demonstrate the utility of this approach by defining a new version of the Robust Multi-chip Averaging (RMA) algorithm which we refer to as refRMA.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We assess performance based on multiple sets of samples processed over HG U133A Affymetrix GeneChip<jats:sup>\u00ae<\/jats:sup> arrays. We show that the refRMA workflow, when used in conjunction with a large, biologically diverse training set, results in the same general characteristics as that of RMA in its classic form when comparing overall data structure, sample-to-sample correlation, and variation. Further, we demonstrate that the refRMA workflow and reference set can be robustly applied to na\u00efve organ types and to benchmark data where its performance indicates respectable results.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Our results indicate that a biologically diverse reference database can be used to train a model for estimating probe set intensities of exclusive test sets, while retaining the overall characteristics of the base algorithm. Although the results we present are specific for RMA, similar versions of other multi-array normalization and summarization schemes can be developed.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-464","type":"journal-article","created":{"date-parts":[[2006,10,24]],"date-time":"2006-10-24T14:28:54Z","timestamp":1161700134000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":70,"title":["A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database"],"prefix":"10.1186","volume":"7","author":[{"given":"Simon","family":"Katz","sequence":"first","affiliation":[]},{"given":"Rafael A","family":"Irizarry","sequence":"additional","affiliation":[]},{"given":"Xue","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Mark","family":"Tripputi","sequence":"additional","affiliation":[]},{"given":"Mark W","family":"Porter","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,10,23]]},"reference":[{"key":"1203_CR1","doi-asserted-by":"publisher","first-page":"e15","DOI":"10.1093\/nar\/gng015","volume":"31","author":"RA Irizarry","year":"2003","unstructured":"Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003, 31: e15. 10.1093\/nar\/gng015","journal-title":"Nucleic Acids Res"},{"key":"1203_CR2","doi-asserted-by":"crossref","first-page":"RESEARCH0032","DOI":"10.1186\/gb-2001-2-10-reports0032","volume":"2","author":"C Li","year":"2001","unstructured":"Li C, Hung Wong W: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2001, 2: RESEARCH0032.","journal-title":"Genome Biol"},{"key":"1203_CR3","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1198\/016214504000000683","volume":"99","author":"Z Wu","year":"2004","unstructured":"Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F: A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association 2004, 99: 909\u2013917. 10.1198\/016214504000000683","journal-title":"Journal of the American Statistical Association"},{"key":"1203_CR4","volume-title":"Guide to Probe Logarithmic Intensity Error (PLIER) Estimation","author":"Affymetrix","year":"2005","unstructured":"Affymetrix: Guide to Probe Logarithmic Intensity Error (PLIER) Estimation. Edited by: Affymetrix I. Santa Clara, CA, ; 2005."},{"key":"1203_CR5","doi-asserted-by":"publisher","first-page":"R16","DOI":"10.1186\/gb-2005-6-2-r16","volume":"6","author":"SE Choe","year":"2005","unstructured":"Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol 2005, 6: R16. 10.1186\/gb-2005-6-2-r16","journal-title":"Genome Biol"},{"key":"1203_CR6","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1093\/bioinformatics\/19.2.185","volume":"19","author":"BM Bolstad","year":"2003","unstructured":"Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19: 185\u2013193. 10.1093\/bioinformatics\/19.2.185","journal-title":"Bioinformatics"},{"key":"1203_CR7","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1093\/biostatistics\/4.2.249","volume":"4","author":"RA Irizarry","year":"2003","unstructured":"Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4: 249\u2013264. 10.1093\/biostatistics\/4.2.249","journal-title":"Biostatistics"},{"key":"1203_CR8","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1186\/1471-2105-6-26","volume":"6","author":"K Shedden","year":"2005","unstructured":"Shedden K, Chen W, Kuick R, Ghosh D, Macdonald J, Cho KR, Giordano TJ, Gruber SB, Fearon ER, Taylor JM, Hanash S: Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data. BMC Bioinformatics 2005, 6: 26. 10.1186\/1471-2105-6-26","journal-title":"BMC Bioinformatics"},{"key":"1203_CR9","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1093\/bioinformatics\/btg410","volume":"20","author":"LM Cope","year":"2004","unstructured":"Cope LM, Irizarry RA, Jaffee HA, Wu Z, Speed TP: A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 2004, 20: 323\u2013331. 10.1093\/bioinformatics\/btg410","journal-title":"Bioinformatics"},{"key":"1203_CR10","doi-asserted-by":"publisher","first-page":"D562","DOI":"10.1093\/nar\/gki022","volume":"33","author":"T Barrett","year":"2005","unstructured":"Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R: NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res 2005, 33: D562\u20136. 10.1093\/nar\/gki022","journal-title":"Nucleic Acids Res"},{"key":"1203_CR11","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1201\/9780429246593","volume-title":"An Introduction to the Bootstrap","author":"B Efron","year":"1994","unstructured":"Efron B, Tibshirani R: An Introduction to the Bootstrap. Boca Raton, Chapman & Hall\/CRC; 1994:456."},{"key":"1203_CR12","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1038\/nmeth756","volume":"2","author":"RA Irizarry","year":"2005","unstructured":"Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods 2005, 2: 345\u2013350. 10.1038\/nmeth756","journal-title":"Nat Methods"},{"issue":"19","key":"1203_CR13","doi-asserted-by":"publisher","first-page":"2364","DOI":"10.1093\/bioinformatics\/btl402","volume":"22","author":"DR Goldstein","year":"2006","unstructured":"Goldstein DR: Partition Resampling and Extrapolation Averaging: approximation methods for quantifying gene expression in large numbers of short oligonucleotide arrays. Bioinformatics 2006, 22(19):2364\u20132372. 10.1093\/bioinformatics\/btl402","journal-title":"Bioinformatics"},{"key":"1203_CR14","volume-title":"Am J Respir Cell Mol Biol","author":"MP Gruber","year":"2006","unstructured":"Gruber MP, Coldren CD, Woolum MD, Cosgrove GP, Zeng C, Baron AE, Moore MD, Cool CD, Worthen GS, Brown KK, Geraci MW: Human Lung Project: Evaluating Variance of Gene Expression in the Human Lung. Am J Respir Cell Mol Biol 2006."},{"key":"1203_CR15","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/1471-2105-3-4","volume":"3","author":"M Bakay","year":"2002","unstructured":"Bakay M, Chen YW, Borup R, Zhao P, Nagaraju K, Hoffman EP: Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformatics 2002, 3: 4. 10.1186\/1471-2105-3-4","journal-title":"BMC Bioinformatics"},{"key":"1203_CR16","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1093\/bioinformatics\/btg405","volume":"20","author":"L Gautier","year":"2004","unstructured":"Gautier L, Cope L, Bolstad BM, Irizarry RA: affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004, 20: 307\u2013315. 10.1093\/bioinformatics\/btg405","journal-title":"Bioinformatics"},{"key":"1203_CR17","doi-asserted-by":"publisher","first-page":"R80","DOI":"10.1186\/gb-2004-5-10-r80","volume":"5","author":"RC Gentleman","year":"2004","unstructured":"Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5: R80. 10.1186\/gb-2004-5-10-r80","journal-title":"Genome Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-464.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:17:07Z","timestamp":1630466227000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-464"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,10,23]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["1203"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-464","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,10,23]]},"assertion":[{"value":"11 July 2006","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 October 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 October 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"464"}}