{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:38:49Z","timestamp":1761863929812,"version":"3.37.3"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2018,10,24]],"date-time":"2018-10-24T00:00:00Z","timestamp":1540339200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100007835","name":"Silesian University of Technology","doi-asserted-by":"publisher","award":["BK-200\/RAU1\/2018\/8"],"award-info":[{"award-number":["BK-200\/RAU1\/2018\/8"]}],"id":[{"id":"10.13039\/501100007835","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science Center","award":["2015\/19\/B\/ST6\/01736"],"award-info":[{"award-number":["2015\/19\/B\/ST6\/01736"]}]},{"name":"Harmonia","award":["DEC-2013\/08\/M\/ST6\/00924"],"award-info":[{"award-number":["DEC-2013\/08\/M\/ST6\/00924"]}]},{"name":"OPUS","award":["2016\/21\/B\/ST6\/02153"],"award-info":[{"award-number":["2016\/21\/B\/ST6\/02153"]}]},{"name":"GeCONiI computational infrastructure","award":["POIG.02.03.01-24-099\/13"],"award-info":[{"award-number":["POIG.02.03.01-24-099\/13"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>In contemporary biological experiments, bias, which interferes with the measurements, requires attentive processing. Important sources of bias in high-throughput biological experiments are batch effects and diverse methods towards removal of batch effects have been established. These include various normalization techniques, yet many require knowledge on the number of batches and assignment of samples to batches. Only few can deal with the problem of identification of batch effect of unknown structure. For this reason, an original batch identification algorithm through dynamical programming is introduced for omics data that may be sorted on a timescale.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>BatchI algorithm is based on partitioning a series of high-throughput experiment samples into sub-series corresponding to estimated batches. The dynamic programming method is used for splitting data with maximal dispersion between batches, while maintaining minimal within batch dispersion. The procedure has been tested on a number of available datasets with and without prior information about batch partitioning. Datasets with a priori identified batches have been split accordingly, measured with weighted average Dice Index. Batch effect correction is justified by higher intra-group correlation. In the blank datasets, identified batch divisions lead to improvement of parameters and quality of biological information, shown by literature study and Information Content. The outcome of the algorithm serves as a starting point for correction methods. It has been demonstrated that omitting the essential step of batch effect control may lead to waste of valuable potential discoveries.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The implementation is available within the BatchI R package at http:\/\/zaed.aei.polsl.pl\/index.php\/pl\/111-software.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty900","type":"journal-article","created":{"date-parts":[[2018,10,24]],"date-time":"2018-10-24T09:33:38Z","timestamp":1540373618000},"page":"1885-1892","source":"Crossref","is-referenced-by-count":20,"title":["BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm"],"prefix":"10.1093","volume":"35","author":[{"given":"Anna","family":"Papiez","sequence":"first","affiliation":[{"name":"Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland"}]},{"given":"Michal","family":"Marczyk","sequence":"additional","affiliation":[{"name":"Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland"},{"name":"Department of Internal Medicine, Yale School of Medicine, Yale University, New Haven, CT, USA"}]},{"given":"Joanna","family":"Polanska","sequence":"additional","affiliation":[{"name":"Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland"}]},{"given":"Andrzej","family":"Polanski","sequence":"additional","affiliation":[{"name":"Institute of Informatics, Silesian University of Technology, Gliwice, Poland"}]}],"member":"286","published-online":{"date-parts":[[2018,10,24]]},"reference":[{"key":"2023012713080056800_bty900-B1","doi-asserted-by":"crossref","first-page":"10101","DOI":"10.1073\/pnas.97.18.10101","article-title":"Singular value decomposition for genome-wide expression data processing and modeling","volume":"97","author":"Alter","year":"2000","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713080056800_bty900-B2","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1534\/genetics.110.114983","article-title":"Statistical design and analysis of RNA sequencing data","volume":"185","author":"Auer","year":"2010","journal-title":"Genetics"},{"key":"2023012713080056800_bty900-B3","doi-asserted-by":"crossref","first-page":"284.","DOI":"10.1145\/366573.366611","article-title":"On the approximation of curves by line segments using dynamic programming","volume":"4","author":"Bellman","year":"1961","journal-title":"Commun. ACM"},{"key":"2023012713080056800_bty900-B4","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1093\/bioinformatics\/btg385","article-title":"Adjustment of systematic microarray data biases","volume":"20","author":"Benito","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012713080056800_bty900-B5","doi-asserted-by":"crossref","first-page":"207.","DOI":"10.1186\/1471-2105-8-207","article-title":"Orthogonal projections to latent structures as a strategy for microarray data normalization","volume":"8","author":"Bylesj\u00f6","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012713080056800_bty900-B6","doi-asserted-by":"crossref","first-page":"e17238.","DOI":"10.1371\/journal.pone.0017238","article-title":"Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods","volume":"6","author":"Chen","year":"2011","journal-title":"PLoS One"},{"key":"2023012713080056800_bty900-B7","doi-asserted-by":"crossref","first-page":"297","DOI":"10.2307\/1932409","article-title":"Measures of the amount of ecologic association between species","volume":"26","author":"Dice","year":"1945","journal-title":"Ecology"},{"key":"2023012713080056800_bty900-B8","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"Star: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012713080056800_bty900-B9","doi-asserted-by":"crossref","first-page":"E359","DOI":"10.1002\/ijc.29210","article-title":"Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012","volume":"136","author":"Ferlay","year":"2015","journal-title":"Int. J. Cancer"},{"key":"2023012713080056800_bty900-B10","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1093\/biostatistics\/kxr034","article-title":"Using control genes to correct for unwanted variation in microarray data","volume":"13","author":"Gagnon-Bartsch","year":"2012","journal-title":"Biostatistics"},{"key":"2023012713080056800_bty900-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2008\/586721","article-title":"Inflammation, adenoma and cancer: objective classification of colon biopsy specimens with gene expression signature","volume":"25","author":"Galamb","year":"2008","journal-title":"Dis. Mark"},{"key":"2023012713080056800_bty900-B12","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1158\/1078-0432.CCR-08-1067","article-title":"Molecular classification and prognostication of adrenocortical tumors by transcriptome profiling","volume":"15","author":"Giordano","year":"2009","journal-title":"Clin. Cancer Res"},{"key":"2023012713080056800_bty900-B13","doi-asserted-by":"crossref","first-page":"191.","DOI":"10.1055\/s-0029-1242458","article-title":"Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors","volume":"22","author":"Haggar","year":"2009","journal-title":"Clin. Colon Rectal Surg"},{"key":"2023012713080056800_bty900-B14","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1093\/biostatistics\/4.2.249","article-title":"Exploration, normalization, and summaries of high density oligonucleotide array probe level data","volume":"4","author":"Irizarry","year":"2003","journal-title":"Biostatistics"},{"key":"2023012713080056800_bty900-B15","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1109\/LSP.2001.838216","article-title":"An algorithm for optimal partitioning of data on an interval","volume":"12","author":"Jackson","year":"2005","journal-title":"Signal Process. Lett. IEEE"},{"key":"2023012713080056800_bty900-B16","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2023012713080056800_bty900-B17","doi-asserted-by":"crossref","first-page":"2579","DOI":"10.1016\/j.ejca.2013.02.034","article-title":"Adrenocortical carcinoma: a population-based study on incidence and survival in the Netherlands since 1993","volume":"49","author":"Kerkhofs","year":"2013","journal-title":"Eur. J. Cancer"},{"key":"2023012713080056800_bty900-B18","first-page":"D868","article-title":"ArrayExpress update \u2013 simplifying data submissions","volume":"37","author":"Kolesnikov","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023012713080056800_bty900-B19","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"key":"2023012713080056800_bty900-B20","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat. Rev. Genet"},{"key":"2023012713080056800_bty900-B21","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1038\/tpj.2010.57","article-title":"A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data","volume":"10","author":"Luo","year":"2010","journal-title":"Pharmacogenomics J"},{"key":"2023012713080056800_bty900-B22","doi-asserted-by":"crossref","first-page":"3836","DOI":"10.1093\/bioinformatics\/btw538","article-title":"Batchqc: interactive software for evaluating sample and batch effects in genomic data","volume":"32","author":"Manimaran","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713080056800_bty900-B23","doi-asserted-by":"crossref","first-page":"e561.","DOI":"10.7717\/peerj.561","article-title":"Removing batch effects for prediction problems with frozen surrogate variable analysis","volume":"2","author":"Parker","year":"2014","journal-title":"PeerJ"},{"key":"2023012713080056800_bty900-B24","first-page":"148","article-title":"Comparison of peptide cancer signatures identified by mass spectrometry in serum of patients with head and neck, lung and colorectal cancers: association with tumor progression","volume":"40","author":"Pietrowska","year":"2012","journal-title":"Int. J. Oncol"},{"key":"2023012713080056800_bty900-B25","doi-asserted-by":"crossref","first-page":"e0134256.","DOI":"10.1371\/journal.pone.0134256","article-title":"Signal partitioning algorithm for highly efficient gaussian mixture modeling in mass spectrometry","volume":"10","author":"Polanski","year":"2015","journal-title":"PLoS One"},{"key":"2023012713080056800_bty900-B26","doi-asserted-by":"crossref","first-page":"16234","DOI":"10.1073\/pnas.1209508109","article-title":"Transcriptional profiling in facioscapulohumeral muscular dystrophy to identify candidate biomarkers","volume":"109","author":"Rahimov","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713080056800_bty900-B27","doi-asserted-by":"crossref","first-page":"2877","DOI":"10.1093\/bioinformatics\/btt480","article-title":"A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal components analysis","volume":"29","author":"Reese","year":"2013","journal-title":"Bioinformatics"},{"first-page":"448","year":"1995","author":"Resnik","key":"2023012713080056800_bty900-B28"},{"key":"2023012713080056800_bty900-B29","doi-asserted-by":"crossref","DOI":"10.1002\/9780470685983","volume-title":"Batch Effects and Noise in Microarray Experiments: Sources and Solutions. Wiley Series in Probability and Statistics","author":"Scherer","year":"2009"},{"key":"2023012713080056800_bty900-B30","doi-asserted-by":"crossref","DOI":"10.1201\/9781315140919","volume-title":"Density Estimation for Statistics and Data Analysis","author":"Silverman","year":"2018"},{"key":"2023012713080056800_bty900-B31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1755-8794-1-42","article-title":"The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets\u2013improving meta-analysis and prediction of prognosis","volume":"1","author":"Sims","year":"2008","journal-title":"BMC Med. Genomics"},{"key":"2023012713080056800_bty900-B32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1755-8794-4-84","article-title":"Batch effect correction for genome-wide methylation data with Illumina Infinium platform","volume":"4","author":"Sun","year":"2011","journal-title":"BMC Med. Genomics"},{"key":"2023012713080056800_bty900-B33","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1002\/cem.695","article-title":"Orthogonal projections to latent structures (O-PLS)","volume":"16","author":"Trygg","year":"2002","journal-title":"J. Chemometr"},{"key":"2023012713080056800_bty900-B34","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.nbd.2009.12.019","article-title":"Expression profiling in peripheral blood reveals signature for penetrance in DYT1 dystonia","volume":"38","author":"Walter","year":"2010","journal-title":"Neurobiol. Dis"},{"key":"2023012713080056800_bty900-B35","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1016\/S0140-6736(05)17947-1","article-title":"Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer","volume":"365","author":"Wang","year":"2005","journal-title":"The Lancet"},{"key":"2023012713080056800_bty900-B36","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1093\/bioinformatics\/btx635","article-title":"Detecting hidden batch factors through data adaptive adjustment for biological effects","volume":"34","author":"Yi","year":"2018","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/11\/1885\/48935163\/bioinformatics_35_11_1885.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/11\/1885\/48935163\/bioinformatics_35_11_1885.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T14:13:26Z","timestamp":1674828806000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/11\/1885\/5144170"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,10,24]]},"references-count":36,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2019,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty900","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,6,1]]},"published":{"date-parts":[[2018,10,24]]}}}