{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T20:37:50Z","timestamp":1774903070730,"version":"3.50.1"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T00:00:00Z","timestamp":1700870400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T00:00:00Z","timestamp":1700870400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Brain Inf."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Machine Learning (ML) is nowadays an essential tool in the analysis of Magnetic Resonance Imaging (MRI) data, in particular in the identification of brain correlates in neurological and neurodevelopmental disorders. ML requires datasets of appropriate size for training, which in neuroimaging are typically obtained collecting data from multiple acquisition centers. However, analyzing large multicentric datasets can introduce bias due to differences between acquisition centers. ComBat harmonization is commonly used to address batch effects, but it can lead to data leakage when the entire dataset is used to estimate model parameters. In this study, structural and functional MRI data from the Autism Brain Imaging Data Exchange (ABIDE) collection were used to classify subjects with Autism Spectrum Disorders (ASD) compared to Typical Developing controls (TD). We compared the classical approach (external harmonization) in which harmonization is performed before train\/test split, with an harmonization calculated only on the train set (internal harmonization), and with the dataset with no harmonization. The results showed that harmonization using the whole dataset achieved higher discrimination performance, while non-harmonized data and harmonization using only the train set showed similar results, for both structural and connectivity features. We also showed that the higher performances of the external harmonization are not due to larger size of the sample for the estimation of the model and hence these improved performance with the entire dataset may be ascribed to data leakage. In order to prevent this leakage, it is recommended to define the harmonization model solely using the train set.<\/jats:p>","DOI":"10.1186\/s40708-023-00210-x","type":"journal-article","created":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T09:01:39Z","timestamp":1700902899000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Effect of data harmonization of multicentric dataset in ASD\/TD classification"],"prefix":"10.1186","volume":"10","author":[{"given":"Giacomo","family":"Serra","sequence":"first","affiliation":[]},{"given":"Francesca","family":"Mainas","sequence":"additional","affiliation":[]},{"given":"Bruno","family":"Golosio","sequence":"additional","affiliation":[]},{"given":"Alessandra","family":"Retico","sequence":"additional","affiliation":[]},{"given":"Piernicola","family":"Oliva","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,11,25]]},"reference":[{"issue":"8","key":"210_CR1","doi-asserted-by":"publisher","first-page":"1228","DOI":"10.1176\/ajp.152.8.122","volume":"152","author":"Samuel B Guze","year":"1995","unstructured":"Guze Samuel B (1995) Diagnostic and statistical manual of mental disorders, 4th ed. (DSM-IV). Am J Psychiatry 152(8):1228\u20131228. https:\/\/doi.org\/10.1176\/ajp.152.8.122","journal-title":"Am J Psychiatry"},{"key":"210_CR2","unstructured":"World Health Organization: The ICD-10 classification of mental and behavioural disorders : diagnostic criteria for research. World Health Organization (1993)"},{"key":"210_CR3","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-13005-8","author":"M Postema","year":"2019","unstructured":"Postema M, Van Rooij D, Anagnostou E, Arango C, Auzias G, Behrmann M, Busatto G, Calderoni S, Calvo R, Daly E, Deruelle C, Di Martino A, Dinstein I, Duran F, Durston S, Ecker C, Ehrlich S, Fair D, Fedor J, Francks C (2019) Altered structural brain asymmetry in autism spectrum disorder in a study of 54 datasets. Nat Commun. https:\/\/doi.org\/10.1038\/s41467-019-13005-8","journal-title":"Nat Commun"},{"key":"210_CR4","doi-asserted-by":"publisher","DOI":"10.1007\/s11682-016-9534-5","author":"K Riddle","year":"2017","unstructured":"Riddle K, Cascio C, Woodward N (2017) Brain structure in autism: a voxel-based morphometry analysis of the autism brain imaging database exchange (abide). Brain Imaging Behav. https:\/\/doi.org\/10.1007\/s11682-016-9534-5","journal-title":"Brain Imaging Behav"},{"issue":"3","key":"210_CR5","doi-asserted-by":"publisher","first-page":"738","DOI":"10.1016\/j.celrep.2013.10.001","volume":"5","author":"K Supekar","year":"2013","unstructured":"Supekar K, Uddin LQ, Khouzam A, Phillips J, Gaillard WD, Kenworthy LE, Yerys BE, Vaidya CJ, Menon V (2013) Brain hyperconnectivity in children with autism and its links to social deficits. Cell Rep 5(3):738\u2013747. https:\/\/doi.org\/10.1016\/j.celrep.2013.10.001","journal-title":"Cell Rep"},{"key":"210_CR6","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyt.2019.00620","author":"G Spera","year":"2019","unstructured":"Spera G, Retico A, Bosco P, Ferrari E, Palumbo L, Oliva P, Muratori F, Calderoni S (2019) Evaluation of altered functional connections in male children with autism spectrum disorders on multiple-site data optimized with machine learning. Front Psychiatry. https:\/\/doi.org\/10.3389\/fpsyt.2019.00620","journal-title":"Front Psychiatry"},{"issue":"5","key":"210_CR7","doi-asserted-by":"publisher","first-page":"1842","DOI":"10.1002\/hbm.23140","volume":"37","author":"H Jamalabadi","year":"2016","unstructured":"Jamalabadi H, Alizadeh S, Sch\u00f6nauer M, Leibold C, Gais S (2016) Classification based hypothesis testing in neuroscience: below-chance level classification rates and overlooked statistical properties of linear parametric classifiers. Human Brain Map 37(5):1842\u20131855. https:\/\/doi.org\/10.1002\/hbm.23140","journal-title":"Human Brain Map"},{"issue":"9","key":"210_CR8","doi-asserted-by":"publisher","first-page":"700","DOI":"10.1089\/brain.2016.0429","volume":"6","author":"C Zhang","year":"2016","unstructured":"Zhang C, Cahill ND, Arbabshirani MR, White T, Baum SA, Michael AM (2016) Sex and age effects of functional connectivity in early adulthood. Brain Connect 6(9):700\u2013713. https:\/\/doi.org\/10.1089\/brain.2016.0429","journal-title":"Brain Connect"},{"issue":"2","key":"210_CR9","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1001\/archneur.55.2.169","volume":"55","author":"CE Coffey","year":"1998","unstructured":"Coffey CE, Lucke JF, Saxton JA, Ratcliff G, Unitas LJ, Billig B, Bryan RN (1998) Sex differences in brain aging: a quantitative magnetic resonance imaging study. Arch Neurol 55(2):169\u2013179. https:\/\/doi.org\/10.1001\/archneur.55.2.169","journal-title":"Arch Neurol"},{"key":"210_CR10","doi-asserted-by":"publisher","first-page":"9137","DOI":"10.1038\/s41598-020-66100-y","volume":"10","author":"V Costumero-Ramos","year":"2020","unstructured":"Costumero-Ramos V, Bueichek\u00fa E, Adri\u00e1n-Ventura J, Avila C (2020) Opening or closing eyes at rest modulates the functional connectivity of v1 with default and salience networks. Sci Rep 10:9137. https:\/\/doi.org\/10.1038\/s41598-020-66100-y","journal-title":"Sci Rep"},{"key":"210_CR11","doi-asserted-by":"publisher","DOI":"10.1038\/mp.2013.78","author":"A Di Martino","year":"2013","unstructured":"Di Martino A, Yan C-G, Li Q, Denio E, Castellanos F, Alaerts K, Anderson J, Assaf M, Bookheimer S, Dapretto M, Deen B, Delmonte S, Dinstein I, Birgit E-W, Fair D, Gallagher L, Kennedy D, Keown C, Keysers C, Milham M (2013) The autism brain imaging data exchange: towards large-scale evaluation of the intrinsic brain architecture in autism. Mol Psychiatry. https:\/\/doi.org\/10.1038\/mp.2013.78","journal-title":"Mol Psychiatry"},{"key":"210_CR12","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2017.10","volume":"4","author":"A Di Martino","year":"2017","unstructured":"...Di Martino A, O\u2019connor D, Chen B, Alaerts K, Anderson JS, Assaf M, Balsters JH, Baxter L, Beggiato A, Bernaerts S, Blanken LME, Bookheimer SY, Braden BB, Byrge L, Castellanos FX, Dapretto M, Delorme R, Fair DA, Fishman I, Fitzgerald J, Gallagher L, Keehn RJJ, Kennedy DP, Lainhart JE, Luna B, Mostofsky SH, M\u00fcller R-A, Nebel MB, Nigg JT, O\u2019hearn, K., Solomon, M., Toro, R., Vaidya, C.J., Wenderoth, N., White, T., Craddock, R.C., Lord, C., Leventhal, B.L., Milham, M. (2017) Enhancing studies of the connectome in autism using the autism brain imaging data exchange II. Sci Data 4:170010. https:\/\/doi.org\/10.1038\/sdata.2017.10","journal-title":"Sci Data"},{"key":"210_CR13","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2023.120125","volume":"274","author":"F Hu","year":"2023","unstructured":"Hu F, Chen AA, Horng H, Bashyam V, Davatzikos C, Alexander-Bloch A, Li M, Shou H, Satterthwaite TD, Yu M, Shinohara RT (2023) Image harmonization: a review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization. NeuroImage 274:120125. https:\/\/doi.org\/10.1016\/j.neuroimage.2023.120125","journal-title":"NeuroImage"},{"key":"210_CR14","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/j.neuroimage.2017.08.047","volume":"161","author":"J-P Fortin","year":"2017","unstructured":"Fortin J-P, Parker D, Tun\u00e7 B, Watanabe T, Elliott MA, Ruparel K, Roalf DR, Satterthwaite TD, Gur RC, Gur RE, Schultz RT, Verma R, Shinohara RT (2017) Harmonization of multi-site diffusion tensor imaging data. NeuroImage 161:149\u2013170. https:\/\/doi.org\/10.1016\/j.neuroimage.2017.08.047","journal-title":"NeuroImage"},{"key":"210_CR15","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","volume":"8","author":"W Johnson","year":"2007","unstructured":"Johnson W, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8:118\u201327. https:\/\/doi.org\/10.1093\/biostatistics\/kxj037","journal-title":"Biostatistics"},{"key":"210_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2019.116450","volume":"208","author":"R Pomponio","year":"2020","unstructured":"...Pomponio R, Erus G, Habes M, Doshi J, Srinivasan D, Mamourian E, Bashyam V, Nasrallah IM, Satterthwaite TD, Fan Y, Launer LJ, Masters CL, Maruff P, Zhuo C, V\u00f6lzke H, Johnson SC, Fripp J, Koutsouleris N, Wolf DH, Gur R, Gur R, Morris J, Albert MS, Grabe HJ, Resnick SM, Bryan RN, Wolk DA, Shinohara RT, Shou H, Davatzikos C (2020) Harmonization of large mri datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage 208:116450. https:\/\/doi.org\/10.1016\/j.neuroimage.2019.116450","journal-title":"NeuroImage"},{"issue":"11","key":"210_CR17","doi-asserted-by":"publisher","first-page":"4213","DOI":"10.1002\/hbm.24241","volume":"39","author":"M Yu","year":"2018","unstructured":"Yu M, Linn KA, Cook PA, Phillips ML, McInnis M, Fava M, Trivedi MH, Weissman MM, Shinohara RT, Sheline YI (2018) Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fmri data. Human Brain Map 39(11):4213\u20134227. https:\/\/doi.org\/10.1002\/hbm.24241","journal-title":"Human Brain Map"},{"key":"210_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2022.119198","volume":"256","author":"AA Chen","year":"2022","unstructured":"Chen AA, Srinivasan D, Pomponio R, Fan Y, Nasrallah IM, Resnick SM, Beason-Held LL, Davatzikos C, Satterthwaite TD, Bassett DS, Shinohara RT, Shou H (2022) Harmonizing functional connectivity reduces scanner effects in community detection. NeuroImage 256:119198. https:\/\/doi.org\/10.1016\/j.neuroimage.2022.119198","journal-title":"NeuroImage"},{"issue":"12","key":"210_CR19","doi-asserted-by":"publisher","first-page":"3628","DOI":"10.1109\/TBME.2021.3080259","volume":"68","author":"M Ingalhalikar","year":"2021","unstructured":"Ingalhalikar M, Shinde S, Karmarkar A, Rajan A, Rangaprakash D, Deshpande G (2021) Functional connectivity-based prediction of autism on site harmonized abide dataset. IEEE Trans Biomed Eng 68(12):3628\u20133637. https:\/\/doi.org\/10.1109\/TBME.2021.3080259","journal-title":"IEEE Trans Biomed Eng"},{"key":"210_CR20","doi-asserted-by":"publisher","DOI":"10.3389\/fncom.2021.762781","author":"AM Reardon","year":"2021","unstructured":"Reardon AM, Li K, Hu XP (2021) Improving between-group effect size for multi-site functional connectivity data via site-wise de-meaning. Front Computation Neurosci. https:\/\/doi.org\/10.3389\/fncom.2021.762781","journal-title":"Front Computation Neurosci"},{"key":"210_CR21","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxab039","author":"T Li","year":"2021","unstructured":"Li T, Zhang Y, Patil P, Johnson WE (2021) Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference. Biostatistics. https:\/\/doi.org\/10.1093\/biostatistics\/kxab039","journal-title":"Biostatistics"},{"issue":"2","key":"210_CR22","doi-asserted-by":"publisher","first-page":"774","DOI":"10.1016\/j.neuroimage.2012.01.021","volume":"62","author":"B Fischl","year":"2012","unstructured":"Fischl B (2012) Freesurfer. NeuroImage 62(2):774\u2013781. https:\/\/doi.org\/10.1016\/j.neuroimage.2012.01.021. (20 YEARS OF fMRI)","journal-title":"NeuroImage"},{"key":"210_CR23","doi-asserted-by":"publisher","DOI":"10.1016\/j.nicl.2022.103082","volume":"35","author":"S Saponaro","year":"2022","unstructured":"Saponaro S, Giuliano A, Bellotti R, Lombardi A, Tangaro S, Oliva P, Calderoni S, Retico A (2022) Multi-site harmonization of mri data uncovers machine-learning discrimination capability in barely separable populations: an example from the abide dataset. NeuroImage Clin 35:103082. https:\/\/doi.org\/10.1016\/j.nicl.2022.103082","journal-title":"NeuroImage Clin"},{"key":"210_CR24","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2012.00171","author":"A Klein","year":"2012","unstructured":"Klein A, Tourville J (2012) 101 Labeled brain images and a consistent human cortical labeling protocol. Front Neurosci. https:\/\/doi.org\/10.3389\/fnins.2012.00171","journal-title":"Front Neurosci"},{"issue":"2","key":"210_CR25","doi-asserted-by":"publisher","first-page":"782","DOI":"10.1016\/j.neuroimage.2011.09.015","volume":"62","author":"M Jenkinson","year":"2012","unstructured":"Jenkinson M, Beckmann CF, Behrens TEJ, Woolrich MW, Smith SM (2012) Fsl. NeuroImage 62(2):782\u2013790. https:\/\/doi.org\/10.1016\/j.neuroimage.2011.09.015. (20 YEARS OF fMRI)","journal-title":"NeuroImage"},{"issue":"11","key":"210_CR26","doi-asserted-by":"publisher","first-page":"5740","DOI":"10.1002\/hbm.23764","volume":"38","author":"H Chen","year":"2017","unstructured":"Chen H, Nomi JS, Uddin LQ, Duan X, Chen H (2017) Intrinsic functional connectivity variance and state-specific under-connectivity in autism. Human Brain Map 38(11):5740\u20135755. https:\/\/doi.org\/10.1002\/hbm.23764","journal-title":"Human Brain Map"},{"key":"210_CR27","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory","author":"VN Vapnik","year":"1995","unstructured":"Vapnik VN (1995) The Nature of Statistical Learning Theory. Springer, New York. https:\/\/doi.org\/10.1007\/978-1-4757-2440-0"},{"key":"210_CR28","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyt.2016.00177","author":"P Kassraian-Fard","year":"2016","unstructured":"Kassraian-Fard P, Matthis C, Balsters JH, Maathuis MH, Wenderoth N (2016) Promises, pitfalls, and basic guidelines for applying machine learning classifiers to psychiatric imaging data, with autism as an example. Front Psychiatry. https:\/\/doi.org\/10.3389\/fpsyt.2016.00177","journal-title":"Front Psychiatry"},{"key":"210_CR29","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"issue":"6","key":"210_CR30","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1016\/j.jacr.2006.02.021","volume":"3","author":"CE Metz","year":"2006","unstructured":"Metz CE (2006) Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems. J Am College Radiol 3(6):413\u2013422. https:\/\/doi.org\/10.1016\/j.jacr.2006.02.021. (Special Issue: Image Perception)","journal-title":"J Am College Radiol"},{"key":"210_CR31","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","volume":"143","author":"JA Hanley","year":"1982","unstructured":"Hanley JA, Mcneil B (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143:29\u201336. https:\/\/doi.org\/10.1148\/radiology.143.1.7063747","journal-title":"Radiology"},{"key":"210_CR32","volume-title":"Principal component analysis (PCA)","author":"T Kurita","year":"2019","unstructured":"Kurita T (2019) Principal component analysis (PCA). Springer, Cham"},{"key":"210_CR33","unstructured":"Eli5\u2019s documentation: Permutation importance doi: https:\/\/eli5.readthedocs.io\/en\/latest\/blackbox\/permutation_importance.html . Accessed 15 April 2023"},{"key":"210_CR34","unstructured":"Eli5\u2019s documentation doi: https:\/\/eli5.readthedocs.io\/en\/latest\/index.html . Accessed 15 April 2023"}],"container-title":["Brain Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40708-023-00210-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40708-023-00210-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40708-023-00210-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T09:04:00Z","timestamp":1700903040000},"score":1,"resource":{"primary":{"URL":"https:\/\/braininformatics.springeropen.com\/articles\/10.1186\/s40708-023-00210-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,25]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["210"],"URL":"https:\/\/doi.org\/10.1186\/s40708-023-00210-x","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3143234\/v1","asserted-by":"object"}]},"ISSN":["2198-4018","2198-4026"],"issn-type":[{"value":"2198-4018","type":"print"},{"value":"2198-4026","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,25]]},"assertion":[{"value":"5 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 October 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We used a public and easily accessible database, ABIDE. The ABIDE initiative declared: \u201cThis database was created through the aggregation of datasets independently collected across more than 24 international brain imaging laboratories and are being made available to investigators throughout the world, consistent with open science principles, such as those at the core of the International Neuroimaging Data-sharing Initiative. In accordance with HIPAA guidelines and 1000 Functional Connectomes Project \/ INDI protocols, all datasets have been anonymized, with no protected health information included.\u201d","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"32"}}