{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T04:35:42Z","timestamp":1773376542256,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:00:00Z","timestamp":1773273600000},"content-version":"vor","delay-in-days":11,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Copy number variants (CNVs) have been shown to play a significant role in the pathogenesis of various human diseases. Although several tools have been developed for detecting CNVs based on whole-exome sequencing (WES) data, their performance remains suboptimal for small exon-level CNVs (exCNVs). This is primarily due to multiple technical variabilities, including probe capture efficiency, mappability, exon size, batch effects, experimental background noise bias, and control sample selection, all of which can lead to false negatives and false positives in exCNV detection. To address these challenges, we developed ML-ExonCNV, which innovatively integrates the XGBoost machine learning model with a multi-expert ensemble approach. The model was trained using 14 features derived from 22\u2009364 real-world, quantitative polymerase chain reaction-validated rare exCNVs. Evaluation on a test set of 492 real WES and the NA12878 gold-standard dataset demonstrated that ML-ExonCNV outperformed widely used tools such as GATK-gCNV, ExomeDepth, and CNVkit. Notably, ML-ExonCNV can detect large segmental CNVs, mosaic CNVs, and breakpoint CNVs on exon region. Furthermore, our analysis revealed recurrent exCNV-associated genes and their phenotypic correlations. Neurodevelopmental and musculoskeletal abnormalities were identified as the most frequently associated phenotypes with high-recurrence exCNVs.<\/jats:p>","DOI":"10.1093\/bib\/bbag100","type":"journal-article","created":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T12:40:56Z","timestamp":1772714456000},"source":"Crossref","is-referenced-by-count":0,"title":["ML-ExonCNV: a robust XGBoost multi-expert ensemble framework for rare exon CNV detection in whole-exome sequencing data"],"prefix":"10.1093","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0385-8770","authenticated-orcid":false,"given":"Shuang-Hao","family":"Yang","sequence":"first","affiliation":[{"name":"Department of Bioinformatics, Chigene (Beijing) Translational Medical Research Center Co., Ltd. , Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]},{"name":"Department of Bioinformatics, Beijing Quanpu Medical Laboratory Co., Ltd. , E2, 3rd Floor, No. 88 Kechuang 6th Road, Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]},{"name":"Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University , Wuhan 430070 ,","place":["China"]}]},{"given":"Hua","family":"He","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Chigene (Beijing) Translational Medical Research Center Co., Ltd. , Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]}]},{"given":"Shuyu","family":"Hou","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Chigene (Beijing) Translational Medical Research Center Co., Ltd. , Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]}]},{"given":"Tuanfeng","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Neurology, Peking University International Hospital , No. 1 Life Park Road, Changping District, Beijing 102206 ,","place":["China"]}]},{"given":"Zehao","family":"Yin","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Chigene (Beijing) Translational Medical Research Center Co., Ltd. , Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8078-4401","authenticated-orcid":false,"given":"Hong-Yu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University , Wuhan 430070 ,","place":["China"]}]},{"given":"Weiyue","family":"Gu","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Chigene (Beijing) Translational Medical Research Center Co., Ltd. , Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]},{"name":"Department of Bioinformatics, Beijing Quanpu Medical Laboratory Co., Ltd. , E2, 3rd Floor, No. 88 Kechuang 6th Road, Beijing Yizhuang Biomedical Park, 100176 ,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2026,3,12]]},"reference":[{"key":"2026031216243208100_ref1","doi-asserted-by":"publisher","first-page":"1589","DOI":"10.1038\/s41588-023-01449-0","article-title":"GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data","volume":"55","author":"Babadi","year":"2023","journal-title":"Nat Genet"},{"key":"2026031216243208100_ref2","doi-asserted-by":"crossref","first-page":"10493","DOI":"10.1038\/s41598-020-64353-1","article-title":"CONY: a Bayesian procedure for detecting copy number variations from sequencing read depths","volume":"10","author":"Wei","year":"2020","journal-title":"Sci Rep"},{"key":"2026031216243208100_ref3","doi-asserted-by":"crossref","first-page":"167854","DOI":"10.1016\/j.bbadis.2025.167854","article-title":"The improvement in diagnostic yield of developmental and epileptic encephalopathy by the multi-omics sequential testing method","volume":"1871","author":"Yang","year":"2025","journal-title":"Biochim Biophys Acta Mol Basis Dis"},{"key":"2026031216243208100_ref4","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1016\/j.ajhg.2010.04.006","article-title":"Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies","volume":"86","author":"Miller","year":"2010","journal-title":"Am J Hum Genet"},{"key":"2026031216243208100_ref5","doi-asserted-by":"publisher","first-page":"380","DOI":"10.1093\/bib\/bbu027","article-title":"Exome sequence read depth methods for identifying copy number changes","volume":"16","author":"Kadalayil","year":"2015","journal-title":"Brief Bioinform"},{"key":"2026031216243208100_ref6","doi-asserted-by":"publisher","first-page":"2413","DOI":"10.1038\/s41436-019-0554-6","article-title":"Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders","volume":"21","author":"Srivastava","year":"2019","journal-title":"Genet Med"},{"key":"2026031216243208100_ref7","doi-asserted-by":"publisher","first-page":"2747","DOI":"10.1093\/bioinformatics\/bts526","article-title":"A robust model for read count data in exome sequencing experiments and implications for copy number variant calling","volume":"28","author":"Plagnol","year":"2012","journal-title":"Bioinformatics"},{"key":"2026031216243208100_ref8","doi-asserted-by":"publisher","first-page":"e1004873","DOI":"10.1371\/journal.pcbi.1004873","article-title":"CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing","volume":"12","author":"Talevich","year":"2016","journal-title":"PLoS Comput Biol"},{"key":"2026031216243208100_ref9","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1038\/s41467-023-44116-y","article-title":"ECOLE: learning to call copy number variants on whole exome sequencing data","volume":"15","author":"Mandiracioglu","year":"2024","journal-title":"Nat Commun"},{"key":"2026031216243208100_ref10","doi-asserted-by":"publisher","first-page":"1645","DOI":"10.1038\/s41431-020-0675-z","article-title":"Evaluation of CNV detection tools for NGS panel data in genetic diagnostics","volume":"28","author":"Moreno-Cabrera","year":"2020","journal-title":"Eur J Hum Genet"},{"key":"2026031216243208100_ref11","doi-asserted-by":"publisher","first-page":"1525","DOI":"10.1101\/gr.138115.112","article-title":"Copy number variation detection and genotyping from exome sequence data","volume":"22","author":"Krumm","year":"2012","journal-title":"Genome Res"},{"key":"2026031216243208100_ref12","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1038\/nmeth.1810","article-title":"Detection of structural variants and indels within exome data","volume":"9","author":"Karakoc","year":"2011","journal-title":"Nat Methods"},{"key":"2026031216243208100_ref13","doi-asserted-by":"publisher","first-page":"974","DOI":"10.1101\/gr.114876.110","article-title":"CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing","volume":"21","author":"Abyzov","year":"2011","journal-title":"Genome Res"},{"key":"2026031216243208100_ref14","doi-asserted-by":"crossref","first-page":"6283","DOI":"10.3390\/cancers13246283","article-title":"A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data","volume":"13","author":"Gabrielaite","year":"2021","journal-title":"Cancers"},{"key":"2026031216243208100_ref15","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1093\/nar\/gkad1140","article-title":"Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing","volume":"52","author":"Hijikata","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2026031216243208100_ref16","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.acl-long.70","article-title":"DeepSeekMoE: towards ultimate expert specialization in mixture-of-experts language models","author":"Dai","year":"2024"},{"key":"2026031216243208100_ref17","doi-asserted-by":"publisher","first-page":"e72","DOI":"10.1093\/nar\/gks001","article-title":"Summarizing and correcting the GC content bias in high-throughput sequencing","volume":"40","author":"Benjamini","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2026031216243208100_ref18","doi-asserted-by":"crossref","first-page":"1220","DOI":"10.1093\/bioinformatics\/btv710","article-title":"Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications","volume":"32","author":"Chen","year":"2016","journal-title":"Bioinformatics"},{"key":"2026031216243208100_ref19","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1186\/s12859-020-03859-x","article-title":"Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms","volume":"21","author":"Rao","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"2026031216243208100_ref20","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1093\/bioinformatics\/btx699","article-title":"Mosdepth: quick coverage calculation for genomes and exomes","volume":"34","author":"Pedersen","year":"2018","journal-title":"Bioinformatics"},{"key":"2026031216243208100_ref21","doi-asserted-by":"crossref","first-page":"9424","DOI":"10.1038\/s41598-020-66331-z","article-title":"Shedding light on dark genes: enhanced targeted resequencing by optimizing the combination of enrichment technology and DNA fragment length","volume":"10","author":"Iadarola","year":"2020","journal-title":"Sci Rep"},{"key":"2026031216243208100_ref22","doi-asserted-by":"publisher","first-page":"1282","DOI":"10.1038\/gim.2016.58","article-title":"Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing","volume":"18","author":"Mandelker","year":"2016","journal-title":"Genet Med"},{"key":"2026031216243208100_ref23","doi-asserted-by":"publisher","first-page":"bbae645","DOI":"10.1093\/bib\/bbae645","article-title":"Detection of germline CNVs from gene panel data: benchmarking the state of the art","volume":"26","author":"Munt\u00e9","year":"2024","journal-title":"Brief Bioinform"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/27\/2\/bbag100\/67318666\/bbag100.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/27\/2\/bbag100\/67318666\/bbag100.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T20:24:40Z","timestamp":1773347080000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbag100\/8516650"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,1]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbag100","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,3]]},"published":{"date-parts":[[2026,3,1]]},"article-number":"bbag100"}}