{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:27:57Z","timestamp":1766136477474},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Whole-genome and -exome sequencing on parent\u2013offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in trio-based genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge.<\/jats:p>\n               <jats:p>Results: In this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exome-sequencing project. We evaluated DNMFilter\u2019s theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity.<\/jats:p>\n               <jats:p>Availability: The software DNMFilter implemented using a combination of Java and R is freely available from the website at http:\/\/humangenome.duke.edu\/software .<\/jats:p>\n               <jats:p>Contact: \u00a0ydwang@hit.edu.cn<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu141","type":"journal-article","created":{"date-parts":[[2014,3,12]],"date-time":"2014-03-12T00:16:49Z","timestamp":1394583409000},"page":"1830-1836","source":"Crossref","is-referenced-by-count":37,"title":["A gradient-boosting approach for filtering \n            <i>de novo<\/i>\n             mutations in parent\u2013offspring trios"],"prefix":"10.1093","volume":"30","author":[{"given":"Yongzhuang","family":"Liu","sequence":"first","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"},{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"}]},{"given":"Bingshan","family":"Li","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"}]},{"given":"Renjie","family":"Tan","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"},{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"}]},{"given":"Xiaolin","family":"Zhu","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"}]},{"given":"Yadong","family":"Wang","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, 2 Center for Human Genome Variation, Duke University, Durham, NC 27708 and 3 Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,3,10]]},"reference":[{"key":"2023012711153314800_btu141-B1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012711153314800_btu141-B2","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/1471-2105-13-8","article-title":"An integrative variant analysis suite for whole exome next-generation sequencing data","volume":"13","author":"Challis","year":"2012","journal-title":"BMC Bioinform."},{"key":"2023012711153314800_btu141-B3","doi-asserted-by":"crossref","first-page":"e145","DOI":"10.1093\/nar\/gks606","article-title":"SVM\n              2\n              : an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data","volume":"40","author":"Chiara","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012711153314800_btu141-B4","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1214\/09-AOAS285","article-title":"Bart: bayesian additive regression trees","volume":"4","author":"Chipman","year":"2010","journal-title":"Ann. Appl. Stat."},{"key":"2023012711153314800_btu141-B31","doi-asserted-by":"crossref","first-page":"712","DOI":"10.1038\/ng.862","article-title":"Variation in genome-wide mutation rates within and between human families","volume":"43","author":"Conrad","year":"2011","journal-title":"Nature genetics"},{"key":"2023012711153314800_btu141-B5","doi-asserted-by":"crossref","first-page":"1921","DOI":"10.1056\/NEJMoa1206524","article-title":"Diagnostic exome sequencing in persons with severe intellectual disability","volume":"367","author":"de Ligt","year":"2012","journal-title":"New England J. Med."},{"key":"2023012711153314800_btu141-B6","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet."},{"key":"2023012711153314800_btu141-B7","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1093\/bioinformatics\/btr629","article-title":"Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data","volume":"28","author":"Ding","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012711153314800_btu141-B8","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1038\/nature12439","article-title":"De novo mutations in epileptic encephalopathies","volume":"501","author":"Epi4K Consortium & Epilepsy Phenome\/Genome Project","year":"2013","journal-title":"Nature"},{"key":"2023012711153314800_btu141-B9","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"2023012711153314800_btu141-B10","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","article-title":"Stochastic gradient boosting","volume":"38","author":"Friedman","year":"2002","journal-title":"Comput. Stat. Data An."},{"key":"2023012711153314800_btu141-B11","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/ng.886","article-title":"Increased exonic de novo mutation rate in individuals with schizophrenia","volume":"43","author":"Girard","year":"2011","journal-title":"Nat. Genet."},{"key":"2023012711153314800_btu141-B12","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learnin","author":"Hastie","year":"2009"},{"key":"2023012711153314800_btu141-B13","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1101\/gr.129684.111","article-title":"VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing","volume":"22","author":"Koboldt","year":"2012","journal-title":"Genome Res."},{"key":"2023012711153314800_btu141-B14","doi-asserted-by":"crossref","first-page":"952","DOI":"10.1101\/gr.113084.110","article-title":"SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples","volume":"21","author":"Le","year":"2011","journal-title":"Genome Res."},{"key":"2023012711153314800_btu141-B15","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012711153314800_btu141-B16","doi-asserted-by":"crossref","first-page":"e1002944","DOI":"10.1371\/journal.pgen.1002944","article-title":"A likelihood-based framework for variant calling and de novo mutation detection in families","volume":"8","author":"Li","year":"2012","journal-title":"PLoS Genet."},{"key":"2023012711153314800_btu141-B17","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1186\/1471-2105-12-451","article-title":"Identification and correction of systematic error in high-throughput sequence data","volume":"12","author":"Meacham","year":"2011","journal-title":"BMC Bioinform."},{"key":"2023012711153314800_btu141-B18","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1038\/nmeth.2085","article-title":"forestSV: structural variant discovery through statistical learning","volume":"9","author":"Michaelson","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012711153314800_btu141-B19","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1016\/j.cell.2012.11.019","article-title":"Whole-genome sequencing in autism identifies hot spots for \n              de novo\n               germline mutation","volume":"151","author":"Michaelson","year":"2012","journal-title":"Cell"},{"key":"2023012711153314800_btu141-B20","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/nature11011","article-title":"Patterns and rates of exonic \n              de novo\n               mutations in autism spectrum disorders","volume":"485","author":"Neale","year":"2012","journal-title":"Nature"},{"key":"2023012711153314800_btu141-B21","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1038\/nrg2986","article-title":"Genotype and SNP calling from next-generation sequencing data","volume":"12","author":"Nielsen","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023012711153314800_btu141-B22","doi-asserted-by":"crossref","first-page":"1361","DOI":"10.1093\/bioinformatics\/btt172","article-title":"A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data","volume":"29","author":"O\u2019Fallon","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012711153314800_btu141-B23","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nature10989","article-title":"Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations","volume":"485","author":"O\u2019Roak","year":"2012","journal-title":"Nature"},{"key":"2023012711153314800_btu141-B24","doi-asserted-by":"crossref","first-page":"985","DOI":"10.1038\/nmeth.2611","article-title":"DeNovoGear: \n              de novo\n               indel and point mutation discovery and phasing","volume":"10","author":"Ramu","year":"2013","journal-title":"Nat. Methods"},{"key":"2023012711153314800_btu141-B25","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1038\/nbt.1754","article-title":"Integrative genomics viewer","volume":"29","author":"Robinson","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023012711153314800_btu141-B26","doi-asserted-by":"crossref","first-page":"1674","DOI":"10.1016\/S0140-6736(12)61480-9","article-title":"Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study","volume":"380","author":"Rauch","year":"2012","journal-title":"Lancet"},{"key":"2023012711153314800_btu141-B27","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/nature10945","article-title":"De novo mutations revealed by whole-exome sequencing are strongly associated with autism","volume":"485","author":"Sanders","year":"2012","journal-title":"Nature"},{"key":"2023012711153314800_btu141-B28","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1038\/nrg3241","article-title":"De novo\n               mutations in human genetic disease","volume":"13","author":"Veltman","year":"2012","journal-title":"Nat. Rev. Genet."},{"key":"2023012711153314800_btu141-B29","doi-asserted-by":"crossref","first-page":"1365","DOI":"10.1038\/ng.2446","article-title":"De novo\n               gene mutations highlight patterns of genetic and neural complexity in schizophrenia","volume":"44","author":"Xu","year":"2012","journal-title":"Nat. Genet."},{"key":"2023012711153314800_btu141-B30","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1038\/ng.902","article-title":"Exome sequencing supports a de novo mutational paradigm for schizophrenia","volume":"43","author":"Xu","year":"2011","journal-title":"Nat. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/13\/1830\/48924870\/bioinformatics_30_13_1830.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/13\/1830\/48924870\/bioinformatics_30_13_1830.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T11:52:53Z","timestamp":1674820373000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/13\/1830\/2422269"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,10]]},"references-count":31,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2014,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu141","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,7,1]]},"published":{"date-parts":[[2014,3,10]]}}}