{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T11:09:29Z","timestamp":1763809769105,"version":"3.37.3"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2018,1,11]],"date-time":"2018-01-11T00:00:00Z","timestamp":1515628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The Snakemake, C++\u2009and R codes are freely available from https:\/\/github.com\/Oncostat\/BenchmarkNCVTools and supported on Linux.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty008","type":"journal-article","created":{"date-parts":[[2018,1,9]],"date-time":"2018-01-09T21:39:22Z","timestamp":1515533962000},"page":"1635-1641","source":"Crossref","is-referenced-by-count":21,"title":["A benchmark study of scoring methods for non-coding mutations"],"prefix":"10.1093","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9997-9727","authenticated-orcid":false,"given":"Damien","family":"Drubay","sequence":"first","affiliation":[{"name":"INSERM U1018, CESP, Fac. de M\u00e9decine\u2013Univ. Paris-Sud\u2013UVSQ, INSERM, Universit\u00e9 Paris-Saclay, Villejuif cedex, France"},{"name":"Gustave Roussy, Service de Biostatistique et d\u2019Epid\u00e9miologie, Villejuif, France"}]},{"given":"Daniel","family":"Gautheret","sequence":"additional","affiliation":[{"name":"Institute for Integrative Biology of the Cell, Universit\u00e9 Paris-Sud, CNRS, CEA, Gif-sur-Yvette, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6963-2968","authenticated-orcid":false,"given":"Stefan","family":"Michiels","sequence":"additional","affiliation":[{"name":"INSERM U1018, CESP, Fac. de M\u00e9decine\u2013Univ. Paris-Sud\u2013UVSQ, INSERM, Universit\u00e9 Paris-Saclay, Villejuif cedex, France"},{"name":"Gustave Roussy, Service de Biostatistique et d\u2019Epid\u00e9miologie, Villejuif, France"}]}],"member":"286","published-online":{"date-parts":[[2018,1,11]]},"reference":[{"key":"2023012713414932200_bty008-B1","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/nature12477","article-title":"Signatures of mutational processes in human cancer","volume":"500","author":"Alexandrov","year":"2013","journal-title":"Nature"},{"key":"2023012713414932200_bty008-B2","doi-asserted-by":"crossref","first-page":"1601","DOI":"10.1534\/genetics.115.177220","article-title":"The nature of genetic variation for complex traits revealed by GWAS and regional heritability mapping analyses","volume":"201","author":"Caballero","year":"2015","journal-title":"Genetics"},{"first-page":"233","year":"2006","author":"Davis","key":"2023012713414932200_bty008-B3"},{"key":"2023012713414932200_bty008-B4","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1101\/gr.134635.111","article-title":"MuSiC: identifying mutational significance in cancer genomes","volume":"22","author":"Dees","year":"2012","journal-title":"Genome Res"},{"key":"2023012713414932200_bty008-B5","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1097\/CMR.0000000000000048","article-title":"Melanomas of unknown primary frequently harbor TERT-promoter mutations","volume":"24","author":"Egberts","year":"2014","journal-title":"Melanoma Res"},{"volume-title":"Current Protocols in Human Genetics","year":"2008","author":"Forbes","key":"2023012713414932200_bty008-B6"},{"key":"2023012713414932200_bty008-B7","doi-asserted-by":"crossref","first-page":"480.","DOI":"10.1186\/s13059-014-0480-5","article-title":"FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer","volume":"15","author":"Fu","year":"2014","journal-title":"Genome Biol"},{"key":"2023012713414932200_bty008-B8","doi-asserted-by":"crossref","first-page":"13373.","DOI":"10.1038\/srep13373","article-title":"Smoking gun or circumstantial evidence? Comparison of statistical learning methods using functional annotations for prioritizing risk variants","volume":"5","author":"Gagliano","year":"2015","journal-title":"Sci. Rep"},{"key":"2023012713414932200_bty008-B9","doi-asserted-by":"crossref","first-page":"S4.1","DOI":"10.1186\/gb-2006-7-s1-s4","article-title":"GENCODE: producing a reference annotation for ENCODE","volume":"7","author":"Harrow","year":"2006","journal-title":"Genome Biol"},{"key":"2023012713414932200_bty008-B10","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1038\/nrg.2015.17","article-title":"Role of non-coding sequence variants in cancer","volume":"17","author":"Khurana","year":"2016","journal-title":"Nat. Rev. Genetics"},{"key":"2023012713414932200_bty008-B11","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat. Genetics"},{"key":"2023012713414932200_bty008-B12","doi-asserted-by":"crossref","first-page":"D980","DOI":"10.1093\/nar\/gkt1113","article-title":"ClinVar: public archive of relationships among sequence variation and human phenotype","volume":"42","author":"Landrum","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023012713414932200_bty008-B13","doi-asserted-by":"crossref","first-page":"e1004583","DOI":"10.1371\/journal.pcbi.1004583","article-title":"A dual model for prioritizing cancer mutations in the non-coding genome based on germline and somatic events","volume":"11","author":"Li","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023012713414932200_bty008-B14","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/j.canlet.2015.09.015","article-title":"Mining the coding and non-coding genome for cancer drivers","volume":"369","author":"Li","year":"2015","journal-title":"Cancer Lett"},{"key":"2023012713414932200_bty008-B15","doi-asserted-by":"crossref","first-page":"R143","DOI":"10.1530\/ERC-15-0533","article-title":"TERT promoter mutations in thyroid cancer","volume":"23","author":"Liu","year":"2016","journal-title":"Endocrine-Related Cancer"},{"key":"2023012713414932200_bty008-B16","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1136\/jmedgenet-2016-104369","article-title":"The performance of deleteriousness prediction scores for rare non-protein-changing single nucleotide variants in human genes","volume":"54","author":"Liu","year":"2017","journal-title":"J. Med. Genetics"},{"key":"2023012713414932200_bty008-B17","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.tig.2016.10.008","article-title":"Mining the unknown: assigning function to noncoding single nucleotide polymorphisms","volume":"33","author":"Nishizaki","year":"2017","journal-title":"Trends Genetics: TIG"},{"key":"2023012713414932200_bty008-B18","first-page":"366","article-title":"So much \u2019junk\u2019 DNA in our genome","volume":"23","author":"Ohno","year":"1972","journal-title":"Brookhaven Symposia Biol"},{"key":"2023012713414932200_bty008-B19","doi-asserted-by":"crossref","first-page":"68.","DOI":"10.3389\/fmed.2015.00068","article-title":"Pseudogenes in human cancer","volume":"2","author":"Poliseno","year":"2015","journal-title":"Front. Med"},{"key":"2023012713414932200_bty008-B20","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1093\/bioinformatics\/btu703","article-title":"DANN: a deep learning approach for annotating the pathogenicity of genetic variants","volume":"31","author":"Quang","year":"2015","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023012713414932200_bty008-B21","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/nmeth.2832","article-title":"Functional annotation of noncoding sequence variants","volume":"11","author":"Ritchie","year":"2014","journal-title":"Nat. Methods"},{"key":"2023012713414932200_bty008-B22","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.1093\/bioinformatics\/btv009","article-title":"An integrative approach to predicting the functional effects of non-coding and coding sequence variation","volume":"31","author":"Shihab","year":"2015","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023012713414932200_bty008-B23","doi-asserted-by":"crossref","first-page":"1034","DOI":"10.1101\/gr.3715005","article-title":"Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes","volume":"15","author":"Siepel","year":"2005","journal-title":"Genome Res"},{"key":"2023012713414932200_bty008-B24","doi-asserted-by":"crossref","first-page":"13.","DOI":"10.1186\/gm13","article-title":"The human gene mutation database: 2008 update","volume":"1","author":"Stenson","year":"2009","journal-title":"Genome Med"},{"key":"2023012713414932200_bty008-B25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00439-013-1358-4","article-title":"The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine","volume":"133","author":"Stenson","year":"2014","journal-title":"Human Genetics"},{"key":"2023012713414932200_bty008-B26","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"The 1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023012713414932200_bty008-B27","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1038\/nbt.2422","article-title":"Interpreting noncoding genetic variation in complex traits and human disease","volume":"30","author":"Ward","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023012713414932200_bty008-B28","doi-asserted-by":"crossref","first-page":"99.","DOI":"10.1186\/1471-2105-12-99","article-title":"SNP-based pathway enrichment analysis for genome-wide association studies","volume":"12","author":"Weng","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012713414932200_bty008-B29","doi-asserted-by":"crossref","first-page":"145.","DOI":"10.3389\/fgene.2015.00145","article-title":"Long noncoding RNAs: a potential novel class of cancer biomarkers","volume":"6","author":"Yarmishyn","year":"2015","journal-title":"Front. Genetics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/10\/1635\/48935600\/bioinformatics_34_10_1635.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/10\/1635\/48935600\/bioinformatics_34_10_1635.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T14:22:58Z","timestamp":1674829378000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/10\/1635\/4798701"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,1,11]]},"references-count":29,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2018,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty008","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,5,15]]},"published":{"date-parts":[[2018,1,11]]}}}