{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:38:36Z","timestamp":1740184716355,"version":"3.37.3"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2017,7,29]],"date-time":"2017-07-29T00:00:00Z","timestamp":1501286400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004587","name":"ISCIII","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004587","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>This tool is freely available at http:\/\/gitlab.com\/carbonell\/ces.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx482","type":"journal-article","created":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T11:09:19Z","timestamp":1501240159000},"page":"3511-3517","source":"Crossref","is-referenced-by-count":0,"title":["Reference genome assessment from a population scale perspective: an accurate profile of variability and noise"],"prefix":"10.1093","volume":"33","author":[{"given":"Jos\u00e9","family":"Carbonell-Caballero","sequence":"first","affiliation":[{"name":"Computational Genomics, Principe Felipe Research Centre, Valencia"}]},{"given":"Alicia","family":"Amadoz","sequence":"additional","affiliation":[{"name":"Computational Genomics, Principe Felipe Research Centre, Valencia"}]},{"given":"Roberto","family":"Alonso","sequence":"additional","affiliation":[{"name":"Computational Genomics, Principe Felipe Research Centre, Valencia"}]},{"given":"Marta R","family":"Hidalgo","sequence":"additional","affiliation":[{"name":"Computational Genomics, Principe Felipe Research Centre, Valencia"}]},{"given":"Cankut","family":"\u00c7ubuk","sequence":"additional","affiliation":[{"name":"Computational Genomics, Principe Felipe Research Centre, Valencia"}]},{"given":"David","family":"Conesa","sequence":"additional","affiliation":[{"name":"Estad\u00edstica e investigaci\u00f3n Operativa, Universitat de Val\u00e8ncia, Burjassot"}]},{"given":"Antonio","family":"L\u00f3pez-Qu\u00edlez","sequence":"additional","affiliation":[{"name":"Estad\u00edstica e investigaci\u00f3n Operativa, Universitat de Val\u00e8ncia, Burjassot"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3318-120X","authenticated-orcid":false,"given":"Joaqu\u00edn","family":"Dopazo","sequence":"additional","affiliation":[{"name":"Computational Genomics, Principe Felipe Research Centre, Valencia"},{"name":"Clinical Bioinformatics Area, Fundaci\u00f3n Progreso y Salud, Hospital Virgen del Rocio, Sevilla"},{"name":"Functional Genomics Node (INB), Fundaci\u00f3n Progreso y Salud, Hospital Virgen del Rocio, Sevilla"},{"name":"Bioinformatics in Rare Diseases (BiER), Centro de Investigaci\u00f3n Biom\u00e9dica en Red de Enfermedades Raras (CIBERER), Fundaci\u00f3n Progreso y Salud, Hospital Virgen del Rocio, Sevilla, Spain"}]}],"member":"286","published-online":{"date-parts":[[2017,7,29]]},"reference":[{"key":"2023051308272032700_btx482-B1","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"Abecasis","year":"2012","journal-title":"Nature"},{"key":"2023051308272032700_btx482-B2","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023051308272032700_btx482-B3","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1002\/dvg.22877","article-title":"The arabidopsis information resource: making and mining the \u2018gold standard\u2019 annotated reference plant genome","volume":"53","author":"Berardini","year":"2015","journal-title":"Genesis"},{"key":"2023051308272032700_btx482-B4","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/ejhg.2013.118","article-title":"The genome of the Netherlands: design, and project goals","volume":"22","author":"Boomsma","year":"2014","journal-title":"Eur. J. Hum. Genet"},{"key":"2023051308272032700_btx482-B5","doi-asserted-by":"crossref","first-page":"10.","DOI":"10.1186\/2047-217X-2-10","article-title":"Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species","volume":"2","author":"Bradnam","year":"2013","journal-title":"Gigascience"},{"key":"2023051308272032700_btx482-B6","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1093\/bioinformatics\/bts723","article-title":"ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051308272032700_btx482-B7","doi-asserted-by":"crossref","first-page":"1205","DOI":"10.1093\/molbev\/msw005","article-title":"267 Spanish exomes reveal population-specific differences in disease-related genetic variation","volume":"33","author":"Dopazo","year":"2016","journal-title":"Mol. Biol. Evol"},{"volume-title":"Statistical Methods for Research Workers","year":"1925","author":"Fisher","key":"2023051308272032700_btx482-B8"},{"key":"2023051308272032700_btx482-B9","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1038\/ng.3247","article-title":"Large-scale whole-genome sequencing of the Icelandic population","volume":"47","author":"Gudbjartsson","year":"2015","journal-title":"Nat. Genet"},{"key":"2023051308272032700_btx482-B10","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051308272032700_btx482-B11","doi-asserted-by":"crossref","first-page":"R47.","DOI":"10.1186\/gb-2013-14-5-r47","article-title":"REAPR: a universal tool for genome assembly evaluation","volume":"14","author":"Hunt","year":"2013","journal-title":"Genome Biol"},{"key":"2023051308272032700_btx482-B12","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nature03001","article-title":"Finishing the euchromatic sequence of the human genome","volume":"431","author":"International Human Genome Sequencing Consortium","year":"2004","journal-title":"Nature"},{"key":"2023051308272032700_btx482-B13","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"Lander","year":"2001","journal-title":"Nature"},{"key":"2023051308272032700_btx482-B14","first-page":"44","article-title":"The European nucleotide archive","volume":"39 (Suppl. 1)","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051308272032700_btx482-B15","first-page":"2010","article-title":"The sequence read archive","volume":"39 (Suppl. 1)","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051308272032700_btx482-B16","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1038\/nature19057","article-title":"Analysis of protein-coding genetic variation in 60,706 humans","volume":"536","author":"Lek","year":"2016","journal-title":"Nature"},{"key":"2023051308272032700_btx482-B17","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051308272032700_btx482-B18","doi-asserted-by":"crossref","first-page":"1718","DOI":"10.1093\/bioinformatics\/btt273","article-title":"GAGE-B: an evaluation of genome assemblers for bacterial organisms","volume":"29","author":"Magoc","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051308272032700_btx482-B19","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023051308272032700_btx482-B20","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.1093\/bioinformatics\/btv697","article-title":"MetaQUAST: evaluation of metagenome assemblies","volume":"32","author":"Mikheenko","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051308272032700_btx482-B21","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1016\/j.ajhg.2013.07.006","article-title":"Genetic evidence for recent population mixture in India","volume":"93","author":"Moorjani","year":"2013","journal-title":"Am. J. Hum. Genet"},{"key":"2023051308272032700_btx482-B22","doi-asserted-by":"crossref","first-page":"8018","DOI":"10.1038\/ncomms9018","article-title":"Rare variant discovery by deep whole-genome sequencing of 1, 070 Japanese individuals","volume":"6","author":"Nagasaki","year":"2015","journal-title":"Nat. Commun"},{"key":"2023051308272032700_btx482-B23","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051308272032700_btx482-B24","doi-asserted-by":"crossref","first-page":"R8","DOI":"10.1186\/gb-2013-14-1-r8","article-title":"CGAL: computing genome assembly likelihoods","volume":"14","author":"Rahman","year":"2013","journal-title":"Genome Biol"},{"key":"2023051308272032700_btx482-B25","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res"},{"key":"2023051308272032700_btx482-B26","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res"},{"key":"2023051308272032700_btx482-B27","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1126\/science.1172257","article-title":"The genetic structure and history of Africans and African Americans","volume":"324","author":"Tishkoff","year":"2009","journal-title":"Science"},{"key":"2023051308272032700_btx482-B28","doi-asserted-by":"crossref","first-page":"e52210","DOI":"10.1371\/journal.pone.0052210","article-title":"Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons","volume":"7","author":"Vezzi","year":"2012","journal-title":"PLoS One"},{"key":"2023051308272032700_btx482-B29","doi-asserted-by":"crossref","first-page":"e112963.","DOI":"10.1371\/journal.pone.0112963","article-title":"Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement","volume":"9","author":"Walker","year":"2014","journal-title":"PLoS One"},{"key":"2023051308272032700_btx482-B30","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The Cancer Genome Atlas Pan-Cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat. Genet"},{"key":"2023051308272032700_btx482-B31","doi-asserted-by":"crossref","first-page":"R113.","DOI":"10.1186\/gb-2010-11-11-r113","article-title":"Genetic diversity in India and the inference of Eurasian population expansion","volume":"11","author":"Xing","year":"2010","journal-title":"Genome Biol"},{"key":"2023051308272032700_btx482-B32","doi-asserted-by":"crossref","first-page":"386.","DOI":"10.1186\/s12859-015-0818-3","article-title":"misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads","volume":"16","author":"Zhu","year":"2015","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/22\/3511\/50307261\/bioinformatics_33_22_3511.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/22\/3511\/50307261\/bioinformatics_33_22_3511.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,13]],"date-time":"2023-05-13T08:27:52Z","timestamp":1683966472000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/22\/3511\/4056065"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,7,29]]},"references-count":32,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2017,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx482","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,11,15]]},"published":{"date-parts":[[2017,7,29]]}}}