{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:51Z","timestamp":1772138091303,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2018,9,1]],"date-time":"2018-09-01T00:00:00Z","timestamp":1535760000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Bayer CropScience NV"},{"DOI":"10.13039\/501100000765","name":"University College London","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000765","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"publisher","award":["PP00P3_150654"],"award-info":[{"award-number":["PP00P3_150654"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000268","name":"Biotechnology and Biological Sciences Research Council","doi-asserted-by":"publisher","award":["BB\/L018241\/1"],"award-info":[{"award-number":["BB\/L018241\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragments of the same gene are annotated as distinct genes, which may cause them to be mistaken as paralogs.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this study, we introduce two novel phylogenetic tests to infer non-overlapping or partially overlapping genes that are in fact parts of the same gene. One approach collapses branches with low bootstrap support and the other computes a likelihood ratio test. We extensively validated these methods by (i) introducing and recovering fragmentation on the bread wheat, Triticum aestivum cv. Chinese Spring, chromosome 3B; (ii) by applying the methods to the low-quality 3B assembly and validating predictions against the high-quality 3B assembly; and (iii) by comparing the performance of the proposed methods to the performance of existing methods, namely Ensembl Compara and ESPRIT. Application of this combination to a draft shotgun assembly of the entire bread wheat genome revealed 1221 pairs of genes that are highly likely to be fragments of the same gene. Our approach demonstrates the power of fine-grained evolutionary inferences across multiple species to improving genome assemblies and annotations.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>An open source software tool is available at https:\/\/github.com\/DessimozLab\/esprit2.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty772","type":"journal-article","created":{"date-parts":[[2018,9,1]],"date-time":"2018-09-01T00:01:06Z","timestamp":1535760066000},"page":"1159-1166","source":"Crossref","is-referenced-by-count":1,"title":["Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome"],"prefix":"10.1093","volume":"35","author":[{"given":"Ivana","family":"Pili\u017eota","sequence":"first","affiliation":[{"name":"Department of Genetics Evolution & Environment, University College London, UK"},{"name":"Department of Computer Science, University College London, UK"}]},{"given":"Cl\u00e9ment-Marie","family":"Train","sequence":"additional","affiliation":[{"name":"Department of Computational Biology, Lausanne, Switzerland"},{"name":"Center for Integrative Genomics University of Lausanne, Lausanne, Switzerland"},{"name":"Swiss Institute of Bioinformatics, Biophore Building, Lausanne, Switzerland"}]},{"given":"Adrian","family":"Altenhoff","sequence":"additional","affiliation":[{"name":"Swiss Institute of Bioinformatics, Biophore Building, Lausanne, Switzerland"}]},{"given":"Henning","family":"Redestig","sequence":"additional","affiliation":[{"name":"Bayer CropScience NV, Ghent, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2170-853X","authenticated-orcid":false,"given":"Christophe","family":"Dessimoz","sequence":"additional","affiliation":[{"name":"Department of Genetics Evolution & Environment, University College London, UK"},{"name":"Department of Computer Science, University College London, UK"},{"name":"Department of Computational Biology, Lausanne, Switzerland"},{"name":"Center for Integrative Genomics University of Lausanne, Lausanne, Switzerland"},{"name":"Swiss Institute of Bioinformatics, Biophore Building, Lausanne, Switzerland"}]}],"member":"286","published-online":{"date-parts":[[2018,9,1]]},"reference":[{"key":"2023013107274142700_bty772-B1","doi-asserted-by":"crossref","first-page":"e53786.","DOI":"10.1371\/journal.pone.0053786","article-title":"Inferring hierarchical orthologous groups from orthologous gene pairs","volume":"8","author":"Altenhoff","year":"2013","journal-title":"PLoS One"},{"key":"2023013107274142700_bty772-B2","doi-asserted-by":"crossref","first-page":"D240","DOI":"10.1093\/nar\/gku1158","article-title":"The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements","volume":"43","author":"Altenhoff","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023013107274142700_bty772-B3","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1038\/nbt.3535","article-title":"Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity","volume":"34","author":"Bredeson","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023013107274142700_bty772-B4","doi-asserted-by":"crossref","first-page":"421.","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023013107274142700_bty772-B5","doi-asserted-by":"crossref","first-page":"1249721.","DOI":"10.1126\/science.1249721","article-title":"Structural and functional partitioning of bread wheat chromosome 3B","volume":"345","author":"Choulet","year":"2014","journal-title":"Science"},{"key":"2023013107274142700_bty772-B6","doi-asserted-by":"crossref","first-page":"439","DOI":"10.3390\/biology1020439","article-title":"Why assembling plant genome sequences is so challenging","volume":"1","author":"Claros","year":"2012","journal-title":"Biology"},{"key":"2023013107274142700_bty772-B7","doi-asserted-by":"crossref","first-page":"e56925.","DOI":"10.1371\/journal.pone.0056925","article-title":"The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study","volume":"8","author":"Dalquen","year":"2013","journal-title":"PLoS One"},{"key":"2023013107274142700_bty772-B8","doi-asserted-by":"crossref","first-page":"e1003998.","DOI":"10.1371\/journal.pcbi.1003998","article-title":"Extensive error in the number of genes inferred from draft genome assemblies","volume":"10","author":"Denton","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023013107274142700_bty772-B9","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1093\/bib\/bbr038","article-title":"Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)","volume":"12","author":"Dessimoz","year":"2011","journal-title":"Brief. Bioinform"},{"key":"2023013107274142700_bty772-B10","doi-asserted-by":"crossref","first-page":"7085","DOI":"10.1073\/pnas.93.14.7085","article-title":"Bootstrap confidence levels for phylogenetic trees. Proc","volume":"93","author":"Efron","year":"1996","journal-title":"Natl. Acad. Sci. USA"},{"key":"2023013107274142700_bty772-B11","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1007\/978-1-4899-4541-9_16","volume-title":"An Introduction to the Bootstrap","author":"Efron","year":"1993"},{"key":"2023013107274142700_bty772-B12","doi-asserted-by":"crossref","first-page":"5737","DOI":"10.1073\/pnas.0900906106","article-title":"Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event","volume":"106","author":"Fawcett","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013107274142700_bty772-B13","doi-asserted-by":"crossref","first-page":"D1178","DOI":"10.1093\/nar\/gkr944","article-title":"Phytozome: a comparative platform for green plant genomics","volume":"40","author":"Goodstein","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023013107274142700_bty772-B14","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1007\/BF00166252","article-title":"Statistical tests of models of DNA substitution","volume":"36","author":"Goldman","year":"1993","journal-title":"J. Mol. Evol"},{"key":"2023013107274142700_bty772-B15","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nrg.2016.49","article-title":"Coming of age: ten years of next-generation sequencing technologies","volume":"17","author":"Goodwin","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2023013107274142700_bty772-B16","doi-asserted-by":"crossref","first-page":"baw053","DOI":"10.1093\/database\/baw053","article-title":"Ensembl comparative genomics resources","volume":"2016","author":"Herrero","year":"2016","journal-title":"Database"},{"key":"2023013107274142700_bty772-B17","doi-asserted-by":"crossref","first-page":"1251788.","DOI":"10.1126\/science.1251788","article-title":"A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome","volume":"345","year":"2014","journal-title":"Science"},{"key":"2023013107274142700_bty772-B18","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1016\/j.pbi.2017.02.002","article-title":"The impact of third generation genomic technologies on plant genome assembly","volume":"36","author":"Jiao","year":"2017","journal-title":"Curr. Opin. Plant Biol"},{"key":"2023013107274142700_bty772-B19","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1093\/molbev\/mst010","article-title":"MAFFT multiple sequence alignment software version 7: improvements in performance and usability","volume":"30","author":"Katoh","year":"2013","journal-title":"Mol. Biol. Evol"},{"key":"2023013107274142700_bty772-B20","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/978-1-61779-582-4_5","article-title":"Next-generation sequencing technologies and fragment assembly algorithms","volume":"855","author":"Lee","year":"2012","journal-title":"Methods Mol. Biol"},{"key":"2023013107274142700_bty772-B21","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1093\/bioinformatics\/bts661","article-title":"Scaffolding low quality genomes using orthologous protein sequences","volume":"29","author":"Li","year":"2013","journal-title":"Bioinformatics"},{"key":"2023013107274142700_bty772-B22","doi-asserted-by":"crossref","first-page":"e9490.","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2023013107274142700_bty772-B23","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1007\/s12042-011-9088-z","article-title":"The cassava genome: current progress, future directions","volume":"5","author":"Prochnik","year":"2012","journal-title":"Trop. Plant Biol"},{"key":"2023013107274142700_bty772-B24","doi-asserted-by":"crossref","first-page":"3279","DOI":"10.1093\/molbev\/msx261","article-title":"Fragmentary gene sequences negatively impact gene tree and species tree reconstruction","volume":"34","author":"Sayyari","year":"2017","journal-title":"Mol. Biol. Evol"},{"key":"2023013107274142700_bty772-B25","doi-asserted-by":"crossref","first-page":"i75","DOI":"10.1093\/bioinformatics\/btx229","article-title":"Orthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference","volume":"33","author":"Train","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013107274142700_bty772-B26","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1101\/gr.073585.107","article-title":"EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates","volume":"19","author":"Vilella","year":"2009","journal-title":"Genome Res"},{"key":"2023013107274142700_bty772-B27","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1214\/aoms\/1177732360","article-title":"The large-sample distribution of the likelihood ratio for testing composite hypotheses","volume":"9","author":"Wilks","year":"1938","journal-title":"Ann. Math. Stat"},{"key":"2023013107274142700_bty772-B28","doi-asserted-by":"crossref","first-page":"31.","DOI":"10.1186\/s13742-016-0136-3","article-title":"AGOUTI: improving genome assembly and annotation using transcriptome data","volume":"5","author":"Zhang","year":"2016","journal-title":"Gigascience"},{"key":"2023013107274142700_bty772-B29","doi-asserted-by":"crossref","first-page":"3193","DOI":"10.1093\/bioinformatics\/btw378","article-title":"PEP_scaffolder: using (homologous) proteins to scaffold genomes","volume":"32","author":"Zhu","year":"2016","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1159\/48968136\/bioinformatics_35_7_1159.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1159\/48968136\/bioinformatics_35_7_1159.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T05:29:22Z","timestamp":1675142962000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/7\/1159\/5089230"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,9,1]]},"references-count":29,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2019,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty772","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/182550","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,4,1]]},"published":{"date-parts":[[2018,9,1]]}}}