{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:28Z","timestamp":1772138068632,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Australian Research Council Discovery","award":["210101802"],"award-info":[{"award-number":["210101802"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Insertions and deletions (indels) influence the genetic code in fundamentally distinct ways from substitutions, significantly impacting gene product structure and function. Despite their influence, the evolutionary history of indels is often neglected in phylogenetic tree inference and ancestral sequence reconstruction, hindering efforts to comprehend biological diversity determinants and engineer variants for medical and industrial applications.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We frame determining the optimal history of indel events as a single Mixed-Integer Programming (MIP) problem, across all branch points in a phylogenetic tree adhering to topological constraints, and all sites implied by a given set of aligned, extant sequences. By disentangling the impact on ancestral sequences at each branch point, this approach identifies the minimal indel events that jointly explain the diversity in sequences mapped to the tips of that tree. MIP can recover alternate optimal indel histories, if available. We evaluated MIP for indel inference on a dataset comprising 15 real phylogenetic trees associated with protein families ranging from 165 to 2000 extant sequences, and on 60 synthetic trees at comparable scales of data and reflecting realistic rates of mutation. Across relevant metrics, MIP outperformed alternative parsimony-based approaches and reported the fewest indel events, on par or below their occurrence in synthetic datasets. MIP offers a rational justification for indel patterns in extant sequences; importantly, it uniquely identifies global optima on complex protein data sets without making unrealistic assumptions of independence or evolutionary underpinnings, promising a deeper understanding of molecular evolution and aiding novel protein design.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The implementation is available via GitHub at https:\/\/github.com\/santule\/indelmip.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae254","type":"journal-article","created":{"date-parts":[[2024,5,9]],"date-time":"2024-05-09T08:43:12Z","timestamp":1715244192000},"page":"i277-i286","source":"Crossref","is-referenced-by-count":3,"title":["Optimal phylogenetic reconstruction of insertion and deletion events"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5630-5792","authenticated-orcid":false,"given":"Sanjana","family":"Tule","sequence":"first","affiliation":[{"name":"School of Chemistry and Molecular Biosciences, The University of Queensland , Brisbane, QLD 4072, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0487-2629","authenticated-orcid":false,"given":"Gabriel","family":"Foley","sequence":"additional","affiliation":[{"name":"School of Chemistry and Molecular Biosciences, The University of Queensland , Brisbane, QLD 4072, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2829-957X","authenticated-orcid":false,"given":"Chongting","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Chemistry and Molecular Biosciences, The University of Queensland , Brisbane, QLD 4072, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2240-7245","authenticated-orcid":false,"given":"Michael","family":"Forbes","sequence":"additional","affiliation":[{"name":"School of Mathematics and Physics, The University of Queensland , Brisbane, QLD 4072, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3548-268X","authenticated-orcid":false,"given":"Mikael","family":"Bod\u00e9n","sequence":"additional","affiliation":[{"name":"School of Chemistry and Molecular Biosciences, The University of Queensland , Brisbane, QLD 4072, Australia"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809040818300_btae254-B1","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1073\/pnas.1220450110","article-title":"Evolutionary inference via the poisson indel process","volume":"110","author":"Bouchard-C\u00f4t\u00e9","year":"2013","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024062809040818300_btae254-B2","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1142\/S0219720006002168","article-title":"On the inference of parsimonious indel evolutionary scenarios","volume":"4","author":"Chindelevitch","year":"2006","journal-title":"J Bioinform Comput Biol"},{"key":"2024062809040818300_btae254-B3","doi-asserted-by":"crossref","first-page":"R37","DOI":"10.1186\/gb-2010-11-4-r37","article-title":"Phylogenetic assessment of alignments reveals neglected tree signal in gaps","volume":"11","author":"Dessimoz","year":"2010","journal-title":"Genome Biol"},{"key":"2024062809040818300_btae254-B4","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1186\/1471-2148-9-211","article-title":"Phylogenetic inference under varying proportions of indel-induced alignment gaps","volume":"9","author":"Dwivedi","year":"2009","journal-title":"BMC Evol Biol"},{"key":"2024062809040818300_btae254-B5","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1186\/1471-2105-5-123","article-title":"Gasp: gapped ancestral sequence prediction for proteins","volume":"5","author":"Edwards","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2024062809040818300_btae254-B6","first-page":"559","author":"Fischer","year":"2014"},{"key":"2024062809040818300_btae254-B7","doi-asserted-by":"crossref","first-page":"1879","DOI":"10.1093\/molbev\/msp098","article-title":"INDELible: a flexible simulator of biological sequence evolution","volume":"26","author":"Fletcher","year":"2009","journal-title":"Mol Biol Evol"},{"key":"2024062809040818300_btae254-B8","doi-asserted-by":"crossref","first-page":"e1010633","DOI":"10.1371\/journal.pcbi.1010633","article-title":"Engineering indel and substitution variants of diverse and ancient enzymes using graphical representation of ancestral sequence predictions","volume":"18","author":"Foley","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2024062809040818300_btae254-B9","author":"Fredslund"},{"key":"2024062809040818300_btae254-B10","doi-asserted-by":"crossref","first-page":"1546","DOI":"10.1093\/bioinformatics\/bth126","article-title":"Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems","volume":"20","author":"Grasso","year":"2004","journal-title":"Bioinformatics"},{"key":"2024062809040818300_btae254-B11","author":"Gurobi Optimization, LLC","year":"2023"},{"key":"2024062809040818300_btae254-B12","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1109\/TCBB.2008.119","article-title":"Parsimony score of phylogenetic networks: hardness results and a linear-time heuristic","volume":"6","author":"Jin","year":"2009","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2024062809040818300_btae254-B13","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1186\/1472-6807-10-24","article-title":"Systematic analysis of short internal indels and their impact on protein folding","volume":"10","author":"Kim","year":"2010","journal-title":"BMC Struct Biol"},{"key":"2024062809040818300_btae254-B14","doi-asserted-by":"crossref","first-page":"4453","DOI":"10.1093\/bioinformatics\/btz305","article-title":"RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference","volume":"35","author":"Kozlov","year":"2019","journal-title":"Bioinformatics"},{"key":"2024062809040818300_btae254-B15","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1046\/j.1420-9101.1991.4010009.x","article-title":"Multi-residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses","volume":"4","author":"Lloyd","year":"1991","journal-title":"J of Evolutionary Biology"},{"key":"2024062809040818300_btae254-B16","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1021\/acs.biochem.2c00188","article-title":"Insertions and deletions (indels): a missing piece of the protein engineering jigsaw","volume":"62","author":"Miton","year":"2023","journal-title":"Biochemistry"},{"key":"2024062809040818300_btae254-B17","article-title":"FireProtASR: a web server for fully automated ancestral sequence reconstruction","volume":"22","author":"Musil","year":"2020","journal-title":"Brief Bioinform"},{"key":"2024062809040818300_btae254-B18","doi-asserted-by":"crossref","first-page":"e49794","DOI":"10.1371\/journal.pone.0049794","article-title":"Re-mind the gap! insertion\u2014deletion data reveal neglected phylogenetic potential of the nuclear ribosomal internal transcribed spacer (ITS) of fungi","volume":"7","author":"Nagy","year":"2012","journal-title":"PLoS One"},{"key":"2024062809040818300_btae254-B19","doi-asserted-by":"crossref","first-page":"4290","DOI":"10.1093\/bioinformatics\/btz249","article-title":"Accounting for ambiguity in ancestral sequence reconstruction","volume":"35","author":"Oliva","year":"2019","journal-title":"Bioinformatics"},{"key":"2024062809040818300_btae254-B20","doi-asserted-by":"crossref","first-page":"890","DOI":"10.1093\/oxfordjournals.molbev.a026369","article-title":"A fast algorithm for joint reconstruction of ancestral amino acid sequences","volume":"17","author":"Pupko","year":"2000","journal-title":"Mol Biol Evol"},{"key":"2024062809040818300_btae254-B21","doi-asserted-by":"crossref","first-page":"e1000172","DOI":"10.1371\/journal.pcbi.1000172","article-title":"Probabilistic phylogenetic inference with insertions and deletions","volume":"4","author":"Rivas","year":"2008","journal-title":"PLoS Comput Biol"},{"key":"2024062809040818300_btae254-B22","doi-asserted-by":"crossref","first-page":"108010","DOI":"10.1016\/j.biotechadv.2022.108010","article-title":"Insertions and deletions in protein evolution and engineering","volume":"60","author":"Savino","year":"2022","journal-title":"Biotechnol Adv"},{"key":"2024062809040818300_btae254-B23","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13015-022-00216-w","article-title":"Treewidth-based algorithms for the small parsimony problem on networks","volume":"17","author":"Scornavacca","year":"2022","journal-title":"Algorithms Mol Biol"},{"key":"2024062809040818300_btae254-B24","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1016\/0958-1669(95)80067-0","article-title":"The emerging role of insertions and deletions in protein engineering","volume":"6","author":"Shortle","year":"1995","journal-title":"Curr Opin Biotechnol"},{"key":"2024062809040818300_btae254-B25","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1093\/sysbio\/49.2.369","article-title":"Gaps as characters in Sequence-Based phylogenetic analyses","volume":"49","author":"Simmons","year":"2000","journal-title":"Systematic Biology"},{"key":"2024062809040818300_btae254-B26","first-page":"265","author":"Snir","year":"2006"},{"key":"2024062809040818300_btae254-B27","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1016\/j.ympev.2012.10.023","article-title":"Incorporating indels as phylogenetic characters: impact for interfamilial relationships within arctoidea (mammalia: Carnivora)","volume":"66","author":"Luan","year":"2013","journal-title":"Molecular Phylogenetics and Evolution"},{"key":"2024062809040818300_btae254-B28","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1007\/BF02193625","article-title":"An evolutionary model for maximum likelihood alignment of DNA sequences","volume":"33","author":"Thorne","year":"1991","journal-title":"J Mol Evol"},{"key":"2024062809040818300_btae254-B29","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/BF00163848","article-title":"Inching toward reality: an improved likelihood model of sequence evolution","volume":"34","author":"Thorne","year":"1992","journal-title":"J Mol Evol"},{"key":"2024062809040818300_btae254-B30","doi-asserted-by":"crossref","first-page":"2399","DOI":"10.1002\/pro.5560051203","article-title":"Protein structural plasticity exemplified by insertion and deletion mutants in T4 lysozyme","volume":"5","author":"Vetter","year":"1996","journal-title":"Protein Sci"},{"key":"2024062809040818300_btae254-B31","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.1093\/molbev\/msm088","article-title":"PAML 4: phylogenetic analysis by maximum likelihood","volume":"24","author":"Yang","year":"2007","journal-title":"Mol Biol Evol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i277\/58354750\/btae254.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i277\/58354750\/btae254.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T05:28:09Z","timestamp":1719552489000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i277\/7700856"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":31,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae254","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.01.24.577130","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}