{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T19:15:15Z","timestamp":1775330115810,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-1838344"],"award-info":[{"award-number":["DBI-1838344"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-1759943"],"award-info":[{"award-number":["DBI-1759943"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Simulation is an essential technique for generating biomolecular data with a \u2018known\u2019 history for use in validating phylogenetic inference and other evolutionary methods. On longer time scales, simulation supports investigations of equilibrium behavior and provides a formal framework for testing competing evolutionary hypotheses. Twenty years of molecular evolution research have produced a rich repertoire of simulation methods. However, current models do not capture the stringent constraints acting on the domain insertions, duplications, and deletions by which multidomain architectures evolve. Although these processes have the potential to generate any combination of domains, only a tiny fraction of possible domain combinations are observed in nature. Modeling these stringent constraints on domain order and co-occurrence is a fundamental challenge in domain architecture simulation that does not arise with sequence and gene family simulation.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here, we introduce a stochastic model of domain architecture evolution to simulate evolutionary trajectories that reflect the constraints on domain order and co-occurrence observed in nature. This framework is implemented in a novel domain architecture simulator, DomArchov, using the Metropolis\u2013Hastings algorithm with data-driven transition probabilities. The use of a data-driven event module enables quick and easy redeployment of the simulator for use in different taxonomic and protein function contexts. Using empirical evaluation with metazoan datasets, we demonstrate that domain architectures simulated by DomArchov recapitulate properties of genuine domain architectures that reflect the constraints on domain order and adjacency seen in nature. This work expands the realm of evolutionary processes that are amenable to simulation.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>DomArchov is written in Python 3 and is available at http:\/\/www.cs.cmu.edu\/~durand\/DomArchov. The data underlying this article are available via the same link.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac242","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T11:10:15Z","timestamp":1649934615000},"page":"i134-i142","source":"Crossref","is-referenced-by-count":5,"title":["Simulating domain architecture evolution"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9838-553X","authenticated-orcid":false,"given":"Xiaoyue","family":"Cui","sequence":"first","affiliation":[{"name":"Computational Biology, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4068-2707","authenticated-orcid":false,"given":"Yifan","family":"Xue","sequence":"additional","affiliation":[{"name":"Computational Biology, Carnegie Mellon University , Pittsburgh, PA 15213, USA"},{"name":"Department of Biological Sciences, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Collin","family":"McCormack","sequence":"additional","affiliation":[{"name":"Computational Biology, Carnegie Mellon University , Pittsburgh, PA 15213, USA"},{"name":"Department of Biological Sciences, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alejandro","family":"Garces","sequence":"additional","affiliation":[{"name":"Department of Biological Sciences, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas W","family":"Rachman","sequence":"additional","affiliation":[{"name":"Computational Biology, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Yi","sequence":"additional","affiliation":[{"name":"Computational Biology, Carnegie Mellon University , Pittsburgh, PA 15213, USA"},{"name":"Department of Biological Sciences, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5137-9485","authenticated-orcid":false,"given":"Maureen","family":"Stolzer","sequence":"additional","affiliation":[{"name":"Department of Biological Sciences, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3505-6640","authenticated-orcid":false,"given":"Dannie","family":"Durand","sequence":"additional","affiliation":[{"name":"Department of Biological Sciences, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"D376","DOI":"10.1093\/nar\/gkz1064","article-title":"The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures","volume":"48","author":"Andreeva","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1023\/A:1026113408773","article-title":"Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination","volume":"4","author":"Apic","year":"2003","journal-title":"J. Struct. Funct. Genomics"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1006\/jmbi.2001.5288","article-title":"The geometry of domain combination in proteins","volume":"315","author":"Bashton","year":"2002","journal-title":"J. Mol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1101\/gr.6943508","article-title":"Evolution of protein domain promiscuity in eukaryotes","volume":"18","author":"Basu","year":"2008","journal-title":"Genome Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1093\/bib\/bbn057","article-title":"Domain mobility in proteins: functional and evolutionary implications","volume":"10","author":"Basu","year":"2009","journal-title":"Brief. Bioinform"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1016\/j.jmb.2005.08.067","article-title":"Domain rearrangements in protein evolution","volume":"353","author":"Bj\u00f6rklund","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"e114","DOI":"10.1371\/journal.pcbi.0020114","article-title":"Expansion of protein domain repeats","volume":"2","author":"Bj\u00f6rklund","year":"2006","journal-title":"PLoS Comput. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1016\/j.jmb.2010.07.011","article-title":"Nebulin: a study of protein repeat evolution","volume":"402","author":"Bj\u00f6rklund","year":"2010","journal-title":"J. Mol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"D344","DOI":"10.1093\/nar\/gkaa977","article-title":"The InterPro protein families and domains database: 20 years on","volume":"49","author":"Blum","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","DOI":"10.1201\/b10905","volume-title":"Handbook of Markov Chain Monte Carlo","author":"Brooks","year":"2011"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1042\/BST0370751","article-title":"The evolution of protein domain families","volume":"37","author":"Buljan","year":"2009","journal-title":"Biochem. Soc. Trans"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"R74","DOI":"10.1186\/gb-2010-11-7-r74","article-title":"Quantifying the mechanisms of domain gain in animal proteins","volume":"11","author":"Buljan","year":"2010","journal-title":"Genome Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"784","DOI":"10.1039\/C0MB00182A","article-title":"Evolution of domain promiscuity in eukaryotic genomes-a perspective from the inferred ancestral domain architectures","volume":"7","author":"Cohen-Gihon","year":"2011","journal-title":"Mol. Biosyst"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1093\/gbe\/evu228","article-title":"New tricks for \u201cold\u201d domains: how novel architectures and promiscuous hubs contributed to the organization and evolution of the ECM","volume":"6","author":"Cromar","year":"2014","journal-title":"Genome Biol. Evol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"pii:baw013","DOI":"10.1093\/database\/baw013","article-title":"PhyloPro2.0: a database for the dynamic exploration of phylogenetically conserved proteins and their domain architectures across the Eukarya","volume":"2016","author":"Cromar","year":"2016","journal-title":"Database (Oxford)"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"1286","DOI":"10.1093\/bioinformatics\/btz710","article-title":"Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages","volume":"36","author":"Dav\u00edn","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/s12862-020-1591-0","article-title":"The modular nature of protein evolution: domain rearrangement rates across eukaryotic life","volume":"20","author":"Dohmen","year":"2020","journal-title":"BMC Evol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1214\/ss\/1177011136","article-title":"Inference from iterative simulation using multiple sequences","volume":"7","author":"Gelman","year":"1992","journal-title":"Stat. Sci"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/nar\/30.1.268","article-title":"SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments","volume":"30","author":"Gough","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1038\/nrm2144","article-title":"The folding and evolution of multidomain proteins","volume":"8","author":"Han","year":"2007","journal-title":"Nat. Rev. Mol. Cell Biol"},{"key":"2023041407531413600_","volume-title":"Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition","author":"Jurafsky","year":"2008","edition":"2nd edn."},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/1471-2148-2-18","article-title":"Birth and death of protein domains: a simple model of evolution explains power law behavior","volume":"2","author":"Karev","year":"2002","journal-title":"BMC Evol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/1471-2148-4-32","article-title":"Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models","volume":"4","author":"Karev","year":"2004","journal-title":"BMC Evol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1111\/j.2517-6161.1949.tb00032.x","article-title":"Stochastic processes and population growth","volume":"11","author":"Kendall","year":"1949","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"2133","DOI":"10.1093\/molbev\/mss078","article-title":"REvolver: modeling sequence evolution under domain constraints","volume":"29","author":"Koestler","year":"2012","journal-title":"Mol. Biol. Evol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1186\/1471-2105-10-39","article-title":"Protein domain organisation: adding order","volume":"10","author":"Kummerfeld","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.tig.2004.11.007","article-title":"Relative rates of gene fusion and fission in mutli-domain proteins","author":"Kummerfeld","year":"2005","journal-title":"Trends Genet"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"3496","DOI":"10.1093\/bioinformatics\/btz081","article-title":"SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution","volume":"35","author":"Kundu","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"D257","DOI":"10.1093\/nar\/gkj079","article-title":"SMART 5: domains in the context of genomes and networks","volume":"34","author":"Letunic","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"D265","DOI":"10.1093\/nar\/gkz991","article-title":"CDD\/SPARCLE: the conserved domain database in 2020","volume":"48","author":"Lu","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1093\/sysbio\/syv082","article-title":"SimPhy: phylogenomic simulation of gene, locus, and species trees","volume":"65","author":"Mallo","year":"2016","journal-title":"Syst. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1126\/science.285.5428.751","article-title":"Detecting protein function and protein-protein interactions from genome sequences","volume":"285","author":"Marcotte","year":"1999","journal-title":"Science"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"PFAM: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1016\/j.tibs.2008.05.008","article-title":"Arrangements in the modular evolution of proteins","volume":"33","author":"Moore","year":"2008","journal-title":"Trends Biochem. Sci"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1089\/cmb.2006.13.351","article-title":"Graph theoretical insights into evolution of multidomain proteins","volume":"13","author":"Przytycka","year":"2006","journal-title":"J. Comput. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"3170","DOI":"10.1093\/molbev\/msw194","article-title":"Evolution of protein domain repeats in metazoa","volume":"33","author":"Sch\u00fcler","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1186\/1471-2105-14-209","article-title":"GenPhyloData: realistic simulation of gene family evolution","volume":"14","author":"Sj\u00f6strand","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/S0168-9525(99)01924-1","article-title":"Genome evolution. Gene fusion versus gene fission","volume":"16","author":"Snel","year":"2000","journal-title":"Trends Genet"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"2581","DOI":"10.1093\/molbev\/msp174","article-title":"Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0","volume":"26","author":"Strope","year":"2009","journal-title":"Mol. Biol. Evol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"5064","DOI":"10.1111\/j.1742-4658.2005.04917.x","article-title":"Modules, multidomain proteins and organismic complexity","volume":"272","author":"Tordai","year":"2005","journal-title":"FEBS J"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.sbi.2004.03.011","article-title":"Structure, function and evolution of multidomain proteins","volume":"14","author":"Vogel","year":"2004","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1016\/j.jmb.2004.11.050","article-title":"The relationship between domain duplication and recombination","volume":"346","author":"Vogel","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"2037","DOI":"10.1111\/j.1742-4658.2006.05220.x","article-title":"Domain deletions and substitutions in the modular protein evolution","volume":"273","author":"Weiner","year":"2006","journal-title":"FEBS J"},{"key":"2023041407531413600_","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1101\/gr.1610504","article-title":"Comparative analysis of protein domain organization","volume":"14","author":"Ye","year":"2004","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i134\/49886656\/btac242.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i134\/49886656\/btac242.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,22]],"date-time":"2024-09-22T06:55:50Z","timestamp":1726988150000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_1\/i134\/6617482"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,24]]},"references-count":44,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2022,6,24]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac242","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,1]]},"published":{"date-parts":[[2022,6,24]]}}}