{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T05:05:22Z","timestamp":1773896722735,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2019,3,15]],"date-time":"2019-03-15T00:00:00Z","timestamp":1552608000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IOS 1546858"],"award-info":[{"award-number":["IOS 1546858"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Orphan Genes: An Untapped Genetic Reservoir of Novel Traits"},{"name":"Center for Metabolic Biology, Iowa State University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene\u2019s phylostratum.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code available at https:\/\/github.com\/arendsee\/phylostratr.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz171","type":"journal-article","created":{"date-parts":[[2019,3,13]],"date-time":"2019-03-13T16:12:44Z","timestamp":1552493564000},"page":"3617-3627","source":"Crossref","is-referenced-by-count":52,"title":["<tt>phylostratr<\/tt>\n                    : a framework for phylostratigraphy"],"prefix":"10.1093","volume":"35","author":[{"given":"Zebulun","family":"Arendsee","sequence":"first","affiliation":[{"name":"Bioinformatics and Computational Biology Program, Iowa State University , Ames, IA, USA"},{"name":"Genetics, Development, and Cell Biology, Iowa State University , Ames, IA, USA"},{"name":"Center for Metabolic Biology, Iowa State University , Ames, IA, USA"}]},{"given":"Jing","family":"Li","sequence":"additional","affiliation":[{"name":"Bioinformatics and Computational Biology Program, Iowa State University , Ames, IA, USA"},{"name":"Genetics, Development, and Cell Biology, Iowa State University , Ames, IA, USA"}]},{"given":"Urminder","family":"Singh","sequence":"additional","affiliation":[{"name":"Bioinformatics and Computational Biology Program, Iowa State University , Ames, IA, USA"},{"name":"Genetics, Development, and Cell Biology, Iowa State University , Ames, IA, USA"}]},{"given":"Arun","family":"Seetharam","sequence":"additional","affiliation":[{"name":"Genetics, Development, and Cell Biology, Iowa State University , Ames, IA, USA"},{"name":"Genome Informatics Facility, Iowa State University , Ames, IA, USA"}]},{"given":"Karin","family":"Dorman","sequence":"additional","affiliation":[{"name":"Bioinformatics and Computational Biology Program, Iowa State University , Ames, IA, USA"},{"name":"Genetics, Development, and Cell Biology, Iowa State University , Ames, IA, USA"},{"name":"Department of Statistics, Iowa State University , Ames, IA, USA"}]},{"given":"Eve Syrkin","family":"Wurtele","sequence":"additional","affiliation":[{"name":"Bioinformatics and Computational Biology Program, Iowa State University , Ames, IA, USA"},{"name":"Genetics, Development, and Cell Biology, Iowa State University , Ames, IA, USA"},{"name":"Center for Metabolic Biology, Iowa State University , Ames, IA, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,3,14]]},"reference":[{"key":"2023020108350934600_btz171-B1","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1016\/j.tplants.2014.07.003","article-title":"Coming of age: orphan genes in plants","volume":"19","author":"Arendsee","year":"2014","journal-title":"Trends Plant Sci"},{"key":"2023020108350934600_btz171-B2","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1126\/science.1137614","article-title":"Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry","volume":"316","author":"Asara","year":"2007","journal-title":"Science"},{"key":"2023020108350934600_btz171-B3","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.plantsci.2017.10.014","article-title":"Raising orphans from a metadata morass: a researcher\u2019s guide to re-use of public \u2019omics data","volume":"267","author":"Bhandary","year":"2018","journal-title":"Plant Sci"},{"key":"2023020108350934600_btz171-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1146\/annurev-genet-120215-035329","article-title":"Witnessing genome evolution: experimental reconstruction of endosymbiotic and horizontal gene transfer","volume":"51","author":"Bock","year":"2017","journal-title":"Annu. Rev. Genet"},{"key":"2023020108350934600_btz171-B5","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1038\/nature11184","article-title":"Proto-genes and de novo gene birth","volume":"487","author":"Carvunis","year":"2012","journal-title":"Nature"},{"key":"2023020108350934600_btz171-B6","first-page":"2906","article-title":"From de novo to \u201cde nono\u201d: the majority of novel protein-coding genes identified with phylostratigraphy are old genes or recent duplicates","volume":"10","author":"Casola","year":"2018","journal-title":"Genome Biol. Evol"},{"key":"2023020108350934600_btz171-B7","doi-asserted-by":"crossref","first-page":"3811","DOI":"10.1073\/pnas.94.8.3811","article-title":"Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish","volume":"94","author":"Chen","year":"1997","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108350934600_btz171-B8","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1111\/tpj.13415","article-title":"Araport11: a complete reannotation of the Arabidopsis thaliana reference genome","volume":"89","author":"Cheng","year":"2017","journal-title":"Plant J"},{"key":"2023020108350934600_btz171-B9","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1093\/molbev\/msv047","article-title":"A \u201cdevelopmental hourglass\u201d in fungi","volume":"32","author":"Cheng","year":"2015","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B10","first-page":"43, D204\u2013D212","article-title":"UniProt: a hub for protein information","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020108350934600_btz171-B11","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1016\/j.tig.2007.08.014","article-title":"A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages","volume":"23","author":"Domazet-Lo\u0161o","year":"2007","journal-title":"Trends Genet"},{"key":"2023020108350934600_btz171-B12","first-page":"843","article-title":"No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution","volume":"34","author":"Domazet-Lo\u0161o","year":"2017","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B13","doi-asserted-by":"crossref","first-page":"1221","DOI":"10.1093\/molbev\/msv012","article-title":"Evidence for active maintenance of phylotranscriptomic hourglass patterns in animal and plant embryogenesis","volume":"32","author":"Drost","year":"2015","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B14","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1093\/bioinformatics\/btx835","article-title":"myTAI: evolutionary transcriptomics with R","volume":"34","author":"Drost","year":"2018","journal-title":"Bioinformatics"},{"key":"2023020108350934600_btz171-B15","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1126\/science.1086132","article-title":"Structural dynamics of eukaryotic chromosome evolution","volume":"301","author":"Eichler","year":"2003","journal-title":"Science"},{"key":"2023020108350934600_btz171-B16","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1093\/bioinformatics\/btw122","article-title":"ORFanFinder: automated identification of taxonomically restricted orphan genes","volume":"32","author":"Ekstrom","year":"2016","journal-title":"Bioinformatics"},{"key":"2023020108350934600_btz171-B17","volume-title":"Statistical Methods in Bioinformatics: An Introduction","author":"Ewens","year":"2006"},{"key":"2023020108350934600_btz171-B18","doi-asserted-by":"crossref","first-page":"D136","DOI":"10.1093\/nar\/gkr1178","article-title":"The NCBI taxonomy database","volume":"40","author":"Federhen","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020108350934600_btz171-B19","doi-asserted-by":"crossref","first-page":"W30","DOI":"10.1093\/nar\/gkv397","article-title":"Hmmer web server: 2015 update","volume":"43","author":"Finn","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020108350934600_btz171-B20","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s10142-013-0345-0","article-title":"Horizontal gene transfer in plants","volume":"14","author":"Gao","year":"2014","journal-title":"Funct. Integr. Genomics"},{"key":"2023020108350934600_btz171-B21","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1126\/science.860134","article-title":"Evolution and tinkering","volume":"196","author":"Jacob","year":"1977","journal-title":"Science"},{"key":"2023020108350934600_btz171-B22","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1093\/gbe\/evz008","article-title":"The evolutionary traceability of a protein","volume":"11","author":"Jain","year":"2019","journal-title":"Genome Biol. Evol"},{"key":"2023020108350934600_btz171-B23","doi-asserted-by":"crossref","first-page":"e50226.","DOI":"10.1371\/journal.pone.0050226","article-title":"Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes","volume":"7","author":"Johnson","year":"2012","journal-title":"PLoS One"},{"key":"2023020108350934600_btz171-B24","doi-asserted-by":"crossref","first-page":"1313","DOI":"10.1101\/gr.101386.109","article-title":"Origins, evolution, and phenotypic impact of new genes","volume":"20","author":"Kaessmann","year":"2010","journal-title":"Genome Res"},{"key":"2023020108350934600_btz171-B25","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1016\/j.tig.2009.07.006","article-title":"More than just orphans: are taxonomically-restricted genes important in evolution?","volume":"25","author":"Khalturin","year":"2009","journal-title":"Trends Genet"},{"key":"2023020108350934600_btz171-B26","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1111\/febs.14504","article-title":"Origins and structural properties of novel and de novo protein domains during insect evolution","volume":"285","author":"Klasberg","year":"2018","journal-title":"The FEBS J"},{"key":"2023020108350934600_btz171-B27","doi-asserted-by":"crossref","first-page":"R66.","DOI":"10.1186\/gb-2013-14-6-r66","article-title":"Separating homeologs by phasing in the tetraploid wheat transcriptome","volume":"14","author":"Krasileva","year":"2013","journal-title":"Genome Biol"},{"key":"2023020108350934600_btz171-B28","doi-asserted-by":"crossref","first-page":"1812","DOI":"10.1093\/gbe\/evw113","article-title":"Towards consensus gene ages","volume":"8","author":"Liebeskind","year":"2016","journal-title":"Genome Biol. Evol"},{"key":"2023020108350934600_btz171-B29","doi-asserted-by":"crossref","first-page":"2823","DOI":"10.1093\/molbev\/msx210","article-title":"A comprehensive analysis of transcript-supported de novo genes in saccharomyces sensu stricto yeasts","volume":"34","author":"Lu","year":"2017","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B30","doi-asserted-by":"crossref","first-page":"17.","DOI":"10.1186\/2047-217X-3-17","article-title":"Data access for the 1,000 plants (1KP) project","volume":"3","author":"Matasci","year":"2014","journal-title":"Gigascience"},{"key":"2023020108350934600_btz171-B31","doi-asserted-by":"crossref","first-page":"567.","DOI":"10.1038\/nrg.2016.78","article-title":"Open questions in the study of de novo genes: what, how and why","volume":"17","author":"McLysaght","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2023020108350934600_btz171-B32","doi-asserted-by":"crossref","first-page":"3579","DOI":"10.1073\/pnas.1517551113","article-title":"Protein networks identify novel symbiogenetic genes resulting from plastid endosymbiosis","volume":"113","author":"M\u00e9heust","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108350934600_btz171-B33","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1093\/molbev\/msu286","article-title":"Phylostratigraphic bias creates spurious patterns of genome evolution","volume":"32","author":"Moyers","year":"2015","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B34","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1093\/molbev\/msw008","article-title":"Evaluating phylostratigraphic evidence for widespread de novo gene birth in genome evolution","volume":"33","author":"Moyers","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B35","doi-asserted-by":"crossref","first-page":"1519","DOI":"10.1093\/gbe\/evx109","article-title":"Further simulations and analyses demonstrate open problems of phylostratigraphy","volume":"9","author":"Moyers","year":"2017","journal-title":"Genome Biol. Evol"},{"key":"2023020108350934600_btz171-B36","doi-asserted-by":"crossref","first-page":"2037","DOI":"10.1093\/gbe\/evy161","article-title":"Toward reducing phylostratigraphic errors and biases","volume":"10","author":"Moyers","year":"2018","journal-title":"Genome Biol. Evol"},{"key":"2023020108350934600_btz171-B37","doi-asserted-by":"crossref","first-page":"117.","DOI":"10.1186\/1471-2164-14-117","article-title":"Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution","volume":"14","author":"Neme","year":"2013","journal-title":"BMC Genomics"},{"key":"2023020108350934600_btz171-B38","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/0471250953.bi0301s42","article-title":"An introduction to sequence similarity (\u201chomology\u201d) searching","volume":"42","author":"Pearson","year":"2013","journal-title":"Curr. Protoc. Bioinf"},{"key":"2023020108350934600_btz171-B39","doi-asserted-by":"crossref","first-page":"e32","DOI":"10.1093\/nar\/gkq953","article-title":"Metaphors: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score","volume":"39","author":"Pryszcz","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020108350934600_btz171-B40","doi-asserted-by":"crossref","first-page":"85.","DOI":"10.1186\/s13059-017-1214-2","article-title":"Horizontal gene transfer is not a hallmark of the human genome","volume":"18","author":"Salzberg","year":"2017","journal-title":"Genome Biol"},{"key":"2023020108350934600_btz171-B41","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1093\/molbev\/msu319","article-title":"Phylostratigraphic profiles in zebrafish uncover chordate origins of the vertebrate brain","volume":"32","author":"\u0160estak","year":"2015","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B42","first-page":"451","article-title":"Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny","volume":"18","author":"Smith","year":"2017","journal-title":"Briefings Bioinf"},{"key":"2023020108350934600_btz171-B43","doi-asserted-by":"crossref","first-page":"563.","DOI":"10.3390\/genes9110563","article-title":"Legume cytosolic and plastid acetyl-coenzyme\u2014a carboxylase genes differ by evolutionary patterns and selection pressure schemes acting before and after whole-genome duplications","volume":"9","author":"Szczepaniak","year":"2018","journal-title":"Genes"},{"key":"2023020108350934600_btz171-B44","doi-asserted-by":"crossref","first-page":"692","DOI":"10.1038\/nrg3053","article-title":"The evolutionary origin of orphan genes","volume":"12","author":"Tautz","year":"2011","journal-title":"Nat. Rev. Genet"},{"key":"2023020108350934600_btz171-B45","doi-asserted-by":"crossref","first-page":"2716","DOI":"10.1093\/gbe\/evy183","article-title":"Shared transcriptional control and disparate gain and loss of aphid parasitism genes","volume":"10","author":"Thorpe","year":"2018","journal-title":"Genome Biol. Evol"},{"key":"2023020108350934600_btz171-B46","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1093\/molbev\/msx315","article-title":"A molecular portrait of de novo genes in yeasts","volume":"35","author":"Vakirlis","year":"2018","journal-title":"Mol. Biol. Evol"},{"key":"2023020108350934600_btz171-B47","doi-asserted-by":"crossref","first-page":"E4859","DOI":"10.1073\/pnas.1323926111","article-title":"Phylotranscriptomic analysis of the origin and early diversification of land plants","volume":"111","author":"Wickett","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108350934600_btz171-B48","doi-asserted-by":"crossref","first-page":"e01024","DOI":"10.1128\/mBio.01024-18","article-title":"Tracing the de novo origin of protein-coding genes in yeast","volume":"9","author":"Wu","year":"2018","journal-title":"MBio"},{"key":"2023020108350934600_btz171-B49","doi-asserted-by":"crossref","first-page":"1660","DOI":"10.1093\/bioinformatics\/btu077","article-title":"SOAPdenovo-trans: de novo transcriptome assembly with short RNA-seq reads","volume":"30","author":"Xie","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020108350934600_btz171-B50","doi-asserted-by":"crossref","first-page":"1152","DOI":"10.1038\/ncomms2148","article-title":"Widespread impact of horizontal gene transfer on plant colonization of land","volume":"3","author":"Yue","year":"2012","journal-title":"Nat. Commun"},{"key":"2023020108350934600_btz171-B51","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1101\/gr.7.6.649","article-title":"Powerblast: a new network blast application for interactive or automated sequence analysis and annotation","volume":"7","author":"Zhang","year":"1997","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/19\/3617\/48976930\/bioinformatics_35_19_3617.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/19\/3617\/48976930\/bioinformatics_35_19_3617.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T14:42:42Z","timestamp":1675262562000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/19\/3617\/5380767"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,3,14]]},"references-count":51,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2019,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz171","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/360164","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,10,1]]},"published":{"date-parts":[[2019,3,14]]}}}