{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T05:15:58Z","timestamp":1770354958323,"version":"3.49.0"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2020,7,13]],"date-time":"2020-07-13T00:00:00Z","timestamp":1594598400000},"content-version":"vor","delay-in-days":12,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["NSF-1815485"],"award-info":[{"award-number":["NSF-1815485"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["NSF-1845967"],"award-info":[{"award-number":["NSF-1845967"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100016353","name":"San Diego Supercomputer Center","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100016353","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["ACI-1053575"],"award-info":[{"award-number":["ACI-1053575"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. When constituents are absent from the reference set, we seek to phylogenetically position them with respect to the underlying tree of the reference species. This simple yet fundamental problem (which we call phylogenetic double-placement) has enjoyed surprisingly little attention in the literature. As genome skimming (low-pass sequencing of genomes at low coverage, precluding assembly) becomes more prevalent, this problem finds wide-ranging applications in areas as varied as biodiversity research, food production and provenance, and evolutionary reconstruction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce a model that relates distances between a mixed sample and reference species to the distances between constituents and reference species. Our model is based on Jaccard indices computed between each sample represented as k-mer sets. The model, built on several assumptions and approximations, allows us to formalize the phylogenetic double-placement problem as a non-convex optimization problem that decomposes mixture distances and performs phylogenetic placement simultaneously. Using a variety of techniques, we are able to solve this optimization problem numerically. We test the resulting method, called MIxed Sample Analysis tool (MISA), on a varied set of simulated and biological datasets. Despite all the assumptions used, the method performs remarkably well in practice.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The software and data are available at https:\/\/github.com\/balabanmetin\/misa and https:\/\/github.com\/balabanmetin\/misa-data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa489","type":"journal-article","created":{"date-parts":[[2020,5,7]],"date-time":"2020-05-07T19:10:19Z","timestamp":1588878619000},"page":"i335-i343","source":"Crossref","is-referenced-by-count":13,"title":["Phylogenetic double placement of mixed samples"],"prefix":"10.1093","volume":"36","author":[{"given":"Metin","family":"Balaban","sequence":"first","affiliation":[{"name":"Bioinformatics and Systems Biology Department, University of California San Diego , San Diego, CA 92093, USA"}]},{"given":"Siavash","family":"Mirarab","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, University of California San Diego , San Diego, CA 92093, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,7,13]]},"reference":[{"key":"2024021913333914000_btaa489-B1","first-page":"566","author":"Balaban","year":"2020"},{"key":"2024021913333914000_btaa489-B2","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/sysbio\/syy054","article-title":"EPA-ng: massively parallel evolutionary placement of genetic sequences","volume":"68","author":"Barbera","year":"2019","journal-title":"System. Biol"},{"key":"2024021913333914000_btaa489-B3","first-page":"896","author":"Boyd","year":"2017"},{"key":"2024021913333914000_btaa489-B4","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1038\/nmeth.1358","article-title":"Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models","volume":"6","author":"Brady","year":"2009","journal-title":"Nat. Methods"},{"key":"2024021913333914000_btaa489-B5","author":"Bushnell","year":"2014"},{"key":"2024021913333914000_btaa489-B6","first-page":"233","article-title":"Phylogenetic analysis. Models and estimation procedures","volume":"19","author":"Cavalli-Sforza","year":"1967","journal-title":"Am. J. Hum. Genet"},{"key":"2024021913333914000_btaa489-B7","author":"Conn","year":"2000"},{"key":"2024021913333914000_btaa489-B8","doi-asserted-by":"crossref","first-page":"2296","DOI":"10.1093\/bioinformatics\/btn436","article-title":"Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison","volume":"24","author":"Dai","year":"2008","journal-title":"Bioinformatics"},{"key":"2024021913333914000_btaa489-B9","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1007\/s00442-017-3968-3","article-title":"Nutritional composition of honey bee food stores vary with floral composition","volume":"185","author":"Donkersley","year":"2017","journal-title":"Oecologia"},{"key":"2024021913333914000_btaa489-B10","doi-asserted-by":"crossref","first-page":"1610","DOI":"10.1101\/gr.076075.108","article-title":"Reconstruction of the genome origins and evolution of the hybrid lager yeast Saccharomyces pastorianus","volume":"18","author":"Dunn","year":"2008","journal-title":"Genome Res"},{"key":"2024021913333914000_btaa489-B11","doi-asserted-by":"crossref","first-page":"522","DOI":"10.1186\/s12864-015-1647-5","article-title":"An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data","volume":"16","author":"Fan","year":"2015","journal-title":"BMC Genomics"},{"key":"2024021913333914000_btaa489-B12","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1126\/science.155.3760.279","article-title":"Construction of phylogenetic trees","volume":"155","author":"Fitch","year":"1967","journal-title":"Science"},{"key":"2024021913333914000_btaa489-B13","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1093\/bioinformatics\/14.1.68","article-title":"SplitsTree: analyzing and visualizing evolutionary data","volume":"14","author":"Huson","year":"1998","journal-title":"Bioinformatics"},{"key":"2024021913333914000_btaa489-B14","first-page":"21","article-title":"Evolution of protein molecules","author":"Jukes","year":"1969"},{"key":"2024021913333914000_btaa489-B15","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1093\/bioinformatics\/btt336","article-title":"Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing","volume":"29","author":"Koslicki","year":"2013","journal-title":"Bioinformatics"},{"key":"2024021913333914000_btaa489-B16","doi-asserted-by":"crossref","first-page":"e91784","DOI":"10.1371\/journal.pone.0091784","article-title":"WGSQuikr: fast whole-genome shotgun metagenomic classification","volume":"9","author":"Koslicki","year":"2014","journal-title":"PLoS One"},{"key":"2024021913333914000_btaa489-B17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3389\/fmicb.2018.02253","article-title":"A unique Saccharomyces cerevisiae \u00d7 Saccharomyces uvarum hybrid isolated from norwegian farmhouse beer: characterization and reconstruction","volume":"9","author":"Krogerus","year":"2018","journal-title":"Front. Microbiol"},{"key":"2024021913333914000_btaa489-B18","first-page":"2835","article-title":"sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing","volume":"35","author":"Langdon","year":"2018"},{"key":"2024021913333914000_btaa489-B19","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1038\/s41559-019-0998-8","article-title":"Fermentation innovation through complex hybridization of wild and domesticated yeasts","volume":"3","author":"Langdon","year":"2019","journal-title":"Nat. Ecol. Evol"},{"key":"2024021913333914000_btaa489-B20","doi-asserted-by":"crossref","first-page":"2798","DOI":"10.1093\/molbev\/msv150","article-title":"FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program","volume":"32","author":"Lefort","year":"2015","journal-title":"Mol. Biol. Evol"},{"key":"2024021913333914000_btaa489-B21","doi-asserted-by":"crossref","first-page":"14539","DOI":"10.1073\/pnas.1105430108","article-title":"Microbe domestication and the identification of the wild genetic stock of lager-brewing yeast","volume":"108","author":"Libkind","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2024021913333914000_btaa489-B22","author":"Liu","year":"2011"},{"key":"2024021913333914000_btaa489-B23","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1038\/nature05706","article-title":"Hybrid speciation","volume":"446","author":"Mallet","year":"2007","journal-title":"Nature"},{"key":"2024021913333914000_btaa489-B24","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1186\/1471-2105-11-538","article-title":"pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree","volume":"11","author":"Matsen","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2024021913333914000_btaa489-B25","doi-asserted-by":"crossref","first-page":"e31009","DOI":"10.1371\/journal.pone.0031009","article-title":"A format for phylogenetic placements","volume":"7","author":"Matsen","year":"2012","journal-title":"PLoS One"},{"key":"2024021913333914000_btaa489-B26","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/s13059-017-1299-7","article-title":"Comprehensive benchmarking and ensemble approaches for metagenomic classifiers","volume":"18","author":"McIntyre","year":"2017","journal-title":"Genome Biol"},{"key":"2024021913333914000_btaa489-B27","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/s13059-019-1646-y","article-title":"Assessing taxonomic metagenome profilers with OPAL","volume":"20","author":"Meyer","year":"2019","journal-title":"Genome Biol"},{"key":"2024021913333914000_btaa489-B28","doi-asserted-by":"crossref","first-page":"3131","DOI":"10.1534\/g3.118.200160","article-title":"Highly contiguous genome assemblies of 15 Drosophila species generated using nanopore sequencing","volume":"8","author":"Miller","year":"2018","journal-title":"G3 Genes Genomes Genet"},{"key":"2024021913333914000_btaa489-B29","first-page":"247","volume-title":"Pacific Symposium on Biocomputing","author":"Mirarab","year":"2012"},{"key":"2024021913333914000_btaa489-B30","author":"Moshiri","year":"2018"},{"key":"2024021913333914000_btaa489-B31","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1016\/j.tree.2013.09.004","article-title":"Computational approaches to species phylogeny inference and gene tree reconciliation","volume":"28","author":"Nakhleh","year":"2013","journal-title":"Trends Ecol. Evol"},{"key":"2024021913333914000_btaa489-B32","doi-asserted-by":"crossref","first-page":"3548","DOI":"10.1093\/bioinformatics\/btu721","article-title":"TIPP: taxonomic identification and phylogenetic profiling","volume":"30","author":"Nguyen","year":"2014","journal-title":"Bioinformatics"},{"key":"2024021913333914000_btaa489-B33","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2024021913333914000_btaa489-B34","doi-asserted-by":"crossref","DOI":"10.1111\/1755-0998.13135","article-title":"On the impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters","volume":"20","author":"Rachtman","year":"2020","journal-title":"Mol. Ecol. Resources"},{"key":"2024021913333914000_btaa489-B35","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1093\/bioinformatics\/btq619","article-title":"NBC: the naive Bayes classification tool webserver for taxonomic classification of metagenomic reads","volume":"27","author":"Rosen","year":"2011","journal-title":"Bioinformatics"},{"key":"2024021913333914000_btaa489-B36","doi-asserted-by":"crossref","first-page":"2634","DOI":"10.1038\/srep02634","article-title":"Next-Generation Anchor Based Phylogeny (NexABP): constructing phylogeny from Next-generation sequencing data","volume":"3","author":"Roychowdhury","year":"2013","journal-title":"Sci. Rep"},{"key":"2024021913333914000_btaa489-B37","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s13059-019-1632-4","article-title":"Skmer: assembly-free and alignment-free sample identification using genome skims","volume":"20","author":"Sarmashghi","year":"2019","journal-title":"Genome Biol"},{"key":"2024021913333914000_btaa489-B38","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1038\/nmeth.4458","article-title":"Critical assessment of metagenome interpretation\u2014a benchmark of metagenomics software","volume":"14","author":"Sczyrba","year":"2017","journal-title":"Nat. Methods"},{"key":"2024021913333914000_btaa489-B39","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1038\/nmeth.2066","article-title":"Metagenomic microbial community profiling using unique clade-specific marker genes","volume":"9","author":"Segata","year":"2012","journal-title":"Nat. Methods"},{"key":"2024021913333914000_btaa489-B40","doi-asserted-by":"crossref","first-page":"3927","DOI":"10.1534\/g3.116.034744","article-title":"Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data","volume":"6","author":"Shen","year":"2016","journal-title":"G3 Genes Genomes Genet"},{"key":"2024021913333914000_btaa489-B41","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1186\/1471-2164-11-461","article-title":"MLTreeMap\u2014accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies","volume":"11","author":"Stark","year":"2010","journal-title":"BMC Genomics"},{"key":"2024021913333914000_btaa489-B42","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1093\/dnares\/dsx026","article-title":"The evolutionary history of Saccharomyces species inferred from completed mitochondrial genomes and revision in the \u2018yeast mitochondrial genetic code\u2019","volume":"24","author":"Sulo","year":"2017","journal-title":"DNA Res"},{"key":"2024021913333914000_btaa489-B43","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/nmeth.2693","article-title":"Metagenomic species profiling using universal phylogenetic marker genes","volume":"10","author":"Sunagawa","year":"2013","journal-title":"Nat. Methods"},{"key":"2024021913333914000_btaa489-B44","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1186\/s13059-019-1872-3","article-title":"Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression","volume":"20","author":"Tang","year":"2019","journal-title":"Genome Biol"},{"key":"2024021913333914000_btaa489-B45","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1089\/cmb.2006.13.336","article-title":"The average common substring approach to phylogenomic reconstruction","volume":"13","author":"Ulitsky","year":"2006","journal-title":"J. Comput. Biol"},{"key":"2024021913333914000_btaa489-B46","first-page":"261","author":"Virtanen","year":"2020"},{"key":"2024021913333914000_btaa489-B47","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2024021913333914000_btaa489-B48","doi-asserted-by":"crossref","first-page":"e33","DOI":"10.1093\/nar\/gkn075","article-title":"Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction","volume":"36","author":"Yang","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2024021913333914000_btaa489-B49","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1016\/j.cell.2019.07.010","article-title":"Benchmarking metagenomics tools for taxonomic classification","volume":"178","author":"Ye","year":"2019","journal-title":"Cell"},{"key":"2024021913333914000_btaa489-B50","doi-asserted-by":"crossref","first-page":"e75","DOI":"10.1093\/nar\/gkt003","article-title":"Co-phylog: an assembly-free phylogenomic approach for closely related organisms","volume":"41","author":"Yi","year":"2013","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i335\/56702480\/bioinformatics_36_supplement1_i335.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i335\/56702480\/bioinformatics_36_supplement1_i335.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T13:42:49Z","timestamp":1708350169000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_1\/i335\/5870522"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,1]]},"references-count":50,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2020,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa489","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,7]]},"published":{"date-parts":[[2020,7,1]]}}}