{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"name":"bioRxiv"}],"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T09:33:20Z","timestamp":1768469600767,"version":"3.49.0"},"posted":{"date-parts":[[2017,10,3]]},"group-title":"Bioinformatics","reference-count":9,"publisher":"openRxiv","license":[{"start":{"date-parts":[[2017,10,3]],"date-time":"2017-10-03T00:00:00Z","timestamp":1506988800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"accepted":{"date-parts":[[2017,12,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Advances in sequencing technologies have made it feasible to obtain massive datasets for phylogenomic inference, often consisting of large numbers of loci from multiple species and individuals. The phylogenomic analysis of next-generation sequencing (NGS) data implies a complex computational pipeline where multiple technical and methodological decisions are necessary that can influence the final tree obtained, like those related to coverage, assembly, mapping, variant calling and\/or phasing.<\/jats:p>\n                <\/jats:sec>\n                <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To assess the influence of these variables we introduce NGSphy, an open-source tool for the simulation of Illumina reads\/read counts obtained from haploid\/diploid individual genomes with thousands of independent gene families evolving under a common species tree. In order to resemble real NGS experiments, NGSphy includes multiple options to model sequencing coverage (depth) heterogeneity across species, individuals and loci, including off-target or uncaptured loci. For comprehensive simulations covering multiple evolutionary scenarios, parameter values for the different replicates can be sampled from user-defined statistical distributions.<\/jats:p>\n                <\/jats:sec>\n                <jats:sec>\n                  <jats:title>Availability<\/jats:title>\n                  <jats:p>\n                    Source code, full documentation and tutorials including a quick start guide are available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/github.com\/merlyescalona\/ngsphy\">http:\/\/github.com\/merlyescalona\/ngsphy<\/jats:ext-link>\n                    .\n                  <\/jats:p>\n                <\/jats:sec>\n                <jats:sec>\n                  <jats:title>Contact<\/jats:title>\n                  <jats:p>\n                    <jats:email>merlyescalona@uvigo.es<\/jats:email>\n                    .\n                    <jats:email>dposada@uvigo.es<\/jats:email>\n                  <\/jats:p>\n                <\/jats:sec>","DOI":"10.1101\/197715","type":"posted-content","created":{"date-parts":[[2017,10,4]],"date-time":"2017-10-04T01:10:17Z","timestamp":1507079417000},"source":"Crossref","is-referenced-by-count":0,"title":["NGSphy: phylogenomic simulation of next-generation sequencing data"],"prefix":"10.64898","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0213-4777","authenticated-orcid":false,"given":"Merly","family":"Escalona","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9705-1511","authenticated-orcid":false,"given":"Sara","family":"Rocha","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1407-3406","authenticated-orcid":false,"given":"David","family":"Posada","sequence":"additional","affiliation":[]}],"member":"54368","reference":[{"key":"2024080317334526000_197715v2.1","doi-asserted-by":"crossref","first-page":"1059","DOI":"10.1111\/1755-0998.12449","article-title":"Exon capture phylogenomics: efficacy across scales of divergence","volume":"16","year":"2016","journal-title":"Mol. Ecol. Resour"},{"key":"2024080317334526000_197715v2.2","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msp098"},{"key":"2024080317334526000_197715v2.3","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btr708"},{"key":"2024080317334526000_197715v2.4","doi-asserted-by":"crossref","unstructured":"Mallo, D. and Posada, D. (2016) Multilocus inference of species trees and DNA barcoding. Philos. Trans. R. Soc. Lond. B Biol. Sci., 371.","DOI":"10.1098\/rstb.2015.0335"},{"key":"2024080317334526000_197715v2.5","doi-asserted-by":"publisher","DOI":"10.1016\/j.ympev.2011.12.007"},{"key":"2024080317334526000_197715v2.6","doi-asserted-by":"publisher","DOI":"10.1101\/gr.107524.110"},{"key":"2024080317334526000_197715v2.7","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-017-1592-1"},{"key":"2024080317334526000_197715v2.8","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btq228"},{"key":"2024080317334526000_197715v2.9","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MCSE.2011.37","article-title":"The NumPy Array: A Structure for Efficient Numerical Computation","volume":"13","year":"2011","journal-title":"Computing in Science Engineering"}],"container-title":[],"original-title":[],"link":[{"URL":"https:\/\/syndication.highwire.org\/content\/doi\/10.1101\/197715","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T21:50:46Z","timestamp":1768427446000},"score":1,"resource":{"primary":{"URL":"http:\/\/biorxiv.org\/lookup\/doi\/10.1101\/197715"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,3]]},"references-count":9,"URL":"https:\/\/doi.org\/10.1101\/197715","relation":{},"subject":[],"published":{"date-parts":[[2017,10,3]]},"subtype":"preprint"}}