{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T15:57:10Z","timestamp":1773676630120,"version":"3.50.1"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"S9","license":[{"start":{"date-parts":[[2019,11,1]],"date-time":"2019-11-01T00:00:00Z","timestamp":1572566400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,11,22]],"date-time":"2019-11-22T00:00:00Z","timestamp":1574380800000},"content-version":"vor","delay-in-days":21,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003500","name":"Universit\u00e0 degli Studi di Padova","doi-asserted-by":"publisher","award":["-"],"award-info":[{"award-number":["-"]}],"id":[{"id":"10.13039\/501100003500","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>In the last few years, 16S rRNA gene sequencing (16S rDNA-seq) has seen a surprisingly rapid increase in election rate as a methodology to perform microbial community studies. Despite the considerable popularity of this technique, an exiguous number of specific tools are currently available for proper 16S rDNA-seq count data preprocessing and simulation. Indeed, the great majority of tools have been developed adapting methodologies previously used for bulk RNA-seq data, with poor assessment of their applicability in the metagenomics field. For such tools and the few ones specifically developed for 16S rDNA-seq data, performance assessment is challenging, mainly due to the complex nature of the data and the lack of realistic simulation models. In fact, to the best of our knowledge, no software thought for data simulation are available to directly obtain synthetic 16S rDNA-seq count tables that properly model heavy sparsity and compositionality typical of these data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In this paper we present metaSPARSim, a sparse count matrix simulator intended for usage in development of 16S rDNA-seq metagenomic data processing pipelines. metaSPARSim implements a new generative process that models the sequencing process with a Multivariate Hypergeometric distribution in order to realistically simulate 16S rDNA-seq count table, resembling real experimental data compositionality and sparsity. It provides ready-to-use count matrices and comes with the possibility to reproduce different pre-coded scenarios and to estimate simulation parameters from real experimental data. The tool is made available at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/sysbiobig.dei.unipd.it\/?q=Software#metaSPARSim\">http:\/\/sysbiobig.dei.unipd.it\/?q=Software#metaSPARSim<\/jats:ext-link>and<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/gitlab.com\/sysbiobig\/metasparsim\">https:\/\/gitlab.com\/sysbiobig\/metasparsim<\/jats:ext-link>.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>metaSPARSim is able to generate count matrices resembling real 16S rDNA-seq data. The availability of count data simulators is extremely valuable both for methods developers, for which a ground truth for tools validation is needed, and for users who want to assess state of the art analysis tools for choosing the most accurate one. Thus, we believe that metaSPARSim is a valuable tool for researchers involved in developing, testing and using robust and reliable data analysis methods in the context of 16S rRNA gene sequencing.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-019-2882-6","type":"journal-article","created":{"date-parts":[[2019,11,22]],"date-time":"2019-11-22T10:02:38Z","timestamp":1574416958000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["metaSPARSim: a 16S rRNA gene sequencing count data simulator"],"prefix":"10.1186","volume":"20","author":[{"given":"Ilaria","family":"Patuzzi","sequence":"first","affiliation":[]},{"given":"Giacomo","family":"Baruzzo","sequence":"additional","affiliation":[]},{"given":"Carmen","family":"Losasso","sequence":"additional","affiliation":[]},{"given":"Antonia","family":"Ricci","sequence":"additional","affiliation":[]},{"given":"Barbara","family":"Di Camillo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,11,22]]},"reference":[{"issue":"8","key":"2882_CR1","doi-asserted-by":"publisher","first-page":"1922","DOI":"10.1128\/JCM.34.8.1922-1925.1996","volume":"34","author":"B Choi","year":"1996","unstructured":"Choi B, Wyss C, G\u00f6bel U. Phylogenetic analysis of pathogen-related oral spirochetes. J Clin Microbiol. 1996; 34(8):1922\u20135.","journal-title":"J Clin Microbiol"},{"issue":"7","key":"2882_CR2","doi-asserted-by":"publisher","first-page":"3023","DOI":"10.1128\/JCM.42.7.3023-3029.2004","volume":"42","author":"M Munson","year":"2004","unstructured":"Munson M, Banerjee A, Watson T, Wade W. Molecular analysis of the microflora associated with dental caries. J Clin Microbiol. 2004; 42(7):3023\u20139.","journal-title":"J Clin Microbiol"},{"issue":"8","key":"2882_CR3","doi-asserted-by":"publisher","first-page":"3557","DOI":"10.1128\/AEM.67.8.3557-3563.2001","volume":"67","author":"A Schmalenberger","year":"2001","unstructured":"Schmalenberger A, Schwieger F, Tebbe CC. Effect of primers hybridizing to different evolutionarily conserved regions of the small-subunit rRNA gene in PCR-based microbial community analyses and genetic profiling. Appl Environ Microbiol. 2001; 67(8):3557\u201363.","journal-title":"Appl Environ Microbiol"},{"issue":"10","key":"2882_CR4","doi-asserted-by":"publisher","first-page":"e7401","DOI":"10.1371\/journal.pone.0007401","volume":"4","author":"Y Wang","year":"2009","unstructured":"Wang Y, Qian PY. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PloS ONE. 2009; 4(10):e7401.","journal-title":"PloS ONE"},{"issue":"1","key":"2882_CR5","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1186\/s12859-018-2360-6","volume":"19","author":"F Sambo","year":"2018","unstructured":"Sambo F, Finotello F, Lavezzo E, Baruzzo G, Masi G, Peta E, et al.Optimizing PCR primers targeting the bacterial 16S ribosomal RNA gene. BMC Bioinformatics. 2018; 19(1):343.","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"2882_CR6","doi-asserted-by":"publisher","first-page":"S30","DOI":"10.1101\/gr.3.3.S30","volume":"3","author":"C Dieffenbach","year":"1993","unstructured":"Dieffenbach C, Lowe T, Dveksler G. General concepts for PCR primer design. PCR Methods Appl. 1993; 3(3):S30\u20137.","journal-title":"PCR Methods Appl"},{"key":"2882_CR7","doi-asserted-by":"publisher","first-page":"1509","DOI":"10.1101\/gr.079558.108","volume":"18.9","author":"JC Marioni","year":"2008","unstructured":"Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008; 18.9:1509\u201317.","journal-title":"Genome Res"},{"issue":"10","key":"2882_CR8","doi-asserted-by":"publisher","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","volume":"11","author":"S Anders","year":"2010","unstructured":"Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010; 11(10):R106.","journal-title":"Genome Biol"},{"issue":"1","key":"2882_CR9","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","volume":"26","author":"MD Robinson","year":"2010","unstructured":"Robinson MD, McCarthy DJ. Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1):139\u201340.","journal-title":"Bioinformatics"},{"issue":"7","key":"2882_CR10","doi-asserted-by":"publisher","first-page":"e0129606","DOI":"10.1371\/journal.pone.0129606","volume":"10","author":"L Xu","year":"2015","unstructured":"Xu L, Paterson AD, Turpin W, Xu W. Assessment and selection of competing models for zero-inflated microbiome data. PloS ONE. 2015; 10(7):e0129606.","journal-title":"PloS ONE"},{"issue":"1","key":"2882_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2307\/1269547","volume":"34","author":"D Lambert","year":"1992","unstructured":"Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992; 34(1):1\u201314.","journal-title":"Technometrics"},{"issue":"3","key":"2882_CR12","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1016\/0304-4076(86)90002-3","volume":"33","author":"J Mullahy","year":"1986","unstructured":"Mullahy J. Specification and testing of some modified count data models. J Econ. 1986; 33(3):341\u201365.","journal-title":"J Econ"},{"issue":"16","key":"2882_CR13","doi-asserted-by":"publisher","first-page":"2870","DOI":"10.1093\/bioinformatics\/bty175","volume":"34","author":"TP Quinn","year":"2018","unstructured":"Quinn TP, Erb I, Richardson MF, Crowley TM. Understanding sequencing data as compositions: an outlook and review. Bioinformatics. 2018; 34(16):2870\u20138.","journal-title":"Bioinformatics"},{"key":"2882_CR14","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1038\/nature24460","volume":"551","author":"D Vandeputte","year":"2017","unstructured":"Vandeputte D, Kathagen G, D\u2019hoe K, Vieira-Silva S, Valles-Colomer M, Sabino J, et al.Quantitative microbiome profiling links gut community variation to microbial load. Nature. 2017; 551:507\u201311.","journal-title":"Nature"},{"key":"2882_CR15","doi-asserted-by":"crossref","DOI":"10.1002\/9781119003144","volume-title":"Modeling and analysis of compositional data","author":"V Pawlowsky-Glahn","year":"2015","unstructured":"Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R. Modeling and analysis of compositional data. Hoboken: Wiley; 2015."},{"issue":"3","key":"2882_CR16","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1093\/bioinformatics\/btx549","volume":"34","author":"T \u00c4ij\u00f6","year":"2017","unstructured":"\u00c4ij\u00f6 T, M\u00fcller CL, Bonneau R. Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing. Bioinformatics. 2017; 34(3):372\u201380.","journal-title":"Bioinformatics"},{"issue":"1","key":"2882_CR17","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1214\/12-AOAS592","volume":"7","author":"J Chen","year":"2013","unstructured":"Chen J, Li H. Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis. Ann Appl Stat. 2013; 7(1):418\u201342.","journal-title":"Ann Appl Stat"},{"issue":"2","key":"2882_CR18","doi-asserted-by":"publisher","first-page":"e30126","DOI":"10.1371\/journal.pone.0030126","volume":"7","author":"I Holmes","year":"2012","unstructured":"Holmes I, Harris K, Quince C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PloS ONE. 2012; 7(2):e30126.","journal-title":"PloS ONE"},{"issue":"4","key":"2882_CR19","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1111\/biom.12079","volume":"69","author":"F Xia","year":"2013","unstructured":"Xia F, Chen J, Fung WK, Li H. A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics. 2013; 69(4):1053\u201363.","journal-title":"Biometrics"},{"issue":"12","key":"2882_CR20","doi-asserted-by":"publisher","first-page":"e94","DOI":"10.1093\/nar\/gks251","volume":"40","author":"FE Angly","year":"2012","unstructured":"Angly FE, Willner D, Rohwer F, Hugenholtz P, Tyson GW. Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res. 2012; 40(12):e94.","journal-title":"Nucleic Acids Res"},{"issue":"10","key":"2882_CR21","doi-asserted-by":"publisher","first-page":"e3373","DOI":"10.1371\/journal.pone.0003373","volume":"3","author":"DC Richter","year":"2008","unstructured":"Richter DC, Ott F, Auch AF, Schmid R, Huson DH. MetaSim\u2014a sequencing simulator for genomics and metagenomics. PloS ONE. 2008; 3(10):e3373.","journal-title":"PloS ONE"},{"key":"2882_CR22","first-page":"210","volume":"20.1","author":"S Hawinkel","year":"2017","unstructured":"Hawinkel S, Mattiello F, Bijnens L, Thas O. A broken promise: microbiome differential abundance methods do not control the false discovery rate. Brief Bioinform. 2017; 20.1:210\u201321.","journal-title":"Brief Bioinform"},{"issue":"1","key":"2882_CR23","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1186\/s40168-017-0237-y","volume":"5","author":"S Weiss","year":"2017","unstructured":"Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, et al.Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017; 5(1):27.","journal-title":"Microbiome"},{"issue":"4","key":"2882_CR24","doi-asserted-by":"publisher","first-page":"e1003531","DOI":"10.1371\/journal.pcbi.1003531","volume":"10","author":"PJ McMurdie","year":"2014","unstructured":"McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014; 10(4):e1003531.","journal-title":"PLoS Comput Biol"},{"issue":"4","key":"2882_CR25","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1093\/bioinformatics\/btx650","volume":"34","author":"J Chen","year":"2017","unstructured":"Chen J, King E, Deek R, Wei Z, Yu Y, Grill D, et al.An omnibus test for differential distribution analysis of microbiome sequencing data. Bioinformatics. 2017; 34(4):643\u201351.","journal-title":"Bioinformatics"},{"issue":"5","key":"2882_CR26","doi-asserted-by":"publisher","first-page":"e1004226","DOI":"10.1371\/journal.pcbi.1004226","volume":"11","author":"ZD Kurtz","year":"2015","unstructured":"Kurtz ZD, M\u00fcller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015; 11(5):e1004226.","journal-title":"PLoS Comput Biol"},{"issue":"7402","key":"2882_CR27","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1038\/nature11234","volume":"486","author":"C Huttenhower","year":"2012","unstructured":"Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al.Structure, function and diversity of the healthy human microbiome. Nature. 2012; 486(7402):207.","journal-title":"Nature"},{"issue":"7402","key":"2882_CR28","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1038\/nature11209","volume":"486","author":"BA Meth\u00e9","year":"2012","unstructured":"Meth\u00e9 BA, Nelson KE, Pop M, Creasy HH, Giglio MG, Huttenhower C, et al.A framework for human microbiome research. Nature. 2012; 486(7402):215.","journal-title":"Nature"},{"key":"2882_CR29","first-page":"2122","volume":"5","author":"ATL Lun","year":"2016","unstructured":"Lun ATL, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 2016; 5:2122.","journal-title":"F1000Res"},{"issue":"1","key":"2882_CR30","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","volume":"18","author":"L Zappia","year":"2017","unstructured":"Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017; 18(1):174.","journal-title":"Genome Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-2882-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-2882-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-2882-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,23]],"date-time":"2023-09-23T07:02:26Z","timestamp":1695452546000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-2882-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11]]},"references-count":30,"journal-issue":{"issue":"S9","published-print":{"date-parts":[[2019,11]]}},"alternative-id":["2882"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-2882-6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11]]},"assertion":[{"value":"19 April 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 May 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 November 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The experimental design was approved by IZSVe ethical committee (OpBA project n. 05\/2015).","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"416"}}