{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T03:10:00Z","timestamp":1776222600503,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T00:00:00Z","timestamp":1744329600000},"content-version":"vor","delay-in-days":41,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The statistical problem of estimating the total number of distinct species in a population (or distinct elements in a multiset), given only a small sample, occurs in various areas, ranging from the unseen species problem in ecology to estimating the diversity of immune repertoires. Accurately estimating the true richness from very small samples is challenging, in particular for highly diverse populations with many rare species. Depending on the application, different estimation strategies have been proposed that incorporate explicit or implicit assumptions about either the species distribution or about the sampling process. These methods are scattered across the literature, and an extensive overview of their assumptions, methodology, and performance is currently lacking.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We comprehensively review and evaluate a variety of existing methods on real and simulated data with different compositions of rare and abundant species. Our evaluation shows that, depending on species composition, different methods provide the most accurate richness estimates. Simple methods based on the observed number of singletons yield accurate asymptotic lower bounds for several of the tested simulated species compositions, but tend to underestimate the true richness for heterogeneous populations and small samples containing 1% to 5% of the population. When the population size is known, upsampling (extrapolating) estimators such as PreSeq and RichnEst yield accurate estimates of the total species richness in a sample that is up to 10 times larger than the observed sample.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability<\/jats:title>\n                    <jats:p>Source code for data simulation and richness estimation is available at https:\/\/gitlab.com\/rahmannlab\/speciesrichness.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bib\/bbaf158","type":"journal-article","created":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T10:27:38Z","timestamp":1743589658000},"source":"Crossref","is-referenced-by-count":10,"title":["A comprehensive review and evaluation of species richness estimation"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-6377-2561","authenticated-orcid":false,"given":"Johanna","family":"Elena Schmitz","sequence":"first","affiliation":[{"name":"Algorithmic Bioinformatics , Center for Bioinformatics Saar, Saarland Informatics Campus, 66123 Saarbr\u00fccken,","place":["Germany"]},{"name":"Fakult\u00e4t MI , Saarland University, Saarland Informatics Campus, 66123 Saarbr\u00fccken,","place":["Germany"]},{"name":"Saarbr\u00fccken Graduate School of Computer Science , Saarland Informatics Campus, 66123 Saarbr\u00fccken,","place":["Germany"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8536-6065","authenticated-orcid":false,"given":"Sven","family":"Rahmann","sequence":"additional","affiliation":[{"name":"Algorithmic Bioinformatics , Center for Bioinformatics Saar, Saarland Informatics Campus, 66123 Saarbr\u00fccken,","place":["Germany"]},{"name":"Fakult\u00e4t MI , Saarland University, Saarland Informatics Campus, 66123 Saarbr\u00fccken,","place":["Germany"]}]}],"member":"286","published-online":{"date-parts":[[2025,4,11]]},"reference":[{"key":"2025041107282792800_ref1","doi-asserted-by":"publisher","first-page":"42","DOI":"10.2307\/1411","article-title":"The relation between the number of species and the number of individuals in a random sample of an animal population","volume":"12","author":"Fisher","year":"1943","journal-title":"J Anim Ecol"},{"key":"2025041107282792800_ref2","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1093\/biomet\/40.3-4.237","article-title":"The population frequencies of species and the estimation of population parameters","volume":"40","author":"Good","year":"1953","journal-title":"Biometrika"},{"key":"2025041107282792800_ref3","doi-asserted-by":"publisher","first-page":"1229","DOI":"10.1007\/s10311-020-01010-z","article-title":"Metagenomic applications in microbial diversity, bioremediation, pollution monitoring, enzyme and drug discovery. A review","volume":"18","author":"Saptashwa Datta","year":"2020","journal-title":"Environ Chem Lett"},{"key":"2025041107282792800_ref4","doi-asserted-by":"publisher","first-page":"1613","DOI":"10.3390\/nu11071613","article-title":"Gut microbiome: profound implications for diet and disease","volume":"11","author":"Hills","year":"2019","journal-title":"Nutrients"},{"key":"2025041107282792800_ref5","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1093\/biomet\/asab012","article-title":"More for less: predicting and maximizing genomic variant discovery via Bayesian nonparametrics","volume":"109","author":"Masoero","year":"2022","journal-title":"Biometrika"},{"key":"2025041107282792800_ref6","doi-asserted-by":"publisher","first-page":"20140291","DOI":"10.1098\/rstb.2014.0291","article-title":"Estimating T-cell repertoire diversity: limitations of classical estimators and a new approach","volume":"370","author":"Laydon","year":"2015","journal-title":"Philos Trans R Soc B Biol Sci"},{"key":"2025041107282792800_ref7","first-page":"265","article-title":"Nonparametric estimation of the number of classes in a population","volume":"11","author":"Chao","year":"1984","journal-title":"Scand J Stat"},{"key":"2025041107282792800_ref8","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1093\/biomet\/65.3.625","article-title":"Estimation of the size of a closed population when capture probabilities vary among animals","volume":"65","author":"Burnham","year":"1978","journal-title":"Biometrika"},{"key":"2025041107282792800_ref9","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1093\/biomet\/71.1.27","article-title":"Statistical inference for Poisson and multinomial models for capture-recapture experiments","volume":"71","author":"Sandland","year":"1984","journal-title":"Biometrika"},{"key":"2025041107282792800_ref10","first-page":"e1298v2","article-title":"Efficient duplicate rate estimation from subsamples of sequencing libraries","volume":"3","author":"Schr\u00f6der","year":"2015","journal-title":"PeerJ PrePrints"},{"key":"2025041107282792800_ref11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3125643","article-title":"Estimating the unseen: improved estimators for entropy and other properties","volume":"64","author":"Valiant","year":"2017","journal-title":"J ACM"},{"key":"2025041107282792800_ref12","doi-asserted-by":"publisher","first-page":"1042","DOI":"10.1111\/biom.12332","article-title":"Estimating diversity via frequency ratios","volume":"71","author":"Willis","year":"2015","journal-title":"Biometrics"},{"key":"2025041107282792800_ref13","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1093\/jpe\/rtr044","article-title":"Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages","volume":"5","author":"Colwell","year":"2012","journal-title":"J Plant Ecol"},{"key":"2025041107282792800_ref14","doi-asserted-by":"publisher","first-page":"11881","DOI":"10.1038\/ncomms11881","article-title":"Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples","volume":"7","author":"Kaplinsky","year":"2016","journal-title":"Nat Commun"},{"key":"2025041107282792800_ref15","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1111\/1755-0998.13730","article-title":"Pitfalls in the statistical analysis of microbiome amplicon sequencing data","volume":"23","author":"Boshuizen","year":"2023","journal-title":"Mol Ecol Resour"},{"key":"2025041107282792800_ref16","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1080\/01621459.1992.10475194","article-title":"Estimating the number of classes via sample coverage","volume":"87","author":"Chao","year":"1992","journal-title":"J Am Stat Assoc"},{"key":"2025041107282792800_ref17","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1111\/j.0006-341X.2002.00531.x","article-title":"Estimating the number of species in a stochastic abundance model","volume":"58","author":"Chao","year":"2002","journal-title":"Biometrics"},{"key":"2025041107282792800_ref18","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.1016\/j.csda.2011.01.017","article-title":"An extension of Chao\u2019s estimator of population size based on the first three capture frequency counts","volume":"55","author":"Lanumteang","year":"2011","journal-title":"Comput Stat Data Anal"},{"key":"2025041107282792800_ref19","doi-asserted-by":"publisher","first-page":"e14540","DOI":"10.7717\/peerj.14540","article-title":"A more reliable species richness estimator based on the gamma\u2013Poisson model","volume":"11","author":"Chiu","year":"2023","journal-title":"PeerJ"},{"key":"2025041107282792800_ref20","doi-asserted-by":"publisher","first-page":"765","DOI":"10.1214\/10-BA527","article-title":"Objective Bayesian estimation for the number of species","volume":"5","author":"Barger","year":"2010","journal-title":"Bayesian Anal"},{"key":"2025041107282792800_ref21","doi-asserted-by":"crossref","first-page":"e4363","DOI":"10.1002\/ecs2.4363","article-title":"Estimating total species richness: fitting rarefaction by asymptotic approximation","volume":"14","author":"Zou","year":"2023","journal-title":"Ecosphere"},{"key":"2025041107282792800_ref22","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1111\/2041-210X.12613","article-title":"iNEXT: an R package for rarefaction and extrapolation of species diversity (hill numbers)","volume":"7","author":"Hsieh","year":"2016","journal-title":"Method Ecol Evol"},{"key":"2025041107282792800_ref23","doi-asserted-by":"crossref","first-page":"13283","DOI":"10.1073\/pnas.1607774113","article-title":"Optimal prediction of the number of unseen species","volume":"113","author":"Orlitsky","year":"2016","journal-title":"Proc Natl Acad Sci"},{"key":"2025041107282792800_ref24","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1038\/nmeth.2375","article-title":"Predicting the molecular complexity of sequencing libraries","volume":"10","author":"Daley","year":"2013","journal-title":"Nat Methods"},{"key":"2025041107282792800_ref25","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1007\/BF01213386","article-title":"Exchangeable and partially exchangeable random partitions","volume":"102","author":"Pitman","year":"1995","journal-title":"Probab Theory Relat Fields"},{"key":"2025041107282792800_ref26","doi-asserted-by":"publisher","first-page":"e1003646","DOI":"10.1371\/journal.pcbi.1003646","article-title":"Quantification of HTLV-1 clonality and TCR diversity","volume":"10","author":"Laydon","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"2025041107282792800_ref27","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1080\/01621459.1952.10483446","article-title":"A generalization of sampling without replacement from a finite universe","volume":"47","author":"Horvitz","year":"1952","journal-title":"J Am Stat Assoc"},{"key":"2025041107282792800_ref28","doi-asserted-by":"publisher","first-page":"927","DOI":"10.2307\/1936861","article-title":"Robust estimation of population size when capture probabilities vary among animals","volume":"60","author":"Burnham","year":"1979","journal-title":"Ecology"},{"key":"2025041107282792800_ref29","article-title":"Species estimation and applications","volume-title":"Encyclopedia of Statistical Sciences","author":"Chao","year":"2006"},{"key":"2025041107282792800_ref30","doi-asserted-by":"publisher","first-page":"45","DOI":"10.2307\/2333577","article-title":"The number of new species, and the increase in population coverage, when a sample is increased","volume":"43","author":"Good","year":"1956","journal-title":"Biometrika"},{"key":"2025041107282792800_ref31","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1191\/1471082X03st057oa","article-title":"Point and interval estimation of the population size using the truncated Poisson regression model","volume":"3","author":"van der Heijden","year":"2003","journal-title":"Stat Model"},{"key":"2025041107282792800_ref32","doi-asserted-by":"publisher","first-page":"783","DOI":"10.2307\/2531532","article-title":"Estimating the population size for capture-recapture data with unequal catchability","volume":"43","author":"Chao","year":"1987","journal-title":"Biometrics"},{"key":"2025041107282792800_ref33","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1080\/10618600.2011.647174","article-title":"Use of the ratio plot in capture\u2013recapture estimation","volume":"22","author":"Dankmar B\u00f6hning","year":"2013","journal-title":"J Comput Graph Stat"},{"key":"2025041107282792800_ref34","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1007\/s00184-018-0689-5","article-title":"A modification of Chao\u2019s lower bound estimator in the case of one-inflation","volume":"82","author":"B\u00f6hning","year":"2019","journal-title":"Metrika"},{"key":"2025041107282792800_ref35","article-title":"Species richness estimation with high diversity but spurious singletons","author":"Willis","year":"2016"},{"key":"2025041107282792800_ref36","doi-asserted-by":"publisher","first-page":"577","DOI":"10.2307\/1934145","article-title":"The nonconcept of species diversity: a critique and alternative parameters","volume":"52","author":"Hurlbert","year":"1971","journal-title":"Ecology"},{"key":"2025041107282792800_ref37","doi-asserted-by":"crossref","first-page":"283","DOI":"10.2307\/2529778","article-title":"Sampling properties of a family of diversity measures","volume":"33","author":"Smith","year":"1977","journal-title":"Biometrics"},{"key":"2025041107282792800_ref38","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1890\/13-0133.1","article-title":"Rarefaction and extrapolation with hill numbers: a framework for sampling and estimation in species diversity studies","volume":"84","author":"Chao","year":"2014","journal-title":"Ecol Monogr"},{"key":"2025041107282792800_ref39","doi-asserted-by":"publisher","first-page":"435","DOI":"10.1093\/biomet\/63.3.435","article-title":"Estimating the number of unseen species: how many words did Shakespeare know","volume":"63","author":"Efron","year":"1976","journal-title":"Biometrika"},{"key":"2025041107282792800_ref40","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1515\/sagmb-2012-0049","article-title":"Exploring the sampling universe of RNA-seq","volume":"12","author":"Tauber","year":"2013","journal-title":"Stat Appl Genet Mol Biol"},{"key":"2025041107282792800_ref41","first-page":"101","article-title":"Estimating terrestrial biodiversity through extrapolation","volume":"345","author":"Colwell","year":"1997","journal-title":"Philos Trans R Soc Lond B Biol Sci"},{"key":"2025041107282792800_ref42","doi-asserted-by":"publisher","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with Snakemake","volume":"10","author":"M\u00f6lder","year":"2021","journal-title":"F1000Research"},{"key":"2025041107282792800_ref43","doi-asserted-by":"publisher","first-page":"e1004503","DOI":"10.1371\/journal.pcbi.1004503","article-title":"VDJtools: unifying post-analysis of T cell receptor repertoires","volume":"11","author":"Shugay","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"2025041107282792800_ref44","doi-asserted-by":"publisher","first-page":"3030","DOI":"10.1038\/s41598-021-82726-y","article-title":"Comparison between 16S rRNA and shotgun sequencing data for the taxonomic characterization of the gut microbiota","volume":"11","author":"Durazzi","year":"2021","journal-title":"Sci Rep"},{"key":"2025041107282792800_ref45","doi-asserted-by":"publisher","first-page":"6875","DOI":"10.1038\/s41467-021-27212-9","article-title":"Species richness and identity both determine the biomass of global reef fish communities","volume":"12","author":"Lefcheck","year":"2021","journal-title":"Nat Commun"},{"key":"2025041107282792800_ref46","first-page":"e1634","article-title":"Estimating and comparing microbial diversity in the presence of sequencing errors","volume-title":"PeerJ","author":"Chiu","year":"2016"},{"key":"2025041107282792800_ref47","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1038\/s43705-021-00033-z","article-title":"Handling of spurious sequences affects the outcome of high-throughput 16s rRNA gene amplicon profiling","volume":"1","author":"Reitmeier","year":"2021","journal-title":"ISME Commun"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/2\/bbaf158\/62909347\/bbaf158.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/2\/bbaf158\/62909347\/bbaf158.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T03:30:12Z","timestamp":1744342212000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf158\/8110880"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf158","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.10.09.615408","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2025,3]]},"article-number":"bbaf158"}}