{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T09:42:01Z","timestamp":1776159721976,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"S15","license":[{"start":{"date-parts":[[2019,12,1]],"date-time":"2019-12-01T00:00:00Z","timestamp":1575158400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,12,24]],"date-time":"2019-12-24T00:00:00Z","timestamp":1577145600000},"content-version":"vor","delay-in-days":23,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard\/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard\/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard\/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard\/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate <jats:italic>p<\/jats:italic>-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called  (<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/cran.r-project.org\/package=jaccard\">https:\/\/cran.r-project.org\/package=jaccard<\/jats:ext-link>).<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusion<\/jats:title>\n<jats:p>We introduce a suite of statistical methods for the Jaccard\/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-019-3118-5","type":"journal-article","created":{"date-parts":[[2019,12,24]],"date-time":"2019-12-24T09:02:35Z","timestamp":1577178155000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":247,"title":["Jaccard\/Tanimoto similarity test and estimation methods for biological presence-absence data"],"prefix":"10.1186","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6798-8867","authenticated-orcid":false,"given":"Neo Christopher","family":"Chung","sequence":"first","affiliation":[]},{"given":"B\u0142a\u017bej","family":"Miasojedow","sequence":"additional","affiliation":[]},{"given":"Micha\u0142","family":"Startek","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3476-3017","authenticated-orcid":false,"given":"Anna","family":"Gambin","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,12,24]]},"reference":[{"issue":"2","key":"3118_CR1","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1111\/j.1469-8137.1912.tb05611.x","volume":"11","author":"P Jaccard","year":"1912","unstructured":"Jaccard P. The distribution of the flora in the alpine zone. New Phytologist. 1912; 11(2):37\u201350. https:\/\/doi.org\/10.1111\/j.1469-8137.1912.tb05611.x.","journal-title":"New Phytologist"},{"key":"3118_CR2","unstructured":"Tanimoto T. An elementary mathematical theory of classification and prediction. Technical report. 1958."},{"key":"3118_CR3","first-page":"165","volume":"24","author":"HJB Birks","year":"1987","unstructured":"Birks HJB. Recent methodological developments in quantitative descriptive biogeography. Ann Zool Fenn. 1987; 24:165\u201378.","journal-title":"Ann Zool Fenn"},{"issue":"5","key":"3118_CR4","doi-asserted-by":"publisher","first-page":"930","DOI":"10.1086\/285367","volume":"139","author":"DA Jackson","year":"1992","unstructured":"Jackson DA, Somers KM, Harvey HH. Null models and fish communities: Evidence of nonrandom patterns. Am Natural. 1992; 139(5):930\u201351.","journal-title":"Am Natural"},{"issue":"3","key":"3118_CR5","doi-asserted-by":"publisher","first-page":"380","DOI":"10.1093\/sysbio\/45.3.380","volume":"45","author":"R Real","year":"1996","unstructured":"Real R, Vargas JM. The probabilistic basis of jaccard\u2019s index of similarity. Syst Biol. 1996; 45(3):380\u20135. https:\/\/doi.org\/10.1093\/sysbio\/45.3.380.","journal-title":"Syst Biol"},{"key":"3118_CR6","volume-title":"Randomization, Bootstrap and Monte Carlo Methods in Biology","author":"BFJ Manly","year":"2006","unstructured":"Manly BFJ. Randomization, Bootstrap and Monte Carlo Methods in Biology. Boca Raton, FL: Chapman & Hall \/ CRC Press; 2006."},{"key":"3118_CR7","volume-title":"An Introduction to Behavioural Ecology","author":"NB Davies","year":"1993","unstructured":"Davies NB, Krebs JR. An Introduction to Behavioural Ecology. USA: Wiley-Blackwell; 1993."},{"key":"3118_CR8","volume-title":"Essentials of Ecology","author":"CR Townsend","year":"2002","unstructured":"Townsend CR, Begon M, Harper JL. Essentials of Ecology. USA: Wiley-Blackwell; 2002."},{"issue":"3","key":"3118_CR9","doi-asserted-by":"publisher","first-page":"279","DOI":"10.2307\/1943563","volume":"30","author":"RH Whittaker","year":"1960","unstructured":"Whittaker RH. Vegetation of the siskiyou mountains, oregon and california. Ecol Monogr. 1960; 30(3):279\u2013338. https:\/\/doi.org\/10.2307\/1943563.","journal-title":"Ecol Monogr"},{"issue":"1","key":"3118_CR10","doi-asserted-by":"publisher","first-page":"151","DOI":"10.2307\/5518","volume":"61","author":"S Harrison","year":"1992","unstructured":"Harrison S, Ross SJ, Lawton JH. Beta diversity on geographic gradients in britain. J Animal Ecol. 1992; 61(1):151. https:\/\/doi.org\/10.2307\/5518.","journal-title":"J Animal Ecol"},{"issue":"3","key":"3118_CR11","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1046\/j.1365-2656.2003.00710.x","volume":"72","author":"P Koleff","year":"2003","unstructured":"Koleff P, Gaston KJ, Lennon JJ. Measuring beta diversity for presence-absence data. J Animal Ecol. 2003; 72(3):367\u201382. https:\/\/doi.org\/10.1046\/j.1365-2656.2003.00710.x.","journal-title":"J Animal Ecol"},{"issue":"6","key":"3118_CR12","doi-asserted-by":"publisher","first-page":"1132","DOI":"10.2307\/1936961","volume":"60","author":"EF Connor","year":"1979","unstructured":"Connor EF, Simberloff D. The assembly of species communities: Chance or competition?Ecology. 1979; 60(6):1132. https:\/\/doi.org\/10.2307\/1936961.","journal-title":"Ecology"},{"key":"3118_CR13","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1007\/BF00349013","volume":"52","author":"JM Diamond","year":"1982","unstructured":"Diamond JM, Gilpin ME. Examination of the \u201cnull\u201d model of connor and simberloff for species co-occurrence on islands. Oecologia. 1982; 52:64\u201374. https:\/\/doi.org\/10.1007\/BF00349013.","journal-title":"Oecologia"},{"key":"3118_CR14","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1007\/BF00349014","volume":"52","author":"ME Gilpin","year":"1982","unstructured":"Gilpin ME, Diamond JM. Factors contributing to non-randomness in species co-occurrences on islands. Oecologia. 1982; 52:75\u201384. https:\/\/doi.org\/10.1007\/BF00349014.","journal-title":"Oecologia"},{"issue":"4","key":"3118_CR15","doi-asserted-by":"publisher","first-page":"579","DOI":"10.1007\/BF00379419","volume":"73","author":"JB Wilson","year":"1987","unstructured":"Wilson JB. Methods for detecting non-randomness in species co-occurrences: a contribution. Oecologia. 1987; 73(4):579\u201382. https:\/\/doi.org\/10.1007\/BF00379419.","journal-title":"Oecologia"},{"issue":"4","key":"3118_CR16","doi-asserted-by":"publisher","first-page":"1109","DOI":"10.2307\/1940919","volume":"76","author":"BFJ Manly","year":"1995","unstructured":"Manly BFJ. A note on the analysis of species co-occurrences. Ecology. 1995; 76(4):1109\u201315. https:\/\/doi.org\/10.2307\/1940919.","journal-title":"Ecology"},{"issue":"1\u20132","key":"3118_CR17","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1007\/s004420050589","volume":"116","author":"J Sanderson","year":"1998","unstructured":"Sanderson J, Moulton M, Selfridge R. Null matrices and the analysis of species co-occurrencessanderson. Oecologia. 1998; 116(1\u20132):275\u201383. https:\/\/doi.org\/10.1007\/s004420050.","journal-title":"Oecologia"},{"issue":"4","key":"3118_CR18","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1111\/j.1461-0248.2009.01284.x","volume":"12","author":"MDF Ellwood","year":"2009","unstructured":"Ellwood MDF, Manica A, Foster WA. Stochastic and deterministic processes jointly structure tropical arthropod communities. Ecol Lett. 2009; 12(4):277\u201384. https:\/\/doi.org\/10.1111\/j.1461-0248.2009.01284.x.","journal-title":"Ecol Lett"},{"issue":"1576","key":"3118_CR19","doi-asserted-by":"publisher","first-page":"2351","DOI":"10.1098\/rstb.2011.0063","volume":"366","author":"JM Chase","year":"2011","unstructured":"Chase JM, Myers JA. Disentangling the importance of ecological niches from stochastic processes across scales. Philosoph Trans Royal Soc B: Biol Sci. 2011; 366(1576):2351\u201363. https:\/\/doi.org\/10.1098\/rstb.2011.0063.","journal-title":"Philosoph Trans Royal Soc B: Biol Sci"},{"issue":"4","key":"3118_CR20","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1111\/j.1365-2745.2007.01236.x","volume":"95","author":"JD Fridley","year":"2007","unstructured":"Fridley JD, Vandermast DB, Kuppinger DM, Manthey M, Peet RK. Co-occurrence based assessment of habitat generalists and specialists: A new approach for the measurement of niche width. J Ecol. 2007; 95(4):707\u201322. https:\/\/doi.org\/10.1111\/j.1365-2745.2007.01236.x.","journal-title":"J Ecol"},{"key":"3118_CR21","doi-asserted-by":"publisher","unstructured":"Ara\u00fajo MB, Rozenfeld A. The geographic scaling of biotic interactions. Ecography. 2013. https:\/\/doi.org\/10.1111\/j.1600-0587.2013.00643.x.","DOI":"10.1111\/j.1600-0587.2013.00643.x"},{"issue":"3","key":"3118_CR22","doi-asserted-by":"publisher","first-page":"251","DOI":"10.2307\/2412493","volume":"25","author":"C Baroni-Urbani","year":"1976","unstructured":"Baroni-Urbani C, Buser MW. Similarity of binary data. Syst Zool. 1976; 25(3):251. https:\/\/doi.org\/10.2307\/2412493.","journal-title":"Syst Zool"},{"issue":"3","key":"3118_CR23","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1007\/BF00545229","volume":"44","author":"C Baroni-Urbani","year":"1979","unstructured":"Baroni-Urbani C. A statistical table for the degree of coexistence between two species. Oecologia. 1979; 44(3):287\u20139. https:\/\/doi.org\/10.1007\/bf00545229.","journal-title":"Oecologia"},{"key":"3118_CR24","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1111\/j.1466-8238.2012.00789.x","volume":"22","author":"JA Veech","year":"2013","unstructured":"Veech JA. A probabilistic model for analysing species co-occurrence. Global Ecol Biogeogr. 2013; 22:252\u201360. https:\/\/doi.org\/10.1111\/j.1466-8238.2012.00789.x.","journal-title":"Global Ecol Biogeogr"},{"key":"3118_CR25","doi-asserted-by":"publisher","unstructured":"Griffith DM, Veech JA, Marsh CJ. cooccur: Probabilistic species co-occurrence analysis inr. J Stat Softw. 2016; 69. https:\/\/doi.org\/10.18637\/jss.v069.c02.","DOI":"10.18637\/jss.v069.c02"},{"key":"3118_CR26","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Core Team","year":"2017","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. https:\/\/www.R-project.org."},{"key":"3118_CR27","volume-title":"All of Statistics: A Concise Course in Statistical Inference","author":"L Wasserman","year":"2010","unstructured":"Wasserman L. All of Statistics: A Concise Course in Statistical Inference. New York: Springer; 2010."},{"issue":"6","key":"3118_CR28","doi-asserted-by":"publisher","first-page":"3272","DOI":"10.1021\/acs.analchem.6b01459","volume":"89","author":"MK \u0141\u0105cki","year":"2017","unstructured":"\u0141\u0105cki MK, Startek M, Valkenborg D, Gambin A. IsoSpec: Hyperfast fine structure calculator. Analyt Chem. 2017; 89(6):3272\u20137. https:\/\/doi.org\/10.1021\/acs.analchem.6b01459.","journal-title":"Analyt Chem"},{"key":"3118_CR29","doi-asserted-by":"crossref","DOI":"10.1201\/9780429246593","volume-title":"An Introduction to the Bootstrap","author":"B Efron","year":"1994","unstructured":"Efron B, Tibshirani R. An Introduction to the Bootstrap. Boca Raton, Florida: Chapman & Hall \/ CRC Press; 1994."},{"key":"3118_CR30","doi-asserted-by":"publisher","first-page":"219","DOI":"10.2307\/2937300","volume":"48","author":"EF Connor","year":"1978","unstructured":"Connor EF, Simberloff D. Species number and compositional similarity of the galapagos flora and avifauna. Ecol Monogr. 1978; 48:219\u201348. https:\/\/doi.org\/10.2307\/2937300.","journal-title":"Ecol Monogr"},{"key":"3118_CR31","unstructured":"Gotelli NJ, Hart EM, Ellison AM. EcoSimR: Null Model Analysis for Ecological Data. R package version 0.1.0. 2015. http:\/\/github.com\/gotellilab\/EcoSimR."},{"key":"3118_CR32","unstructured":"Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O\u2019Hara RB, Simpson GL, Solymos P, Stevens MHH, Szoecs E, Wagner H. Vegan: Community Ecology Package. R package version 2.4-5. 2017. https:\/\/CRAN.R-project.org\/package=vegan. Accessed 14 Jun 2018."},{"issue":"16","key":"3118_CR33","doi-asserted-by":"publisher","first-page":"9440","DOI":"10.1073\/pnas.1530509100","volume":"100","author":"JD Storey","year":"2003","unstructured":"Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Nat Acad Sci. 2003; 100(16):9440\u20135. https:\/\/doi.org\/10.1073\/pnas.1530509100.","journal-title":"Proc Nat Acad Sci"},{"issue":"10","key":"3118_CR34","doi-asserted-by":"publisher","first-page":"1008","DOI":"10.1111\/ecog.01871","volume":"39","author":"L Comte","year":"2016","unstructured":"Comte L, Hugueny B, Grenouillet G. Climate interacts with anthropogenic drivers to determine extirpation dynamics. Ecography. 2016; 39(10):1008\u201316. https:\/\/doi.org\/10.1111\/ecog.01871.","journal-title":"Ecography"},{"issue":"11","key":"3118_CR35","doi-asserted-by":"publisher","first-page":"2884","DOI":"10.1021\/ci300261r","volume":"52","author":"R Todeschini","year":"2012","unstructured":"Todeschini R, Consonni V, Xiang H, Holliday J, Buscema M, Willett P. Similarity coefficients for binary chemoinformatics data: Overview and extended comparison using simulated and real data sets. J Chem Inf Model. 2012; 52(11):2884\u2013901. https:\/\/doi.org\/10.1021\/ci300261r.","journal-title":"J Chem Inf Model"},{"issue":"2","key":"3118_CR36","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1038\/nmeth.2803","volume":"11","author":"SA Rahman","year":"2014","unstructured":"Rahman SA, Cuesta SM, Furnham N, Holliday GL, Thornton JM. EC-BLAST: a tool to automatically search and compare enzyme reactions. Nature Methods. 2014; 11(2):171\u20134. https:\/\/doi.org\/10.1038\/nmeth.2803.","journal-title":"Nature Methods"},{"key":"3118_CR37","doi-asserted-by":"publisher","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations?J Chem Inform. 2015; 7(1). https:\/\/doi.org\/10.1186\/s13321-015-0069-3.","DOI":"10.1186\/s13321-015-0069-3"},{"key":"3118_CR38","doi-asserted-by":"publisher","unstructured":"Quinlan AR. Bedtools: the swiss-army tool for genome feature analysis. Current Protocols in Bioinformatics. 2014:11\u201312. https:\/\/doi.org\/10.1002\/0471250953.bi1112s47.","DOI":"10.1002\/0471250953.bi1112s47"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3118-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3118-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3118-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,23]],"date-time":"2020-12-23T00:05:40Z","timestamp":1608681940000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3118-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":38,"journal-issue":{"issue":"S15","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["3118"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3118-5","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]},"assertion":[{"value":"19 September 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 December 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"644"}}