{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,25]],"date-time":"2026-06-25T11:46:19Z","timestamp":1782387979813,"version":"3.54.5"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.<\/jats:p>\n                  <jats:p>Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.<\/jats:p>\n                  <jats:p>Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http:\/\/www.cbcb.umd.edu\/software\/jellyfish.<\/jats:p>\n                  <jats:p>Contact: \u00a0gmarcais@umd.edu<\/jats:p>\n                  <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr011","type":"journal-article","created":{"date-parts":[[2011,1,8]],"date-time":"2011-01-08T20:13:43Z","timestamp":1294517623000},"page":"764-770","source":"Crossref","is-referenced-by-count":4436,"title":["A fast, lock-free approach for efficient parallel counting of occurrences of\n                    <i>k<\/i>\n                    -mers"],"prefix":"10.1093","volume":"27","author":[{"given":"Guillaume","family":"Mar\u00e7ais","sequence":"first","affiliation":[{"name":"1 Program in Applied Mathematics, Statistics and Scientific Computation and 2Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[{"name":"1 Program in Applied Mathematics, Statistics and Scientific Computation and 2Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2011,1,7]]},"reference":[{"key":"2023012511552544300_B1","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1093\/bioinformatics\/bti039","article-title":"RAP: a new computer program for de novo identification of repeated sequences in whole genomes","volume":"21","author":"Campagna","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012511552544300_B2","volume-title":"Introduction to Algorithms.","author":"Cormen","year":"1990"},{"key":"2023012511552544300_B3","doi-asserted-by":"crossref","first-page":"e1000475","DOI":"10.1371\/journal.pbio.1000475","article-title":"Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis","volume":"8","author":"Dalloul","year":"2010","journal-title":"PLoS Biol"},{"key":"2023012511552544300_B4","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: simplified data processing on large clusters","volume":"51","author":"Dean","year":"2008","journal-title":"Commun. ACM"},{"key":"2023012511552544300_B5","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012511552544300_B6","first-page":"50a","article-title":"Almost wait-free resizable hashtables","volume-title":"Proceeding of the 18th International Parallel and Distributed Processing Symposium","author":"Gao","year":"2004"},{"key":"2023012511552544300_B7","doi-asserted-by":"crossref","first-page":"2306","DOI":"10.1101\/gr.1350803","article-title":"Annotating large genomes with exact word matches","volume":"13","author":"Healy","year":"2003","journal-title":"Genome Res."},{"key":"2023012511552544300_B8","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1101\/gr.828403","article-title":"Whole-genome sequence assembly for mammalian genomes: Arachne 2","volume":"13","author":"Jaffe","year":"2003","journal-title":"Genome Res."},{"key":"2023012511552544300_B9","doi-asserted-by":"crossref","first-page":"R116","DOI":"10.1186\/gb-2010-11-11-r116","article-title":"Quake: quality-aware detection and correction of sequencing errors","volume":"11","author":"Kelley","year":"2010","journal-title":"Genome Biol."},{"key":"2023012511552544300_B10","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1186\/1471-2164-9-517","article-title":"A new method to compute k-mer frequencies and its application to annotate large repetitive plant genomes","volume":"9","author":"Kurtz","year":"2008","journal-title":"BMC Genomics"},{"key":"2023012511552544300_B11","first-page":"117","article-title":"An optimistic approach to lock-free fifo queues","volume-title":"Proceedings of the 18th International Symposium on Distributed Computing, LNCS 3274","author":"Ladan-mozes","year":"2004"},{"key":"2023012511552544300_B12","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1093\/bioinformatics\/btf843","article-title":"FORRepeats: detects repeats on entire chromosomes and between genomes","volume":"19","author":"Lefebvre","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012511552544300_B13","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1038\/nature08696","article-title":"The sequence and de novo assembly of the giant panda genome","volume":"463","author":"Li","year":"2010","journal-title":"Nature"},{"key":"2023012511552544300_B14","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1145\/564870.564881","article-title":"High performance dynamic lock-free hash tables and list-based sets","volume-title":"SPAA '02: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures","author":"Michael","year":"2002"},{"key":"2023012511552544300_B15","doi-asserted-by":"crossref","DOI":"10.1145\/248052.248106","article-title":"Simple, fast, and practical non-blocking and blocking concurrent queue algorithms","author":"Michael","year":"1996","journal-title":"Proceeding of PODC '96"},{"key":"2023012511552544300_B16","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012511552544300_B17","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023012511552544300_B18","doi-asserted-by":"crossref","DOI":"10.1007\/11561927_10","article-title":"Non-blocking hashtables with open addressing","volume-title":"Technical Report 639","author":"Purcell","year":"2005"},{"key":"2023012511552544300_B19","article-title":"Efficient generation of random nonsingular matrices","volume-title":"Technical Report","author":"Randall","year":"1991"},{"key":"2023012511552544300_B20","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1101\/gr.101360.109","article-title":"Assembly of large genomes using second-generation sequencing","volume":"20","author":"Schatz","year":"2010","journal-title":"Genome Res."},{"key":"2023012511552544300_B21","article-title":"Information sorting in the application of electronic digital computers to business operations","volume-title":"Master's Thesis","author":"Seward","year":"1954"},{"key":"2023012511552544300_B22","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1145\/1147954.1147958","article-title":"Split-ordered lists: Lock-free extensible hash tables","volume":"53","author":"Shalev","year":"2006","journal-title":"J. ACM"},{"key":"2023012511552544300_B23","doi-asserted-by":"crossref","first-page":"061912","DOI":"10.1103\/PhysRevE.78.061912","article-title":"Duplication count distributions in DNA sequences","volume":"78","author":"Sindi","year":"2008","journal-title":"Phys. Rev. E"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/6\/764\/48866141\/bioinformatics_27_6_764.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/6\/764\/48866141\/bioinformatics_27_6_764.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:10:11Z","timestamp":1674630611000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/6\/764\/234905"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,1,7]]},"references-count":23,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2011,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr011","relation":{"has-review":[{"id-type":"doi","id":"10.3410\/f.9766956.10459055","asserted-by":"object"}]},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,3,15]]},"published":{"date-parts":[[2011,1,7]]}}}