{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:19:02Z","timestamp":1772173142752,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010638","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,11,9]],"date-time":"2022-11-09T00:00:00Z","timestamp":1667952000000}}],"reference-count":18,"publisher":"Public Library of Science (PLoS)","issue":"10","license":[{"start":{"date-parts":[[2022,10,28]],"date-time":"2022-10-28T00:00:00Z","timestamp":1666915200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["3165\/19"],"award-info":[{"award-number":["3165\/19"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["1339\/18"],"award-info":[{"award-number":["1339\/18"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Len Blavatnik and the Blavatnik Family Foundation"},{"name":"Edmond J. Safra Center for Bioinformatics at Tel-Aviv University"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:sec id=\"sec001\">\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Sequencing long reads presents novel challenges to mapping. One such challenge is low sequence similarity between the reads and the reference, due to high sequencing error and mutation rates. This occurs, e.g., in a cancer tumor, or due to differences between strains of viruses or bacteria. A key idea in mapping algorithms is to sketch sequences with their minimizers. Recently, syncmers were introduced as an alternative sketching method that is more robust to mutations and sequencing errors.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec id=\"sec002\">\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce parameterized syncmer schemes (PSS), a generalization of syncmers, and provide a theoretical analysis for multi-parameter schemes. By combining PSS with downsampling or minimizers we can achieve any desired compression and window guarantee. We implemented the use of PSS in the popular minimap2 and Winnowmap2 mappers. In tests on simulated and real long-read data from a variety of genomes, the PSS-based algorithms, with scheme parameters selected on the basis of our theoretical analysis, reduced unmapped reads by 20-60% at high compression while usually using less memory. The advantage was more pronounced at low sequence identity. At sequence identity of 75% and medium compression, PSS-minimap had only 37% as many unmapped reads, and 8% fewer of the reads that did map were incorrectly mapped. Even at lower compression and error rates, PSS-based mapping mapped more reads than the original minimizer-based mappers as well as mappers using the original syncmer schemes. We conclude that using PSS can improve mapping of long reads in a wide range of settings.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1371\/journal.pcbi.1010638","type":"journal-article","created":{"date-parts":[[2022,10,28]],"date-time":"2022-10-28T13:28:04Z","timestamp":1666963684000},"page":"e1010638","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":22,"title":["Parameterized syncmer schemes improve long-read mapping"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6519-9581","authenticated-orcid":true,"given":"Abhinav","family":"Dutta","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6296-5209","authenticated-orcid":true,"given":"David","family":"Pellow","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1889-9870","authenticated-orcid":true,"given":"Ron","family":"Shamir","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,10,28]]},"reference":[{"issue":"Supplement_1","key":"pcbi.1010638.ref001","doi-asserted-by":"crossref","first-page":"i111","DOI":"10.1093\/bioinformatics\/btaa435","article-title":"Weighted minimizer sampling improves long read mapping","volume":"36","author":"C Jain","year":"2020","journal-title":"Bioinformatics"},{"issue":"6","key":"pcbi.1010638.ref002","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1038\/s41592-018-0001-7","article-title":"Accurate detection of complex structural variations using single-molecule sequencing","volume":"15","author":"FJ Sedlazeck","year":"2018","journal-title":"Nature methods"},{"issue":"18","key":"pcbi.1010638.ref003","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"H Li","year":"2018","journal-title":"Bioinformatics"},{"issue":"12","key":"pcbi.1010638.ref004","doi-asserted-by":"crossref","first-page":"i201","DOI":"10.1093\/bioinformatics\/btw279","article-title":"Compacting de Bruijn graphs from sequencing data quickly and in low memory","volume":"32","author":"R Chikhi","year":"2016","journal-title":"Bioinformatics"},{"issue":"17","key":"pcbi.1010638.ref005","doi-asserted-by":"crossref","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","article-title":"KMC 3: counting and manipulating k-mer statistics","volume":"33","author":"M Kokot","year":"2017","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1010638.ref006","first-page":"1","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"DE Wood","year":"2019","journal-title":"Genome biology"},{"key":"pcbi.1010638.ref007","doi-asserted-by":"crossref","first-page":"e10805","DOI":"10.7717\/peerj.10805","article-title":"Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences","volume":"9","author":"R Edgar","year":"2021","journal-title":"PeerJ"},{"issue":"2","key":"pcbi.1010638.ref008","doi-asserted-by":"crossref","first-page":"lqaa037","DOI":"10.1093\/nargab\/lqaa037","article-title":"Benchmarking of long-read correction methods","volume":"2","author":"JC Dohm","year":"2020","journal-title":"NAR Genomics and Bioinformatics"},{"key":"pcbi.1010638.ref009","article-title":"Theory of local k-mer selection with applications to long-read alignment","author":"J Shaw","year":"2021","journal-title":"Bioinformatics"},{"key":"pcbi.1010638.ref010","doi-asserted-by":"crossref","unstructured":"Li H. New strategies to improve minimap2 alignment accuracy. arXiv preprint arXiv:210803515. 2021.","DOI":"10.1093\/bioinformatics\/btab705"},{"key":"pcbi.1010638.ref011","first-page":"1","article-title":"Long-read mapping to repetitive reference sequences using Winnowmap2","author":"C Jain","year":"2022","journal-title":"Nature Methods"},{"key":"pcbi.1010638.ref012","doi-asserted-by":"crossref","unstructured":"Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data; 2003. p. 76\u201385.","DOI":"10.1145\/872757.872770"},{"issue":"5","key":"pcbi.1010638.ref013","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1101\/gr.213611.116","article-title":"Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly","volume":"27","author":"VA Schneider","year":"2017","journal-title":"Genome research"},{"issue":"6588","key":"pcbi.1010638.ref014","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1126\/science.abj6987","article-title":"The complete sequence of a human genome","volume":"376","author":"S Nurk","year":"2022","journal-title":"Science"},{"issue":"5331","key":"pcbi.1010638.ref015","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1126\/science.277.5331.1453","article-title":"The complete genome sequence of Escherichia coli K-12","volume":"277","author":"FR Blattner","year":"1997","journal-title":"Science"},{"key":"pcbi.1010638.ref016","unstructured":"PacificBiosciences. Microbial Multiplexing Data Set 48 plex: PacBio Sequel II System, Chemistry v2.0, SMRT Link v8.0 Analysis; 2019. https:\/\/github.com\/PacificBiosciences\/DevNet\/wiki\/Microbial-Multiplexing-Data-Set---48-plex:-PacBio-Sequel-II-System,-Chemistry-v2.0,-SMRT-Link-v8.0-Analysis."},{"issue":"1","key":"pcbi.1010638.ref017","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","article-title":"PBSIM: PacBio reads simulator\u2014toward accurate genome assembly","volume":"29","author":"Y Ono","year":"2013","journal-title":"Bioinformatics"},{"issue":"4","key":"pcbi.1010638.ref018","doi-asserted-by":"crossref","first-page":"gix010","DOI":"10.1093\/gigascience\/gix010","article-title":"NanoSim: nanopore sequence read simulator based on statistical characterization","volume":"6","author":"C Yang","year":"2017","journal-title":"GigaScience"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010638","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,11,9]],"date-time":"2022-11-09T00:00:00Z","timestamp":1667952000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010638","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,9]],"date-time":"2022-11-09T14:09:46Z","timestamp":1668002986000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010638"}},"subtitle":[],"editor":[{"given":"Heng","family":"Li","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,10,28]]},"references-count":18,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,10,28]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010638","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.01.10.475696","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,28]]}}}