{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T07:32:17Z","timestamp":1777793537064,"version":"3.51.4"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T00:00:00Z","timestamp":1621468800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T00:00:00Z","timestamp":1621468800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010664","name":"H2020 Future and Emerging Technologies","doi-asserted-by":"publisher","award":["863320"],"award-info":[{"award-number":["863320"]}],"id":[{"id":"10.13039\/100010664","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed\u2013filter\u2013extend methodology and rely on filtration heuristics to reduce the overhead of edit distance computation. However, filtering has inherent performance\u2013accuracy trade-offs that limits its effectiveness.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      Motivated by algorithmic advances in randomized low-distortion embedding, we introduce SEE, a new methodology for developing sequence mappers and aligners. While SFE focuses on eliminating sub-optimal candidates, SEE focuses instead on identifying optimal candidates. To do so, SEE transforms the read and reference strings from edit distance regime to the Hamming regime by embedding them using a randomized algorithm, and uses Hamming distance over the embedded set to identify optimal candidates. To show that SEE performs well in practice, we present Accel-Align\u00a0an SEE-based short-read sequence mapper and aligner that is 3\u201312\n                      <jats:inline-formula>\n                        <jats:alternatives>\n                          <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                          <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                            <mml:mo>\u00d7<\/mml:mo>\n                          <\/mml:math>\n                        <\/jats:alternatives>\n                      <\/jats:inline-formula>\n                      faster than state-of-the-art aligners on commodity CPUs, without any special-purpose hardware, while providing comparable accuracy.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>As sequencing technologies continue to increase read length while improving throughput and accuracy, we believe that randomized embeddings open up new avenues for optimization that cannot be achieved by using edit distance. Thus, the techniques presented in this paper have a much broader scope as they can be used for other applications like graph alignment, multiple sequence alignment, and sequence assembly.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-021-04162-z","type":"journal-article","created":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T15:09:10Z","timestamp":1621523350000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Accel-Align: a fast sequence mapper and aligner based on the seed\u2013embed\u2013extend method"],"prefix":"10.1186","volume":"22","author":[{"given":"Yiqing","family":"Yan","sequence":"first","affiliation":[]},{"given":"Nimisha","family":"Chaturvedi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5887-4091","authenticated-orcid":false,"given":"Raja","family":"Appuswamy","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,5,20]]},"reference":[{"issue":"3","key":"4162_CR1","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1109\/JPROC.2015.2455551","volume":"105","author":"S Canzar","year":"2017","unstructured":"Canzar S, Salzberg SL. Short read mapping: an algorithmic tour. Proc IEEE. 2017;105(3):436\u201358.","journal-title":"Proceedings of the IEEE"},{"key":"4162_CR2","doi-asserted-by":"crossref","unstructured":"Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In: Proceedings of the forty-seventh annual ACM symposium on theory of computing, pp. 51\u201358 (2015)","DOI":"10.1145\/2746539.2746612"},{"key":"4162_CR3","doi-asserted-by":"crossref","unstructured":"Xin H, Lee D, Hormozdiari F, Yedkar S, Mutlu O, Alkan C. Accelerating read mapping with fasthash. BMC Genomics. 2013;14.","DOI":"10.1186\/1471-2164-14-S1-S13"},{"issue":"10","key":"4162_CR4","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1093\/bioinformatics\/btu856","volume":"31","author":"H Xin","year":"2015","unstructured":"Xin H, Greth J, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O. Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinformatics. 2015;31(10):1553\u201360.","journal-title":"Bioinformatics"},{"key":"4162_CR5","unstructured":"Alser, M., Mutlu, O., Alkan, C.: Magnet: understanding and improving the accuracy of genome pre-alignment filtering. arXiv preprint arXiv:1707.01631 (2017)"},{"key":"4162_CR6","doi-asserted-by":"crossref","unstructured":"Kim J, Senol Cali D, Xin H, Lee D, Ghose S, Alser M, Hassan H, Ergin O, Alkan C, Mutlu O. Grim-filter: fast seed location filtering in dna read mapping using processing-in-memory technologies. BMC Genomics. 2018;19.","DOI":"10.1186\/s12864-018-4460-0"},{"issue":"21","key":"4162_CR7","doi-asserted-by":"publisher","first-page":"4255","DOI":"10.1093\/bioinformatics\/btz234","volume":"35","author":"M Alser","year":"2019","unstructured":"Alser M, Hassan H, Kumar A, Mutlu O, Alkan C. Shouji: a fast and efficient pre-alignment filter for sequence alignment. Bioinformatics. 2019;35(21):4255\u201363.","journal-title":"Bioinformatics"},{"key":"4162_CR8","doi-asserted-by":"crossref","unstructured":"Liao Y, Smyth GK, Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nuc Acids Res. 2013;41(10).","DOI":"10.1093\/nar\/gkt214"},{"key":"4162_CR9","doi-asserted-by":"crossref","unstructured":"Chakraborty, D., Goldenberg, E., Kouck\u1ef3, M.: Streaming algorithms for embedding and computing edit distance in the low distance regime. In: Proceedings of the forty-eighth annual ACM symposium on theory of computing, pp. 712\u2013725 (2016)","DOI":"10.1145\/2897518.2897577"},{"key":"4162_CR10","doi-asserted-by":"crossref","unstructured":"Zhang, H., Zhang, Q.: Embedjoin: Efficient edit similarity joins via embeddings. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 585\u2013594 (2017)","DOI":"10.1145\/3097983.3098003"},{"key":"4162_CR11","unstructured":"Zhang, X., Yuan, Y., Indyk, P.: Neural embeddings for nearest neighbor search under edit distance (2019)"},{"key":"4162_CR12","doi-asserted-by":"crossref","unstructured":"Suzuki H, Kasahara M. Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC Bioinformatics. 2018;19(45).","DOI":"10.1186\/s12859-018-2014-8"},{"issue":"4","key":"4162_CR13","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/nmeth.1923","volume":"9","author":"B Langmead","year":"2012","unstructured":"Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357.","journal-title":"Nature methods"},{"key":"4162_CR14","unstructured":"Li, H.: Aligning sequence reads, clone sequences and assembly contigs with bwa-mem. arXiv preprint arXiv:1303.3997 (2013)"},{"issue":"18","key":"4162_CR15","doi-asserted-by":"publisher","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","volume":"34","author":"H Li","year":"2018","unstructured":"Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094\u2013100.","journal-title":"Bioinformatics"},{"issue":"10","key":"4162_CR16","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1093\/nar\/gkt214","volume":"41","author":"Y Liao","year":"2013","unstructured":"Liao Y, Smyth GK, Shi W. The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucl Acids Res. 2013;41(10):108\u2013108.","journal-title":"Nucleic acids research"},{"key":"4162_CR17","unstructured":"Zaharia, M., Bolosky, W.J., Curtis, K., Fox, A., Patterson, D., Shenker, S., Stoica, I., Karp, R.M., Sittler, T.: Faster and more accurate sequence alignment with SNAP (2011). 1111.5572"},{"key":"4162_CR18","unstructured":"Holtgrewe, M.: Mason: a read simulator for second generation sequencing data (2010)"},{"key":"4162_CR19","doi-asserted-by":"crossref","unstructured":"Marco-Sola, S., Moure L\u00f3pez, J.C., Moreto Planas, M., Espinosa\u00a0Morales, A.: Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics (btaa777), 1\u20138 (2020)","DOI":"10.1093\/bioinformatics\/btaa777"},{"key":"4162_CR20","doi-asserted-by":"crossref","unstructured":"Kumaran M, Subramanian U, Devarajan B. Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data. BMC Bioinformatics. 2019;20(342).","DOI":"10.1186\/s12859-019-2928-9"},{"issue":"5","key":"4162_CR21","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1038\/ng.806","volume":"43","author":"MA DePristo","year":"2011","unstructured":"DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Angel GD, Rivas MA, Hanna M. A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat Genetics. 2011;43(5):491.","journal-title":"Nature genetics"},{"key":"4162_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pcbi.1005944","volume":"14","author":"G Mar\u00e7ais","year":"2018","unstructured":"Mar\u00e7ais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. Mummer4: A fast and versatile genome alignment system. PLOS Comput Biol. 2018;14:1\u201314.","journal-title":"PLOS Computational Biology"},{"issue":"3","key":"4162_CR23","doi-asserted-by":"publisher","first-page":"440","DOI":"10.1093\/bioinformatics\/18.3.440","volume":"18","author":"B Ma","year":"2002","unstructured":"Ma B, Tromp J, Li M. Patternhunter: faster and more sensitive homology search. Bioinformatics. 2002;18(3):440\u20135.","journal-title":"Bioinformatics"},{"issue":"3","key":"4162_CR24","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1101\/gr.113985.110","volume":"21","author":"SM Kie\u0142basa","year":"2011","unstructured":"Kie\u0142basa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21(3):487\u201393.","journal-title":"Genome research"},{"issue":"18","key":"4162_CR25","doi-asserted-by":"publisher","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","volume":"20","author":"M Roberts","year":"2004","unstructured":"Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Reducing storage requirements for biological sequence comparison. Bioinformatics. 2004;20(18):3363\u20139.","journal-title":"Bioinformatics"},{"issue":"11","key":"4162_CR26","doi-asserted-by":"publisher","first-page":"1632","DOI":"10.1093\/bioinformatics\/btv670","volume":"32","author":"H Xin","year":"2015","unstructured":"Xin H, Nahar S, Zhu R, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O. Optimal seed solver: optimizing seed selection in read mapping. Bioinformatics. 2015;32(11):1632\u201342.","journal-title":"Bioinformatics"},{"key":"4162_CR27","doi-asserted-by":"crossref","unstructured":"Appuswamy, R., Fellay, J., Chaturvedi, N.: Sequence alignment through the looking glass. In: 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW) (2018)","DOI":"10.1109\/IPDPSW.2018.00050"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04162-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04162-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04162-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T15:09:44Z","timestamp":1621523384000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04162-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,20]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["4162"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04162-z","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.07.20.211888","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,20]]},"assertion":[{"value":"31 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 May 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"257"}}