{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T19:11:09Z","timestamp":1774120269178,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2016,9,25]],"date-time":"2016-09-25T00:00:00Z","timestamp":1474761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"name":"Nature Science Foundation of China","award":["61301204"],"award-info":[{"award-number":["61301204"]}]},{"name":"Nature Science Foundation of China","award":["31301089"],"award-info":[{"award-number":["31301089"]}]},{"name":"High-Tech Research and Development Program (863) of China","award":["2015AA020101"],"award-info":[{"award-number":["2015AA020101"]}]},{"name":"High-Tech Research and Development Program (863) of China","award":["2015AA020108"],"award-info":[{"award-number":["2015AA020108"]}]},{"name":"High-Tech Research and Development Program (863) of China","award":["2014AA021505"],"award-info":[{"award-number":["2014AA021505"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Read length is continuously increasing with the development of novel high-throughput sequencing technologies, which has enormous potentials on cutting-edge genomic studies. However, longer reads could more frequently span the breakpoints of structural variants (SVs) than that of shorter reads. This may greatly influence read alignment, since most state-of-the-art aligners are designed for handling relatively small variants in a co-linear alignment framework. Meanwhile, long read alignment is still not as efficient as that of short reads, which could be also a bottleneck for the upcoming wide application.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose long approximate matches-based split aligner (LAMSA), a novel split read alignment approach. It takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming-based split alignment approach to handle the large or non-co-linear variants. We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; meanwhile, it also has good ability to handle various categories of SVs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>LAMSA is available at https:\/\/github.com\/hitbc\/LAMSA<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw594","type":"journal-article","created":{"date-parts":[[2016,9,26]],"date-time":"2016-09-26T00:07:40Z","timestamp":1474848460000},"page":"192-201","source":"Crossref","is-referenced-by-count":23,"title":["LAMSA: fast split read alignment with long approximate matches"],"prefix":"10.1093","volume":"33","author":[{"given":"Bo","family":"Liu","sequence":"first","affiliation":[{"name":"Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang, China"}]},{"given":"Yan","family":"Gao","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang, China"}]},{"given":"Yadong","family":"Wang","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Harbin Institute of Technology, Harbin, Heilongjiang, China"}]}],"member":"286","published-online":{"date-parts":[[2016,9,25]]},"reference":[{"key":"2023020204303489200_btw594-B1","doi-asserted-by":"crossref","first-page":"1679","DOI":"10.1093\/bioinformatics\/btt198","article-title":"RSVSim: an R\/Bioconductor package for the simulation of structural variations","volume":"29","author":"Bartenhagen","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B3","doi-asserted-by":"crossref","first-page":"238.","DOI":"10.1186\/1471-2105-13-238","article-title":"Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory","volume":"13","author":"Chaisson","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020204303489200_btw594-B4","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1038\/nature13907","article-title":"Resolving the complexity of the human genome using single-molecule sequencing","volume":"517","author":"Chaisson","year":"2015","journal-title":"Nature"},{"key":"2023020204303489200_btw594-B5","doi-asserted-by":"crossref","first-page":"e1002384","DOI":"10.1371\/journal.pgen.1002384","article-title":"Repetitive elements may comprise over two-thirds of the human genome","volume":"7","author":"De Koning","year":"2011","journal-title":"PLoS Genet"},{"key":"2023020204303489200_btw594-B6","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023020204303489200_btw594-B7","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B8","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1126\/science.1162986","article-title":"Real-time DNA sequencing from single polymerase molecules","volume":"323","author":"Eid","year":"2009","journal-title":"Science"},{"key":"2023020204303489200_btw594-B9","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1038\/nbt0412-295","article-title":"Oxford Nanopore announcement sets sequencing sector abuzz","volume":"30","author":"Eisenstein","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023020204303489200_btw594-B10","doi-asserted-by":"crossref","first-page":"2417","DOI":"10.1093\/bioinformatics\/bts456","article-title":"YAHA: fast and flexible long-read alignment with optimal breakpoint detection","volume":"28","author":"Faust","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B11","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1109\/SFCS.2000.892127","article-title":"Opportunistic data structures with applications","author":"Ferragina","year":"2000","journal-title":"Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000), IEEE Computer Society"},{"key":"2023020204303489200_btw594-B12","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1038\/nrg1767","article-title":"Structural variation in the human genome","volume":"7","author":"Feuk","year":"2006","journal-title":"Nat. Rev. Genet"},{"key":"2023020204303489200_btw594-B13","doi-asserted-by":"crossref","first-page":"3169","DOI":"10.1093\/bioinformatics\/bts605","article-title":"Tools for mapping high-throughput sequencing data","volume":"28","author":"Fonseca","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B14","doi-asserted-by":"crossref","first-page":"688","DOI":"10.1101\/gr.168450.113","article-title":"Reconstructing complex regions of genomes using long-read sequencing technology","volume":"24","author":"Huddleston","year":"2014","journal-title":"Genome Res"},{"key":"2023020204303489200_btw594-B15","doi-asserted-by":"crossref","first-page":"1075","DOI":"10.12688\/f1000research.7201.1","article-title":"MinION Analysis and Reference Consortium: Phase 1 data release and analysis","volume":"4","author":"Ip","year":"2015","journal-title":"F1000Res"},{"key":"2023020204303489200_btw594-B16","doi-asserted-by":"crossref","first-page":"2576","DOI":"10.1093\/bioinformatics\/bts484","article-title":"PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants","volume":"28","author":"Jiang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B17","first-page":"656","article-title":"BLAT\u2014the BLAST-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023020204303489200_btw594-B18","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1101\/gr.113985.110","article-title":"Adaptive seeds tame genomic sequence comparison","volume":"21","author":"Kie\u0142basa","year":"2011","journal-title":"Genome Res"},{"key":"2023020204303489200_btw594-B19","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","volume":"30","author":"Koren","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023020204303489200_btw594-B20","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol"},{"key":"2023020204303489200_btw594-B21","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020204303489200_btw594-B22","first-page":"589","author":"Li","year":"2013"},{"key":"2023020204303489200_btw594-B23","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B24","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows\u2013Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B25","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B26","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res"},{"key":"2023020204303489200_btw594-B27","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkv533","article-title":"BatAlign: an incremental method for accurate alignment of sequencing reads","volume":"43","author":"Lim","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020204303489200_btw594-B28","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1093\/bioinformatics\/bts061","article-title":"SOAP3: ultra-fast GPU-based parallel alignment tool for short reads","volume":"28","author":"Liu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B29","doi-asserted-by":"crossref","first-page":"D986","DOI":"10.1093\/nar\/gkt958","article-title":"The database of genomic variants: a curated collection of structural variation in the human genome","volume":"42","author":"MacDonald","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020204303489200_btw594-B30","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1038\/nmeth.2221","article-title":"The GEM mapper: fast, accurate and versatile alignment by filtration","volume":"9","author":"Marco-Sola","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020204303489200_btw594-B31","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023020204303489200_btw594-B32","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature09708","article-title":"Mapping copy number variation by population-scale genome sequencing","volume":"470","author":"Mills","year":"2011","journal-title":"Nature"},{"key":"2023020204303489200_btw594-B33","doi-asserted-by":"crossref","first-page":"2366","DOI":"10.1093\/bioinformatics\/bts450","article-title":"Fast and accurate read alignment for resequencing","volume":"28","author":"Mu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B34","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1101\/gr.194201","article-title":"SSAHA: a fast search method for large DNA databases","volume":"11","author":"Ning","year":"2001","journal-title":"Genome Res"},{"key":"2023020204303489200_btw594-B35","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1093\/bioinformatics\/bts649","article-title":"PBSIM: PacBio reads simulator\u2013toward accurate genome assembly","volume":"29","author":"Ono","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020204303489200_btw594-B36","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1038\/nbt.2181","article-title":"DNA sequencing with nanopores","volume":"30","author":"Schneider","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023020204303489200_btw594-B37","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/nature15394","article-title":"An integrated map of structural variation in 2,504 human genomes","volume":"526","author":"Sudmant","year":"2015","journal-title":"Nature"},{"key":"2023020204303489200_btw594-B38","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/nrg3117","article-title":"Repetitive DNA and next-generation sequencing: computational challenges and solutions","volume":"13","author":"Treangen","year":"2012","journal-title":"Nat. Rev. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/2\/192\/49037224\/bioinformatics_33_2_192.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/2\/192\/49037224\/bioinformatics_33_2_192.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T04:31:21Z","timestamp":1675312281000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/2\/192\/2525703"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2016,9,25]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw594","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,1,15]]},"published":{"date-parts":[[2016,9,25]]}}}