{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T07:51:17Z","timestamp":1648540277257},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (&gt; 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/code.google.com\/p\/oculus-bio\" ext-link-type=\"uri\">http:\/\/code.google.com\/p\/oculus-bio<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-13-297","type":"journal-article","created":{"date-parts":[[2012,11,13]],"date-time":"2012-11-13T17:14:25Z","timestamp":1352826865000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Oculus: faster sequence alignment by streaming read compression"],"prefix":"10.1186","volume":"13","author":[{"given":"Brendan A","family":"Veeneman","sequence":"first","affiliation":[]},{"given":"Matthew K","family":"Iyer","sequence":"additional","affiliation":[]},{"given":"Arul M","family":"Chinnaiyan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,11,13]]},"reference":[{"key":"5548_CR1","unstructured":"Wetterstrand KA: DNA sequencing costs: data from the NHGRI large-scale genome sequencing program. http:\/\/www.genome.gov\/sequencingcosts,"},{"key":"5548_CR2","doi-asserted-by":"publisher","first-page":"666","DOI":"10.1126\/science.331.6018.666","volume":"331","author":"E Pennisi","year":"2011","unstructured":"Pennisi E: Human genome 10th anniversary. Will computers crash genomics?. Science. 2011, 331: 666-668. 10.1126\/science.331.6018.666.","journal-title":"Science"},{"key":"5548_CR3","doi-asserted-by":"publisher","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","volume":"10","author":"B Langmead","year":"2009","unstructured":"Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186\/gb-2009-10-3-r25.","journal-title":"Genome Biol"},{"key":"5548_CR4","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093\/bioinformatics\/btp324.","journal-title":"Bioinformatics"},{"key":"5548_CR5","doi-asserted-by":"publisher","first-page":"1851","DOI":"10.1101\/gr.078212.108","volume":"18","author":"H Li","year":"2008","unstructured":"Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18: 1851-1858. 10.1101\/gr.078212.108.","journal-title":"Genome Res"},{"key":"5548_CR6","doi-asserted-by":"publisher","first-page":"1646","DOI":"10.1101\/gr.088823.108","volume":"19","author":"D Weese","year":"2009","unstructured":"Weese D, Emde AK, Rausch T, D\u00f6ring A, Reinert K: RazerS\u2013fast read mapping with sensitivity control. Genome Res. 2009, 19: 1646-1654. 10.1101\/gr.088823.108.","journal-title":"Genome Res"},{"key":"5548_CR7","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.","journal-title":"J Mol Biol"},{"key":"5548_CR8","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Sch\u00e4ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093\/nar\/25.17.3389.","journal-title":"Nucleic Acids Res"},{"key":"5548_CR9","doi-asserted-by":"publisher","first-page":"1363","DOI":"10.1093\/bioinformatics\/btp236","volume":"25","author":"MC Schatz","year":"2009","unstructured":"Schatz MC: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009, 25: 1363-1369. 10.1093\/bioinformatics\/btp236.","journal-title":"Bioinformatics"},{"key":"5548_CR10","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1186\/1756-0500-4-171","volume":"4","author":"T Nguyen","year":"2011","unstructured":"Nguyen T, Shi W, Ruden D: CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping. BMC Res Notes. 2011, 4: 171-10.1186\/1756-0500-4-171.","journal-title":"BMC Res Notes"},{"key":"5548_CR11","doi-asserted-by":"publisher","first-page":"2159","DOI":"10.1093\/bioinformatics\/btr325","volume":"27","author":"L Pireddu","year":"2011","unstructured":"Pireddu L, Leo S, Zanetti G: SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics. 2011, 27: 2159-2160. 10.1093\/bioinformatics\/btr325.","journal-title":"Bioinformatics"},{"key":"5548_CR12","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1093\/bioinformatics\/btq677","volume":"27","author":"K Shimizu","year":"2010","unstructured":"Shimizu K, Tsuda K: SlideSort: all pairs similarity search for short reads. Bioinformatics. 2010, 27: 464-470.","journal-title":"Bioinformatics"},{"key":"5548_CR13","doi-asserted-by":"publisher","first-page":"576","DOI":"10.1038\/nmeth0810-576","volume":"7","author":"F Hach","year":"2010","unstructured":"Hach F, Hormozdiari F, Alkan C, Hormozdiari F, Birol I, Eichler EE, Sahinalp SC: mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods. 2010, 7: 576-577. 10.1038\/nmeth0810-576.","journal-title":"Nat Methods"},{"key":"5548_CR14","unstructured":"Burriesci MS, Lehnert EM, Pringle JR: Fulcrum: condensing redundant reads from high-throughput sequencing studies. Bioinformatics. in press"},{"key":"5548_CR15","doi-asserted-by":"publisher","first-page":"636","DOI":"10.1126\/science.1105136","volume":"306","author":"Encode Project Consortium","year":"2004","unstructured":"Encode Project Consortium: The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004, 306: 636-640.","journal-title":"Science"},{"key":"5548_CR16","doi-asserted-by":"publisher","first-page":"e17490","DOI":"10.1371\/journal.pone.0017490","volume":"6","author":"Z Sun","year":"2011","unstructured":"Sun Z, Asmann YW, Kalari KR, Bot B, Eckel-Passow JE, Baker TR, Carr JM, Khrebtukova I, Luo S, Zhang L, Schroth GP, Perez EA, Thompson EA: Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing. PLoS One. 2011, 6: e17490-10.1371\/journal.pone.0017490.","journal-title":"PLoS One"},{"key":"5548_CR17","doi-asserted-by":"publisher","first-page":"i383","DOI":"10.1093\/bioinformatics\/btr247","volume":"27","author":"PP \u0141abaj","year":"2011","unstructured":"\u0141abaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP: Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics. 2011, 27: i383-i391. 10.1093\/bioinformatics\/btr247.","journal-title":"Bioinformatics"},{"key":"5548_CR18","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","volume":"25","author":"C Trapnell","year":"2009","unstructured":"Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093\/bioinformatics\/btp120.","journal-title":"Bioinformatics"},{"key":"5548_CR19","unstructured":"sparsehash: An extremely memory-efficient hash_map implementation. http:\/\/code.google.com\/p\/sparsehash\/,"},{"key":"5548_CR20","unstructured":"MurmurHash: http:\/\/sites.google.com\/site\/murmurhash,"},{"key":"5548_CR21","unstructured":"Kent Informatics, Inc: BLAT and other fine software. http:\/\/www.kentinformatics.com,"},{"key":"5548_CR22","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"TF Smith","year":"1981","unstructured":"Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol. 1981, 147: 195-197. 10.1016\/0022-2836(81)90087-5.","journal-title":"J Mol Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-297.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T21:09:04Z","timestamp":1630530544000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-297"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11,13]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["5548"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-297","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,11,13]]},"assertion":[{"value":"10 April 2012","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 November 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"297"}}