{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T19:19:33Z","timestamp":1779909573041,"version":"3.53.1"},"reference-count":18,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T00:00:00Z","timestamp":1601510400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T00:00:00Z","timestamp":1601510400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Science Foundation","award":["Grant number IOS-1744001"],"award-info":[{"award-number":["Grant number IOS-1744001"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      Here we present\n                      <jats:italic>SequelTools<\/jats:italic>\n                      , a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and\/or by minimum CLR length.\n                      <jats:italic>SequelTools<\/jats:italic>\n                      is implemented in bash, R, and Python using only standard libraries and packages and is platform independent.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>\n                      <jats:italic>SequelTools<\/jats:italic>\n                      is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ISUgenomics\/SequelTools\">https:\/\/github.com\/ISUgenomics\/SequelTools<\/jats:ext-link>\n                      .\n                    <\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-020-03751-8","type":"journal-article","created":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T12:04:36Z","timestamp":1601553876000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["SequelTools: a suite of tools for working with PacBio Sequel raw sequence data"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6221-2493","authenticated-orcid":false,"given":"David E.","family":"Hufnagel","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matthew B.","family":"Hufford","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Arun S.","family":"Seetharam","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,10,1]]},"reference":[{"issue":"R2","key":"3751_CR1","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1093\/hmg\/ddq416","volume":"19","author":"EE Schadt","year":"2010","unstructured":"Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet. 2010;19(R2):227\u201340.","journal-title":"Hum Mol Genet"},{"issue":"6","key":"3751_CR2","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/nrg.2016.49","volume":"17","author":"S Goodwin","year":"2016","unstructured":"Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17(6):333\u201351.","journal-title":"Nat Rev Genet"},{"issue":"5","key":"3751_CR3","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/j.gpb.2015.08.002","volume":"13","author":"A Rhoads","year":"2015","unstructured":"Rhoads A, Au KF. PacBio sequencing and its applications. Genom Proteom Bioinform. 2015;13(5):278\u201389.","journal-title":"Genom Proteom Bioinform"},{"issue":"5","key":"3751_CR4","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1186\/gb-2013-14-5-r51","volume":"14","author":"MG Ross","year":"2013","unstructured":"Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):51.","journal-title":"Genome Biol"},{"issue":"5","key":"3751_CR5","doi-asserted-by":"publisher","first-page":"2159","DOI":"10.1093\/nar\/gky066","volume":"46","author":"S Ardui","year":"2018","unstructured":"Ardui S, Ameur A, Vermeesch JR, Hestand MS. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. 2018;46(5):2159\u201368. https:\/\/doi.org\/10.1093\/nar\/gky066.","journal-title":"Nucleic Acids Res"},{"key":"3751_CR6","volume-title":"FastQC","author":"S Andrews","year":"2012","unstructured":"Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC. Babraham: Babraham Institute; 2012."},{"issue":"3","key":"3751_CR7","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1093\/bioinformatics\/bty654","volume":"35","author":"R Lanfear","year":"2019","unstructured":"Lanfear R, Schalamun M, Kainer D, Wang W, Schwessinger B. MinIONQC: fast and simple quality control for MinION sequencing data. Bioinformatics. 2019;35(3):523\u20135.","journal-title":"Bioinformatics"},{"issue":"11","key":"3751_CR8","doi-asserted-by":"publisher","first-page":"1934","DOI":"10.1093\/bioinformatics\/bty034","volume":"34","author":"D Desvillechabrol","year":"2018","unstructured":"Desvillechabrol D, Legendre R, Rioualen C, Bouchier C, Van Helden J, Kennedy S, Cokelaer T. Sequanix: a dynamic graphical interface for snakemake workflows. Bioinformatics. 2018;34(11):1934\u20136.","journal-title":"Bioinformatics"},{"key":"3751_CR9","unstructured":"PacificBiosciences\/stsPlots. Plot primary analysis quality control metrics. https:\/\/github.com\/PacificBiosciences\/stsPlots. Accessed 8 June 2019"},{"key":"3751_CR10","unstructured":"Software Downloads. Accessed 5 July 2019. https:\/\/www.pacb.com\/support\/software-downloads\/"},{"issue":"10","key":"3751_CR11","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","volume":"37","author":"AM Wenger","year":"2019","unstructured":"Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, Topfer A, Alonge M, Mahmoud M, Qian Y, Chin CS, Phillippy AM, Schatz MC, Myers G, DePristo MA, Ruan J, Marschall T, Sedlazeck FJ, Zook JM, Li H, Koren S, Carroll A, Rank DR, Hunkapiller MW. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37(10):1155\u201362.","journal-title":"Nat Biotechnol"},{"key":"3751_CR12","unstructured":"Pacific Biosciences of California I. ccs. GitHub. https:\/\/github.com\/PacificBiosciences\/ccs. Accessed 7 Oct 2019"},{"issue":"16","key":"3751_CR13","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment\/map format and SAMtools. Bioinformatics. 2009;25(16):2078\u20139.","journal-title":"Bioinformatics"},{"key":"3751_CR14","unstructured":"BAM format specification for PacBio. Accessed 12 Oct 2019. https:\/\/pacbiofileformats.readthedocs.io\/en\/5.1\/BAM.html."},{"key":"3751_CR15","doi-asserted-by":"crossref","unstructured":"Ou S, Liu J, Manchanda N, Gilbert AM, Wei X, Chin C-S, Hufnagel DE, Pedersen S, Snodgrass S, Fengler K, et al. Effect of sequence depth and length in long-read assembly of the maize inbred nc358. Nat Biotechnol; 2019.","DOI":"10.1101\/858365"},{"key":"3751_CR16","unstructured":"The PacBio Arabidopsis Thaliana Genome. https:\/\/downloads.pacbcloud.com\/public\/SequelData\/ArabidopsisDemoData\/SequenceData\/1_A01_customer\/. Accessed 12 July 2020."},{"key":"3751_CR17","doi-asserted-by":"publisher","first-page":"722","DOI":"10.1101\/gr.215087.116","volume":"27","author":"K Sergey","year":"2017","unstructured":"Sergey K, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722\u201336.","journal-title":"Genome Res"},{"key":"3751_CR18","doi-asserted-by":"publisher","DOI":"10.1186\/2047-217X-2-10","author":"KR Bradnam","year":"2013","unstructured":"Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou W-C, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam T-W, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, MacCallum I, MacManes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu S-M, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience. 2013. https:\/\/doi.org\/10.1186\/2047-217X-2-10.","journal-title":"GigaScience"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03751-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-03751-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03751-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T20:31:52Z","timestamp":1633033912000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03751-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,1]]},"references-count":18,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3751"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03751-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/611814","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,1]]},"assertion":[{"value":"19 December 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 September 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 October 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"429"}}