{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T09:07:48Z","timestamp":1781773668775,"version":"3.54.5"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,2,21]],"date-time":"2024-02-21T00:00:00Z","timestamp":1708473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000925","name":"NHMRC","doi-asserted-by":"publisher","award":["GNT2016547"],"award-info":[{"award-number":["GNT2016547"]}],"id":[{"id":"10.13039\/501100000925","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The process of analyzing high throughput sequencing data often requires the identification and extraction of specific target sequences. This could include tasks, such as identifying cellular barcodes and UMIs in single-cell data, and specific genetic variants for genotyping. However, existing tools, which perform these functions are often task-specific, such as only demultiplexing barcodes for a dedicated type of experiment, or are not tolerant to noise in the sequencing data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To overcome these limitations, we developed Flexiplex, a versatile and fast sequence searching and demultiplexing tool for omics data, which is based on the Levenshtein distance and thus allows imperfect matches. We demonstrate Flexiplex\u2019s application on three use cases, identifying cell-line-specific sequences in Illumina short-read single-cell data, and discovering and demultiplexing cellular barcodes from noisy long-read single-cell RNA-seq data. We show that Flexiplex achieves an excellent balance of accuracy and computational efficiency compared to leading task-specific tools.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Flexiplex is available at https:\/\/davidsongroup.github.io\/flexiplex\/.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae102","type":"journal-article","created":{"date-parts":[[2024,2,21]],"date-time":"2024-02-21T02:52:50Z","timestamp":1708483970000},"source":"Crossref","is-referenced-by-count":26,"title":["Flexiplex: a versatile demultiplexer and search tool for omics data"],"prefix":"10.1093","volume":"40","author":[{"given":"Oliver","family":"Cheng","sequence":"first","affiliation":[{"name":"Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Faculty of Science, The University of Melbourne , Parkville, VIC 3010, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Min Hao","family":"Ling","sequence":"additional","affiliation":[{"name":"Department for Epigenetic and Epitranscriptomic Regulation, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR) , Singapore 138672, Republic of Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7868-3882","authenticated-orcid":false,"given":"Changqing","family":"Wang","sequence":"additional","affiliation":[{"name":"Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne , Parkville, VIC 3010, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7631-5688","authenticated-orcid":false,"given":"Shuyi","family":"Wu","sequence":"additional","affiliation":[{"name":"Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Faculty of Science, The University of Melbourne , Parkville, VIC 3010, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matthew E","family":"Ritchie","sequence":"additional","affiliation":[{"name":"Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne , Parkville, VIC 3010, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jonathan","family":"G\u00f6ke","sequence":"additional","affiliation":[{"name":"Department for Epigenetic and Epitranscriptomic Regulation, Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR) , Singapore 138672, Republic of Singapore"},{"name":"Department of Statistics and Data Science, National University of Singapore , Singapore 117546, Republic of Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Noorul","family":"Amin","sequence":"additional","affiliation":[{"name":"Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne , Parkville, VIC 3010, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8461-7467","authenticated-orcid":false,"given":"Nadia M","family":"Davidson","sequence":"additional","affiliation":[{"name":"Blood Cells and Blood Cancer Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research , Parkville, VIC 3052, Australia"},{"name":"Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne , Parkville, VIC 3010, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,2,21]]},"reference":[{"key":"2024030523335457800_btae102-B1","doi-asserted-by":"crossref","first-page":"3287","DOI":"10.1109\/TIT.2020.2996543","article-title":"Levenshtein distance, sequence comparison and biological database search","volume":"67","author":"Berger","year":"2021","journal-title":"IEEE Trans Inf Theory"},{"key":"2024030523335457800_btae102-B2","doi-asserted-by":"crossref","first-page":"1436","DOI":"10.1038\/s41596-019-0290-z","article-title":"Clonal tracking using embedded viral barcoding and high-throughput sequencing","volume":"15","author":"Bramlett","year":"2020","journal-title":"Nat Protoc"},{"key":"2024030523335457800_btae102-B3","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1158\/0008-5472.CAN-20-0696","article-title":"Single-cell transcriptomic heterogeneity in invasive ductal and lobular breast cancer cells","volume":"81","author":"Chen","year":"2021","journal-title":"Cancer Res"},{"key":"2024030523335457800_btae102-B4","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1186\/s13073-015-0167-x","article-title":"JAFFA: high sensitivity transcriptome-focused fusion gene detection","volume":"7","author":"Davidson","year":"2015","journal-title":"Genome Med"},{"key":"2024030523335457800_btae102-B5","doi-asserted-by":"crossref","first-page":"lqaa037","DOI":"10.1093\/nargab\/lqaa037","article-title":"Benchmarking of long-read correction methods","volume":"2","author":"Dohm","year":"2020","journal-title":"NAR Genom Bioinform"},{"key":"2024030523335457800_btae102-B6","doi-asserted-by":"crossref","first-page":"104530","DOI":"10.1016\/j.isci.2022.104530","article-title":"Fast and accurate matching of cellular barcodes across short-reads and long-reads of single-cell RNA-seq experiments","volume":"25","author":"Ebrahimi","year":"2022","journal-title":"iScience"},{"key":"2024030523335457800_btae102-B7","doi-asserted-by":"crossref","first-page":"R6","DOI":"10.1186\/gb-2011-12-1-r6","article-title":"Identification of fusion genes in breast cancer by paired-end RNA-sequencing","volume":"12","author":"Edgren","year":"2011","journal-title":"Genome Biol"},{"key":"2024030523335457800_btae102-B8","doi-asserted-by":"crossref","first-page":"2667","DOI":"10.1038\/s41467-018-05083-x","article-title":"Detection and removal of barcode swapping in single-cell RNA-seq data","volume":"9","author":"Griffiths","year":"2018","journal-title":"Nat Commun"},{"key":"2024030523335457800_btae102-B9","author":"Jabbari"},{"key":"2024030523335457800_btae102-B10","doi-asserted-by":"crossref","first-page":"4025","DOI":"10.1038\/s41467-020-17800-6","article-title":"High throughput error corrected nanopore single cell transcriptome sequencing","volume":"11","author":"Lebrigand","year":"2020","journal-title":"Nat Commun"},{"key":"2024030523335457800_btae102-B11","doi-asserted-by":"crossref","first-page":"766","DOI":"10.1038\/s41467-019-08595-2","article-title":"Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer","volume":"10","author":"Merino","year":"2019","journal-title":"Nat Commun"},{"key":"2024030523335457800_btae102-B12","doi-asserted-by":"crossref","first-page":"e99439","DOI":"10.1371\/journal.pone.0099439","article-title":"The \u2018grep\u2019 command but not FusionMap, FusionFinder or ChimeraScan captures the CIC-DUX4 fusion gene from whole transcriptome sequencing data on a small round cell tumor with t(4;19)(q35;q13)","volume":"9","author":"Panagopoulos","year":"2014","journal-title":"PLoS One"},{"key":"2024030523335457800_btae102-B13","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.1038\/s41587-021-00965-w","article-title":"Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq","volume":"39","author":"Philpott","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2024030523335457800_btae102-B14","author":"Putri","year":"2023"},{"key":"2024030523335457800_btae102-B15","doi-asserted-by":"crossref","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"2024030523335457800_btae102-B16","doi-asserted-by":"crossref","first-page":"e142","DOI":"10.1093\/nar\/gkq368","article-title":"Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples","volume":"38","author":"Smith","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2024030523335457800_btae102-B17","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1093\/bioinformatics\/btw753","article-title":"Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance","volume":"33","author":"\u0160o\u0161i\u0107","year":"2017","journal-title":"Bioinformatics"},{"key":"2024030523335457800_btae102-B18","author":"Sullivan","year":"2023"},{"key":"2024030523335457800_btae102-B19","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1186\/s13059-021-02525-6","article-title":"Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing","volume":"22","author":"Tian","year":"2021","journal-title":"Genome Biol"},{"key":"2024030523335457800_btae102-B20","doi-asserted-by":"crossref","first-page":"631","DOI":"10.12688\/f1000research.11547.1","article-title":"Investigation of chimeric reads using the MinION","volume":"6","author":"White","year":"2017","journal-title":"F1000Res"},{"key":"2024030523335457800_btae102-B21","doi-asserted-by":"crossref","first-page":"141","DOI":"10.12688\/wellcomeopenres.16791.1","article-title":"Ultraplex: a rapid, flexible, all-in-one fastq demultiplexer","volume":"6","author":"Wilkins","year":"2021","journal-title":"Wellcome Open Res"},{"key":"2024030523335457800_btae102-B22","author":"Wu"},{"key":"2024030523335457800_btae102-B23","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1186\/s13059-023-02907-y","article-title":"Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE","volume":"24","author":"You","year":"2023","journal-title":"Genome Biol"},{"key":"2024030523335457800_btae102-B24","doi-asserted-by":"crossref","first-page":"giaa151","DOI":"10.1093\/gigascience\/giaa151","article-title":"SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data","volume":"9","author":"Young","year":"2020","journal-title":"Gigascience"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae102\/56724783\/btae102.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae102\/56850933\/btae102.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/3\/btae102\/56850933\/btae102.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T18:34:17Z","timestamp":1709663657000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae102\/7611801"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2024,2,21]]},"references-count":24,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae102","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.08.21.554084","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,3,1]]},"published":{"date-parts":[[2024,2,21]]},"article-number":"btae102"}}