{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T00:47:24Z","timestamp":1779151644326,"version":"3.51.4"},"reference-count":19,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,4,2]],"date-time":"2021-04-02T00:00:00Z","timestamp":1617321600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2021,4,2]],"date-time":"2021-04-02T00:00:00Z","timestamp":1617321600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SFB 876 \/ C1"],"award-info":[{"award-number":["SFB 876 \/ C1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100013325","name":"Mercator Research Center Ruhr","doi-asserted-by":"publisher","award":["Pe-2013-0012"],"award-info":[{"award-number":["Pe-2013-0012"]}],"id":[{"id":"10.13039\/501100013325","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008349","name":"Universit\u00e4t Duisburg-Essen","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008349","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species\u2019 (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper\/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability<\/jats:title>\n                    <jats:p>\n                      Our software\n                      <jats:italic>xengsort<\/jats:italic>\n                      is available under the MIT license at\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/gitlab.com\/genomeinformatics\/xengsort\">http:\/\/gitlab.com\/genomeinformatics\/xengsort<\/jats:ext-link>\n                      . It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.\n                    <\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s13015-021-00181-w","type":"journal-article","created":{"date-parts":[[2021,4,2]],"date-time":"2021-04-02T06:03:05Z","timestamp":1617343385000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["Fast lightweight accurate xenograft sorting"],"prefix":"10.1186","volume":"16","author":[{"given":"Jens","family":"Zentgraf","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8536-6065","authenticated-orcid":false,"given":"Sven","family":"Rahmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,4,2]]},"reference":[{"issue":"1","key":"181_CR1","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1186\/s13059-019-1849-2","volume":"20","author":"SY Jo","year":"2019","unstructured":"Jo SY, Kim E, Kim S. Impact of mouse contamination in genomic profiling of patient-derived models and best practice for robust analysis. Genome Biol. 2019;20(1):231.","journal-title":"Genome Biol"},{"issue":"1","key":"181_CR2","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1186\/s12859-018-2353-5","volume":"19","author":"RJC Kluin","year":"2018","unstructured":"Kluin RJC, Kemper K, Kuilman T, de Ruiter JR, Iyer V, Forment JV, Cornelissen-Steijger P, de Rink I, Ter Brugge P, Song JY, Klarenbeek S, McDermott U, Jonkers J, Velds A, Adams DJ, Peeper DS, Krijgsman O. XenofilteR: computational deconvolution of mouse and human reads in tumor xenograft sequence data. BMC Bioinform. 2018;19(1):366.","journal-title":"BMC Bioinform"},{"key":"181_CR3","unstructured":"Giner G. XenoSplit. Unpublished; 2019. source code available at https:\/\/github.com\/goknurginer\/XenoSplit."},{"issue":"8","key":"181_CR4","doi-asserted-by":"publisher","first-page":"1012","DOI":"10.1158\/1541-7786.MCR-16-0431","volume":"15","author":"G Khandelwal","year":"2017","unstructured":"Khandelwal G, Girotti MR, Smowton C, Taylor S, Wirth C, Dynowski M, Frese KK, Brady G, Dive C, Marais R, Miller C. Next-generation sequencing analysis and algorithms for PDX and CDX models. Mol Cancer Res. 2017;15(8):1012\u20136.","journal-title":"Mol Cancer Res"},{"key":"181_CR5","doi-asserted-by":"publisher","first-page":"2741","DOI":"10.12688\/f1000research.10082.1","volume":"5","author":"MJ Ahdesm\u00e4ki","year":"2016","unstructured":"Ahdesm\u00e4ki MJ, Gray SR, Johnson JH, Lai Z. Disambiguate: an open-source application for disambiguating two species in next generation sequencing data from grafted samples. F1000Res. 2016;5:2741.","journal-title":"F1000Res"},{"key":"181_CR6","unstructured":"Bushnell B. BBsplit, Joint Genome Institute, Walnut Creek, CA. Part of BBTools; 2014\u20132020. https:\/\/jgi.doe.gov\/data-and-tools\/bbtools\/."},{"issue":"12","key":"181_CR7","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1093\/bioinformatics\/bts236","volume":"28","author":"T Conway","year":"2012","unstructured":"Conway T, Wazny J, Bromage A, Tymms M, Sooraj D, Williams ED, Beresford-Smith B. Xenome\u2014a tool for classifying reads from xenograft samples. Bioinformatics. 2012;28(12):172\u20138.","journal-title":"Bioinformatics"},{"issue":"1","key":"181_CR8","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1186\/s12864-017-4414-y","volume":"19","author":"M Callari","year":"2018","unstructured":"Callari M, Batra AS, Batra RN, Sammut SJ, Greenwood W, Clifford H, Hercus C, Chin SF, Bruna A, Rueda OM, Caldas C. Computational approach to discriminate human and mouse sequences in patient-derived tumour xenografts. BMC Genomics. 2018;19(1):19.","journal-title":"BMC Genomics"},{"issue":"7","key":"181_CR9","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1016\/j.jgg.2018.07.001","volume":"45","author":"W Dai","year":"2018","unstructured":"Dai W, Liu J, Li Q, Liu W, Li YX, Li YY. A comparison of next-generation sequencing analysis methods for cancer xenograft samples. J Genet Genomics. 2018;45(7):345\u201350.","journal-title":"J Genet Genomics"},{"key":"181_CR10","doi-asserted-by":"publisher","unstructured":"Walzer S. Load thresholds for Cuckoo hashing with overlapping blocks. In: Chatzigiannakis I, Kaklamanis C, Marx D, Sannella D, editors. 45th international colloquium on automata, languages, and programming, ICALP 2018. LIPIcs; 2018. vol. 107, p. 102\u2013110210. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Wadern, Germany. https:\/\/doi.org\/10.4230\/LIPIcs.ICALP.2018.102","DOI":"10.4230\/LIPIcs.ICALP.2018.102"},{"key":"181_CR11","doi-asserted-by":"publisher","unstructured":"Zentgraf J, Timm H, Rahmann S. Cost-optimal assignment of elements in genome-scale multi-way bucketed Cuckoo hash tables. In: Proceedings of the symposium on algorithm engineering and experiments (ALENEX) 2020, 2020, p. 186\u201398. SIAM, Philadelphia, PA, USA. https:\/\/doi.org\/10.1137\/1.9781611976007.15","DOI":"10.1137\/1.9781611976007.15"},{"key":"181_CR12","unstructured":"Espinosa A. Cuckoo breeding ground\u2014a better cuckoo hash table; 2018. https:\/\/cbg.netlify.app\/publication\/research_cuckoo_cbg\/."},{"key":"181_CR13","unstructured":"Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34(5): 525\u20137. Erratum in Nat. Biotechnol. 2016;34(8):888."},{"issue":"4","key":"181_CR14","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1101\/gr.229202","volume":"12","author":"WJ Kent","year":"2002","unstructured":"Kent WJ. BLAT\u2014the BLAST-like alignment tool. Genome Res. 2002;12(4):656\u201364.","journal-title":"Genome Res"},{"issue":"1","key":"181_CR15","doi-asserted-by":"publisher","first-page":"10","DOI":"10.14806\/ej.17.1.200","volume":"17","author":"M Martin","year":"2011","unstructured":"Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17(1):10\u20132. https:\/\/doi.org\/10.14806\/ej.17.1.200.","journal-title":"EMBnet J"},{"key":"181_CR16","unstructured":"Andrews S. FastQC: a quality control tool for high throughput sequence data. Babraham Bioinformatics, Inc; 2010. http:\/\/www.bioinformatics.babraham.ac.uk\/projects\/fastqc\/."},{"key":"181_CR17","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1186\/1471-2105-10-421","volume":"10","author":"C Camacho","year":"2009","unstructured":"Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinform. 2009;10:421.","journal-title":"BMC Bioinform"},{"key":"181_CR18","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/j.isci.2019.07.032","volume":"18","author":"D.S Standage","year":"2019","unstructured":"Standage D.S, Brown C.T, Hormozdiari F. Kevlar: a mapping-free framework for accurate discovery of de novo variants. iScience. 2019;18:28\u201336.","journal-title":"iScience"},{"key":"181_CR19","doi-asserted-by":"publisher","unstructured":"Lam SK, Pitrou A, Seibert S. Numba: a LLVM-based python JIT compiler. In: Finkel H, editor. Proceedings of the second workshop on the LLVM compiler infrastructure in HPC, LLVM 2015; 2015, p. 7\u2013176. New York: ACM. https:\/\/doi.org\/10.1145\/2833157.2833162.","DOI":"10.1145\/2833157.2833162"}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-021-00181-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13015-021-00181-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-021-00181-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,2]],"date-time":"2021-04-02T06:03:23Z","timestamp":1617343403000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/s13015-021-00181-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,2]]},"references-count":19,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["181"],"URL":"https:\/\/doi.org\/10.1186\/s13015-021-00181-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.05.14.095604","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-204922\/v1","asserted-by":"object"}]},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,2]]},"assertion":[{"value":"2 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"2"}}