{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T02:07:17Z","timestamp":1773194837493,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T00:00:00Z","timestamp":1640304000000},"content-version":"vor","delay-in-days":23,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100001513","name":"breast cancer alliance","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100001513","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>cDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-021-04529-2","type":"journal-article","created":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T13:02:43Z","timestamp":1640350963000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8671-3110","authenticated-orcid":false,"given":"Meifang","family":"Qi","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9024-6084","authenticated-orcid":false,"given":"Utthara","family":"Nayar","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2916-2164","authenticated-orcid":false,"given":"Leif S.","family":"Ludwig","sequence":"additional","affiliation":[]},{"given":"Nikhil","family":"Wagle","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5566-6729","authenticated-orcid":false,"given":"Esther","family":"Rheinbay","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,12,24]]},"reference":[{"key":"4529_CR1","doi-asserted-by":"publisher","first-page":"E20","DOI":"10.1038\/s41586-020-2522-3","volume":"584","author":"J Kim","year":"2020","unstructured":"Kim J, Zhao B, Huang AY, Miller MB, Lodato MA, Walsh CA, et al. APP gene copy number changes reflect exogenous contamination. Nature. 2020;584:E20\u20138.","journal-title":"Nature"},{"key":"4529_CR2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-018-0718-6","author":"M-H Lee","year":"2018","unstructured":"Lee M-H, Siddoway B, Kaeser GE, Segota I, Rivera R, Romanow WJ, et al. Somatic APP gene recombination in Alzheimer\u2019s disease and normal neurons. Nature. 2018. https:\/\/doi.org\/10.1038\/s41586-018-0718-6.","journal-title":"Nature"},{"key":"4529_CR3","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1038\/nm.3824","volume":"21","author":"JS Lim","year":"2015","unstructured":"Lim JS, Kim W-I, Kang H-C, Kim SH, Park AH, Park EK, et al. Brain somatic mutations in MTOR cause focal cortical dysplasia type II leading to intractable epilepsy. Nat Med. 2015;21:395\u2013400.","journal-title":"Nat Med"},{"key":"4529_CR4","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw383","author":"J Kim","year":"2016","unstructured":"Kim J, Maeng JH, Lim JS, Son H, Lee J, Lee JH, et al. Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination. Bioinformatics. 2016. https:\/\/doi.org\/10.1093\/bioinformatics\/btw383.","journal-title":"Bioinformatics"},{"key":"4529_CR5","doi-asserted-by":"publisher","DOI":"10.1126\/science.aav1898","author":"MR Corces","year":"2018","unstructured":"Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018. https:\/\/doi.org\/10.1126\/science.aav1898.","journal-title":"Science"},{"key":"4529_CR6","unstructured":"Sequence Cleaner [Internet]. [cited 2021 Jul 13]. https:\/\/sourceforge.net\/projects\/seqclean\/"},{"key":"4529_CR7","unstructured":"VecScreen: Screen for Vector Contamination. [cited 2021 Jul 13]. https:\/\/www.ncbi.nlm.nih.gov\/tools\/vecscreen\/"},{"key":"4529_CR8","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1038\/74184","volume":"24","author":"C Esnault","year":"2000","unstructured":"Esnault C, Maestre J, Heidmann T. Human LINE retrotransposons generate processed pseudogenes. Nat Genet. 2000;24:363\u20137.","journal-title":"Nat Genet"},{"key":"4529_CR9","doi-asserted-by":"publisher","first-page":"1429","DOI":"10.1128\/MCB.21.4.1429-1439.2001","volume":"21","author":"W Wei","year":"2001","unstructured":"Wei W, Gilbert N, Ooi SL, Lawler JF, Ostertag EM, Kazazian HH, et al. Human L1 retrotransposition: cis preference versus trans complementation. Mol Cell Biol. 2001;21:1429\u201339.","journal-title":"Mol Cell Biol"},{"key":"4529_CR10","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1038\/nrg2487","volume":"10","author":"H Kaessmann","year":"2009","unstructured":"Kaessmann H, Vinckenbosch N, Long M. RNA-based gene duplication: mechanistic and evolutionary insights. Nat Rev Genet. 2009;10:19\u201331.","journal-title":"Nat Rev Genet"},{"key":"4529_CR11","unstructured":"The UniVec Database [Internet]. [cited 2021 Jul 13]. https:\/\/www.ncbi.nlm.nih.gov\/tools\/vecscreen\/univec\/"},{"key":"4529_CR12","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1186\/s13100-015-0041-9","volume":"6","author":"W Bao","year":"2015","unstructured":"Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11.","journal-title":"Mob DNA"},{"key":"4529_CR13","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1186\/1471-2105-11-38","volume":"11","author":"J Falgueras","year":"2010","unstructured":"Falgueras J, Lara AJ, Fern\u00e1ndez-Pozo N, Cant\u00f3n FR, P\u00e9rez-Trabado G, Claros MG. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinform. 2010;11:38.","journal-title":"BMC Bioinform"},{"key":"4529_CR14","doi-asserted-by":"publisher","first-page":"e17288","DOI":"10.1371\/journal.pone.0017288","volume":"6","author":"R Schmieder","year":"2011","unstructured":"Schmieder R, Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE. 2011;6:e17288.","journal-title":"PLoS ONE"},{"key":"4529_CR15","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp324","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009. https:\/\/doi.org\/10.1093\/bioinformatics\/btp324.","journal-title":"Bioinformatics"},{"key":"4529_CR16","doi-asserted-by":"publisher","first-page":"966","DOI":"10.1126\/science.1213506","volume":"335","author":"JH Lee","year":"2012","unstructured":"Lee JH, Silhavy JL, Lee JE, Al-Gazali L, Thomas S, Davis EE, et al. Evolutionarily assembled cis-regulatory module at a human ciliopathy locus. Science. 2012;335:966\u20139.","journal-title":"Science"},{"key":"4529_CR17","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1016\/j.stem.2015.09.017","volume":"17","author":"C Mazumdar","year":"2015","unstructured":"Mazumdar C, Shen Y, Xavy S, Zhao F, Reinisch A, Li R, et al. Leukemia-associated cohesin mutants dominantly enforce stem cell programs and impair human hematopoietic progenitor differentiation. Cell Stem Cell. 2015;17:675\u201388.","journal-title":"Cell Stem Cell"},{"key":"4529_CR18","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1016\/j.molcel.2017.05.022","volume":"67","author":"YG Chen","year":"2017","unstructured":"Chen YG, Kim MV, Chen X, Batista PJ, Aoyama S, Wilusz JE, et al. Sensing self and foreign circular RNAs by intron identity. Mol Cell. 2017;67:228-238.e5.","journal-title":"Mol Cell"},{"key":"4529_CR19","doi-asserted-by":"publisher","first-page":"995","DOI":"10.15252\/embj.201695534","volume":"36","author":"C-W Pan","year":"2017","unstructured":"Pan C-W, Jin X, Zhao Y, Pan Y, Yang J, Karnes RJ, et al. AKT-phosphorylated FOXO1 suppresses ERK activation and chemoresistance by disrupting IQGAP1-MAPK interaction. EMBO J. 2017;36:995\u20131010.","journal-title":"EMBO J"},{"key":"4529_CR20","doi-asserted-by":"publisher","first-page":"6524","DOI":"10.1158\/0008-5472.CAN-17-0686","volume":"77","author":"Y Yang","year":"2017","unstructured":"Yang Y, Blee AM, Wang D, An J, Pan Y, Yan Y, et al. Loss of FOXO1 cooperates with TMPRSS2\u2013ERG overexpression to promote prostate tumorigenesis and cell invasion. Cancer Res. 2017;77:6524\u201337.","journal-title":"Cancer Res"},{"key":"4529_CR21","doi-asserted-by":"publisher","DOI":"10.1186\/s12943-019-1096-x","author":"Q Shi","year":"2019","unstructured":"Shi Q, Zhu Y, Ma J, Chang K, Ding D, Bai Y, et al. Prostate cancer-associated SPOP mutations enhance cancer cell survival and docetaxel resistance by upregulating Caprin1-dependent stress granule assembly. Mol Cancer. 2019. https:\/\/doi.org\/10.1186\/s12943-019-1096-x.","journal-title":"Mol Cancer"},{"key":"4529_CR22","doi-asserted-by":"publisher","first-page":"361","DOI":"10.15252\/embj.201592426","volume":"36","author":"SN Huang","year":"2017","unstructured":"Huang SN, Williams JS, Arana ME, Kunkel TA, Pommier Y. Topoisomerase I-mediated cleavage at unrepaired ribonucleotides generates DNA double-strand breaks. EMBO J EMBO. 2017;36:361\u201373.","journal-title":"EMBO J EMBO"},{"key":"4529_CR23","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1038\/s41586-018-0519-y","volume":"562","author":"M Seehawer","year":"2018","unstructured":"Seehawer M, Heinzmann F, D\u2019Artista L, Harbig J, Roux P-F, Hoenicke L, et al. Necroptosis microenvironment directs lineage commitment in liver cancer. Nature. 2018;562:69\u201375.","journal-title":"Nature"},{"key":"4529_CR24","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1038\/s41586-019-1186-3","volume":"569","author":"M Ghandi","year":"2019","unstructured":"Ghandi M, Huang FW, Jan\u00e9-Valbuena J, Kryukov GV, Lo CC, McDonald ER 3rd, et al. Next-generation characterization of the cancer cell line encyclopedia. Nature. 2019;569:503\u20138.","journal-title":"Nature"},{"key":"4529_CR25","doi-asserted-by":"publisher","unstructured":"Wilson DJ. The harmonic mean p-value for combining dependent tests. https:\/\/doi.org\/10.1101\/171751","DOI":"10.1101\/171751"},{"key":"4529_CR26","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289\u2013300.","journal-title":"J R Stat Soc"},{"key":"4529_CR27","doi-asserted-by":"publisher","first-page":"1351","DOI":"10.1093\/gbe\/evx081","volume":"9","author":"C Casola","year":"2017","unstructured":"Casola C, Betr\u00e1n E. The genomic impact of gene retrocopies: what have we learned from comparative genomics, population genomics, and transcriptomic analyses? Genome Biol Evol. 2017;9:1351\u201373.","journal-title":"Genome Biol Evol"},{"key":"4529_CR28","doi-asserted-by":"publisher","first-page":"D221","DOI":"10.1093\/nar\/gkx1031","volume":"46","author":"S Pujar","year":"2018","unstructured":"Pujar S, O\u2019Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, et al. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucl Acids Res. 2018;46:D221\u20138.","journal-title":"Nucl Acids Res"},{"key":"4529_CR29","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1038\/nbt.1754","volume":"29","author":"JT Robinson","year":"2011","unstructured":"Robinson JT, Thorvaldsd\u00f3ttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24\u20136.","journal-title":"Nat Biotechnol"},{"key":"4529_CR30","doi-asserted-by":"publisher","first-page":"R137","DOI":"10.1186\/gb-2008-9-9-r137","volume":"9","author":"Y Zhang","year":"2008","unstructured":"Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:R137.","journal-title":"Genome Biol"},{"key":"4529_CR31","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","volume":"26","author":"AR Quinlan","year":"2010","unstructured":"Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841\u20132.","journal-title":"Bioinformatics"},{"key":"4529_CR32","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1038\/s41587-019-0074-6","volume":"37","author":"JM Zook","year":"2019","unstructured":"Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol. 2019;37:561\u20136.","journal-title":"Nat Biotechnol"},{"key":"4529_CR33","doi-asserted-by":"publisher","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","volume":"27","author":"H Li","year":"2011","unstructured":"Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987\u201393.","journal-title":"Bioinformatics"},{"key":"4529_CR34","doi-asserted-by":"publisher","first-page":"D794","DOI":"10.1093\/nar\/gkx1081","volume":"46","author":"CA Davis","year":"2018","unstructured":"Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, et al. The encyclopedia of DNA elements (ENCODE): data portal update. Nucl Acids Res. 2018;46:D794-801.","journal-title":"Nucl Acids Res"},{"key":"4529_CR35","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment\/map format and SAMtools. Bioinformatics. 2009;25:2078\u20139.","journal-title":"Bioinformatics"},{"key":"4529_CR36","doi-asserted-by":"publisher","first-page":"38551","DOI":"10.18632\/oncotarget.9535","volume":"7","author":"J Zhao","year":"2016","unstructured":"Zhao J, Zhao Y, Wang L, Zhang J, Karnes RJ, Kohli M, et al. Alterations of androgen receptor-regulated enhancer RNAs (eRNAs) contribute to enzalutamide resistance in castration-resistant prostate cancer. Oncotarget. 2016;7:38551\u201365.","journal-title":"Oncotarget"},{"key":"4529_CR37","doi-asserted-by":"publisher","first-page":"599","DOI":"10.1016\/j.celrep.2016.03.038","volume":"15","author":"Y Zhao","year":"2016","unstructured":"Zhao Y, Wang L, Ren S, Wang L, Blackburn PR, McNulty MS, et al. Activation of P-TEFb by androgen receptor-regulated enhancer RNAs in castration-resistant prostate cancer. Cell Rep. 2016;15:599\u2013610.","journal-title":"Cell Rep"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04529-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04529-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04529-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T01:21:17Z","timestamp":1726363277000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04529-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["4529"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04529-2","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.08.11.455962","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"13 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"611"}}