{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T22:23:43Z","timestamp":1773181423045,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T00:00:00Z","timestamp":1595462400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T00:00:00Z","timestamp":1595462400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61901238"],"award-info":[{"award-number":["61901238"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61363018"],"award-info":[{"award-number":["61363018"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Science and Technique Research Foundation of Ningxia Institutions of Higher Education","award":["NGY2018-54"],"award-info":[{"award-number":["NGY2018-54"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-020-03665-5","type":"journal-article","created":{"date-parts":[[2020,7,23]],"date-time":"2020-07-23T12:03:26Z","timestamp":1595505806000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6526-6991","authenticated-orcid":false,"given":"Zhenhua","family":"Yu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fang","family":"Du","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rongjun","family":"Ban","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanwei","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,7,23]]},"reference":[{"key":"3665_CR1","first-page":"251364","volume":"2012","author":"L Liu","year":"2012","unstructured":"Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. Comparison of next-generation sequencing systems. J Biomed Biotechnol. 2012;2012:251364..","journal-title":"J Biomed Biotechnol"},{"issue":"1","key":"3665_CR2","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1093\/bib\/bbv029","volume":"17","author":"D Laehnemann","year":"2015","unstructured":"Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data\u2014high-throughput sequencing errors and their correction. Brief Bioinform. 2015;17(1):154\u201379.","journal-title":"Brief Bioinform"},{"issue":"1","key":"3665_CR3","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nrg3655","volume":"15","author":"K Robasky","year":"2014","unstructured":"Robasky K, Lewis NE, Church GM. The role of replicates for error mitigation in next-generation sequencing. Nat Rev Genet. 2014;15(1):56.","journal-title":"Nat Rev Genet"},{"issue":"3","key":"3665_CR4","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1101\/gr.8.3.175","volume":"8","author":"B Ewing","year":"1998","unstructured":"Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. Genome Res. 1998;8(3):175\u201385.","journal-title":"Genome Res"},{"issue":"1","key":"3665_CR5","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/s12859-016-0976-y","volume":"17","author":"M Schirmer","year":"2016","unstructured":"Schirmer M, D\u2019Amore R, Ijaz UZ, Hall N, Quince C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics. 2016;17(1):125.","journal-title":"BMC Bioinformatics"},{"issue":"6","key":"3665_CR6","doi-asserted-by":"crossref","first-page":"e37","DOI":"10.1093\/nar\/gku1341","volume":"43","author":"M Schirmer","year":"2015","unstructured":"Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res. 2015;43(6):e37.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"3665_CR7","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","volume":"28","author":"W Huang","year":"2011","unstructured":"Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2011;28(4):593\u20134.","journal-title":"Bioinformatics"},{"issue":"12","key":"3665_CR8","doi-asserted-by":"crossref","first-page":"e94","DOI":"10.1093\/nar\/gks251","volume":"40","author":"FE Angly","year":"2012","unstructured":"Angly FE, Willner D, Rohwer F, Hugenholtz P, Tyson GW. Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Res. 2012;40(12):e94.","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"3665_CR9","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1093\/bioinformatics\/bts187","volume":"28","author":"X Hu","year":"2012","unstructured":"Hu X, Yuan J, Shi Y, Lu J, Liu B, Li Z, Chen Y, Mu D, Zhang H, Li N. pIRS: profile-based Illumina pair-end reads simulator. Bioinformatics. 2012;28(11):1533\u20135.","journal-title":"Bioinformatics"},{"issue":"1","key":"3665_CR10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-13-74","volume":"13","author":"KE McElroy","year":"2012","unstructured":"McElroy KE, Luciani F, Thomas T. GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics. 2012;13(1):1.","journal-title":"BMC Genomics"},{"issue":"8","key":"3665_CR11","doi-asserted-by":"crossref","first-page":"1076","DOI":"10.1093\/bioinformatics\/btt074","volume":"29","author":"S Kim","year":"2013","unstructured":"Kim S, Jeong K, Bafna V. Wessim: a whole-exome sequencing simulator based on in silico exome capture. Bioinformatics. 2013;29(8):1076\u20137.","journal-title":"Bioinformatics"},{"issue":"10","key":"3665_CR12","doi-asserted-by":"crossref","first-page":"e75448","DOI":"10.1371\/journal.pone.0075448","volume":"8","author":"B Jia","year":"2013","unstructured":"Jia B, Xuan L, Cai K, Hu Z, Ma L, Wei C. NeSSM: a next-generation sequencing simulator for Metagenomics. PLoS One. 2013;8(10):e75448.","journal-title":"PLoS One"},{"issue":"Suppl 9","key":"3665_CR13","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1471-2105-15-S9-S14","volume":"15","author":"S Johnson","year":"2014","unstructured":"Johnson S, Trost B, Long JR, Pittet V, Kusalik A. A better sequence-read simulator program for metagenomics. BMC Bioinformatics. 2014;15(Suppl 9):S14.","journal-title":"BMC Bioinformatics"},{"key":"3665_CR14","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1186\/1756-0500-7-533","volume":"7","author":"A Shcherbina","year":"2014","unstructured":"Shcherbina A. FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets. BMC Res Notes. 2014;7:533.","journal-title":"BMC Res Notes"},{"key":"3665_CR15","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/1471-2105-15-40","volume":"15","author":"S Pattnaik","year":"2014","unstructured":"Pattnaik S, Gupta S, Rao AA, Panda B. SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data. BMC Bioinformatics. 2014;15:40.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"3665_CR16","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1186\/s12859-015-0502-7","volume":"16","author":"M Qin","year":"2015","unstructured":"Qin M, Liu B, Conroy JM, Morrison CD, Hu Q, Cheng Y, Murakami M, Odunsi AO, Johnson CS, Wei L. SCNVSim: somatic copy number variation and structure variation simulator. BMC Bioinformatics. 2015;16(1):66.","journal-title":"BMC Bioinformatics"},{"issue":"11","key":"3665_CR17","doi-asserted-by":"crossref","first-page":"e0167047","DOI":"10.1371\/journal.pone.0167047","volume":"11","author":"ZD Stephens","year":"2016","unstructured":"Stephens ZD, Hudson ME, Mainzer LS, Taschuk M, Weber MR, Iyer RK. Simulating next-generation sequencing datasets from empirical mutation and sequencing models. PLoS One. 2016;11(11):e0167047.","journal-title":"PLoS One"},{"issue":"2","key":"3665_CR18","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1109\/TBME.2016.2560939","volume":"64","author":"X Yuan","year":"2017","unstructured":"Yuan X, Zhang J, Yang L. IntSIM: an integrated simulator of next-generation sequencing data. IEEE Trans Biomed Eng. 2017;64(2):441\u201351.","journal-title":"IEEE Trans Biomed Eng"},{"issue":"3","key":"3665_CR19","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1186\/s12859-017-1464-8","volume":"18","author":"Y Xia","year":"2017","unstructured":"Xia Y, Liu Y, Deng M, Xi R. Pysim-sv: a package for simulating structural variation data with GC-biases. BMC Bioinformatics. 2017;18(3):53.","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"3665_CR20","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1093\/bioinformatics\/bty630","volume":"35","author":"H Gourl\u00e9","year":"2019","unstructured":"Gourl\u00e9 H, Karlsson-Lindsj\u00f6 O, Hayer J, Bongcam-Rudloff E. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics. 2019;35(3):521\u20132.","journal-title":"Bioinformatics"},{"key":"3665_CR21","doi-asserted-by":"publisher","unstructured":"Silverman BW. Density Estimation for Statistics and Data Analysis. New York: Routledge; 1998, https:\/\/doi.org\/10.1201\/9781315140919.","DOI":"10.1201\/9781315140919"},{"key":"3665_CR22","doi-asserted-by":"crossref","unstructured":"Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011;39(13):e90\u20130.","DOI":"10.1093\/nar\/gkr344"},{"issue":"1","key":"3665_CR23","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1186\/s12859-018-2223-1","volume":"19","author":"M Hadigol","year":"2018","unstructured":"Hadigol M, Khiabanian H. MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications. BMC Bioinformatics. 2018;19(1):219.","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"3665_CR24","doi-asserted-by":"crossref","first-page":"1995","DOI":"10.1101\/gr.137570.112","volume":"22","author":"G Ha","year":"2012","unstructured":"Ha G, Roth A, Lai D, Bashashati A, Ding J, Goya R, Giuliany R, Rosner J, Oloumi A, Shumansky K, et al. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Res. 2012;22(10):1995\u20132007.","journal-title":"Genome Res"},{"issue":"10","key":"3665_CR25","doi-asserted-by":"crossref","first-page":"e72","DOI":"10.1093\/nar\/gks001","volume":"40","author":"Y Benjamini","year":"2012","unstructured":"Benjamini Y, Speed TP. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 2012;40(10):e72.","journal-title":"Nucleic Acids Res"},{"key":"3665_CR26","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1093\/bioinformatics\/bty630","volume":"35","author":"H Gourle","year":"2018","unstructured":"Gourle H, Karlsson-Lindsjo O, Hayer J, Bongcam-Rudloff E. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics. 2018;35:521\u20132.","journal-title":"Bioinformatics"},{"issue":"9","key":"3665_CR27","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","volume":"20","author":"A McKenna","year":"2010","unstructured":"McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The genome analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297\u2013303.","journal-title":"Genome Res"},{"issue":"16","key":"3665_CR28","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment\/map format and SAMtools. Bioinformatics. 2009;25(16):2078\u20139.","journal-title":"Bioinformatics"},{"issue":"14","key":"3665_CR29","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25(14):1754\u201360.","journal-title":"Bioinformatics"},{"issue":"1","key":"3665_CR30","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1109\/18.61115","volume":"37","author":"J Lin","year":"1991","unstructured":"Lin J. Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory. 1991;37(1):145\u201351.","journal-title":"IEEE Trans Inf Theory"},{"issue":"1","key":"3665_CR31","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1093\/bib\/bbs015","volume":"14","author":"X Yang","year":"2013","unstructured":"Yang X, Chockalingam SP, Aluru S. A survey of error-correction methods for next-generation sequencing. Brief Bioinform. 2013;14(1):56\u201366.","journal-title":"Brief Bioinform"},{"issue":"18","key":"3665_CR32","doi-asserted-by":"crossref","first-page":"2576","DOI":"10.1093\/bioinformatics\/btu346","volume":"30","author":"Z Yu","year":"2014","unstructured":"Yu Z, Liu Y, Shen Y, Wang M, Li A. CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data. Bioinformatics. 2014;30(18):2576\u201383.","journal-title":"Bioinformatics"},{"issue":"4","key":"3665_CR33","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","volume":"9","author":"B Langmead","year":"2012","unstructured":"Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357\u20139.","journal-title":"Nat Methods"},{"issue":"3","key":"3665_CR34","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1093\/bioinformatics\/btr670","volume":"28","author":"V Boeva","year":"2012","unstructured":"Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012;28(3):423\u20135.","journal-title":"Bioinformatics"},{"issue":"1","key":"3665_CR35","first-page":"15","volume":"10","author":"Z Yu","year":"2017","unstructured":"Yu Z, Li A, Wang M. CLImAT-HET: detecting subclonal copy number alterations and loss of heterozygosity in heterogeneous tumor samples from whole-genome sequencing data. BMC Med Genet. 2017;10(1):15.","journal-title":"BMC Med Genet"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03665-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-03665-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03665-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,22]],"date-time":"2021-07-22T23:09:59Z","timestamp":1626995399000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03665-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,23]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3665"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03665-5","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,23]]},"assertion":[{"value":"8 December 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 July 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 July 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"331"}}