{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,4]],"date-time":"2024-08-04T20:33:54Z","timestamp":1722803634645},"reference-count":58,"publisher":"Walter de Gruyter GmbH","issue":"3","license":[{"start":{"date-parts":[[2017,9,23]],"date-time":"2017-09-23T00:00:00Z","timestamp":1506124800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,9,23]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Detecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two <jats:italic>D. melanogaster<\/jats:italic> eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.<\/jats:p>","DOI":"10.1515\/jib-2017-0025","type":"journal-article","created":{"date-parts":[[2017,9,23]],"date-time":"2017-09-23T10:01:08Z","timestamp":1506160868000},"source":"Crossref","is-referenced-by-count":2,"title":["A Novel Method to Detect Bias in Short Read NGS Data"],"prefix":"10.1515","volume":"14","author":[{"given":"Jamie","family":"Alnasir","sequence":"first","affiliation":[{"name":"Department of Computer Science, Royal Holloway, University of London, London TW20 0EX, England, UK"}]},{"given":"Hugh P.","family":"Shanahan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Royal Holloway, University of London, London TW20 0EX, England, UK"}]}],"member":"374","reference":[{"key":"ref441","first-page":"95","article-title":"Spark: cluster computing with working sets","volume":"10","year":"2010","journal-title":"HotCloud"},{"key":"ref181","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1038\/nmeth.1226","article-title":"Mapping and quantifying mammalian transcriptomes by RNA-seq","volume":"5","year":"2008","journal-title":"Nat Methods"},{"key":"ref111","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1186\/1471-2105-12-480","article-title":"GC-content normalization for RNA-Seq data","volume":"12","year":"2011","journal-title":"BMC Bioinform"},{"key":"ref141","first-page":"259","article-title":"Probes containing runs of guanines provide insights into the biophysics and bioinformatics of Affymetrix GeneChips","volume":"10","year":"2009","journal-title":"Brief Bioinform"},{"key":"ref371","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","year":"2016","journal-title":"Genome Biol"},{"key":"ref281","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1109\/TCBB.2014.2366103","article-title":"ResSeq: enhancing short-read sequencing alignment by rescuing error-containing reads","volume":"12","year":"2015","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"ref541","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","article-title":"TopHat: discovering splice junctions with RNA-Seq","volume":"25","year":"2009","journal-title":"Bioinformatics"},{"key":"ref241","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1101\/gr.140426.112","article-title":"Comparative motif discovery combined with comparative transcriptomics yields accurate targetome and enhancer predictions","volume":"23","year":"2013","journal-title":"Genome Res"},{"key":"ref311","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s13742-015-0064-7","article-title":"Investigation into the annotation of protocol sequencing steps in the sequence read archive","volume":"4","year":"2015","journal-title":"GigaScience"},{"key":"ref251","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","article-title":"TopHat: discovering splice junctions with RNA-Seq","volume":"25","year":"2009","journal-title":"Bioinformatics"},{"key":"ref461","doi-asserted-by":"crossref","first-page":"R22","DOI":"10.1186\/gb-2011-12-3-r22","article-title":"Improving RNA-Seq expression estimates by correcting for fragment bias","volume":"12","year":"2011","journal-title":"Genome Biol"},{"key":"ref381","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/s12859-016-1457-z","article-title":"Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq","volume":"18","year":"2017","journal-title":"BMC Bioinform"},{"key":"ref91","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/s12859-016-1457-z","article-title":"Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq","volume":"18","year":"2017","journal-title":"BMC Bioinform"},{"key":"ref291","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies \u2013 the next generation","volume":"11","year":"2010","journal-title":"Nat Rev Genet"},{"key":"ref191","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1007\/s12064-012-0162-3","article-title":"Measurement of mrna abundance using RNA-seq data: RPKM measure is inconsistent among samples","volume":"131","year":"2012","journal-title":"Theory Biosci"},{"key":"ref161","first-page":"1","article-title":"Big data: astronomical or genomical?","volume":"13","year":"2015","journal-title":"PLoS Biol"},{"key":"ref511","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-662-48653-5_1","volume-title":"International symposium on distributed computing","volume":"vol. 9363","year":"2015"},{"key":"ref21","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/s13742-015-0064-7","article-title":"Investigation into the annotation of protocol sequencing steps in the sequence read archive","volume":"4","year":"2015","journal-title":"GigaScience"},{"key":"ref261","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms","volume":"32","year":"2014","journal-title":"Nat Biotechnol"},{"key":"ref361","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1479-7364-8-3","article-title":"A survey of software for genome-wide discovery of differential splicing in RNA-Seq data","volume":"8","year":"2014","journal-title":"Hum Genomics"},{"key":"ref51","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1093\/bib\/bbq077","article-title":"Base-calling for next-generation sequencing platforms","volume":"12","year":"2011","journal-title":"Brief Bioinform"},{"key":"ref521","first-page":"2","year":"2012","journal-title":"Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing"},{"key":"ref351","article-title":"Index switching causes \u201cspreading-of-signal\u201d among multiplexed samples in Illumina HiSeq 4000 DNA sequencing","year":"2017","journal-title":"bioRxiv"},{"key":"ref11","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","year":"2009","journal-title":"Nat Rev Genet"},{"key":"ref221","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-662-48653-5_1","volume-title":"International symposium on distributed computing","volume":"vol. 9363","year":"2015"},{"key":"ref331","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1093\/bib\/bbq077","article-title":"Base-calling for next-generation sequencing platforms","volume":"12","year":"2011","journal-title":"Briefings in Bioinformatics"},{"key":"ref61","article-title":"Index switching causes \u201cspreading-of-signal\u201d among multiplexed samples in Illumina HiSeq 4000 DNA sequencing","year":"2017","journal-title":"bioRxiv"},{"key":"ref81","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","year":"2016","journal-title":"Genome Biol"},{"key":"ref341","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1093\/bib\/bbq077","article-title":"Base-calling for next-generation sequencing platforms","volume":"12","year":"2011","journal-title":"Brief Bioinform"},{"key":"ref01","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies \u2013 the next generation","volume":"11","year":"2010","journal-title":"Nat Rev Genet"},{"key":"ref131","doi-asserted-by":"crossref","first-page":"e131","DOI":"10.1093\/nar\/gkq224","article-title":"Biases in Illumina transcriptome sequencing caused by random hexamer priming","volume":"38","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"ref171","doi-asserted-by":"crossref","first-page":"R22","DOI":"10.1186\/gb-2011-12-3-r22","article-title":"Improving RNA-Seq expression estimates by correcting for fragment bias","volume":"12","year":"2011","journal-title":"Genome Biol"},{"key":"ref321","doi-asserted-by":"crossref","first-page":"2194","DOI":"10.1093\/bioinformatics\/btp383","article-title":"Swift: primary data analysis for the Illumina Solexa sequencing platform","volume":"25","year":"2009","journal-title":"Bioinformatics"},{"key":"ref491","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1186\/s12859-015-0778-7","article-title":"Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data","volume":"16","year":"2015","journal-title":"BMC Bioinform"},{"key":"ref481","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1007\/s12064-012-0162-3","article-title":"Measurement of mrna abundance using RNA-seq data: RPKM measure is inconsistent among samples","volume":"131","year":"2012","journal-title":"Theory Biosci"},{"key":"ref551","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms","volume":"32","year":"2014","journal-title":"Nat Biotechnol"},{"key":"ref71","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1479-7364-8-3","article-title":"A survey of software for genome-wide discovery of differential splicing in RNA-Seq data","volume":"8","year":"2014","journal-title":"Hum Genomics"},{"key":"ref531","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1101\/gr.140426.112","article-title":"Comparative motif discovery combined with comparative transcriptomics yields accurate targetome and enhancer predictions","volume":"23","year":"2013","journal-title":"Genome Res"},{"key":"ref571","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1109\/TCBB.2014.2366103","article-title":"ResSeq: enhancing short-read sequencing alignment by rescuing error-containing reads","volume":"12","year":"2015","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"ref471","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1038\/nmeth.1226","article-title":"Mapping and quantifying mammalian transcriptomes by RNA-seq","volume":"5","year":"2008","journal-title":"Nat Methods"},{"key":"ref151","first-page":"95","article-title":"Spark: cluster computing with working sets","volume":"10","year":"2010","journal-title":"HotCloud"},{"key":"ref451","first-page":"1","article-title":"Big data: astronomical or genomical?","volume":"13","year":"2015","journal-title":"PLoS Biol"},{"key":"ref31","doi-asserted-by":"crossref","first-page":"2194","DOI":"10.1093\/bioinformatics\/btp383","article-title":"Swift: primary data analysis for the Illumina Solexa sequencing platform","volume":"25","year":"2009","journal-title":"Bioinformatics"},{"key":"ref101","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","year":"2010","journal-title":"Genomics"},{"key":"ref271","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1111\/j.1541-0420.2011.01605.x","article-title":"BM-Map: Bayesian mapping of multireads for next-generation sequencing data","volume":"67","year":"2011","journal-title":"Biometrics"},{"key":"ref561","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1111\/j.1541-0420.2011.01605.x","article-title":"BM-Map: Bayesian mapping of multireads for next-generation sequencing data","volume":"67","year":"2011","journal-title":"Biometrics"},{"key":"ref391","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","year":"2010","journal-title":"Genomics"},{"key":"ref421","doi-asserted-by":"crossref","first-page":"e131","DOI":"10.1093\/nar\/gkq224","article-title":"Biases in Illumina transcriptome sequencing caused by random hexamer priming","volume":"38","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"ref401","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1186\/1471-2105-12-480","article-title":"GC-content normalization for RNA-Seq data","volume":"12","year":"2011","journal-title":"BMC Bioinform"},{"key":"ref501","doi-asserted-by":"crossref","first-page":"D486","DOI":"10.1093\/nar\/gkl827","article-title":"FlyBase: genomes by the dozen","volume":"35","year":"2007","journal-title":"Nucleic Acids Research"},{"key":"ref431","first-page":"259","article-title":"Probes containing runs of guanines provide insights into the biophysics and bioinformatics of Affymetrix GeneChips","volume":"10","year":"2009","journal-title":"Brief Bioinform"},{"key":"ref121","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1186\/1471-2105-12-290","article-title":"Bias detection and correction in RNA-sequencing data","volume":"12","year":"2011","journal-title":"BMC Bioinform"},{"key":"ref211","doi-asserted-by":"crossref","first-page":"D486","DOI":"10.1093\/nar\/gkl827","article-title":"FlyBase: genomes by the dozen","volume":"35","year":"2007","journal-title":"Nucleic Acids Research"},{"key":"ref301","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","year":"2009","journal-title":"Nat Rev Genet"},{"key":"ref231","first-page":"2","year":"2012","journal-title":"Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing"},{"key":"ref201","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1186\/s12859-015-0778-7","article-title":"Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data","volume":"16","year":"2015","journal-title":"BMC Bioinform"},{"key":"ref41","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1093\/bib\/bbq077","article-title":"Base-calling for next-generation sequencing platforms","volume":"12","year":"2011","journal-title":"Briefings in Bioinformatics"},{"key":"ref411","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1186\/1471-2105-12-290","article-title":"Bias detection and correction in RNA-sequencing data","volume":"12","year":"2011","journal-title":"BMC Bioinform"}],"container-title":["Journal of Integrative Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/jib\/14\/3\/article-20170025.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2017-0025\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,22]],"date-time":"2021-04-22T01:57:36Z","timestamp":1619056656000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2017-0025\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9,23]]},"references-count":58,"journal-issue":{"issue":"3"},"URL":"https:\/\/doi.org\/10.1515\/jib-2017-0025","relation":{},"ISSN":["1613-4516"],"issn-type":[{"value":"1613-4516","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,9,23]]}}}