{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T13:20:09Z","timestamp":1756992009968,"version":"3.37.3"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2017,12,29]],"date-time":"2017-12-29T00:00:00Z","timestamp":1514505600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG007182"],"award-info":[{"award-number":["R01HG007182"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Sequencing studies on non-model organisms often interrogate both genomes and transcriptomes with massive amounts of short sequences. Such studies require de novo analysis tools and techniques, when the species and closely related species lack high quality reference resources. For certain applications such as de novo annotation, information on putative exons and alternative splicing may be desirable.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here we present ChopStitch, a new method for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are represented as splice graphs in DOT output format.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>ChopStitch is written in Python and C++ and is released under the GPL license. It is freely available at https:\/\/github.com\/bcgsc\/ChopStitch.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx839","type":"journal-article","created":{"date-parts":[[2017,12,28]],"date-time":"2017-12-28T20:10:43Z","timestamp":1514491843000},"page":"1697-1704","source":"Crossref","is-referenced-by-count":4,"title":["ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data"],"prefix":"10.1093","volume":"34","author":[{"given":"Hamza","family":"Khan","sequence":"first","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hamid","family":"Mohamadi","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benjamin P","family":"Vandervalk","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9890-2293","authenticated-orcid":false,"given":"Rene L","family":"Warren","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Justin","family":"Chu","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0950-7839","authenticated-orcid":false,"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[{"name":"Canada\u2019s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2017,12,29]]},"reference":[{"key":"2023012713415293300_btx839-B100","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1609\/icwsm.v3i1.13937","article-title":"Gephi: an open source software for exploring and manipulating networks","volume":"8","author":"Bastian","year":"2009","journal-title":"Icwsm"},{"key":"2023012713415293300_btx839-B1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0130720","article-title":"De novo transcriptome assemblies of rana (Lithobates) catesbeiana and Xenopus laevis tadpole livers for comparative genomics without reference genomes","volume":"10","author":"Birol","year":"2015","journal-title":"PLoS One"},{"key":"2023012713415293300_btx839-B2","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023012713415293300_btx839-B3","doi-asserted-by":"crossref","first-page":"2210.","DOI":"10.1093\/bioinformatics\/btw218","article-title":"rnaQUAST: a quality assessment tool for de novo transcriptome assemblies","volume":"32","author":"Bushmanova","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713415293300_btx839-B4","doi-asserted-by":"crossref","first-page":"30.","DOI":"10.1186\/s13059-015-0596-2","article-title":"Bridger: a new framework for de novo transcriptome assembly using RNA-seq data","volume":"16","author":"Chang","year":"2015","journal-title":"Genome Biol"},{"key":"2023012713415293300_btx839-B5","doi-asserted-by":"crossref","first-page":"3402.","DOI":"10.1093\/bioinformatics\/btu558","article-title":"BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters","volume":"30","author":"Chu","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713415293300_btx839-B6","doi-asserted-by":"crossref","first-page":"3674","DOI":"10.1093\/bioinformatics\/bti610","article-title":"Blast2go: a universal tool for annotation, visualization and analysis in functional genomics research","volume":"21","author":"Conesa","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012713415293300_btx839-B7","doi-asserted-by":"crossref","first-page":"13.","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","author":"Conesa","year":"2016","journal-title":"Genome Biol"},{"key":"2023012713415293300_btx839-B8","doi-asserted-by":"crossref","first-page":"151.","DOI":"10.1093\/bfgp\/elr020","article-title":"RNA splicing: disease and therapy","volume":"10","author":"Douglas","year":"2011","journal-title":"Brief. Funct. Genomics"},{"key":"2023012713415293300_btx839-B101","first-page":"127","article-title":"Graphviz and dynagraph-static and dynamic graph drawing tools","author":"Ellson","year":"2004","journal-title":"Graph drawing software"},{"key":"2023012713415293300_btx839-B9","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023012713415293300_btx839-B10","first-page":"e127","article-title":"Detection and visualization of differential splicing in RNA-Seq data with JunctionSeq","volume":"44","author":"Hartley","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023012713415293300_btx839-B11","doi-asserted-by":"crossref","first-page":"768","DOI":"10.1101\/gr.214346.116","article-title":"ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter","volume":"27","author":"Jackman","year":"2017","journal-title":"Genome Res"},{"key":"2023012713415293300_btx839-B12","doi-asserted-by":"crossref","first-page":"R36.","DOI":"10.1186\/gb-2013-14-4-r36","article-title":"TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions","volume":"14","author":"Kim","year":"2013","journal-title":"Genome Biol"},{"key":"2023012713415293300_btx839-B13","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.3317","article-title":"HISAT: a fast spliced aligner with low memory requirements","volume":"12","author":"Kim","year":"2015","journal-title":"Nat. Methods"},{"key":"2023012713415293300_btx839-B14","doi-asserted-by":"crossref","first-page":"15.","DOI":"10.1371\/journal.pone.0143329","article-title":"LEMONS \u2013 a tool for the identification of splice junctions in transcriptomes of organisms lacking reference genomes","volume":"10","author":"Levin","year":"2015","journal-title":"Plos One"},{"key":"2023012713415293300_btx839-B15","doi-asserted-by":"crossref","first-page":"323.","DOI":"10.1186\/1471-2105-12-323","article-title":"RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012713415293300_btx839-B16","doi-asserted-by":"crossref","first-page":"e1004772.","DOI":"10.1371\/journal.pcbi.1004772","article-title":"Binpacker: packing-based de novo transcriptome assembly from RNA-seq data","volume":"12","author":"Liu","year":"2016","journal-title":"PLoS Comput. Biol"},{"key":"2023012713415293300_btx839-B17","doi-asserted-by":"crossref","first-page":"333.","DOI":"10.1186\/1471-2105-12-333","article-title":"Efficient counting of k-mers in DNA sequences using a bloom filter","volume":"12","author":"Melsted","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012713415293300_btx839-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0126409","article-title":"DIDA: Distributed Indexing Dispatched Alignment","volume":"10","author":"Mohamadi","year":"2015","journal-title":"PLoS One"},{"key":"2023012713415293300_btx839-B19","doi-asserted-by":"crossref","first-page":"3492","DOI":"10.1093\/bioinformatics\/btw397","article-title":"ntHash: recursive nucleotide hashing","volume":"32","author":"Mohamadi","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012713415293300_btx839-B20","doi-asserted-by":"crossref","first-page":"1324","DOI":"10.1093\/bioinformatics\/btw832","article-title":"ntCard: a streaming algorithm for cardinality estimation in genomics data","volume":"33","author":"Mohamadi","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012713415293300_btx839-B21","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1038\/nbt.3122","article-title":"StringTie enables improved reconstruction of a transcriptome from RNA-seq reads","volume":"33","author":"Pertea","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023012713415293300_btx839-B22","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1038\/nmeth.1517","article-title":"De novo assembly and analysis of RNA-seq data","volume":"7","author":"Robertson","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012713415293300_btx839-B23","doi-asserted-by":"crossref","first-page":"R4.","DOI":"10.1186\/gb-2012-13-1-r4","article-title":"SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data","volume":"13","author":"Rogers","year":"2012","journal-title":"Genome Biol"},{"key":"2023012713415293300_btx839-B24","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-13-S6-S5","article-title":"K is s plice: de-novo calling alternative splicing events from RNA-seq data","volume":"13","author":"Sacomoto","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023012713415293300_btx839-B25","doi-asserted-by":"crossref","first-page":"2.","DOI":"10.1186\/1748-7188-9-2","article-title":"Using cascading Bloom filters to improve the memory usage for de Brujin graphs","volume":"9","author":"Salikhov","year":"2014","journal-title":"Algorithms Mol. Biol"},{"key":"2023012713415293300_btx839-B26","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1093\/bioinformatics\/bts094","article-title":"Oases: robust de novo rna-seq assembly across the dynamic range of expression levels","volume":"28","author":"Schulz","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012713415293300_btx839-B27","doi-asserted-by":"crossref","first-page":"W435","DOI":"10.1093\/nar\/gkl200","article-title":"Augustus: ab initio prediction of alternative transcripts","volume":"34","author":"Stanke","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023012713415293300_btx839-B28","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1038\/nmeth.2714","article-title":"Assessment of transcript reconstruction methods for rna-seq","volume":"10","author":"Steijger","year":"2013","journal-title":"Nat. Methods"},{"key":"2023012713415293300_btx839-B29","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nbt.1621","article-title":"Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation","volume":"28","author":"Trapnell","year":"2010","journal-title":"Nat. Biotechnol"},{"year":"2014","author":"Vandervalk","key":"2023012713415293300_btx839-B30"},{"key":"2023012713415293300_btx839-B31","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1755-8794-8-S3-S1","article-title":"Konnector v2.0: pseudo-long reads from paired-end sequencing data","volume":"8","author":"Vandervalk","year":"2015","journal-title":"BMC Med. Genomics"},{"key":"2023012713415293300_btx839-B32","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet"},{"key":"2023012713415293300_btx839-B33","doi-asserted-by":"crossref","first-page":"E4859","DOI":"10.1073\/pnas.1323926111","article-title":"Phylotranscriptomic analysis of the origin and early diversification of land plants","volume":"111","author":"Wickett","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713415293300_btx839-B34","doi-asserted-by":"crossref","first-page":"1660","DOI":"10.1093\/bioinformatics\/btu077","article-title":"Soapdenovo-trans: de novo transcriptome assembly with short RNA-seq reads","volume":"30","author":"Xie","year":"2014","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/10\/1697\/48935592\/bioinformatics_34_10_1697.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/10\/1697\/48935592\/bioinformatics_34_10_1697.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,30]],"date-time":"2023-08-30T08:10:57Z","timestamp":1693383057000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/10\/1697\/4781691"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2017,12,29]]},"references-count":36,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2018,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx839","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,5,15]]},"published":{"date-parts":[[2017,12,29]]}}}