{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:00Z","timestamp":1772138040574,"version":"3.50.1"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T00:00:00Z","timestamp":1600041600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme under the Marie Sk\u0142odowska-Curie","award":["872539"],"award-info":[{"award-number":["872539"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The tool is distributed as a stand-alone module and the software is freely available at https:\/\/github.com\/AlgoLab\/shark.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa779","type":"journal-article","created":{"date-parts":[[2020,9,2]],"date-time":"2020-09-02T07:27:57Z","timestamp":1599031677000},"page":"464-472","source":"Crossref","is-referenced-by-count":11,"title":["Shark: fishing relevant reads in an RNA-Seq sample"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8786-2276","authenticated-orcid":false,"given":"Luca","family":"Denti","sequence":"first","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8479-7592","authenticated-orcid":false,"given":"Yuri","family":"Pirola","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3040-9539","authenticated-orcid":false,"given":"Marco","family":"Previtali","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]},{"given":"Tamara","family":"Ceccato","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5584-3089","authenticated-orcid":false,"given":"Gianluca","family":"Della Vedova","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]},{"given":"Raffaella","family":"Rizzi","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]},{"given":"Paola","family":"Bonizzoni","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano-Bicocca , Milano 20126, Italy"}]}],"member":"286","published-online":{"date-parts":[[2020,9,14]]},"reference":[{"key":"2023051706073095700_btaa779-B1","doi-asserted-by":"crossref","first-page":"i169","DOI":"10.1093\/bioinformatics\/bty292","article-title":"A space and time-efficient index for the compacted colored de Bruijn graph","volume":"34","author":"Almodaresi","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B2","first-page":"1","volume-title":"RECOMB","author":"Almodaresi","year":"2019"},{"key":"2023051706073095700_btaa779-B3","first-page":"145","volume-title":"SPIRE","author":"Belazzougui","year":"2016"},{"key":"2023051706073095700_btaa779-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-21770-7","article-title":"Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data","volume":"8","author":"Benoit-Pilven","year":"2018","journal-title":"Sci. Rep"},{"key":"2023051706073095700_btaa779-B5","first-page":"49","author":"Beretta","year":"2017"},{"key":"2023051706073095700_btaa779-B6","doi-asserted-by":"crossref","first-page":"4760","DOI":"10.1038\/ncomms5760","article-title":"Human Tra2 proteins jointly control a CHEK1 splicing switch among alternative and constitutive target exons","volume":"5","author":"Best","year":"2014","journal-title":"Nat. Commun"},{"key":"2023051706073095700_btaa779-B7","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun. ACM"},{"key":"2023051706073095700_btaa779-B8","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051706073095700_btaa779-B9","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","author":"Conesa","year":"2016","journal-title":"Genome Biol"},{"key":"2023051706073095700_btaa779-B10","doi-asserted-by":"crossref","first-page":"D745","DOI":"10.1093\/nar\/gky1113","article-title":"Ensembl","volume":"47","author":"Cunningham","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051706073095700_btaa779-B11","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1186\/s12859-018-2436-3","article-title":"ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events","volume":"19","author":"Denti","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023051706073095700_btaa779-B12","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.isci.2019.07.011","article-title":"MALVA: genotyping by mapping-free allele detection of known variants","volume":"18","author":"Denti","year":"2019","journal-title":"iScience"},{"key":"2023051706073095700_btaa779-B13","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B14","first-page":"326","volume-title":"SEA","author":"Gog","year":"2014"},{"key":"2023051706073095700_btaa779-B15","doi-asserted-by":"crossref","first-page":"10073","DOI":"10.1093\/nar\/gks666","article-title":"Modelling and simulating generic RNA-Seq experiments with the flux simulator","volume":"40","author":"Griebel","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051706073095700_btaa779-B16","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1038\/s41592-018-0046-7","article-title":"Bioconda: sustainable and comprehensive software distribution for the life sciences","volume":"15","author":"Gr\u00fcning","year":"2018","journal-title":"Nat. Methods"},{"key":"2023051706073095700_btaa779-B17","doi-asserted-by":"crossref","first-page":"1494","DOI":"10.1038\/nprot.2013.084","article-title":"De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with trinity","volume":"8","author":"Haas","year":"2013","journal-title":"Nat. Protoc"},{"key":"2023051706073095700_btaa779-B18","doi-asserted-by":"crossref","first-page":"1840","DOI":"10.1093\/bioinformatics\/btw076","article-title":"SplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data","volume":"32","author":"Kahles","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B19","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/j.ccell.2018.07.001","article-title":"Comprehensive analysis of alternative splicing across tumors from 8,705 patients","volume":"34","author":"Kahles","year":"2018","journal-title":"Cancer Cell"},{"key":"2023051706073095700_btaa779-B20","doi-asserted-by":"crossref","first-page":"2759","DOI":"10.1093\/bioinformatics\/btx304","article-title":"KMC 3: counting and manipulating k-mer statistics","volume":"33","author":"Kokot","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B21","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2013a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B22","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B23","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms","volume":"32","author":"Patro","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051706073095700_btaa779-B24","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051706073095700_btaa779-B25","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-13-S6-S5","article-title":"KISSPLICE: de-novo calling alternative splicing events from RNA-seq data","volume":"13","author":"Sacomoto","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023051706073095700_btaa779-B26","doi-asserted-by":"crossref","first-page":"E5593","DOI":"10.1073\/pnas.1419161111","article-title":"rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data","volume":"111","author":"Shen","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051706073095700_btaa779-B27","doi-asserted-by":"crossref","first-page":"i192","DOI":"10.1093\/bioinformatics\/btw277","article-title":"RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes","volume":"32","author":"Srivastava","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051706073095700_btaa779-B28","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1089\/cmb.2017.0258","article-title":"AllSome sequence bloom trees","volume":"25","author":"Sun","year":"2018","journal-title":"J. Comput. Biol"},{"key":"2023051706073095700_btaa779-B29","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.bbadis.2008.09.017","article-title":"Alternative splicing and disease","volume":"1792","author":"Tazi","year":"2009","journal-title":"Biochim. Biophys. Acta"},{"key":"2023051706073095700_btaa779-B30","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/s13059-018-1417-1","article-title":"SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions","volume":"19","author":"Trincado","year":"2018","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa779\/34774651\/btaa779.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/4\/464\/50359813\/btaa779.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/4\/464\/50359813\/btaa779.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T02:15:06Z","timestamp":1684289706000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/4\/464\/5905477"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,9,14]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa779","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/836130","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,2,15]]},"published":{"date-parts":[[2020,9,14]]}}}