{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T11:23:06Z","timestamp":1776079386762,"version":"3.50.1"},"reference-count":16,"publisher":"Oxford University Press (OUP)","issue":"23","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The next-generation sequencing era requires reliable, fast and efficient approaches for the accurate annotation of the ever-increasing number of biological sequences and their variations. Transfer of annotation upon similarity search is a standard approach. The procedure of all-against-all protein comparison is a preliminary step of different available methods that annotate sequences based on information already present in databases. Given the actual volume of sequences, methods are necessary to pre-process data to reduce the time of sequence comparison.<\/jats:p>\n               <jats:p>Results: We present an algorithm that optimizes the partition of a large volume of sequences (the whole database) into sets where sequence length values (in residues) are constrained depending on a bounded minimal and expected alignment coverage. The idea is to optimally group protein sequences according to their length, and then computing the all-against-all sequence alignments among sequences that fall in a selected length range. We describe a mathematically optimal solution and we show that our method leads to a 5-fold speed-up in real world cases.<\/jats:p>\n               <jats:p>Availability and implementation: The software is available for downloading at http:\/\/www.biocomp.unibo.it\/\u223cgiuseppe\/partitioning.html.<\/jats:p>\n               <jats:p>Contact: \u00a0giuseppe.profiti2@unibo.it<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv451","type":"journal-article","created":{"date-parts":[[2015,8,1]],"date-time":"2015-08-01T00:40:08Z","timestamp":1438389608000},"page":"3841-3843","source":"Crossref","is-referenced-by-count":4,"title":["AlignBucket: a tool to speed up \u2018all-against-all\u2019 protein sequence alignments optimizing length constraints"],"prefix":"10.1093","volume":"31","author":[{"given":"Giuseppe","family":"Profiti","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and Engineering, via Mura Anteo Zamboni 7, Bologna,"},{"name":"2 Bologna Biocomputing group, via S. Giacomo 9\/2, Bologna and"},{"name":"3 Health Sciences and Technologies ICIR, via Tolara di Sopra 41\/E, Ozzano dell\u2019Emilia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Piero","family":"Fariselli","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, via Mura Anteo Zamboni 7, Bologna,"},{"name":"2 Bologna Biocomputing group, via S. Giacomo 9\/2, Bologna and"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rita","family":"Casadio","sequence":"additional","affiliation":[{"name":"2 Bologna Biocomputing group, via S. Giacomo 9\/2, Bologna and"},{"name":"3 Health Sciences and Technologies ICIR, via Tolara di Sopra 41\/E, Ozzano dell\u2019Emilia, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,7,30]]},"reference":[{"key":"2023020202423598700_btv451-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023020202423598700_btv451-B2","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/j.1460-2075.1986.tb04288.x","article-title":"The relation between the divergence of sequence and structure in proteins","volume":"5","author":"Chothia","year":"1986","journal-title":"EMBO J."},{"key":"2023020202423598700_btv451-B3","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1016\/S0168-9525(01)02348-4","article-title":"Intrinsic errors in genome annotation","volume":"17","author":"Devos","year":"2001","journal-title":"Trends Genet."},{"key":"2023020202423598700_btv451-B4","doi-asserted-by":"crossref","first-page":"1632","DOI":"10.1101\/gr.183801","article-title":"Annotation transfer for genomics: measuring functional divergence in multi-domain proteins","volume":"11","author":"Hegyi","year":"2001","journal-title":"Genome Res."},{"key":"2023020202423598700_btv451-B5","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/1471-2105-12-116","article-title":"Ultra-fast sequence clustering from similarity networks with silix","volume":"12","author":"Miele","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020202423598700_btv451-B6","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023020202423598700_btv451-B7","doi-asserted-by":"crossref","first-page":"W197","DOI":"10.1093\/nar\/gkr292","article-title":"Bar-plus: the bologna annotation resource plus for functional and structural annotation of protein sequences","volume":"39","author":"Piovesan","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023020202423598700_btv451-B8","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-14-S3-S4","article-title":"How to inherit statistically validated annotation within BAR+ protein clusters","volume":"14","author":"Piovesan","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020202423598700_btv451-B9","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/S0022-2836(02)00016-5","article-title":"Enzyme function less conserved than anticipated","volume":"318","author":"Rost","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023020202423598700_btv451-B10","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1186\/1471-2105-9-518","article-title":"Algorithm of oma for large-scale orthology inference","volume":"9","author":"Roth","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020202423598700_btv451-B11","doi-asserted-by":"crossref","first-page":"2993","DOI":"10.1093\/bioinformatics\/btu492","article-title":"Big data and other challenges in the quest for orthologs","volume":"30","author":"Sonnhammer","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202423598700_btv451-B12","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1016\/j.jmb.2003.08.057","article-title":"How well is enzyme function conserved as a function of pairwise sequence identity?","volume":"333","author":"Tian","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023020202423598700_btv451-B13","doi-asserted-by":"crossref","first-page":"D214","DOI":"10.1093\/nar\/gkq1020","article-title":"Ongoing and future developments at the universal protein resource","volume":"39","author":"UniProt","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023020202423598700_btv451-B14","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1101\/gr.073585.107","article-title":"Ensemblcompara genetrees: complete, duplication-aware phylogenetic trees in vertebrates","volume":"19","author":"Vilella","year":"2009","journal-title":"Genome Res."},{"key":"2023020202423598700_btv451-B15","doi-asserted-by":"crossref","first-page":"D358","DOI":"10.1093\/nar\/gks1116","article-title":"Orthodb: a hierarchical catalog of animal, fungal and bacterial orthologs","volume":"41","author":"Waterhouse","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023020202423598700_btv451-B16","doi-asserted-by":"crossref","first-page":"e607","DOI":"10.7717\/peerj.607","article-title":"Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology","volume":"2","author":"Wittwer","year":"2014","journal-title":"PeerJ"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/23\/3841\/49035722\/bioinformatics_31_23_3841.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/23\/3841\/49035722\/bioinformatics_31_23_3841.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:57:31Z","timestamp":1675310251000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/23\/3841\/208710"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,7,30]]},"references-count":16,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2015,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv451","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,12,1]]},"published":{"date-parts":[[2015,7,30]]}}}