{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T23:45:26Z","timestamp":1773272726515,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required.<\/jats:p><jats:p>Results: We present a novel method for fast and accurate homology detection, assuming that the Smith\u2013Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available.<\/jats:p><jats:p>Availability: The programs for fast homolog detection can be downloaded from ftp:\/\/ftp.kuicr.kyoto-u.ac.jp\/itoh\/<\/jats:p><jats:p>Contact: \u00a0itoh@kuicr.kyoto-u.ac.jp<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti076","type":"journal-article","created":{"date-parts":[[2004,10,28]],"date-time":"2004-10-28T00:23:01Z","timestamp":1098922981000},"page":"912-921","source":"Crossref","is-referenced-by-count":16,"title":["Fast and accurate database homology search using upper bounds of local alignment scores"],"prefix":"10.1093","volume":"21","author":[{"given":"Masumi","family":"Itoh","sequence":"first","affiliation":[{"name":"Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji, Kyoto 611-0011, Japan"}]},{"given":"Susumu","family":"Goto","sequence":"additional","affiliation":[{"name":"Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji, Kyoto 611-0011, Japan"}]},{"given":"Tatsuya","family":"Akutsu","sequence":"additional","affiliation":[{"name":"Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji, Kyoto 611-0011, Japan"}]},{"given":"Minoru","family":"Kanehisa","sequence":"additional","affiliation":[{"name":"Bioinformatics Center, Institute for Chemical Research, Kyoto University Gokasho, Uji, Kyoto 611-0011, Japan"}]}],"member":"286","published-online":{"date-parts":[[2004,10,27]]},"reference":[{"key":"2023013107281490200_B1","doi-asserted-by":"crossref","unstructured":"Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.J. 1990Basic local alignment search tool. J. Mol. Biol.215403\u2013410","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"2023013107281490200_B2","doi-asserted-by":"crossref","unstructured":"Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. 1997Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.253389\u20133402","DOI":"10.1093\/nar\/25.17.3389"},{"key":"2023013107281490200_B3","unstructured":"Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. 2004The Pfam protein families database. Nucleic Acids Res.32D138\u2013D141"},{"key":"2023013107281490200_B4","doi-asserted-by":"crossref","unstructured":"Brenner, S.E., Chothia, C., Hubbard, T.J.P. 1998Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci.956073\u20136078","DOI":"10.1073\/pnas.95.11.6073"},{"key":"2023013107281490200_B5","doi-asserted-by":"crossref","unstructured":"Chandonia, J.M., Hon, G., Walker, N.S., Lo Conte, L., Koehl, P., Levitt, M., Brenner, S.E. 2004The ASTRAL compendium in 2004. Nucleic Acids Res.32D189\u2013D192","DOI":"10.1093\/nar\/gkh034"},{"key":"2023013107281490200_B6","doi-asserted-by":"crossref","unstructured":"Davidson, A. 2001A fast pruning algorithm for optimal sequence alignment. Proc. 2nd IEEE Int. Symp. Bioinformatics Bioeng.49\u201356","DOI":"10.1109\/BIBE.2001.974411"},{"key":"2023013107281490200_B7","unstructured":"Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. 1978A model of evolutionary change in proteins. In Dayhoff, M.O. (Ed.). Atlas of Protein Sequence and Structure , Washington, DC National Biomedical Research Foundation Vol. 5Suppl. 3,, pp. 345\u2013352"},{"key":"2023013107281490200_B8","doi-asserted-by":"crossref","unstructured":"Delcher, A.L., Kasif, S., Fleischmann, R.D., Peterson, J., White, O., Salzverg, S.L. 1999Alignment of whole genomes. Nucleic Acids Res.272369\u20132376","DOI":"10.1093\/nar\/27.11.2369"},{"key":"2023013107281490200_B9","unstructured":"Dubey, A., Hwang, S., Rangel, C. 2004Clustering protein sequence and structure space with infinite Gaussian mixture models. Proc. Pac. Symp. Biocomp.9399\u2013410"},{"key":"2023013107281490200_B10","unstructured":"Eddy, S.R. 1998Profile hidden Markov models. Bioinformatics14755\u2013763"},{"key":"2023013107281490200_B11","doi-asserted-by":"crossref","unstructured":"Henikoff, S. and Henikoff, J. 1992Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci.8910915\u201310919","DOI":"10.1073\/pnas.89.22.10915"},{"key":"2023013107281490200_B12","unstructured":"Itoh, M., Akutsu, T., Kanehisa, M. 2004Clustering of database sequences for fast homology search using upper bounds on alignment score. Genome Informatics1593\u2013104"},{"key":"2023013107281490200_B13","doi-asserted-by":"crossref","unstructured":"Kanehisa, M., Goto, S., Kawashima, S., Nakaya, A. 2002The KEGG databases at GenomeNet. Nucleic Acids Res.3042\u201346","DOI":"10.1093\/nar\/30.1.42"},{"key":"2023013107281490200_B14","doi-asserted-by":"crossref","unstructured":"Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M. 2004The KEGG resource for deciphering the genome. Nucleic Acids Res.32D277\u2013D280","DOI":"10.1093\/nar\/gkh063"},{"key":"2023013107281490200_B15","doi-asserted-by":"crossref","unstructured":"Karplus, K., Barrett, C., Hughey, R. 1998Hidden Markov models for detecting remote protein homologies. Bioinformatics14846\u2013856","DOI":"10.1093\/bioinformatics\/14.10.846"},{"key":"2023013107281490200_B16","doi-asserted-by":"crossref","unstructured":"Kriventseva, E.V., Fleischmann, W., Zdobnov, E.M., Apweiler, R. 2001CluSTr: a database of clusters of SWISS-PROT + TrEMBL proteins. Nucleic Acids Res.2933\u201336","DOI":"10.1093\/nar\/29.1.33"},{"key":"2023013107281490200_B17","doi-asserted-by":"crossref","unstructured":"Kriventseva, E.V., Servant, F., Apweiler, R. 2003Improvements to CluSTr: the database of SWISS-PROT + TrEMBL protein clusters. Nucleic Acids Res.31388\u2013389","DOI":"10.1093\/nar\/gkg035"},{"key":"2023013107281490200_B18","doi-asserted-by":"crossref","unstructured":"Li, W., Jaroszewski, L., Godzik, A. 2001Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics17282\u2013283","DOI":"10.1093\/bioinformatics\/17.3.282"},{"key":"2023013107281490200_B19","doi-asserted-by":"crossref","unstructured":"Li, W., Jaroszewski, L., Godzik, A. 2002Sequence clustering strategies improver remote homology recognitions while reducing search time. Protein Eng.15643\u2013649","DOI":"10.1093\/protein\/15.8.643"},{"key":"2023013107281490200_B20","unstructured":"Lipman, D.J. and Pearson, W.R. 1985Rapid and sensitive protein similarity searches. Science2271435\u20131441"},{"key":"2023013107281490200_B21","unstructured":"Ma, B., Tromp, J., Li, M. 2002PatternHunter: faster and more sensitive homology search. Bioinformatics18440\u2013445"},{"key":"2023013107281490200_B22","unstructured":"Matsuda, H., Ishida, T., Hashimoto, A. 1999Classifying molecular sequences using a linkage graph with their pairwise similarities. Theor. Comput. Sci.210305\u2013325"},{"key":"2023013107281490200_B23","unstructured":"Mount, D.W. Bioinformatics: Sequence and Genome Analysis2001, Cold Spring Harbor, NY Cold Spring Harbor Laboratory Press"},{"key":"2023013107281490200_B24","doi-asserted-by":"crossref","unstructured":"Muthukrishnan, S. and Sahinalp, S.C. 2000Approximate nearest neighbors and sequence comparison with block operations. Proc. 32nd Ann. ACM Symp. Theory Comput. , pp. 416\u2013424","DOI":"10.1145\/335305.335353"},{"key":"2023013107281490200_B25","unstructured":"Myers, E.W. 1986An O(ND) difference algorithm and its variations. Algorithmica1251\u2013266"},{"key":"2023013107281490200_B26","unstructured":"Myers, E.W. 1994A sublinear algorithm for approximate keyword searching. Algorithmica12345\u2013374"},{"key":"2023013107281490200_B27","doi-asserted-by":"crossref","unstructured":"Ogasawara, J. and Morishita, S. 2003A fast and sensitive algorithm for aligning ESTs to the human genome. J. Bioinfo. Comput. Biol.1363\u2013386","DOI":"10.1142\/S0219720003000058"},{"key":"2023013107281490200_B28","doi-asserted-by":"crossref","unstructured":"Rognes, T. and Seeberg, E. 2000Six-fold speed-up of Smith\u2013Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics16699\u2013706","DOI":"10.1093\/bioinformatics\/16.8.699"},{"key":"2023013107281490200_B29","doi-asserted-by":"crossref","unstructured":"Rognes, T. and Seeberg, E. 1998SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments. Bioinformatics14839\u2013845","DOI":"10.1093\/bioinformatics\/14.10.839"},{"key":"2023013107281490200_B30","unstructured":"Smith, T.F. and Waterman, M.S. 1981Identification of common molecular subsequences. J. Mol. Biol.147195\u2013197"},{"key":"2023013107281490200_B31","doi-asserted-by":"crossref","unstructured":"Spiro, P.A. and Macura, N. 2004A Local alignment metric for accelerating biosequence database search. J. Comp. Biol.1161\u201382","DOI":"10.1089\/106652704773416894"},{"key":"2023013107281490200_B32","unstructured":"Stojmirovi\u0107, A. 2003Quasi-metric spaces with measure. arXiv e-print:math. GN\/0311070"},{"key":"2023013107281490200_B33","unstructured":"Stojmirovi\u0107, A. and Pestov, V. 2003Index schemes for similarity search in datasets of short protein fragments. arXvi e-print:cs.DS\/0309005"},{"key":"2023013107281490200_B34","unstructured":"Suharnan, S., Itou, T., Matsuda, H., Mori, H. 1997Clustering molecular sequences with their components. Genome Informatics8125\u2013134"},{"key":"2023013107281490200_B35","doi-asserted-by":"crossref","unstructured":"Tatusov, R.L., Koonin, E.V., Lipman, D.J. 1997A genomic perspective on protein families. Science278631\u2013637","DOI":"10.1126\/science.278.5338.631"},{"key":"2023013107281490200_B36","doi-asserted-by":"crossref","unstructured":"Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., Koonin, E.V. 2001The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res.2922\u201328","DOI":"10.1093\/nar\/29.1.22"},{"key":"2023013107281490200_B37","unstructured":"Waterman, M.S., Smith, T.F., Beyer, W.A. 1976Some biological sequence metrics. Adv. Math.20367\u2013387"},{"key":"2023013107281490200_B38","unstructured":"Zhang, Z., Schwartz, S., Wagner, L., Miller, W. 2000A greedy algorithm for aligning DNA sequences. J. Comp. Biol.7203\u2013214"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/912\/48966561\/bioinformatics_21_7_912.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/912\/48966561\/bioinformatics_21_7_912.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,14]],"date-time":"2024-01-14T19:36:59Z","timestamp":1705261019000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/7\/912\/268799"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,27]]},"references-count":38,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2005,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti076","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,4,1]]},"published":{"date-parts":[[2004,10,27]]}}}