{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T19:27:18Z","timestamp":1758396438068},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Background: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP\/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.<\/jats:p>\n               <jats:p>Results: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.<\/jats:p>\n               <jats:p>Conclusion: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.<\/jats:p>\n               <jats:p>Availability: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http:\/\/sonnhammer.sbc.su.se\/download\/software\/MSPcrunch+Blixem\/benchmark.tar.gz<\/jats:p>\n               <jats:p>Contact: \u00a0kristoffer.forslund@sbc.su.se<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp446","type":"journal-article","created":{"date-parts":[[2009,7,21]],"date-time":"2009-07-21T00:58:23Z","timestamp":1248137903000},"page":"2500-2505","source":"Crossref","is-referenced-by-count":10,"title":["Benchmarking homology detection procedures with low complexity filters"],"prefix":"10.1093","volume":"25","author":[{"given":"Kristoffer","family":"Forslund","sequence":"first","affiliation":[{"name":"Stockholm Bioinformatics Center, Stockholm University, SE-10691 Stockholm, Sweden"}]},{"given":"Erik LL","family":"Sonnhammer","sequence":"additional","affiliation":[{"name":"Stockholm Bioinformatics Center, Stockholm University, SE-10691 Stockholm, Sweden"}]}],"member":"286","published-online":{"date-parts":[[2009,7,20]]},"reference":[{"key":"2023013112124547500_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023013112124547500_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B3","doi-asserted-by":"crossref","first-page":"5101","DOI":"10.1111\/j.1742-4658.2005.04945.x","article-title":"Protein database searches using compositionally adjusted substitution matrices","volume":"272","author":"Altschul","year":"2005","journal-title":"FEBS J."},{"key":"2023013112124547500_B4","doi-asserted-by":"crossref","first-page":"D263","DOI":"10.1093\/nar\/gkm1020","article-title":"InParanoid 6: eukaryotic ortholog clusters with inparalogs","volume":"36","author":"Berglund","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B5","doi-asserted-by":"crossref","first-page":"D189","DOI":"10.1093\/nar\/gkh034","article-title":"The ASTRAL compendium in 2004","volume":"32","author":"Chandonia","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B6","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.","author":"Durbin","year":"1998"},{"key":"2023013112124547500_B7","doi-asserted-by":"crossref","first-page":"e1000069","DOI":"10.1371\/journal.pcbi.1000069","article-title":"A probabilistic model of local sequence alignment that simplifies statistical significance estimation","volume":"4","author":"Eddy","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023013112124547500_B8","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/nature03481","article-title":"The genome of the social amoeba Dictyostelium discoideum","volume":"435","author":"Eichinger","year":"2005","journal-title":"Nature"},{"key":"2023013112124547500_B9","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gkj149","article-title":"Pfam: clans, web tools and services","volume":"34","author":"Finn","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B10","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The Pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B11","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1093\/molbev\/msm254","article-title":"Domain tree-based analysis of protein architecture evolution","volume":"25","author":"Forslund","year":"2008","journal-title":"Mol. Biol. Evol."},{"key":"2023013112124547500_B12","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1093\/bioinformatics\/bti204","article-title":"Convergent evolution of domain architectures (is rare)","volume":"21","author":"Gough","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112124547500_B13","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1006\/jmbi.2001.5080","article-title":"Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure","volume":"313","author":"Gough","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023013112124547500_B14","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1006\/jmbi.1994.1104","article-title":"Hidden Markov models in computational biology. Applications to protein modeling","volume":"235","author":"Krogh","year":"1994","journal-title":"J. Mol. Biol."},{"key":"2023013112124547500_B15","doi-asserted-by":"crossref","first-page":"4321","DOI":"10.1093\/nar\/gkf544","article-title":"A comparison of profile hidden Markov model procedures for remote homology detection","volume":"30","author":"Madera","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B16","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023013112124547500_B17","doi-asserted-by":"crossref","first-page":"1041","DOI":"10.1006\/jmbi.2000.5197","article-title":"Automatic clustering of orthologs and in-paralogs from pairwise species comparisons","volume":"314","author":"Remm","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023013112124547500_B18","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Sch\u00e4ffer","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B19","first-page":"363","article-title":"An expert system for processing sequence homology data","volume":"2","author":"Sonnhammer","year":"1994","journal-title":"ISMB"},{"key":"2023013112124547500_B20","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1093\/nar\/26.1.320","article-title":"Pfam: multiple sequence alignments and HMM-profiles of protein domains","volume":"26","author":"Sonnhammer","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023013112124547500_B21","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1186\/1471-2105-6-99","article-title":"Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER","volume":"6","author":"Wistrand","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013112124547500_B22","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1016\/S0076-6879(96)66035-2","article-title":"Analysis of compositionally biased regions in sequence databases","volume":"266","author":"Wootton","year":"1996","journal-title":"Methods Enzymol."},{"key":"2023013112124547500_B23","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1093\/bioinformatics\/bti070","article-title":"The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions","volume":"21","author":"Yu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112124547500_B24","doi-asserted-by":"crossref","first-page":"15688","DOI":"10.1073\/pnas.2533904100","article-title":"The compositional adjustment of amino acid substitution matrices","volume":"100","author":"Yu","year":"2003","journal-title":"Proc. Natl. Acad. Sci. U S A"},{"key":"2023013112124547500_B25","doi-asserted-by":"crossref","first-page":"5966","DOI":"10.1093\/nar\/gkl731","article-title":"Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches","volume":"34","author":"Yu","year":"2006","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/19\/2500\/48995277\/bioinformatics_25_19_2500.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/19\/2500\/48995277\/bioinformatics_25_19_2500.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:38:49Z","timestamp":1675201129000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/19\/2500\/180656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,7,20]]},"references-count":25,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2009,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp446","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,10,1]]},"published":{"date-parts":[[2009,7,20]]}}}