{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T04:30:43Z","timestamp":1746246643107,"version":"3.37.3"},"reference-count":54,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2020,9,8]],"date-time":"2020-09-08T00:00:00Z","timestamp":1599523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61822306","61672184","61702134"],"award-info":[{"award-number":["61822306","61672184","61702134"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004826","name":"Beijing Natural Science Foundation","doi-asserted-by":"publisher","award":["JQ19019"],"award-info":[{"award-number":["JQ19019"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2018AAA0100100"],"award-info":[{"award-number":["2018AAA0100100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangdong Special Support Program of Technology Young talents","award":["2016TQ03X618"],"award-info":[{"award-number":["2016TQ03X618"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As one of the most important and widely used mainstream iterative search tool for protein sequence search, an accurate Position-Specific Scoring Matrix (PSSM) is the key of PSI-BLAST. However, PSSMs containing non-homologous information obviously reduce the performance of PSI-BLAST for protein remote homology.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To further study this problem, we summarize three types of Incorrectly Selected Homology (ISH) errors in PSSMs. A new search tool Supervised-Manner-based Iterative BLAST (SMI-BLAST) is proposed based on PSI-BLAST for solving these errors. SMI-BLAST obviously outperforms PSI-BLAST on the Structural Classification of Proteins-extended (SCOPe) dataset. Compared with PSI-BLAST on the ISH error subsets of SCOPe dataset, SMI-BLAST detects 1.6\u20132.87 folds more remote homologous sequences, and outperforms PSI-BLAST by 35.66% in terms of ROC1 scores. Furthermore, this framework is applied to JackHMMER, DELTA-BLAST and PSI-BLASTexB, and their performance is further improved.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>User-friendly webservers for SMI-BLAST, JackHMMER, DELTA-BLAST and PSI-BLASTexB are established at http:\/\/bliulab.net\/SMI-BLAST\/, by which the users can easily get the results without the need to go through the mathematical details.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa772","type":"journal-article","created":{"date-parts":[[2020,8,28]],"date-time":"2020-08-28T11:51:37Z","timestamp":1598615497000},"page":"913-920","source":"Crossref","is-referenced-by-count":26,"title":["SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection"],"prefix":"10.1093","volume":"37","author":[{"given":"Xiaopeng","family":"Jin","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology , Shenzhen, Guangdong 518055, China"}]},{"given":"Qing","family":"Liao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology , Shenzhen, Guangdong 518055, China"}]},{"given":"Hang","family":"Wei","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology , Shenzhen, Guangdong 518055, China"}]},{"given":"Jun","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology , Shenzhen, Guangdong 518055, China"}]},{"given":"Bin","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology , Shenzhen, Guangdong 518055, China"},{"name":"School of Computer Science and Technology, Beijing Institute of Technology , Beijing 100081, China"},{"name":"Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology , Beijing 100081, China"}]}],"member":"286","published-online":{"date-parts":[[2020,9,8]]},"reference":[{"key":"2023051612174840900_btaa772-B1","doi-asserted-by":"crossref","first-page":"13814","DOI":"10.1073\/pnas.0405612101","article-title":"Comparative homology agreement search: an effective combination of homology-search methods","volume":"101","author":"Alam","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051612174840900_btaa772-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023051612174840900_btaa772-B4","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1038\/nmeth.2728","article-title":"Using networks to measure similarity between genes: association index selection","volume":"10","author":"Bass","year":"2013","journal-title":"Nat. Methods"},{"key":"2023051612174840900_btaa772-B5","first-page":"1089","article-title":"No unbiased estimator of the variance of K-fold cross-validation","volume":"5","author":"Bengio","year":"2004","journal-title":"J. Mach. Learn. Res"},{"key":"2023051612174840900_btaa772-B6","doi-asserted-by":"crossref","first-page":"3770","DOI":"10.1073\/pnas.0810767106","article-title":"Sequence context-specific profiles for homology searching","volume":"106","author":"Biegert","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051612174840900_btaa772-B7","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/1745-6150-7-12","article-title":"Domain enhanced lookup time accelerated BLAST","volume":"7","author":"Boratyn","year":"2012","journal-title":"Biol. Direct"},{"key":"2023051612174840900_btaa772-B9","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/0734-189X(84)90035-5","article-title":"Distance transformations in arbitrary dimensions","volume":"27","author":"Borgefors","year":"1984","journal-title":"Comput. Graph. Image Process"},{"key":"2023051612174840900_btaa772-B10","first-page":"81","article-title":"From ranknet to lambdarank to lambdamart: an overview","volume":"11","author":"Burges","year":"2010","journal-title":"Learning"},{"key":"2023051612174840900_btaa772-B11","first-page":"89","article-title":"Learning to rank using gradient descent","author":"Burges","year":"2005"},{"key":"2023051612174840900_btaa772-B12","first-page":"193"},{"key":"2023051612174840900_btaa772-B15","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1109\/TCBB.2014.2366112","article-title":"Improving retrieval efficacy of homology searches using the false discovery rate","volume":"12","author":"Carroll","year":"2015","journal-title":"IEEE ACM Trans. Comput. Biol"},{"key":"2023051612174840900_btaa772-B7439507","doi-asserted-by":"crossref","first-page":"D475","DOI":"10.1093\/nar\/gky1134","article-title":"SCOPe: classification of large macromolecular structures in the structural classification of proteins\u2014extended database","author":"Chandonia","year":"2019","journal-title":"Nucleic Acids Research"},{"key":"2023051612174840900_btaa772-B16","doi-asserted-by":"crossref","first-page":"3473","DOI":"10.1093\/bioinformatics\/btx429","article-title":"ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised learning to rank","volume":"33","author":"Chen","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B17","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1093\/bib\/bbw108","article-title":"A comprehensive review and comparison of different computational methods for protein remote homology detection","volume":"19","author":"Chen","year":"2018","journal-title":"Brief. Bioinf"},{"key":"2023051612174840900_btaa772-B18","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo-amino acid composition","volume":"43","author":"Chou","year":"2001","journal-title":"Proteins Struct. Funct. Genet"},{"key":"2023051612174840900_btaa772-B19","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1093\/bioinformatics\/bth466","article-title":"Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes","volume":"21","author":"Chou","year":"2005","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B21","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/0146-664X(80)90054-4","article-title":"Euclidean distance mapping","volume":"14","author":"Danielsson","year":"1980","journal-title":"Comput. Graph. Image Process"},{"key":"2023051612174840900_btaa772-B22","doi-asserted-by":"crossref","first-page":"2655","DOI":"10.1093\/bioinformatics\/btp500","article-title":"A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation","volume":"25","author":"Dong","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B23","first-page":"460","article-title":"On the local optimality of LambdaRank","author":"Donmez","year":"2009","journal-title":"In: Proceedings of\u00a0the\u00a032nd international ACM SIGIR conference on Research and development in information retrieval"},{"key":"2023051612174840900_btaa772-B24","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput. Biol"},{"key":"2023051612174840900_btaa772-B25","doi-asserted-by":"crossref","first-page":"2177","DOI":"10.1093\/nar\/gkp1219","article-title":"Homologous over-extension: a challenge for iterative similarity searches","volume":"38","author":"Gonzalez","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B26","doi-asserted-by":"crossref","first-page":"3025","DOI":"10.1093\/nar\/gkn159","article-title":"Using support vector machine combined with auto covariance to predict protein\u2013protein interactions from protein sequences","volume":"36","author":"Guo","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B27","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.1093\/bioinformatics\/btx780","article-title":"DeepSF: deep convolutional neural network for mapping protein sequences to folds","volume":"34","author":"Hou","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B28","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1186\/1471-2105-11-431","article-title":"Hidden Markov model speed heuristic and iterative HMM search procedure","volume":"11","author":"Johnson","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023051612174840900_btaa772-B27150120","first-page":"2611","article-title":"Permutation Arrays Under the Chebyshev Distance","author":"Klove","journal-title":"IEEE Transactions on Information Theory"},{"key":"2023051612174840900_btaa772-B29","doi-asserted-by":"crossref","first-page":"1339","DOI":"10.1093\/bioinformatics\/btn130","article-title":"Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches","volume":"24","author":"Lee","year":"2008","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B30","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1080\/00031305.1988.10475524","article-title":"Thirteen ways to look at the correlation coefficient","volume":"42","author":"Lee Rodgers","year":"1988","journal-title":"Am. Stat"},{"key":"2023051612174840900_btaa772-B31","doi-asserted-by":"crossref","first-page":"1854","DOI":"10.1587\/transinf.E94.D.1854","article-title":"A short introduction to learning to rank","volume":"E94-D","author":"Li","year":"2011","journal-title":"IEICE Trans. Inf. Syst"},{"key":"2023051612174840900_btaa772-B32","doi-asserted-by":"crossref","first-page":"1650","DOI":"10.1093\/bioinformatics\/bts240","article-title":"PSI-Search: iterative HOE-reduced profile SSEARCH searching","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B33","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1186\/1471-2105-9-510","article-title":"A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis","volume":"9","author":"Liu","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023051612174840900_btaa772-B34","doi-asserted-by":"crossref","first-page":"e46633","DOI":"10.1371\/journal.pone.0046633","article-title":"Using amino acid physicochemical distance transformation for fast protein remote homology detection","volume":"7","author":"Liu","year":"2012","journal-title":"PLoS One"},{"key":"2023051612174840900_btaa772-B0614289","first-page":"S3","article-title":"Using distances between Top-n-gram and residue pairs for protein remote homology detection","author":"Liu","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023051612174840900_btaa772-B37","article-title":"iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition","volume":"9","author":"Liu","year":"2014","journal-title":"PLoS One"},{"key":"2023051612174840900_btaa772-B38","doi-asserted-by":"crossref","first-page":"3492","DOI":"10.1093\/bioinformatics\/btv413","article-title":"Application of learning to rank to protein remote homology detection","volume":"31","author":"Liu","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B39","doi-asserted-by":"crossref","first-page":"W65","DOI":"10.1093\/nar\/gkv458","article-title":"Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences","volume":"43","author":"Liu","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B41","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1093\/bioinformatics\/btx579","article-title":"iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC","volume":"34","author":"Liu","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B42","doi-asserted-by":"crossref","first-page":"e127","DOI":"10.1093\/nar\/gkz740","article-title":"BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches","volume":"47","author":"Liu","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B43","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1093\/nar\/30.1.281","article-title":"CDD: a database of conserved domain alignments with links to domain three-dimensional structure","volume":"30","author":"Marchler-Bauer","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B44","doi-asserted-by":"crossref","first-page":"D225","DOI":"10.1093\/nar\/gkq1189","article-title":"CDD: a Conserved Domain Database for the functional annotation of proteins","volume":"39","author":"Marchler-Bauer","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B45","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1186\/s12859-017-1686-9","article-title":"Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance","volume":"18","author":"Oda","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023051612174840900_btaa772-B46","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/0076-6879(90)83007-V","article-title":"Rapid and sensitive sequence comparison with FASTP and FASTA","volume":"183","author":"Pearson","year":"1990","journal-title":"Methods Enzymol"},{"key":"2023051612174840900_btaa772-B47","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051612174840900_btaa772-B48","doi-asserted-by":"crossref","first-page":"e46","DOI":"10.1093\/nar\/gkw1207","article-title":"Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold","volume":"45","author":"Pearson","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B49","doi-asserted-by":"crossref","first-page":"2353","DOI":"10.1093\/bioinformatics\/btm355","article-title":"Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone","volume":"23","author":"Reid","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B50","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat. Methods"},{"key":"2023051612174840900_btaa772-B51","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/s41586-019-1923-7","article-title":"Improved protein structure prediction using potentials from deep learning","volume":"577","author":"Senior","year":"2020","journal-title":"Nature"},{"key":"2023051612174840900_btaa772-B52","first-page":"35","article-title":"Modern information retrieval: a brief overview","volume":"24","author":"Singhal","year":"2001","journal-title":"IEEE Data Eng. Bull"},{"key":"2023051612174840900_btaa772-B5896626","doi-asserted-by":"crossref","first-page":"303","DOI":"10.3354\/meps07841","article-title":"Identification of the Bray-Curtis similarity index: Comment on Yoshioka (2008)","author":"Somerfield","year":"2008","journal-title":"Marine Ecology Progress Series"},{"key":"2023051612174840900_btaa772-B53","doi-asserted-by":"crossref","first-page":"D158","DOI":"10.1093\/nar\/gkw1099","article-title":"UniProt: the universal protein knowledgebase","volume":"45","author":"The UniProt","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051612174840900_btaa772-B54","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1186\/1471-2105-6-99","article-title":"Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER","volume":"6","author":"Wistrand","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023051612174840900_btaa772-B55","doi-asserted-by":"crossref","first-page":"2982","DOI":"10.1093\/bioinformatics\/btz040","article-title":"Protein fold recognition based on multi-view modeling","volume":"35","author":"Yan","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051612174840900_btaa772-B56","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baz092","article-title":"Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D","volume":"2019","author":"Yang","year":"2019","journal-title":"Database (Oxford)"},{"key":"2023051612174840900_btaa772-B57","doi-asserted-by":"crossref","DOI":"10.1186\/s12918-016-0353-5","article-title":"Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy","volume":"10","author":"Zou","year":"2016","journal-title":"BMC Syst. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa772\/34686141\/btaa772.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/7\/913\/50341192\/btaa772.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/7\/913\/50341192\/btaa772.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T12:19:28Z","timestamp":1684239568000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/7\/913\/5902827"}},"subtitle":[],"editor":[{"given":"Xu","family":"Jinbo","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,9,8]]},"references-count":54,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2021,5,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa772","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,4,1]]},"published":{"date-parts":[[2020,9,8]]}}}