{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:40:07Z","timestamp":1675298407526},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation : Insertion\/deletion (indel) and amino acid substitution are two common events that lead to the evolution of and variations in protein sequences. Further, many of the human diseases and functional divergence between homologous proteins are more related to indel mutations, even though they occur less often than the substitution mutations do. A reliable identification of indels and their flanking regions is a major challenge in research related to protein evolution, structures and functions.<\/jats:p>\n               <jats:p>Results : In this article, we propose a novel scheme to predict indel flanking regions in a protein sequence for a given protein fold, based on a variable-order Markov model. The proposed indel flanking region (IndelFR) predictors are designed based on prediction by partial match (PPM) and probabilistic suffix tree (PST), which are referred to as the PPM IndelFR and PST IndelFR predictors, respectively. The overall performance evaluation results show that the proposed predictors are able to predict IndelFRs in the protein sequences with a high accuracy and F1 measure. In addition, the results show that if one is interested only in predicting IndelFRs in protein sequences, it would be preferable to use the proposed predictors instead of HMMER 3.0 in view of the substantially superior performance of the former.<\/jats:p>\n               <jats:p>Contact : m_alshat@ece.concordia.ca or omair@ece.concordia.ca or swamy@ece.concordia.ca .<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu556","type":"journal-article","created":{"date-parts":[[2014,9,2]],"date-time":"2014-09-02T00:17:48Z","timestamp":1409617068000},"page":"40-47","source":"Crossref","is-referenced-by-count":5,"title":["Prediction of Indel flanking regions in protein sequences using a variable-order Markov model"],"prefix":"10.1093","volume":"31","author":[{"given":"Mufleh","family":"Al-Shatnawi","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada"}]},{"given":"M. Omair","family":"Ahmad","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada"}]},{"given":"M.N.S.","family":"Swamy","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Concordia University, QC H3G 2W1, Canada"}]}],"member":"286","published-online":{"date-parts":[[2014,8,31]]},"reference":[{"key":"2023020116155931200_btu556-B1","doi-asserted-by":"crossref","first-page":"D419","DOI":"10.1093\/nar\/gkm993","article-title":"Data growth and its impact on the scop database: new developments","volume":"36","author":"Andreeva","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023020116155931200_btu556-B2","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1613\/jair.1491","article-title":"On prediction using variable order markov models","volume":"22","author":"Begleiter","year":"2004","journal-title":"J. Artif. Intell. Res."},{"key":"2023020116155931200_btu556-B3","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1093\/bioinformatics\/17.1.23","article-title":"Variations on probabilistic suffix trees: statistical modeling and prediction of protein families","volume":"17","author":"Bejerano","year":"2001","journal-title":"Bioinformatics"},{"key":"2023020116155931200_btu556-B4","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1006\/jmbi.1993.1105","article-title":"Empirical and structural models for insertions and deletions in the divergent evolution of proteins","volume":"229","author":"Benner","year":"1993","journal-title":"J. Mol. Biol."},{"key":"2023020116155931200_btu556-B5","doi-asserted-by":"crossref","first-page":"4661","DOI":"10.1073\/pnas.0330964100","article-title":"Majority of divergence between closely related dna samples is due to indels","volume":"100","author":"Britten","year":"2003","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023020116155931200_btu556-B6","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1214\/aos\/1018031204","article-title":"Variable length Markov chains","volume":"27","author":"Buhlmann","year":"1999","journal-title":"The Annals of Statistics"},{"key":"2023020116155931200_btu556-B7","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1186\/1471-2105-8-227","article-title":"Relationship between insertion\/deletion (indel) frequency of proteins and essentiality","volume":"8","author":"Chan","year":"2007","journal-title":"BMC bioinformatics"},{"key":"2023020116155931200_btu556-B8","doi-asserted-by":"crossref","first-page":"D189","DOI":"10.1093\/nar\/gkh034","article-title":"The ASTRAL compendium in 2004","volume":"32","author":"Chandonia","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023020116155931200_btu556-B9","doi-asserted-by":"crossref","first-page":"1523","DOI":"10.1093\/molbev\/msp063","article-title":"Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria","volume":"26","author":"Chen","year":"2009","journal-title":"Mol. Biol. Evol."},{"key":"2023020116155931200_btu556-B10","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1109\/TCOM.1984.1096090","article-title":"Data compression using adaptive coding and partial string matching","volume":"32","author":"Cleary","year":"1984","journal-title":"IEEE Trans. Commun."},{"key":"2023020116155931200_btu556-B11","first-page":"2447","article-title":"Mutations at coding repeat sequences in mismatch repair-deficient human cancers: toward a new concept of target genes for instability","volume":"62","author":"Duval","year":"2002","journal-title":"Cancer Res."},{"key":"2023020116155931200_btu556-B12","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020116155931200_btu556-B13","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recogn. Lett."},{"key":"2023020116155931200_btu556-B14","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023020116155931200_btu556-B15","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1006\/jsbi.2001.4335","article-title":"Fold change in evolution of protein structures","volume":"134","author":"Grishin","year":"2001","journal-title":"J. Struct. Biol."},{"key":"2023020116155931200_btu556-B16","first-page":"135","article-title":"Using substitution probabilities to improve position-specific scoring matrices","volume":"12","author":"Henikoff","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"2023020116155931200_btu556-B17","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1186\/1471-2105-9-293","article-title":"Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins","volume":"9","author":"Hsing","year":"2008","journal-title":"BMC bioinformatics"},{"key":"2023020116155931200_btu556-B18","first-page":"95","article-title":"Hidden Markov models for sequence analysis: extension and analysis of the basic method","volume":"12","author":"Hughey","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"2023020116155931200_btu556-B19","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1186\/1471-2105-8-444","article-title":"Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions","volume":"8","author":"Jiang","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023020116155931200_btu556-B20","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1093\/bioinformatics\/14.10.846","article-title":"Hidden Markov models for detecting remote protein homologies","volume":"14","author":"Karplus","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020116155931200_btu556-B21","first-page":"2256","article-title":"Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions","volume":"D60","author":"Krissinel","year":"2004","journal-title":"Acta cryst."},{"key":"2023020116155931200_btu556-B22","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1006\/jmbi.1994.1104","article-title":"Hidden Markov models in computational biology: applications to protein modeling","volume":"235","author":"Krogh","year":"1994","journal-title":"J. Mol. Biol."},{"key":"2023020116155931200_btu556-B23","doi-asserted-by":"crossref","first-page":"1917","DOI":"10.1109\/26.61469","article-title":"Implementing the PPM data compression scheme","volume":"38","author":"Moffat","year":"1990","journal-title":"IEEE Trans. Commun."},{"key":"2023020116155931200_btu556-B24","first-page":"363","article-title":"Towards behaviometric security systems: learning to identify a typist","volume-title":"Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databasess, Cavtat-Dubrovnik, Croatia","author":"Nisenson","year":"2003"},{"key":"2023020116155931200_btu556-B25","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1517\/14622416.3.1.131","article-title":"Recent progress in multiple sequence alignment: a survey","volume":"3","author":"Notredame","year":"2002","journal-title":"Pharmacogenomics"},{"key":"2023020116155931200_btu556-B26","doi-asserted-by":"crossref","first-page":"D290","DOI":"10.1093\/nar\/gkr1065","article-title":"The Pfam protein families database","volume":"40","author":"Punta","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023020116155931200_btu556-B27","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1023\/A:1026490906255","article-title":"The power of amnesia: Learning probabilistic automata with variable memory length","volume":"25","author":"Ron","year":"1996","journal-title":"Mach. Learn."},{"key":"2023020116155931200_btu556-B28","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1093\/bib\/bbm064","article-title":"ROC analysis: applications to the classification of biological sequences and 3d structures","volume":"9","author":"Sonego","year":"2008","journal-title":"Brief. Bioinform."},{"key":"2023020116155931200_btu556-B29","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1038\/nature07175","article-title":"Single-nucleotide mutation rate increases close to insertions\/deletions in eukaryotes","volume":"455","author":"Tian","year":"2008","journal-title":"Nature"},{"key":"2023020116155931200_btu556-B30","doi-asserted-by":"crossref","first-page":"1267","DOI":"10.1093\/bioinformatics\/bth493","article-title":"SABmark benchmark for sequence alignment that covers the entire known fold space","volume":"21","author":"Walle","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020116155931200_btu556-B31","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1109\/18.382012","article-title":"The context-tree weighting method: basic properties","volume":"41","author":"Willems","year":"1995","journal-title":"IEEE Trans. Inform. Theory"},{"key":"2023020116155931200_btu556-B32","volume-title":"Introduction to Computational Proteomics","author":"Yona","year":"2011"},{"key":"2023020116155931200_btu556-B33","doi-asserted-by":"crossref","first-page":"e14316","DOI":"10.1371\/journal.pone.0014316","article-title":"The combined effects of amino acid substitutions and indels on the evolution of structure within protein families","volume":"5","author":"Zhang","year":"2010","journal-title":"PloS One"},{"key":"2023020116155931200_btu556-B34","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/molbev\/msq196","article-title":"Impact of indels on the flanking regions in structural domains","volume":"28","author":"Zhang","year":"2011","journal-title":"Mol. Biol. Evol."},{"key":"2023020116155931200_btu556-B35","doi-asserted-by":"crossref","first-page":"D512","DOI":"10.1093\/nar\/gkr1107","article-title":"IndelFR: a database of indels in protein structures and their flanking regions","volume":"40","author":"Zhang","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023020116155931200_btu556-B36","doi-asserted-by":"crossref","first-page":"2353","DOI":"10.1093\/molbev\/msp144","article-title":"Genomewide association between insertions\/deletions and the nucleotide diversity in bacteria","volume":"26","author":"Zhu","year":"2009","journal-title":"Mol. Biol. Evol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/1\/40\/49012318\/bioinformatics_31_1_40.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/1\/40\/49012318\/bioinformatics_31_1_40.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:26:17Z","timestamp":1675297577000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/1\/40\/2364860"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8,31]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu556","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,1,1]]},"published":{"date-parts":[[2014,8,31]]}}}