{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T00:29:04Z","timestamp":1748564944292},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Wide application of modeling of three-dimensional protein structures in biomedical research motivates developing protein sequence alignment computer tools featuring high alignment accuracy and sensitivity to remotely homologous proteins. In this paper, we aim at improving the quality of alignments between sequence profiles, encoded multiple sequence alignments. Modeling profile contexts, fixed-length profile fragments, is engaged to achieve this goal.<\/jats:p>\n               <jats:p>Results: We develop a hierarchical Dirichlet process mixture model to describe the distribution of profile contexts, which is able to capture dependencies between amino acids in each context position. The model represents an attempt at modeling profile fragments at several hierarchical levels, within the profile and among profiles. Even modeling unit-length contexts leads to greater improvements than processing 13-length contexts previously. We develop a new profile comparison method, called COMER, integrating the model. A benchmark with three other profile-to-profile comparison methods shows an increase in both sensitivity and alignment quality.<\/jats:p>\n               <jats:p>Availability and Implementation: COMER is open-source software licensed under the GNU GPLv3, available at https:\/\/sourceforge.net\/projects\/comer.<\/jats:p>\n               <jats:p>Contact: \u00a0mindaugas.margelevicius@bti.vu.lt<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw213","type":"journal-article","created":{"date-parts":[[2016,4,24]],"date-time":"2016-04-24T00:17:47Z","timestamp":1461457067000},"page":"2744-2752","source":"Crossref","is-referenced-by-count":8,"title":["Bayesian nonparametrics in protein remote homology search"],"prefix":"10.1093","volume":"32","author":[{"given":"Mindaugas","family":"Margelevi\u010dius","sequence":"first","affiliation":[{"name":"Institute of Biotechnology, Vilnius University, Vilnius 10257, Lithuania"}]}],"member":"286","published-online":{"date-parts":[[2016,4,22]]},"reference":[{"key":"2023020113391657400_btw213-B1","doi-asserted-by":"crossref","first-page":"261","DOI":"10.2307\/2335470","article-title":"Logistic-normal distributions: some properties and uses","volume":"67","author":"Aitchison","year":"1980","journal-title":"Biometrika"},{"key":"2023020113391657400_btw213-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B3","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1093\/nar\/gkn981","article-title":"PSI-BLAST pseudocounts and the minimum description length principle","volume":"37","author":"Altschul","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B4","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B5","doi-asserted-by":"crossref","first-page":"3770","DOI":"10.1073\/pnas.0810767106","article-title":"Sequence context-specific profiles for homology searching","volume":"106","author":"Biegert","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020113391657400_btw213-B6","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1214\/aos\/1176342372","article-title":"Ferguson distributions via P\u00f3lya urn schemes","volume":"1","author":"Blackwell","year":"1973","journal-title":"Ann. Stat"},{"key":"2023020113391657400_btw213-B7","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: An evolutionary classification of protein domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLOS Comput. Biol"},{"key":"2023020113391657400_btw213-B8","doi-asserted-by":"crossref","first-page":"e1002195.","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLOS Comput. Biol"},{"key":"2023020113391657400_btw213-B9","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1214\/aos\/1176342360","article-title":"A Bayesian analysis of some nonparametric problems","volume":"1","author":"Ferguson","year":"1973","journal-title":"Ann. Stat"},{"key":"2023020113391657400_btw213-B10","doi-asserted-by":"crossref","first-page":"D304","DOI":"10.1093\/nar\/gkt1240","article-title":"SCOPe: Structural classification of proteins\u2014extended, integrating SCOP and ASTRAL data and classification of new structures","volume":"42","author":"Fox","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B11","doi-asserted-by":"crossref","first-page":"2177","DOI":"10.1093\/nar\/gkp1219","article-title":"Homologous over-extension: a challenge for iterative similarity searches","volume":"38","author":"Gonzalez","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B12","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1016\/0022-2836(94)90032-9","article-title":"Position-based sequence weights","volume":"243","author":"Henikoff","year":"1994","journal-title":"J. Mol. Biol"},{"key":"2023020113391657400_btw213-B13","doi-asserted-by":"crossref","first-page":"2780","DOI":"10.1093\/bioinformatics\/btn507","article-title":"Searching protein structure databases with DaliLite v.3","volume":"24","author":"Holm","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020113391657400_btw213-B14","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1198\/1061860043001","article-title":"A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model","volume":"13","author":"Jain","year":"2004","journal-title":"J. Comput. Graph. Stat"},{"key":"2023020113391657400_btw213-B15","doi-asserted-by":"crossref","first-page":"W38","DOI":"10.1093\/nar\/gkr441","article-title":"FFAS server: novel features and applications","volume":"39","author":"Jaroszewski","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B16","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1006\/jmbi.1999.3091","article-title":"Protein secondary structure prediction based on position-specific scoring matrices","volume":"292","author":"Jones","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023020113391657400_btw213-B17","doi-asserted-by":"crossref","first-page":"3733","DOI":"10.1073\/pnas.1321614111","article-title":"Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative","volume":"111","author":"Khafizov","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020113391657400_btw213-B18","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1002\/prot.24448","article-title":"CASP10 results compared to those of previous CASP experiments","volume":"82","author":"Kryshtafovych","year":"2014","journal-title":"Proteins"},{"key":"2023020113391657400_btw213-B19","doi-asserted-by":"crossref","first-page":"89.","DOI":"10.1186\/1471-2105-11-89","article-title":"Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison","volume":"11","author":"Margelevi\u010dius","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020113391657400_btw213-B20","doi-asserted-by":"crossref","first-page":"7.","DOI":"10.3410\/B4-7","article-title":"The Protein Structure Initiative: achievements and visions for the future","volume":"4","author":"Montelione","year":"2012","journal-title":"F1000 Biol. Rep"},{"key":"2023020113391657400_btw213-B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/prot.24452","article-title":"Critical assessment of methods of protein structure prediction (CASP) \u2013 round X","volume":"82","author":"Moult","year":"2014","journal-title":"Proteins"},{"key":"2023020113391657400_btw213-B22","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023020113391657400_btw213-B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1089\/cmb.2012.0244","article-title":"Dirichlet mixtures, the Dirichlet process, and the structure of protein space","volume":"20","author":"Nguyen","year":"2013","journal-title":"J. Comput. Biol"},{"key":"2023020113391657400_btw213-B24","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020113391657400_btw213-B25","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1038\/nprot.2010.5","article-title":"I-TASSER: a unified platform for automated protein structure and function prediction","volume":"5","author":"Roy","year":"2010","journal-title":"Nat. Protoc"},{"key":"2023020113391657400_btw213-B26","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.sbi.2009.04.009","article-title":"Discrete-continuous duality of protein structure space","volume":"19","author":"Sadreyev","year":"2009","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023020113391657400_btw213-B27","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Sch\u00e4ffer","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B28","doi-asserted-by":"crossref","first-page":"1531","DOI":"10.1016\/j.str.2013.08.007","article-title":"Protein modeling: What happened to the \u201cprotein structure gap\u201d?","volume":"21","author":"Schwede","year":"2013","journal-title":"Structure"},{"key":"2023020113391657400_btw213-B29","first-page":"327","article-title":"Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology","volume":"12","author":"Sj\u00f6lander","year":"1996","journal-title":"Comput. Appl. Biosci"},{"key":"2023020113391657400_btw213-B30","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"S\u00f6ding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020113391657400_btw213-B31","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1016\/j.sbi.2011.03.005","article-title":"Protein sequence comparison and fold recognition: progress and good-practice benchmarking","volume":"21","author":"S\u00f6ding","year":"2011","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023020113391657400_btw213-B32","doi-asserted-by":"crossref","first-page":"1566","DOI":"10.1198\/016214506000000302","article-title":"Hierarchical Dirichlet processes","volume":"101","author":"Teh","year":"2006","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020113391657400_btw213-B33","doi-asserted-by":"crossref","first-page":"D191","DOI":"10.1093\/nar\/gku469","article-title":"Activities at the Universal protein resource (UniProt)","volume":"42","author":"The UniProt Consortium","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020113391657400_btw213-B34","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1002\/prot.22515","article-title":"The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins","volume":"77","author":"Venclovas","year":"2009","journal-title":"Proteins"},{"key":"2023020113391657400_btw213-B101","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1111\/j.0006-341X.2000.01134.x","article-title":"A permutation test to compare receiver operating characteristic curves","volume":"56","author":"Venkatraman","year":"2000","journal-title":"Biometrics"},{"key":"2023020113391657400_btw213-B35","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1006\/jmbi.1993.1626","article-title":"Comparative protein modelling by satisfaction of spatial restraints","volume":"234","author":"\u0160ali","year":"1993","journal-title":"J. Mol. Biol"},{"key":"2023020113391657400_btw213-B36","doi-asserted-by":"crossref","first-page":"2619","DOI":"10.1038\/srep02619","article-title":"A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction","volume":"3","author":"Yan","year":"2013","journal-title":"Sci. Rep"},{"key":"2023020113391657400_btw213-B37","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1093\/bioinformatics\/bti070","article-title":"The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions","volume":"21","author":"Yu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020113391657400_btw213-B38","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Zhang","year":"2004","journal-title":"Proteins"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/18\/2744\/49020828\/bioinformatics_32_18_2744.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/18\/2744\/49020828\/bioinformatics_32_18_2744.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:44:20Z","timestamp":1675295060000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/18\/2744\/1742891"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,4,22]]},"references-count":39,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2016,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw213","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,9,15]]},"published":{"date-parts":[[2016,4,22]]}}}