{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:40:17Z","timestamp":1675201217598},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2686,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile\u2013profile method CORAL that aligns individual core regions as gap-free units.<\/jats:p>\n               <jats:p>Results: CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved \u2018readability\u2019 that facilitate manual refinement.<\/jats:p>\n               <jats:p>Availability: CORAL will be included in future versions of the NCBI Cn3D\/CDTree software, which can be downloaded at http:\/\/www.ncbi.nlm.nih.gov\/Structure\/cdtree\/cdtree.shtml.<\/jats:p>\n               <jats:p>Contact: \u00a0fongj@ncbi.nlm.nih.gov.<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp334","type":"journal-article","created":{"date-parts":[[2009,5,27]],"date-time":"2009-05-27T01:49:12Z","timestamp":1243388952000},"page":"1862-1868","source":"Crossref","is-referenced-by-count":3,"title":["CORAL: aligning conserved core regions across domain families"],"prefix":"10.1093","volume":"25","author":[{"given":"Jessica H.","family":"Fong","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]},{"given":"Aron","family":"Marchler-Bauer","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,5,26]]},"reference":[{"key":"2023013112054311800_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B2","doi-asserted-by":"crossref","first-page":"e160","DOI":"10.1371\/journal.pcbi.0030160","article-title":"Automated protein subfamily identification and classification","volume":"3","author":"Brown","year":"2007","journal-title":"PLoS Comput. Biol"},{"key":"2023013112054311800_B3","doi-asserted-by":"crossref","first-page":"2598","DOI":"10.1093\/nar\/gkl274","article-title":"Refining multiple sequence alignments with conserved core regions","volume":"34","author":"Chakrabarti","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B4","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B5","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1093\/bioinformatics\/bth090","article-title":"A comparison of scoring functions for protein sequence profile alignment","volume":"20","author":"Edgar","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013112054311800_B6","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gkj149","article-title":"Pfam: clans, web tools and services","volume":"34","author":"Finn","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B7","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1016\/S0959-440X(96)80058-3","article-title":"Surprising similarities in structure comparison","volume":"6","author":"Gibrat","year":"1996","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023013112054311800_B8","first-page":"361","article-title":"Optimal alignment between groups of sequences and its application to multiple sequence alignment","volume":"9","author":"Gotoh","year":"1993","journal-title":"Comput. Appl. Biosci."},{"key":"2023013112054311800_B9","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1093\/bioinformatics\/17.3.272","article-title":"Picasso: generating a covering set of protein family profiles","volume":"17","author":"Heger","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013112054311800_B10","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1093\/nar\/28.1.228","article-title":"Increased coverage of protein families with the blocks database servers","volume":"28","author":"Henikoff","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B11","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1093\/bioinformatics\/bti233","article-title":"A structure-based method for protein sequence alignment","volume":"21","author":"Kann","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112054311800_B12","doi-asserted-by":"crossref","first-page":"4678","DOI":"10.1093\/nar\/gkm414","article-title":"The identification of complete domains within protein sequences using accurate E-values for semi-global alignment","volume":"35","author":"Kann","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B13","doi-asserted-by":"crossref","first-page":"2264","DOI":"10.1073\/pnas.87.6.2264","article-title":"Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes","volume":"87","author":"Karlin","year":"1990","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112054311800_B14","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1006\/jmbi.2001.4466","article-title":"Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs","volume":"307","author":"Kunin","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023013112054311800_B15","doi-asserted-by":"crossref","first-page":"D257","DOI":"10.1093\/nar\/gkj079","article-title":"SMART 5: domains in the context of genomes and networks","volume":"34","author":"Letunic","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023013112054311800_B16","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1002\/prot.340230309","article-title":"Threading a database of protein cores","volume":"23","author":"Madej","year":"1995","journal-title":"Proteins"},{"key":"2023013112054311800_B17","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1093\/nar\/30.1.281","article-title":"CDD: a database of conserved domain alignments with links to domain three-dimensional structure","volume":"30","author":"Marchler-Bauer","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B18","doi-asserted-by":"crossref","first-page":"D205","DOI":"10.1093\/nar\/gkn845","article-title":"CDD: specific functional annotation with the Conserved Domain Database","volume":"37","author":"Marchler-Bauer","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B19","doi-asserted-by":"crossref","first-page":"1531","DOI":"10.1093\/bioinformatics\/btg185","article-title":"Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments","volume":"19","author":"Mittelman","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013112054311800_B20","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023013112054311800_B21","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023013112054311800_B22","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1186\/1471-2105-6-253","article-title":"ProfNet, a method to derive profile-profile alignment scoring functions that improves the alignments of distantly related proteins","volume":"6","author":"Ohlson","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013112054311800_B23","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1002\/prot.20184","article-title":"Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods","volume":"57","author":"Ohlson","year":"2004","journal-title":"Proteins"},{"key":"2023013112054311800_B24","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1093\/nar\/gkg154","article-title":"Finding weak similarities between proteins by sequence profile comparison","volume":"31","author":"Panchenko","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B25","doi-asserted-by":"crossref","first-page":"3836","DOI":"10.1093\/nar\/24.19.3836","article-title":"Searching databases of conserved sequence regions by aligning protein multiple-alignments","volume":"24","author":"Pietrokovski","year":"1996","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B26","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1110\/ps.9.2.232","article-title":"Comparison of sequence profiles. Strategies for structural predictions using sequence information","volume":"9","author":"Rychlewski","year":"2000","journal-title":"Protein Sci"},{"key":"2023013112054311800_B27","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/S0022-2836(02)01371-2","article-title":"COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance","volume":"326","author":"Sadreyev","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023013112054311800_B28","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1002\/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7","article-title":"Large-scale comparison of protein sequence alignment algorithms with structure alignments","volume":"40","author":"Sauder","year":"2000","journal-title":"Proteins"},{"key":"2023013112054311800_B29","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Schaffer","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B30","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023013112054311800_B31","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"Soding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112054311800_B32","doi-asserted-by":"crossref","first-page":"1267","DOI":"10.1093\/bioinformatics\/bth493","article-title":"SABmark\u2013a benchmark for sequence alignment that covers the entire known fold space","volume":"21","author":"Van Walle","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112054311800_B33","first-page":"252","article-title":"Profile-profile alignment: a powerful tool for protein structure prediction","author":"von Ohsen","year":"2003","journal-title":"Pac. Symp. Biocomput."},{"key":"2023013112054311800_B34","doi-asserted-by":"crossref","first-page":"D308","DOI":"10.1093\/nar\/gkl910","article-title":"The SUPERFAMILY database in 2007: families and functions","volume":"35","author":"Wilson","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023013112054311800_B35","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1006\/jmbi.2001.5293","article-title":"Within the twilight zone: a sensitive profile-profile comparison tool based on information theory","volume":"315","author":"Yona","year":"2002","journal-title":"J. Mol. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/15\/1862\/48994684\/bioinformatics_25_15_1862.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/15\/1862\/48994684\/bioinformatics_25_15_1862.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:21:55Z","timestamp":1675200115000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/15\/1862\/212274"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,5,26]]},"references-count":35,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2009,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp334","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,8,1]]},"published":{"date-parts":[[2009,5,26]]}}}