{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,9]],"date-time":"2024-06-09T21:33:00Z","timestamp":1717968780487},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Evolutionarily conserved amino acids within proteins characterize functional or structural regions. Conversely, less conserved amino acids within these regions are generally areas of evolutionary divergence. A priori knowledge of biological function and species can help interpret the amino acid differences between sequences. However, this information is often erroneous or unavailable, hampering discovery with supervised algorithms. Also, most of the current unsupervised methods depend on full sequence similarity, which become inaccurate when proteins diverge (e.g. inversions, deletions, insertions). Due to these and other shortcomings, we developed a novel unsupervised algorithm which discovers highly conserved regions and uses two types of information measures: (i) data measures computed from input sequences; and (ii) class measures computed using a priori class groupings in order to reveal subgroups (i.e. classes) or functional characteristics.<\/jats:p>\n               <jats:p>Results: Using known and putative sequences of two proteins belonging to a relatively uncharacterized protein family we were able to group evolutionarily related sequences and identify conserved regions, which are strong homologous association patterns called Aligned Pattern Clusters, within individual proteins and across the members of this family. An initial synthetic demonstration and in silico results reveal that (i) the data measures are unbiased and (ii) our class measures can accurately rank the quality of the evolutionarily relevant groupings. Furthermore, combining our data and class measures allowed us to interpret the results by inferring regions of biological importance within the binding domain of these proteins. Compared to popular supervised methods, our algorithm has a superior runtime and comparable accuracy.<\/jats:p>\n               <jats:p>Availability and implementation: The dataset and results are available at www.pami.uwaterloo.ca\/\u223cealee\/files\/classification2015 .<\/jats:p>\n               <jats:p>Contact: \u00a0akcwong@uwaterloo.ca<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw211","type":"journal-article","created":{"date-parts":[[2016,4,24]],"date-time":"2016-04-24T00:17:47Z","timestamp":1461457067000},"page":"2427-2434","source":"Crossref","is-referenced-by-count":8,"title":["Partitioning and correlating subgroup characteristics from Aligned Pattern Clusters"],"prefix":"10.1093","volume":"32","author":[{"given":"En-Shiun Annie","family":"Lee","sequence":"first","affiliation":[{"name":"1 Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fiona J.","family":"Whelan","sequence":"additional","affiliation":[{"name":"2 Department of Biochemistry and Biomedical Sciences"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dawn M. E.","family":"Bowdish","sequence":"additional","affiliation":[{"name":"3 Department of Pathology and Molecular Medicine, McMaster University, Hamilton, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew K. C.","family":"Wong","sequence":"additional","affiliation":[{"name":"1 Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2016,4,22]]},"reference":[{"key":"2023020112533635100_btw211-B1","doi-asserted-by":"crossref","first-page":"1462","DOI":"10.1006\/bbrc.2002.6378","article-title":"Arginine residues in domain V have a central role for bacteria-binding activity of macrophage scavenger receptor MARCO","volume":"290","author":"Br\u00e4nnstr\u00f6m","year":"2002","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2023020112533635100_btw211-B2","doi-asserted-by":"crossref","first-page":"8.","DOI":"10.1186\/1687-4153-2012-8","article-title":"Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring","volume":"2012","author":"Durston","year":"2012","journal-title":"EURASIP J. Bioinf. Syst. Biol"},{"key":"2023020112533635100_btw211-B3","author":"Lee","year":"2013"},{"key":"2023020112533635100_btw211-B4","doi-asserted-by":"crossref","first-page":"S2.","DOI":"10.1186\/1471-2105-15-S12-S2","article-title":"Discovering co-occurring patterns and their biological significance in protein families","volume":"15","author":"Lee","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023020112533635100_btw211-B5","author":"Leslie","year":"2002"},{"key":"2023020112533635100_btw211-B6","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1146\/annurev.genom.7.080505.115630","article-title":"Predicting the effects of amino acid substitutions on protein function","volume":"7","author":"Ng","year":"2006","journal-title":"Annu. Rev. Genomics Hum. Genet"},{"key":"2023020112533635100_btw211-B7","doi-asserted-by":"crossref","DOI":"10.1007\/3-540-45065-3","volume-title":"Machine Learning and Data Mining in Pattern Recognition","author":"Perner","year":"2003"},{"key":"2023020112533635100_btw211-B8","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/S0006-3495(96)79210-X","article-title":"The Shannon information entropy of protein sequences","volume":"71","author":"Strait","year":"1996","journal-title":"Biophys. J"},{"key":"2023020112533635100_btw211-B9","author":"Wang","year":"1978"},{"key":"2023020112533635100_btw211-B10","doi-asserted-by":"crossref","first-page":"227.","DOI":"10.1186\/1471-2148-12-227","article-title":"The evolution of the class A scavenger receptors","volume":"12","author":"Whelan","year":"2012","journal-title":"BMC Evol. Biol"},{"key":"2023020112533635100_btw211-B11","author":"Wong"},{"key":"2023020112533635100_btw211-B12","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1109\/TKDE.2011.100","article-title":"Discovery of delta closed patterns and noninduced patterns from sequences","volume":"24","author":"Wong","year":"2012","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"2023020112533635100_btw211-B13","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1109\/TKDE.2008.38","article-title":"Simultaneous pattern and data clustering for pattern cluster analysis","volume":"20","author":"Wong","year":"2008","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"2023020112533635100_btw211-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3389\/fimmu.2015.00342","article-title":"The evolution of the scavenger receptor cysteine-rich domain of the class A scavenger receptors","volume":"6","author":"Yap","year":"2015","journal-title":"Front. Immunol"},{"key":"2023020112533635100_btw211-B15","author":"Zhuang"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/16\/2427\/49020488\/bioinformatics_32_16_2427.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/16\/2427\/49020488\/bioinformatics_32_16_2427.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:52:54Z","timestamp":1675291974000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/16\/2427\/2288456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,4,22]]},"references-count":15,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2016,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw211","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,8,15]]},"published":{"date-parts":[[2016,4,22]]}}}