{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,11,6]],"date-time":"2023-11-06T08:41:55Z","timestamp":1699260115958},"reference-count":23,"publisher":"World Scientific Pub Co Pte Lt","issue":"05","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Bioinform. Comput. Biol."],"published-print":{"date-parts":[[2011,10]]},"abstract":"<jats:p> Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects. <\/jats:p>","DOI":"10.1142\/s0219720011005677","type":"journal-article","created":{"date-parts":[[2011,8,11]],"date-time":"2011-08-11T08:32:57Z","timestamp":1313051577000},"page":"597-611","source":"Crossref","is-referenced-by-count":7,"title":["SEQUENCE-BASED ENZYME CATALYTIC DOMAIN PREDICTION USING CLUSTERING AND AGGREGATED MUTUAL INFORMATION CONTENT"],"prefix":"10.1142","volume":"09","author":[{"given":"KWANGMIN","family":"CHOI","sequence":"first","affiliation":[{"name":"Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, Ohio 45229, USA"}]},{"given":"SUN","family":"KIM","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering and Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-1, Gwanak-ro, Gwanak-gu, Seoul 151-744, Republic of Korea"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm270"},{"key":"rf2","doi-asserted-by":"publisher","DOI":"10.1002\/prot.20321"},{"key":"rf3","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm626"},{"key":"rf4","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2007.0042"},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(03)00515-1"},{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-8-153"},{"key":"rf7","doi-asserted-by":"publisher","DOI":"10.1110\/ps.062523907"},{"key":"rf8","doi-asserted-by":"publisher","DOI":"10.1504\/IJDMB.2006.010855"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl398"},{"key":"rf10","volume-title":"Enzyme Nomenclature 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes","author":"Webb E. C.","year":"1992"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1016\/S1367-5931(03)00028-0"},{"key":"rf12","doi-asserted-by":"publisher","DOI":"10.1021\/bi990140p"},{"key":"rf13","doi-asserted-by":"publisher","DOI":"10.1128\/JB.183.8.2405-2410.2001"},{"key":"rf14","doi-asserted-by":"publisher","DOI":"10.1101\/gr.849004"},{"key":"rf15","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-5-113"},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/30.1.42"},{"key":"rf18","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkh063"},{"key":"rf19","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/18.20.6097"},{"key":"rf20","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2003.10.006"},{"key":"rf21","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkp985"},{"key":"rf22","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.85.8.2444"},{"key":"rf23","doi-asserted-by":"publisher","DOI":"10.1016\/S1367-5931(03)00027-9"}],"container-title":["Journal of Bioinformatics and Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219720011005677","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T02:39:16Z","timestamp":1565145556000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219720011005677"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10]]},"references-count":23,"journal-issue":{"issue":"05","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2011,10]]}},"alternative-id":["10.1142\/S0219720011005677"],"URL":"https:\/\/doi.org\/10.1142\/s0219720011005677","relation":{},"ISSN":["0219-7200","1757-6334"],"issn-type":[{"value":"0219-7200","type":"print"},{"value":"1757-6334","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,10]]}}}