{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T07:18:35Z","timestamp":1769239115194,"version":"3.49.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S4","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2007,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were \"rediscovered\". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-8-s4-s10","type":"journal-article","created":{"date-parts":[[2007,5,23]],"date-time":"2007-05-23T16:35:21Z","timestamp":1179938121000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Clustering protein environments for function prediction: finding PROSITE motifs in 3D"],"prefix":"10.1186","volume":"8","author":[{"given":"Sungroh","family":"Yoon","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jessica C","family":"Ebert","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eui-Young","family":"Chung","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giovanni","family":"De Micheli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Russ B","family":"Altman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2007,5,22]]},"reference":[{"issue":"3","key":"1913_CR1","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1093\/bib\/bbl004","volume":"7","author":"I Friedberg","year":"2006","unstructured":"Friedberg I: Automated protein function prediction \u2013 the genomic challenge. Brief Bioinform 2006, 7(3):225\u2013242. 10.1093\/bib\/bbl004","journal-title":"Brief Bioinform"},{"issue":"41","key":"1913_CR2","doi-asserted-by":"publisher","first-page":"14754","DOI":"10.1073\/pnas.0404569101","volume":"101","author":"F Pazos","year":"2004","unstructured":"Pazos F, Sternberg MJ: Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci USA 2004, 101(41):14754\u201314759. 10.1073\/pnas.0404569101","journal-title":"Proc Natl Acad Sci USA"},{"issue":"13","key":"1913_CR3","doi-asserted-by":"publisher","first-page":"1644","DOI":"10.1093\/bioinformatics\/btg226","volume":"19","author":"JA Barker","year":"2003","unstructured":"Barker JA, Thornton JM: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics 2003, 19(13):1644\u20131649. 10.1093\/bioinformatics\/btg226","journal-title":"Bioinformatics"},{"issue":"2\u20133","key":"1913_CR4","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1006\/jsbi.2001.4391","volume":"134","author":"JA Di Gennaro","year":"2001","unstructured":"Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, Neilson LI, Fetrow JS: Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 2001, 134(2\u20133):232\u2013245. 10.1006\/jsbi.2001.4391","journal-title":"J Struct Biol"},{"issue":"2\u20133","key":"1913_CR5","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1023\/A:1026115125950","volume":"4","author":"O Lichtarge","year":"2003","unstructured":"Lichtarge O, Yao H, Kristensen DM, Madabushi S, Mihalek I: Accurate and scalable identification of functional sites by evolutionary tracing. J Struct Funct Genomics 2003, 4(2\u20133):159\u2013166. 10.1023\/A:1026115125950","journal-title":"J Struct Funct Genomics"},{"key":"1913_CR6","doi-asserted-by":"crossref","unstructured":"Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N: ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic acids research 2005, (33 Web Server):W299\u2013302. 10.1093\/nar\/gki370","DOI":"10.1093\/nar\/gki370"},{"issue":"22","key":"1913_CR7","doi-asserted-by":"publisher","first-page":"12473","DOI":"10.1073\/pnas.211436698","volume":"98","author":"MJ Ondrechen","year":"2001","unstructured":"Ondrechen MJ, Clifton JG, Ringe D: THEMATICS: a simple computational predictor of enzyme function from structure. Proc Natl Acad Sci USA 2001, 98(22):12473\u201312478. 10.1073\/pnas.211436698","journal-title":"Proc Natl Acad Sci USA"},{"issue":"Suppl 4","key":"1913_CR8","doi-asserted-by":"publisher","first-page":"S5","DOI":"10.1186\/1471-2105-6-S4-S5","volume":"6","author":"G Ausiello","year":"2005","unstructured":"Ausiello G, Via A, Helmer-Citterich M: Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics 2005, 6(Suppl 4):S5. 10.1186\/1471-2105-6-S4-S5","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"1913_CR9","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1002\/prot.10339","volume":"52","author":"M Jambon","year":"2003","unstructured":"Jambon M, Imberty A, Deleage G, Geourjon C: A new bioinformatic approach to detect common 3D sites in protein structures. Proteins 2003, 52(2):137\u2013145. 10.1002\/prot.10339","journal-title":"Proteins"},{"key":"1913_CR10","first-page":"12","volume":"3","author":"SC Bagley","year":"1995","unstructured":"Bagley SC, Wei L, Cheng C, Altman RB: Characterizing oriented protein structural sites using biochemical properties. Proc Int Conf Intell Syst Mol Biol 1995, 3: 12\u201320.","journal-title":"Proc Int Conf Intell Syst Mol Biol"},{"issue":"4","key":"1913_CR11","doi-asserted-by":"publisher","first-page":"622","DOI":"10.1002\/pro.5560040404","volume":"4","author":"SC Bagley","year":"1995","unstructured":"Bagley SC, Altman RB: Characterizing the microenvironment surrounding protein sites. Protein Sci 1995, 4(4):622\u2013635.","journal-title":"Protein Sci"},{"issue":"1","key":"1913_CR12","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1142\/S0219720003000150","volume":"1","author":"L Wei","year":"2003","unstructured":"Wei L, Altman RB: Recognizing complex, asymmetric functional sites in protein structures using a Bayesian scoring function. J Bioinform Comput Biol 2003, 1(1):119\u2013138. 10.1142\/S0219720003000150","journal-title":"J Bioinform Comput Biol"},{"key":"1913_CR13","first-page":"497","volume-title":"Pac Symp Biocomput","author":"L Wei","year":"1998","unstructured":"Wei L, Altman RB: Recognizing protein binding sites using statistical descriptions of their 3D environments. Pac Symp Biocomput 1998, 497\u2013508."},{"issue":"5","key":"1913_CR14","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1016\/S1359-0278(96)00052-1","volume":"1","author":"SC Bagley","year":"1996","unstructured":"Bagley SC, Altman RB: Conserved features in the active site of nonhomologous serine proteases. Fold Des 1996, 1(5):371\u2013379. 10.1016\/S1359-0278(96)00052-1","journal-title":"Fold Des"},{"issue":"13","key":"1913_CR15","doi-asserted-by":"publisher","first-page":"3324","DOI":"10.1093\/nar\/gkg553","volume":"31","author":"MP Liang","year":"2003","unstructured":"Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB: WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures. Nucleic acids research 2003, 31(13):3324\u20133327. 10.1093\/nar\/gkg553","journal-title":"Nucleic acids research"},{"issue":"Pt 6 No 1","key":"1913_CR16","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1107\/S0907444902003451","volume":"58","author":"HM Berman","year":"2002","unstructured":"Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, et al.: The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 6 No 1):899\u2013907. 10.1107\/S0907444902003451","journal-title":"Acta Crystallogr D Biol Crystallogr"},{"issue":"3","key":"1913_CR17","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1016\/j.jmb.2004.02.047","volume":"338","author":"AV Tendulkar","year":"2004","unstructured":"Tendulkar AV, Joshi AA, Sohoni MA, Wangikar PP: Clustering of protein structural fragments reveals modular building block approach of nature. Journal of molecular biology 2004, 338(3):611\u2013629. 10.1016\/j.jmb.2004.02.047","journal-title":"Journal of molecular biology"},{"key":"1913_CR18","doi-asserted-by":"crossref","unstructured":"Espadaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles FX, Sternberg MJ, Oliva B: ArchDB: automated protein loop classification as a tool for structural genomics. Nucleic acids research 2004, (32 Database):D185\u2013188. 10.1093\/nar\/gkh002","DOI":"10.1093\/nar\/gkh002"},{"issue":"3","key":"1913_CR19","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1002\/prot.20136","volume":"56","author":"N Fernandez-Fuentes","year":"2004","unstructured":"Fernandez-Fuentes N, Hermoso A, Espadaler J, Querol E, Aviles FX, Oliva B: Classification of common functional loops of kinase super-families. Proteins 2004, 56(3):539\u2013555. 10.1002\/prot.20136","journal-title":"Proteins"},{"issue":"4","key":"1913_CR20","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1002\/prot.20661","volume":"61","author":"SD Mooney","year":"2005","unstructured":"Mooney SD, Liang MH, DeConde R, Altman RB: Structural characterization of proteins using residue environments. Proteins 2005, 61(4):741\u2013747. 10.1002\/prot.20661","journal-title":"Proteins"},{"key":"1913_CR21","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/1472-6807-6-4","volume":"6","author":"B Peters","year":"2006","unstructured":"Peters B, Moad C, Youn E, Buffington K, Heiland R, Mooney S: Identification of similar regions of protein structures using integrated sequence and structure analysis tools. BMC structural biology 2006, 6: 4. 10.1186\/1472-6807-6-4","journal-title":"BMC structural biology"},{"key":"1913_CR22","doi-asserted-by":"crossref","unstructured":"Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ: The PROSITE database. Nucleic acids research 2006, (34 Database):D227\u2013230. 10.1093\/nar\/gkj063","DOI":"10.1093\/nar\/gkj063"},{"key":"1913_CR23","first-page":"204","volume-title":"Pac Symp Biocomput","author":"MP Liang","year":"2003","unstructured":"Liang MP, Brutlag DL, Altman RB: Automated construction of structural motifs for predicting functional sites on protein structures. Pac Symp Biocomput 2003, 204\u2013215."},{"key":"1913_CR24","unstructured":"RCSB Protein Data Bank[ftp:\/\/ftp.rcsb.org\/pub\/pdb\/derived_data\/NR\/]"},{"key":"1913_CR25","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316801","volume-title":"Finding groups in data: an introduction to cluster analysis","author":"LRP Kaufman","year":"1990","unstructured":"Kaufman LRP: Finding groups in data: an introduction to cluster analysis. New York: Wiley; 1990."},{"key":"1913_CR26","volume-title":"The elements of statistical learning","author":"H T","year":"2003","unstructured":"T H, R T, JH F: The elements of statistical learning. Springer; 2003."},{"key":"1913_CR27","unstructured":"FEATURE Microenvironment Clusters[http:\/\/helix-web.stanford.edu\/pubs\/syoon-cluster\/]"},{"key":"1913_CR28","unstructured":"The PyMol Molecular Graphics System[http:\/\/www.pymol.org]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-8-S4-S10.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:28:27Z","timestamp":1630445307000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-8-S4-S10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,5]]},"references-count":28,"journal-issue":{"issue":"S4","published-print":{"date-parts":[[2007,5]]}},"alternative-id":["1913"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-8-s4-s10","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,5]]},"assertion":[{"value":"22 May 2007","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S10"}}