{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T03:38:28Z","timestamp":1774928308164,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed.<\/jats:p><jats:p>Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence\/HMM alignments, then HMM\u2013HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains &amp;gt;50 residues.<\/jats:p><jats:p>Availability: The Pfam assignment data in PDBfam are available at http:\/\/dunbrack2.fccc.edu\/ProtCid\/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.<\/jats:p><jats:p>Contact: Roland.Dunbracks@fccc.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts533","type":"journal-article","created":{"date-parts":[[2012,9,1]],"date-time":"2012-09-01T20:37:41Z","timestamp":1346531861000},"page":"2763-2772","source":"Crossref","is-referenced-by-count":57,"title":["Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB"],"prefix":"10.1093","volume":"28","author":[{"given":"Qifang","family":"Xu","sequence":"first","affiliation":[{"name":"Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA"}]},{"suffix":"Jr","given":"Roland L.","family":"Dunbrack","sequence":"additional","affiliation":[{"name":"Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,8,31]]},"reference":[{"key":"2023012513145830500_bts533-B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of database programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B2","doi-asserted-by":"crossref","first-page":"1761","DOI":"10.1371\/journal.pcbi.0030178","article-title":"Characterization of protein hubs by inferring interacting motifs from protein interactions","volume":"3","author":"Aragues","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023012513145830500_bts533-B3","doi-asserted-by":"crossref","first-page":"D154","DOI":"10.1093\/nar\/gki070","article-title":"The universal protein resource (UniProt)","volume":"33","author":"Bairoch","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B4","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B5","doi-asserted-by":"crossref","first-page":"869","DOI":"10.1016\/j.str.2009.03.015","article-title":"PSI-2: structural genomics to cover protein domain family space","volume":"17","author":"Dessailly","year":"2009","journal-title":"Structure"},{"key":"2023012513145830500_bts533-B6","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1093\/bioinformatics\/bti011","article-title":"iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions","volume":"21","author":"Finn","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B7","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gkj149","article-title":"Pfam: clans, web tools and services","volume":"34","author":"Finn","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B8","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1093\/nar\/gkp985","article-title":"The Pfam protein families database","volume":"38","author":"Finn","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B9","doi-asserted-by":"crossref","first-page":"D211","DOI":"10.1093\/nar\/gkn785","article-title":"InterPro: the integrative protein signature database","volume":"37","author":"Hunter","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B10","doi-asserted-by":"crossref","first-page":"W284","DOI":"10.1093\/nar\/gki418","article-title":"FFAS03: a server for profile\u2013profile sequence alignments","volume":"33","author":"Jaroszewski","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B11","doi-asserted-by":"crossref","first-page":"2287","DOI":"10.1093\/bioinformatics\/bti374","article-title":"Quasi-consensus-based comparison of profile hidden Markov models for protein sequences","volume":"21","author":"Kahsay","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B12","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1016\/j.jmb.2007.05.022","article-title":"Inference of macromolecular assemblies from crystalline state","volume":"372","author":"Krissinel","year":"2007","journal-title":"J. Mol. Biol."},{"key":"2023012513145830500_bts533-B13","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1093\/bioinformatics\/17.3.282","article-title":"Clustering of highly homologous sequences to reduce the size of large protein databases","volume":"17","author":"Li","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B14","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012513145830500_bts533-B15","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"CATH\u2014a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023012513145830500_bts533-B16","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1093\/nar\/30.1.289","article-title":"SUPFAM\u2014a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes","volume":"30","author":"Pandit","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B17","doi-asserted-by":"crossref","first-page":"1987","DOI":"10.1093\/bioinformatics\/btn384","article-title":"Powerful fusion: PSI-BLAST and consensus sequences","volume":"24","author":"Przybylski","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B18","doi-asserted-by":"crossref","first-page":"D290","DOI":"10.1093\/nar\/gkr1065","article-title":"The Pfam protein families database","volume":"40","author":"Punta","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B19","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012513145830500_bts533-B20","doi-asserted-by":"crossref","first-page":"5877","DOI":"10.1073\/pnas.95.11.5857","article-title":"SMART, a simple modular architecture research tool: identification of signaling domains","volume":"95","author":"Schultz","year":"1998","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012513145830500_bts533-B21","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1093\/bib\/3.3.246","article-title":"ProDom: automated clustering of homologous domains","volume":"3","author":"Servant","year":"2002","journal-title":"Brief. Bioinform."},{"key":"2023012513145830500_bts533-B22","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1111\/j.2517-6161.1991.tb01857.x","article-title":"A reliable data-based bandwidth selection method for kernel density estimation","volume":"53","author":"Sheather","year":"1991","journal-title":"J. R. Stat. Soc. Series B Stat. Methodol."},{"key":"2023012513145830500_bts533-B23","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"S\u00f6ding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B24","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","article-title":"Pfam: a comprehensive database of protein domain families based on seed alignments","volume":"28","author":"Sonnhammer","year":"1997","journal-title":"Proteins"},{"key":"2023012513145830500_bts533-B25","doi-asserted-by":"crossref","first-page":"D718","DOI":"10.1093\/nar\/gkq962","article-title":"3did: identification and classification of domain-based interactions of known three-dimensional structure","volume":"39","author":"Stein","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B26","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1002\/prot.20717","article-title":"Domain definition and target classification for CASP6","volume":"61","author":"Tress","year":"2005","journal-title":"Proteins"},{"key":"2023012513145830500_bts533-B27","doi-asserted-by":"crossref","first-page":"D262","DOI":"10.1093\/nar\/gki058","article-title":"E-MSD: an integrated data resource for bioinformatics","volume":"33","author":"Velankar","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B28","doi-asserted-by":"crossref","first-page":"1832","DOI":"10.1038\/nprot.2008.184","article-title":"SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling","volume":"3","author":"Wang","year":"2008","journal-title":"Nat. Protoc."},{"key":"2023012513145830500_bts533-B29","doi-asserted-by":"crossref","first-page":"988","DOI":"10.1093\/bioinformatics\/bti082","article-title":"PDBML: the representation of archival macromolecular structure data in XML","volume":"21","author":"Westbrook","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B34","doi-asserted-by":"crossref","first-page":"D761","DOI":"10.1093\/nar\/gkq1059","article-title":"The protein common interface database (ProtCID)\u2014a comprehensive database of interactions of homologous proteins in multiple crystal forms","volume":"39","author":"Xu","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012513145830500_bts533-B30","doi-asserted-by":"crossref","first-page":"2876","DOI":"10.1093\/bioinformatics\/btl490","article-title":"ProtBuD: a database of biological unit structures of protein families and superfamilies","volume":"22","author":"Xu","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B31","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1016\/j.jmb.2008.06.002","article-title":"Statistical analysis of interface similarity in crystals of homologous proteins","volume":"381","author":"Xu","year":"2008","journal-title":"J. Mol. Biol."},{"key":"2023012513145830500_bts533-B32","doi-asserted-by":"crossref","first-page":"II246","DOI":"10.1093\/bioinformatics\/btg1086","article-title":"Flexible structure alignment by chaining aligned fragment pairs allowing twists","volume":"19","author":"Ye","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012513145830500_bts533-B33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-6-77","article-title":"Comparative mapping of sequence-based and structure-based protein domains","volume":"6","author":"Zhang","year":"2005","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/21\/2763\/48872059\/bioinformatics_28_21_2763.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/21\/2763\/48872059\/bioinformatics_28_21_2763.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,28]],"date-time":"2024-04-28T21:31:56Z","timestamp":1714339916000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/21\/2763\/237209"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,8,31]]},"references-count":34,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2012,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts533","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,11,1]]},"published":{"date-parts":[[2012,8,31]]}}}