{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T06:44:30Z","timestamp":1722840270544},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Accurate prediction of the domain content and arrangement in multi-domain proteins (which make up &amp;gt;65% of the large-scale protein databases) provides a valuable tool for function prediction, comparative genomics and studies of molecular evolution. However, scanning a multi-domain protein against a database of domain sequence profiles can often produce conflicting and overlapping matches. We have developed a novel method that employs heaviest weighted clique-finding (HCF), which we show significantly outperforms standard published approaches based on successively assigning the best non-overlapping match (Best Match Cascade, BMC).<\/jats:p>\n               <jats:p>Results: We created benchmark data set of structural domain assignments in the CATH database and a corresponding set of Hidden Markov Model-based domain predictions. Using these, we demonstrate that by considering all possible combinations of matches using the HCF approach, we achieve much higher prediction accuracy than the standard BMC method. We also show that it is essential to allow overlapping domain matches to a query in order to identify correct domain assignments. Furthermore, we introduce a straightforward and effective protocol for resolving any overlapping assignments, and producing a single set of non-overlapping predicted domains.<\/jats:p>\n               <jats:p>Availability and implementation: The new approach will be used to determine MDAs for UniProt and Ensembl, and made available via the Gene3D website: http:\/\/gene3d.biochem.ucl.ac.uk\/Gene3D\/. The software has been implemented in C++ and compiled for Linux: source code and binaries can be found at: ftp:\/\/ftp.biochem.ucl.ac.uk\/pub\/gene3d_data\/DomainFinder3\/<\/jats:p>\n               <jats:p>Contact: \u00a0yeats@biochem.ucl.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq034","type":"journal-article","created":{"date-parts":[[2010,1,30]],"date-time":"2010-01-30T01:25:53Z","timestamp":1264814753000},"page":"745-751","source":"Crossref","is-referenced-by-count":39,"title":["A fast and automated solution for accurately resolving protein domain architectures"],"prefix":"10.1093","volume":"26","author":[{"given":"Corin","family":"Yeats","sequence":"first","affiliation":[{"name":"Department of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oliver C.","family":"Redfern","sequence":"additional","affiliation":[{"name":"Department of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christine","family":"Orengo","sequence":"additional","affiliation":[{"name":"Department of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2010,1,29]]},"reference":[{"key":"2023012508015698500_B1","doi-asserted-by":"crossref","first-page":"D419","DOI":"10.1093\/nar\/gkm993","article-title":"Data growth and its impact on the SCOP database: new developments","volume":"36","author":"Andreeva","year":"2008","journal-title":"Nuicleic Acids Res."},{"key":"2023012508015698500_B2","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1038\/nsb1203-980","article-title":"Announcing the world-wide Protein Data Bank","volume":"10","author":"Berman","year":"2003","journal-title":"Nat. Struct. Biol."},{"key":"2023012508015698500_B3","doi-asserted-by":"crossref","first-page":"D310","DOI":"10.1093\/nar\/gkn877","article-title":"The CATH classification revisited\u2014architectures reviewed and new ways to characterize structural divergence in superfamilies","volume":"37","author":"Cuff","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B4","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/j.jmb.2005.02.007","article-title":"Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions","volume":"348","author":"Ekman","year":"2005","journal-title":"J. Mol. Biol."},{"key":"2023012508015698500_B5","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The Pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B6","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1016\/S0022-2836(03)00269-9","article-title":"Exhaustive enumeration of protein domain families","volume":"328","author":"Heger","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023012508015698500_B7","doi-asserted-by":"crossref","first-page":"D690","DOI":"10.1093\/nar\/gkn828","article-title":"Ensembl 2009","volume":"37","author":"Hubbard","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B8","doi-asserted-by":"crossref","first-page":"D211","DOI":"10.1093\/nar\/gkn785","article-title":"InterPro: the integrative signature database","volume":"37","author":"Hunter","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B9","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1002\/prot.10540","article-title":"Combining local-structure, fold-recognition, and new-fold methods for protein structure prediction","volume":"53","author":"Karplus","year":"2003","journal-title":"Proteins Struct. Funct. Genet. B"},{"key":"2023012508015698500_B10","doi-asserted-by":"crossref","first-page":"W569","DOI":"10.1093\/nar\/gkh481","article-title":"CHOP: parsing proteins into structural domains","volume":"32","author":"Liu","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B11","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search of similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023012508015698500_B12","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/S0166-218X(01)00290-6","article-title":"A fast algorithm for the maximum clique problem","volume":"120","author":"Ostergard","year":"2002","journal-title":"Disc. Appl. Math."},{"key":"2023012508015698500_B13","doi-asserted-by":"crossref","first-page":"D61","DOI":"10.1093\/nar\/gkl842","article-title":"NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"35","author":"Pruitt","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B14","doi-asserted-by":"crossref","first-page":"e232","DOI":"10.1371\/journal.pcbi.0030232","article-title":"CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multi-domain protein structures","volume":"3","author":"Redfern","year":"2007","journal-title":"PLOS Comput. Biol."},{"key":"2023012508015698500_B15","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1101\/gr.222902","article-title":"Predicting Gene Ontology Functional from ProDom and CDD Protein Domains","volume":"12","author":"Schug","year":"2002","journal-title":"Genome Res."},{"key":"2023012508015698500_B16","doi-asserted-by":"crossref","first-page":"1800","DOI":"10.1110\/ps.041056105","article-title":"Assessing strategies for improved superfamily recognition","volume":"7","author":"Sillitoe","year":"2005","journal-title":"Protein Sci."},{"key":"2023012508015698500_B17","doi-asserted-by":"crossref","first-page":"D169","DOI":"10.1093\/nar\/gkn664","article-title":"The Universal Protein Resource (UniProt) 2009","volume":"37","author":"UniProt Consortium","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B18","doi-asserted-by":"crossref","first-page":"D380","DOI":"10.1093\/nar\/gkn762","article-title":"SUPERFAMILY\u2014sophisticated comparative genomics, data mining, visualization and phylogeny","volume":"37","author":"Wilson","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508015698500_B19","doi-asserted-by":"crossref","first-page":"D414","DOI":"10.1093\/nar\/gkm1019","article-title":"Gene3D, Comprehensive structural and functional annotation of genomes","volume":"36","author":"Yeats","year":"2009","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/745\/48855590\/bioinformatics_26_6_745.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/745\/48855590\/bioinformatics_26_6_745.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:02:11Z","timestamp":1674633731000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/6\/745\/244708"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,29]]},"references-count":19,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2010,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq034","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,15]]},"published":{"date-parts":[[2010,1,29]]}}}