{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T03:08:09Z","timestamp":1767236889730},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2685,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Classification of gene and protein sequences into homologous families, i.e. sets of sequences that share common ancestry, is an essential step in comparative genomic analyses. This is typically achieved by construction of a sequence homology network, followed by clustering to identify dense subgraphs corresponding to families. Accurate classification of single domain families is now within reach due to major algorithmic advances in remote homology detection and graph clustering. However, classification of multidomain families remains a significant challenge. The presence of the same domain in sequences that do not share common ancestry introduces false edges in the homology network that link unrelated families and stymy clustering algorithms.<\/jats:p>\n               <jats:p>Results: Here, we investigate a network-rewiring strategy designed to eliminate edges due to promiscuous domains. We show that this strategy can reduce noise in and restore structure to artificial networks with simulated noise, as well as to the yeast genome homology network. We further evaluate this approach on a hand-curated set of multidomain sequences in mouse and human, and demonstrate that classification using the rewired network delivers dramatic improvement in Precision and Recall, compared with current methods. Families in our test set exhibit a broad range of domain architectures and sequence conservation, demonstrating that our method is flexible, robust and suitable for high-throughput, automated processing of heterogeneous, genome-scale data.<\/jats:p>\n               <jats:p>contact: \u00a0jacobmj@cmu.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp207","type":"journal-article","created":{"date-parts":[[2009,5,28]],"date-time":"2009-05-28T15:48:54Z","timestamp":1243525734000},"page":"i45-i53","source":"Crossref","is-referenced-by-count":19,"title":["Family classification without domain chaining"],"prefix":"10.1093","volume":"25","author":[{"given":"Jacob M.","family":"Joseph","sequence":"first","affiliation":[{"name":"1 Computational Biology and 2Departments of Biological and Computer Sciences, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA"}]},{"given":"Dannie","family":"Durand","sequence":"additional","affiliation":[{"name":"1 Computational Biology and 2Departments of Biological and Computer Sciences, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,5,27]]},"reference":[{"key":"2023013111591927000_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023013111591927000_B2","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1093\/bioinformatics\/17.10.935","article-title":"Clustering protein sequences\u2014structure prediction by transitive homology","volume":"17","author":"Bolten","year":"2001","journal-title":"Bioinformatics"},{"key":"2023013111591927000_B3","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1016\/j.jmb.2005.08.067","article-title":"Domain rearrangements in protein evolution","volume":"353","author":"Bjorklund","year":"2005","journal-title":"J. Mol. Biol."},{"key":"2023013111591927000_B4","first-page":"42","article-title":"Optimal spaced seeds for homologous coding regions","volume-title":"Proceedings of Symposium on Combinatorial Pattern Matching (CPM'03) 2676 of Lecture Notes in Computer Science.","author":"Brejova","year":"2003"},{"key":"2023013111591927000_B5","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1371\/journal.pcbi.0020077","article-title":"Functional classification using phylogenomic inference","volume":"2","author":"Brown","year":"2006","journal-title":"PLoS Comput. Biol."},{"key":"2023013111591927000_B6","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1145\/640075.640083","article-title":"Designing seeds for similarity search in genomic DNA","volume-title":"RECOMB'03: Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology.","author":"Buhler","year":"2003"},{"key":"2023013111591927000_B7","doi-asserted-by":"crossref","first-page":"1456","DOI":"10.1101\/gr.3672305","article-title":"The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species","volume":"15","author":"Byrne","year":"2005","journal-title":"Genome Res."},{"key":"2023013111591927000_B8","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1007\/978-1-59745-547-3_6","article-title":"Sybil: methods and software for multiple genome comparison and visualization","volume":"408","author":"Crabtree","year":"2007","journal-title":"Methods Mol. Biol."},{"key":"2023013111591927000_B9","doi-asserted-by":"crossref","first-page":"e85","DOI":"10.1371\/journal.pone.0000085","article-title":"The evolution of mammalian gene families","volume":"1","author":"Demuth","year":"2006","journal-title":"PLoS ONE"},{"key":"2023013111591927000_B10","first-page":"1203","article-title":"An open graph visualization system and its applications","volume":"30","author":"Emden","year":"1999","journal-title":"Software Pract. and Exper."},{"key":"2023013111591927000_B11","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/nar\/30.7.1575","article-title":"An efficient algorithm for large-scale detection of protein families","volume":"30","author":"Enright","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023013111591927000_B12","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/S0168-9525(00)02005-9","article-title":"Homology: a personal view on some of the problems","volume":"16","author":"Fitch","year":"2000","journal-title":"Trends Genet."},{"key":"2023013111591927000_B13","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/S0079-6107(00)00013-4","article-title":"Towards a covering set of protein family profiles","volume":"73","author":"Heger","year":"2000","journal-title":"Prog. Biophys. Mol. Biol."},{"key":"2023013111591927000_B14","doi-asserted-by":"crossref","first-page":"e766","DOI":"10.1371\/journal.pone.0000766","article-title":"The princeton protein orthology database (P-POD): a comparative genomics analysis tool for biologists","volume":"2","author":"Heinicke","year":"2007","journal-title":"PLoS ONE"},{"key":"2023013111591927000_B15","doi-asserted-by":"crossref","first-page":"5849","DOI":"10.1073\/pnas.95.11.5849","article-title":"Measuring genome evolution","volume":"95","author":"Huynen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013111591927000_B16","doi-asserted-by":"crossref","DOI":"10.1504\/IJDMB.2006.010855","article-title":"Bag: a graph theoretic sequence clustering algorithm","volume":"1","author":"Kim","year":"2006","journal-title":"Int. J. Data Min. Bioinform."},{"key":"2023013111591927000_B17","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/1471-2105-6-15","article-title":"Large scale hierarchical clustering of protein sequences","volume":"6","author":"Krause","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013111591927000_B18","doi-asserted-by":"crossref","first-page":"1571","DOI":"10.1093\/nar\/gkj515","article-title":"Spectral clustering of protein sequences","volume":"34","author":"Paccanaro","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023013111591927000_B19","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1142\/9781860948732_0040","article-title":"Exact and heuristic algorithms for weighted cluster editing","volume":"6","author":"Rahmann","year":"2007","journal-title":"Comput. Syst. Bioinformatics Conf."},{"key":"2023013111591927000_B20","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1093\/nar\/gkg096","article-title":"ProtoNet: hierarchical classification of the protein space","volume":"31","author":"Sasson","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023013111591927000_B21","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1089\/cmb.2007.A009","article-title":"Domain architecture comparison for multidomain homology identification","volume":"14","author":"Song","year":"2007","journal-title":"J. Comput. Biol."},{"key":"2023013111591927000_B22","doi-asserted-by":"crossref","first-page":"e1000063","DOI":"10.1371\/journal.pcbi.1000063","article-title":"Sequence similarity network reveals common ancestry of multidomain proteins","volume":"4","author":"Song","year":"2008","journal-title":"PLoS. Comput. Biol."},{"key":"2023013111591927000_B23","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2105-4-41","article-title":"The COG database: an updated version includes eukaryotes","volume":"4","author":"Tatusov","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023013111591927000_B24","doi-asserted-by":"crossref","first-page":"6559","DOI":"10.1073\/pnas.0308067101","article-title":"Protein ranking: from local to global structure in the protein similarity network","volume":"101","author":"Weston","year":"2004","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023013111591927000_B25","doi-asserted-by":"crossref","first-page":"D13","DOI":"10.1093\/nar\/gkm1000","article-title":"Database resources of the National Center for Biotechnology Information","volume":"36","author":"Wheeler","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013111591927000_B26","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1186\/1471-2105-8-396","article-title":"Large scale clustering of protein sequences with FORCE -a layout based heuristic for weighted cluster editing","volume":"8","author":"Wittkop","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013111591927000_B27","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/S1476-9271(02)00098-1","article-title":"Protein family classification and functional annotation","volume":"27","author":"Wu","year":"2003","journal-title":"Comput. Biol. Chem."},{"key":"2023013111591927000_B28","doi-asserted-by":"crossref","first-page":"3986","DOI":"10.1093\/nar\/26.17.3986","article-title":"Protein sequence similarity searches using patterns as seeds","volume":"26","author":"Zhang","year":"1998","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/12\/i45\/48995185\/bioinformatics_25_12_i45.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/12\/i45\/48995185\/bioinformatics_25_12_i45.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:07:12Z","timestamp":1675199232000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/12\/i45\/189448"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,5,27]]},"references-count":28,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2009,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp207","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,6,15]]},"published":{"date-parts":[[2009,5,27]]}}}