{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T22:24:36Z","timestamp":1767651876375},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution.<\/jats:p>\n               <jats:p>Results: Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis.<\/jats:p>\n               <jats:p>Availability: A software implementation (MOSAIC) of the framework described here can be downloaded from http:\/\/bioinformatics.org.au\/mosaic\/<\/jats:p>\n               <jats:p>Contact: \u00a0m.ragan@uq.edu.au<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq042","type":"journal-article","created":{"date-parts":[[2010,2,4]],"date-time":"2010-02-04T01:55:22Z","timestamp":1265248522000},"page":"737-744","source":"Crossref","is-referenced-by-count":14,"title":["A visual framework for sequence analysis using <i>n<\/i>-grams and spectral rearrangement"],"prefix":"10.1093","volume":"26","author":[{"given":"Stefan R.","family":"Maetschke","sequence":"first","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"},{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]},{"given":"Karin S.","family":"Kassahn","sequence":"additional","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"},{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]},{"given":"Jasmyn A.","family":"Dunn","sequence":"additional","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]},{"given":"Siew-Ping","family":"Han","sequence":"additional","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]},{"given":"Eva Z.","family":"Curley","sequence":"additional","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]},{"given":"Katryn J.","family":"Stacey","sequence":"additional","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"},{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]},{"given":"Mark A.","family":"Ragan","sequence":"additional","affiliation":[{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"},{"name":"1 Institute for Molecular Bioscience, 2 ARC Centre of Excellence in Bioinformatics and 3 School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia"}]}],"member":"286","published-online":{"date-parts":[[2010,2,3]]},"reference":[{"key":"2023012508015837500_B1","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1007\/s00335-002-2229-9","article-title":"Evolutionary analysis of a cluster of ATP-binding cassette (ABC) genes","volume":"14","author":"Annilo","year":"2003","journal-title":"Mamm. Genome"},{"key":"2023012508015837500_B2","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/S0168-9525(03)00112-4","article-title":"Phylogeny for the faint of heart: a tutorial","volume":"19","author":"Baldauf","year":"2003","journal-title":"Trends Genet."},{"key":"2023012508015837500_B3","first-page":"711","article-title":"A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems","volume-title":"Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing","author":"Barnard","year":"1993"},{"key":"2023012508015837500_B4","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1093\/molbev\/msh018","article-title":"Neighbor-Net: an agglomerative method for the construction of phylogenetic networks","volume":"21","author":"Bryant","year":"2004","journal-title":"Mol. Biol. Evol."},{"key":"2023012508015837500_B5","doi-asserted-by":"crossref","first-page":"1481","DOI":"10.1093\/bioinformatics\/btn231","article-title":"A distance metric for a class of tree-sibling phylogenetic networks","volume":"24","author":"Cardona","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508015837500_B6","doi-asserted-by":"crossref","first-page":"e4524","DOI":"10.1371\/journal.pone.0004524","article-title":"Are protein domains modules of lateral genetic transfer?","volume":"4","author":"Chan","year":"2009","journal-title":"PLoS ONE"},{"key":"2023012508015837500_B7","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1016\/S0097-8485(99)00009-1","article-title":"Zones of low entropy in genomic sequences","volume":"23","author":"Crochemore","year":"1999","journal-title":"Comput. Chem."},{"key":"2023012508015837500_B8","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898719192","article-title":"Lanczos algorithms for large symmetric eigenvalue computations","volume-title":"Classics in Applied Mathematics","author":"Cullum","year":"2002"},{"key":"2023012508015837500_B9","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1146\/annurev.bi.62.070193.001445","article-title":"hnRNP proteins and the biogenesis of mRNA","volume":"62","author":"Dreyfuss","year":"1993","journal-title":"Annu. Rev. Biochem."},{"key":"2023012508015837500_B10","doi-asserted-by":"crossref","first-page":"619","DOI":"10.21136\/CMJ.1975.101357","article-title":"A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory","volume":"25","author":"Fiedler","year":"1975","journal-title":"Czechoslovak Math. J."},{"key":"2023012508015837500_B11","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1080\/10635150701294741","article-title":"Is multiple sequence alignment required for accurate inference of phylogeny?","volume":"56","author":"H\u00f6hl","year":"2007","journal-title":"Syst. Biol."},{"key":"2023012508015837500_B12","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1177\/117693430600200016","article-title":"Pattern-based phylogenetic distance estimation and tree reconstruction","volume":"2","author":"H\u00f6hl","year":"2006","journal-title":"Evol. Bioinform."},{"key":"2023012508015837500_B13","doi-asserted-by":"crossref","first-page":"1471","DOI":"10.1210\/me.2005-0247","article-title":"The evolution of mineralocorticoid receptors","volume":"20","author":"Hu","year":"2008","journal-title":"Mol. Endocrinol."},{"key":"2023012508015837500_B14","doi-asserted-by":"crossref","first-page":"1642","DOI":"10.1101\/gr.520702","article-title":"Signatures of domain shuffling in the human genome","volume":"12","author":"Kaessmann","year":"2002","journal-title":"Genome Res."},{"key":"2023012508015837500_B15","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1137\/S1064827595287997","article-title":"A fast and high quality multilevel scheme for partitioning irregular graphs","volume":"20","author":"Karypis","year":"1999","journal-title":"SIAM J. Sci. Comput."},{"key":"2023012508015837500_B16","doi-asserted-by":"crossref","first-page":"1393","DOI":"10.1101\/gr.087072.108","article-title":"Domain shuffling and the evolution of vertebrates","volume":"19","author":"Kawashima","year":"2009","journal-title":"Genome Res."},{"key":"2023012508015837500_B17","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1186\/1471-2148-7-148","article-title":"Gene conversion limits divergence of mammalian TLR1 and TLR6","volume":"7","author":"Kruithof","year":"2007","journal-title":"BMC Evol. Biol."},{"key":"2023012508015837500_B18","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1042\/BJ20050872","article-title":"Structure and function of steroid receptor AF1 transactivation domains: induction of active conformations","volume":"391","author":"Lavery","year":"2005","journal-title":"Biochem. J."},{"key":"2023012508015837500_B19","doi-asserted-by":"crossref","first-page":"664","DOI":"10.1093\/bioinformatics\/17.7.664","article-title":"T-Rex: reconstructing and visualizing phylogenetic trees and reticulation networks","volume":"17","author":"Makarenkov","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012508015837500_B20","first-page":"849","article-title":"On spectral clustering: analysis and an algorithm","volume":"14","author":"Ng","year":"2001","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"2023012508015837500_B21","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1006\/jtbi.1993.1030","article-title":"Entropic profiles of DNA sequences through chaos-game-derived images","volume":"160","author":"Oliver","year":"1993","journal-title":"J. Theor. Biol."},{"key":"2023012508015837500_B22","doi-asserted-by":"crossref","first-page":"1571","DOI":"10.1093\/nar\/gkj515","article-title":"Spectral clustering of protein sequences","volume":"34","author":"Paccanaro","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012508015837500_B23","volume-title":"Protein evolution.","author":"Patthy","year":"1999"},{"key":"2023012508015837500_B24","first-page":"845","article-title":"Spectral clustering of biological sequence data","volume-title":"The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference","author":"Pentney","year":"2005"},{"key":"2023012508015837500_B25","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1093\/jhered\/esn029","article-title":"Concerted evolution of vertebrate CCR2 and CCR5 genes and the origin of a recombinant equine CCR5\/2 gene","volume":"99","author":"Perelygin","year":"2008","journal-title":"J. Hered."},{"key":"2023012508015837500_B26","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1126\/science.1169841","article-title":"HIN-200 proteins regulate caspase activation in response to foreign cytoplasmic DNA","volume":"323","author":"Roberts","year":"2009","journal-title":"Science"},{"key":"2023012508015837500_B27","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1007\/978-3-642-03070-3_28","article-title":"Fast spectral clustering with random projection and sampling","volume":"5632","author":"Sakai","year":"2009","journal-title":"Lect. Notes Comput. Sci."},{"key":"2023012508015837500_B28","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1002\/bies.20546","article-title":"The origins of polypeptide domains","volume":"29","author":"Schmidt","year":"2007","journal-title":"Bioessays"},{"key":"2023012508015837500_B29","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023012508015837500_B30","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1093\/bioinformatics\/18.5.679","article-title":"Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity","volume":"18","author":"Troyanskaya","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012508015837500_B31","article-title":"A comparison of spectral clustering algorithms","volume-title":"Technical Report 03-05-01.","author":"Verma","year":"2001"},{"key":"2023012508015837500_B32","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparision\u2014a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508015837500_B33","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1186\/1471-2105-8-393","article-title":"Local Renyi entropic profiles of DNA sequences","volume":"8","author":"Vinga","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012508015837500_B34","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.sbi.2004.03.011","article-title":"Structure, function and evolution of multidomain proteins","volume":"14","author":"Vogel","year":"2004","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012508015837500_B35","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","article-title":"A tutorial on spectral clustering","volume":"17","author":"von Luxburg","year":"2007","journal-title":"Stat. Comput."},{"key":"2023012508015837500_B36","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1002\/(SICI)1097-4644(1999)75:32+<110::AID-JCB14>3.0.CO;2-T","article-title":"Steroid hormone receptors: Evolution, ligands and molecular basis of biologic function","volume":"32\/33","author":"Whitfield","year":"1999","journal-title":"J. Cell. Biochem."},{"key":"2023012508015837500_B37","doi-asserted-by":"crossref","first-page":"i77","DOI":"10.1093\/bioinformatics\/btn144","article-title":"MACHOS: Markov clusters of homologous subsequences","volume":"24","author":"Wong","year":"2008","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/737\/48855699\/bioinformatics_26_6_737.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/6\/737\/48855699\/bioinformatics_26_6_737.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:02:19Z","timestamp":1674633739000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/6\/737\/245345"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,2,3]]},"references-count":37,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2010,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq042","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,15]]},"published":{"date-parts":[[2010,2,3]]}}}