{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T03:49:16Z","timestamp":1759117756873},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Occult organizational structures in DNA sequences may hold the key to understanding functional and evolutionary aspects of the DNA molecule. Such structures can also provide the means for identifying and discriminating organisms using genomic data. Species specific genomic signatures are useful in a variety of contexts such as evolutionary analysis, assembly and classification of genomic sequences from large uncultivated microbial communities and a rapid identification system in health hazard situations.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We have analyzed genomic sequences of eukaryotic and prokaryotic chromosomes as well as various subtypes of viruses using an information theoretic framework. We confirm the existence of a species specific average mutual information (AMI) profile. We use these profiles to define a very simple, computationally efficient, alignment free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to separate and cluster subtypes of the HIV-1 virus, and classify DNA fragments to species of origin.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>AMI profiles of DNA sequences prove to be species specific and easy to compute. The structure of AMI profiles are conserved, even in short subsequences of a species' genome, rendering a pervasive signature. This signature can be used to classify relatively short DNA fragments to species of origin.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-48","type":"journal-article","created":{"date-parts":[[2008,1,25]],"date-time":"2008-01-25T19:20:18Z","timestamp":1201288818000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["The Average Mutual Information Profile as a Genomic Signature"],"prefix":"10.1186","volume":"9","author":[{"given":"Mark","family":"Bauer","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sheldon M","family":"Schuster","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Khalid","family":"Sayood","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2008,1,25]]},"reference":[{"key":"2033_CR1","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1016\/S0022-5193(86)80144-8","volume":"119","author":"M Gates","year":"1986","unstructured":"Gates M: A Simple Way to Look at DNA. J Theor Biol 1986, 119: 319\u2013328. 10.1016\/S0022-5193(86)80144-8","journal-title":"J Theor Biol"},{"issue":"25","key":"2033_CR2","doi-asserted-by":"publisher","first-page":"3805","DOI":"10.1103\/PhysRevLett.68.3805","volume":"68","author":"R Voss","year":"1992","unstructured":"Voss R: Evolution of long-range Fractal Correlations and 1\/ f Noise in DNA Base Sequences. Phys Rev Letters 1992, 68(25):3805\u20133808. 10.1103\/PhysRevLett.68.3805","journal-title":"Phys Rev Letters"},{"key":"2033_CR3","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1038\/356168a0","volume":"356","author":"C Peng","year":"1992","unstructured":"Peng C, Buldyrev S, Goldberger A, Havlin S, Sciortino F, Simons M, Stanley H: Long Range Correlations in Nucleotide Sequences. Nature 1992, 356: 168\u2013170. 10.1038\/356168a0","journal-title":"Nature"},{"issue":"6","key":"2033_CR4","doi-asserted-by":"publisher","first-page":"4514","DOI":"10.1103\/PhysRevE.47.4514","volume":"47","author":"S Buldyrev","year":"1992","unstructured":"Buldyrev S, Goldberger A, Havlin A, Peng C, Simons M, Stanley H: Generalized Levy-Walk Model for DNA Nucleotide Sequences. Phys Rev E 1992, 47(6):4514\u20134523. 10.1103\/PhysRevE.47.4514","journal-title":"Phys Rev E"},{"issue":"5","key":"2033_CR5","doi-asserted-by":"publisher","first-page":"5281","DOI":"10.1103\/PhysRevE.52.5281","volume":"52","author":"P Allegrini","year":"1995","unstructured":"Allegrini P, Barbi M, Grigolini P, West B: Dynamical Walk Model for DNA Sequences. Phys Rev E 1995, 52(5):5281\u20135296. 10.1103\/PhysRevE.52.5281","journal-title":"Phys Rev E"},{"key":"2033_CR6","volume-title":"Information Theory and the Living System","author":"L Gatlin","year":"1972","unstructured":"Gatlin L: Information Theory and the Living System. New York: Columbia University Press; 1972."},{"issue":"7","key":"2033_CR7","doi-asserted-by":"publisher","first-page":"1187","DOI":"10.1016\/0031-3203(95)00145-X","volume":"29","author":"R Roman-Roldan","year":"1996","unstructured":"Roman-Roldan R, Bernaolo-Galvan P, Oliver J: Application of Information Theory to DNA Sequence Analysis: A Review. Pattern Recognition 1996, 29(7):1187\u20131194. 10.1016\/0031-3203(95)00145-X","journal-title":"Pattern Recognition"},{"key":"2033_CR8","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1016\/0022-2836(86)90165-8","volume":"188","author":"TSGSL Gold","year":"1986","unstructured":"Gold TSGSL, Ehrefeucht A: Information Content of Binding Sites on Nucleotide Sequences. J Mol Biol 1986, 188: 415\u2013431. 10.1016\/0022-2836(86)90165-8","journal-title":"J Mol Biol"},{"key":"2033_CR9","doi-asserted-by":"publisher","first-page":"6097","DOI":"10.1093\/nar\/18.20.6097","volume":"18","author":"T Schneider","year":"1990","unstructured":"Schneider T, Stephens R: Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acid Res 1990, 18: 6097\u20136100. 10.1093\/nar\/18.20.6097","journal-title":"Nucleic Acid Res"},{"key":"2033_CR10","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1016\/S0166-218X(96)00068-6","volume":"71","author":"T Schneider","year":"1996","unstructured":"Schneider T, Mastronade D: Fast Multiple Alignment of Ungapped DNA Sequences Using Information Theory and a Relaxation Method. Discrete Applied Mathematics 1996, 71: 259\u2013268. 10.1016\/S0166-218X(96)00068-6","journal-title":"Discrete Applied Mathematics"},{"issue":"5","key":"2033_CR11","doi-asserted-by":"publisher","first-page":"6312","DOI":"10.1103\/PhysRevE.58.6312","volume":"58","author":"B Giraud","year":"1998","unstructured":"Giraud B, Lapedes A, Liu L: Analysis of Correlations Between Sites in Models of Protein Sequences. Phys Rev E 1998, 58(5):6312\u20136322. 10.1103\/PhysRevE.58.6312","journal-title":"Phys Rev E"},{"key":"2033_CR12","doi-asserted-by":"publisher","first-page":"800","DOI":"10.1103\/PhysRevE.55.800","volume":"55","author":"H Herzel","year":"1997","unstructured":"Herzel H, Grosse I: Correlations in DNA Sequences: The Role of Protein Coding Segments. Phys Rev E 1997, 55: 800\u2013810. 10.1103\/PhysRevE.55.800","journal-title":"Phys Rev E"},{"key":"2033_CR13","doi-asserted-by":"publisher","first-page":"7176","DOI":"10.1073\/pnas.90.15.7176","volume":"90","author":"B Korber","year":"1993","unstructured":"Korber B, Farber R, Wolpert D, Lapedes A: Covariation of Mutations in the V3 Loop of Human Immunodeficiency Virus Type I Envelope Protein: An Information Theoretic Analysis. Proc Natl Acad Sci 1993, 90: 7176\u20137180. 10.1073\/pnas.90.15.7176","journal-title":"Proc Natl Acad Sci"},{"key":"2033_CR14","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1103\/PhysRevE.58.861","volume":"58","author":"L Luo","year":"1998","unstructured":"Luo L, Lee W: Statistical Correlation of Nucleotides in a DNA Sequence. Phys Rev E 1998, 58: 861\u2013871. 10.1103\/PhysRevE.58.861","journal-title":"Phys Rev E"},{"issue":"6","key":"2033_CR15","doi-asserted-by":"publisher","first-page":"1344","DOI":"10.1103\/PhysRevLett.80.1344","volume":"80","author":"R Roman-Roldan","year":"1999","unstructured":"Roman-Roldan R, Bernaolo-Galvan P, Oliver J: Sequence Compositional Complexity of DNA through an Entropic Segmentation Method. Phys Rev Letters 1999, 80(6):1344\u20131347. 10.1103\/PhysRevLett.80.1344","journal-title":"Phys Rev Letters"},{"key":"2033_CR16","first-page":"3336","volume-title":"Phys Rev Letters","author":"P Bernaolo-Galvan","year":"1999","unstructured":"Bernaolo-Galvan P, Oliver J, Ramon-Roldan R: Decomposition of DNA Sequence Complexity. Phys Rev Letters 1999, 3336\u20133339. 10.1103\/PhysRevLett.83.3336"},{"key":"2033_CR17","doi-asserted-by":"publisher","first-page":"619","DOI":"10.1146\/annurev.mi.48.100194.003155","volume":"48","author":"SKL Cardon","year":"1994","unstructured":"Cardon SKL: Computational DNA Sequence Analysis. Annu Rev Microbiol 1994, 48: 619\u2013654. 10.1146\/annurev.micro.48.1.619","journal-title":"Annu Rev Microbiol"},{"issue":"3","key":"2033_CR18","doi-asserted-by":"crossref","first-page":"1886","DOI":"10.1128\/jvi.68.3.1886-1902.1994","volume":"68","author":"S Karlin","year":"1994","unstructured":"Karlin S, Mocarski E, Schachtel G: Molecular Evolution of Herpesviruses: Genomic and Protein Sequence Comparisons. J Virol 1994, 68(3):1886\u20131902.","journal-title":"J Virol"},{"issue":"3","key":"2033_CR19","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1016\/0959-440X(95)80098-0","volume":"5","author":"S Karlin","year":"1995","unstructured":"Karlin S: Statistical Significance of Sequence Patterns in Proteins. Curr Opin Struct Biol 1995, 5(3):360\u2013371. 10.1016\/0959-440X(95)80098-0","journal-title":"Curr Opin Struct Biol"},{"issue":"12","key":"2033_CR20","doi-asserted-by":"publisher","first-page":"5854","DOI":"10.1073\/pnas.93.12.5854","volume":"93","author":"B Blaisdell","year":"1996","unstructured":"Blaisdell B, Campbell A, Karlin S: Similarities and Dissimilarities of Phage Genomes. Proc Natl Acad Sci USA 1996, 93(12):5854\u20135859. 10.1073\/pnas.93.12.5854","journal-title":"Proc Natl Acad Sci USA"},{"issue":"12","key":"2033_CR21","doi-asserted-by":"crossref","first-page":"3899","DOI":"10.1128\/jb.179.12.3899-3913.1997","volume":"179","author":"S Karlin","year":"1997","unstructured":"Karlin S, Mrazek J, Campbell A: Compositional Biases of Bacterial Genomes and Evolutionary Implications. J Bacteriol 1997, 179(12):3899\u20133913.","journal-title":"J Bacteriol"},{"key":"2033_CR22","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1080\/07391102.1986.10507643","volume":"4","author":"V Brendel","year":"1986","unstructured":"Brendel V, Beckmann J, Trifonov E: Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies. J Biomol Struct Dyn 1986, 4: 11\u201321.","journal-title":"J Biomol Struct Dyn"},{"issue":"3","key":"2033_CR23","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1080\/07391102.1986.10506357","volume":"4","author":"J Beckmann","year":"1986","unstructured":"Beckmann J, Brendel V, Trifonov E: Intervening Sequences Exhibit Distinct Vocabulary. J Biomol Struct Dyn 1986, 4(3):391\u2013400.","journal-title":"J Biomol Struct Dyn"},{"key":"2033_CR24","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1016\/S0378-1119(02)01206-4","volume":"304","author":"EB E","year":"2003","unstructured":"E EB, Pizzi E, Giudice PD, Frontali C: Pentamer Vocabularies Characterizing Introns and Intron-like Intergenic Tracts from Caenorhabditis elegans and Drosophila melanogaster. Gene 2003, 304: 183\u2013192. 10.1016\/S0378-1119(02)01206-4","journal-title":"Gene"},{"key":"2033_CR25","doi-asserted-by":"publisher","first-page":"e6","DOI":"10.1093\/nar\/gni004","volume":"33","author":"C Dufraigne","year":"2005","unstructured":"Dufraigne C, Fertil B, Lespinats S, Giron A, Deschavanne P: Detection and Characterization of Horizontal Transfers in Prokaryotes Using Genomic Signature. Nuc Acids Res 2005, 33: e6. 10.1093\/nar\/gni004","journal-title":"Nuc Acids Res"},{"issue":"6978","key":"2033_CR26","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1038\/nature02340","volume":"428","author":"G Tyson","year":"2004","unstructured":"Tyson G, Chapman J, Hugenholtz P, Allen E, Ram R, Richardson P, Solovyev V, Rubin E, Rokhsar D, Banfield J: Community Structure and Metabolism Through Reconstruction of Microbial Genomes from the Environment. Nature 2004, 428(6978):37\u201343. 10.1038\/nature02340","journal-title":"Nature"},{"issue":"5667","key":"2033_CR27","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1126\/science.1093857","volume":"304","author":"J Venter","year":"2004","unstructured":"Venter J, Remington K, Heidelberg J, Halpern A, Rusch D, Eisen J, Wu D, Paulsen I, Nelson K, Nelson W, Fouts D, Levy S, Knap A, Lomas M, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers Y, Smith H: Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 2004, 304(5667):66\u201374. 10.1126\/science.1093857","journal-title":"Science"},{"key":"2033_CR28","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/S0168-9525(00)89076-9","volume":"11","author":"S Karlin","year":"1995","unstructured":"Karlin S, Burge C: Dinucleotide Relative Abundance Extremes: A Genomic Signature. Trends Genet 1995, 11: 283\u2013290. 10.1016\/S0168-9525(00)89076-9","journal-title":"Trends Genet"},{"key":"2033_CR29","doi-asserted-by":"publisher","first-page":"1391","DOI":"10.1093\/oxfordjournals.molbev.a026048","volume":"16","author":"P Deschavanne","year":"2000","unstructured":"Deschavanne P, Giron A, Vilain J, Fagot G, Fertil B: Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Mol Biol Evol 2000, 16: 1391\u20131399.","journal-title":"Mol Biol Evol"},{"issue":"8","key":"2033_CR30","doi-asserted-by":"publisher","first-page":"1404","DOI":"10.1101\/gr.186401","volume":"11","author":"R Sandberg","year":"2001","unstructured":"Sandberg R, Winberg G, Branden CI, Kaske A, Ernberg I, Coster J: Capturing Whole-Genome Characteristics in Short Sequences Using a Naive Bayesian Classifier. Genome Res 2001, 11(8):1404\u20131409. 10.1101\/gr.186401","journal-title":"Genome Res"},{"key":"2033_CR31","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"27","author":"C Shannon","year":"1948","unstructured":"Shannon C: A Mathematical Theory of Communication. Bell Syst Tech J 1948, 27: 379\u2013423. 623\u2013656","journal-title":"Bell Syst Tech J"},{"key":"2033_CR32","doi-asserted-by":"publisher","first-page":"1059","DOI":"10.1016\/S0022-2836(02)00308-X","volume":"319","author":"I Hofacker","year":"2002","unstructured":"Hofacker I, Fekete M, Stadler P: Secondary structure prediction for aligned RNA sequences. Journal of Molecular Biology 2002, 319: 1059\u20131066. 10.1016\/S0022-2836(02)00308-X","journal-title":"Journal of Molecular Biology"},{"key":"2033_CR33","doi-asserted-by":"publisher","first-page":"2988","DOI":"10.1093\/bioinformatics\/btl514","volume":"22","author":"S Lindgreen","year":"2006","unstructured":"Lindgreen S, Gardner P, Krogh A: Meauring covariation in RNA alignments: physical realism improves information measure. Bioinformatics 2006, 22: 2988\u20132995. 10.1093\/bioinformatics\/btl514","journal-title":"Bioinformatics"},{"issue":"5","key":"2033_CR34","doi-asserted-by":"publisher","first-page":"5624","DOI":"10.1103\/PhysRevE.61.5624","volume":"61","author":"I Grosse","year":"2000","unstructured":"Grosse I, Herzel H, Buldyrev S, Stanley H: Species Independence of Mutual Information in Coding and Noncoding Regions. Phys Rev E 2000, 61(5):5624\u20135629. 10.1103\/PhysRevE.61.5624","journal-title":"Phys Rev E"},{"key":"2033_CR35","doi-asserted-by":"publisher","first-page":"18297","DOI":"10.1073\/pnas.0507432102","volume":"102","author":"N Slonim","year":"2005","unstructured":"Slonim N, Atwal G, Tkacik G, Blalek W: Informantion-based clustering. Proceedings of the National Academy of Sciences 2005, 102: 18297\u201318302. 10.1073\/pnas.0507432102","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"2033_CR36","first-page":"2","volume-title":"Molecular Systems Biology","author":"N Slonim","year":"2006","unstructured":"Slonim N, Elemento O, Tavazole S: Ab initio genotype-phenotype association reveals intrinsic modularity in genetic networks. Molecular Systems Biology 2006, 2."},{"issue":"2","key":"2033_CR37","doi-asserted-by":"publisher","first-page":"L237","DOI":"10.1142\/S0219477504001574","volume":"4","author":"M Berryman","year":"2004","unstructured":"Berryman M, Allison A, Abbott D: Mutual Information for Examining Correlations in DNA. Fluctuation and Noise Letters 2004, 4(2):L237-L246. 10.1142\/S0219477504001574","journal-title":"Fluctuation and Noise Letters"},{"key":"2033_CR38","doi-asserted-by":"publisher","first-page":"061913 1","DOI":"10.1103\/PhysRevE.67.061913","volume":"67","author":"D Holste","year":"2003","unstructured":"Holste D, Beirer S, Schieg P, Grosse I, Herzel H: Repeats and Correlations in Human DNA Sequences. Phys Rev E 2003, 67: 061913 1\u2013061913 7. 10.1103\/PhysRevE.67.061913","journal-title":"Phys Rev E"},{"key":"2033_CR39","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.gene.2004.11.026","volume":"345","author":"M Dehnert","year":"2005","unstructured":"Dehnert M, Helm W, Hutt MT: Information theory reveals large scale synchronisation of statistical correlations in eukaryote genomes. Gene 2005, 345: 81\u201390. 10.1016\/j.gene.2004.11.026","journal-title":"Gene"},{"key":"2033_CR40","doi-asserted-by":"publisher","first-page":"021913-1","DOI":"10.1103\/PhysRevE.74.021913","volume":"74","author":"M Dehnert","year":"2006","unstructured":"Dehnert M, Helm W, Hutt MT: Informational structure of two closely related eukaryote genomes. Physical Review E 2006, 74: 021913\u20131-021913\u20139. 10.1103\/PhysRevE.74.021913","journal-title":"Physical Review E"},{"key":"2033_CR41","first-page":"6","volume-title":"BMC Bioinformatics","author":"J Hummel","year":"2005","unstructured":"Hummel J, Keshvari N, weckwerth W, Selbig J: Species-specific analysis ofprotein sequences using mutual information. BMC Bioinformatics 2005, 6."},{"key":"2033_CR42","doi-asserted-by":"publisher","first-page":"4116","DOI":"10.1093\/bioinformatics\/bti671","volume":"21","author":"L Martin","year":"2005","unstructured":"Martin L, Gloor G, Dunn S, Wahl L: Using Information Theory to Search for Co-evolving Residues in Proteins. Bioinformatics 2005, 21: 4116\u20134124. 10.1093\/bioinformatics\/bti671","journal-title":"Bioinformatics"},{"key":"2033_CR43","unstructured":"Average Mutual Information Distance Plotter[http:\/\/sensin.unl.edu\/bioinformatics]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-48.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:26:48Z","timestamp":1630466808000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-48"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,1,25]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2033"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-48","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,1,25]]},"assertion":[{"value":"4 July 2007","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 January 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 January 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"48"}}