{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T05:39:29Z","timestamp":1773466769864,"version":"3.50.1"},"reference-count":126,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T00:00:00Z","timestamp":1660176000000},"content-version":"vor","delay-in-days":222,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Foundation for Science and Technology","doi-asserted-by":"publisher","award":["SFRH\/BD\/141851\/2018"],"award-info":[{"award-number":["SFRH\/BD\/141851\/2018"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Background<\/jats:title>\n                  <jats:p>Viruses are among the shortest yet highly abundant species that harbor minimal instructions to infect cells, adapt, multiply, and exist. However, with the current substantial availability of viral genome sequences, the scientific repertory lacks a complexity landscape that automatically enlights viral genomes\u2019 organization, relation, and fundamental characteristics.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>This work provides a comprehensive landscape of the viral genome\u2019s complexity (or quantity of information), identifying the most redundant and complex groups regarding their genome sequence while providing their distribution and characteristics at a large and local scale. Moreover, we identify and quantify inverted repeats abundance in viral genomes. For this purpose, we measure the sequence complexity of each available viral genome using data compression, demonstrating that adequate data compressors can efficiently quantify the complexity of viral genome sequences, including subsequences better represented by algorithmic sources (e.g., inverted repeats). Using a state-of-the-art genomic compressor on an extensive viral genomes database, we show that double-stranded DNA viruses are, on average, the most redundant viruses while single-stranded DNA viruses are the least. Contrarily, double-stranded RNA viruses show a lower redundancy relative to single-stranded RNA. Furthermore, we extend the ability of data compressors to quantify local complexity (or information content) in viral genomes using complexity profiles, unprecedently providing a direct complexity analysis of human herpesviruses. We also conceive a features-based classification methodology that can accurately distinguish viral genomes at different taxonomic levels without direct comparisons between sequences. This methodology combines data compression with simple measures such as GC-content percentage and sequence length, followed by machine learning classifiers.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>This article presents methodologies and findings that are highly relevant for understanding the patterns of similarity and singularity between viral groups, opening new frontiers for studying viral genomes\u2019 organization while depicting the complexity trends and classification components of these genomes at different taxonomic levels. The whole study is supported by an extensive website (https:\/\/asilab.github.io\/canvas\/) for comprehending the viral genome characterization using dynamic and interactive approaches.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/gigascience\/giac079","type":"journal-article","created":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T11:09:28Z","timestamp":1660216168000},"source":"Crossref","is-referenced-by-count":22,"title":["The complexity landscape of viral genomes"],"prefix":"10.1093","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6331-6091","authenticated-orcid":false,"given":"Jorge Miguel","family":"Silva","sequence":"first","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universit\u00e1rio de Santiago , 3810-193 Aveiro,","place":["Portugal"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1176-552X","authenticated-orcid":false,"given":"Diogo","family":"Pratas","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universit\u00e1rio de Santiago , 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago , 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Virology, University of Helsinki , Haartmaninkatu 3, 00014 Helsinki,","place":["Finland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9157-8761","authenticated-orcid":false,"given":"T\u00e2nia","family":"Caetano","sequence":"additional","affiliation":[{"name":"Department of Biology, University of Aveiro, Campus Universitario de Santiago , 3810-193 Aveiro,","place":["Portugal"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1941-3983","authenticated-orcid":false,"given":"S\u00e9rgio","family":"Matos","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universit\u00e1rio de Santiago , 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago , 3810-193 Aveiro,","place":["Portugal"]}]}],"member":"286","published-online":{"date-parts":[[2022,8,11]]},"reference":[{"key":"2024111706273321500_bib1","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/B978-012680126-2\/50016-5","article-title":"Evolutionary relationships among diverse bacteriophages and prophages: all the world\u2019s a phage","volume-title":"Horizontal gene transfer","author":"Hendrix","year":"2002"},{"issue":"D1","key":"2024111706273321500_bib2","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"2024111706273321500_bib3","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1038\/nrmicro1163","article-title":"Viral metagenomics","volume":"3","author":"Edwards","year":"2005","journal-title":"Nat Rev Microbiol"},{"issue":"19","key":"2024111706273321500_bib4","doi-asserted-by":"crossref","first-page":"12599","DOI":"10.1074\/jbc.R800078200","article-title":"Structural and functional studies of archaeal viruses","volume":"284","author":"Lawrence","year":"2009","journal-title":"J Biol Chem"},{"issue":"1","key":"2024111706273321500_bib5","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1186\/1745-6150-1-29","article-title":"The ancient Virus World and evolution of cells","volume":"1","author":"Koonin","year":"2006","journal-title":"Biol Direct"},{"issue":"4","key":"2024111706273321500_bib6","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/s41587-020-0718-6","article-title":"A genomic catalog of Earth\u2019s microbiomes","volume":"39","author":"Nayfach","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2024111706273321500_bib7","first-page":"17","article-title":"Virion structure, genome organization, and taxonomy of viruses","volume":"1","author":"Fermin","year":"2018","journal-title":"Viruses"},{"issue":"2","key":"2024111706273321500_bib8","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/S0166-6851(01)00388-7","article-title":"Discovering patterns in Plasmodium falciparum genomic DNA","volume":"118","author":"Stern","year":"2001","journal-title":"Mol Biochem Parasitol"},{"issue":"1","key":"2024111706273321500_bib9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-11-599","article-title":"A genome alignment algorithm based on compression","volume":"11","author":"Cao","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2024111706273321500_bib10","first-page":"1","article-title":"Comparing biological networks via graph compression","volume-title":"BMC Syst Biol","author":"Hayashida","year":"2010"},{"issue":"4","key":"2024111706273321500_bib11","doi-asserted-by":"crossref","first-page":"e0119306","DOI":"10.1371\/journal.pone.0119306","article-title":"Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity","volume":"10","author":"Bywater","year":"2015","journal-title":"PLoS One"},{"key":"2024111706273321500_bib12","first-page":"259","article-title":"On the approximation of the Kolmogorov complexity for DNA sequences","volume-title":"Iberian Conference on Pattern Recognition and Image Analysis","author":"Pratas","year":"2017"},{"key":"2024111706273321500_bib13","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.23919\/EUSIPCO.2018.8553297","article-title":"Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight","volume-title":"2018 26th European Signal Processing Conference (EUSIPCO)","author":"Pratas","year":"2018"},{"issue":"5","key":"2024111706273321500_bib14","doi-asserted-by":"crossref","first-page":"giaa048","DOI":"10.1093\/gigascience\/giaa048","article-title":"Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements","volume":"9","author":"Hosseini","year":"2020","journal-title":"GigaScience"},{"key":"2024111706273321500_bib15","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1038\/nrmicro2644","article-title":"Microbiology by numbers","volume":"9","author":"","year":"2011","journal-title":"Nat Rev Microbiol"},{"issue":"1","key":"2024111706273321500_bib16","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/j.virusres.2006.01.008","article-title":"Mimivirus and the emerging concept of \u201cgiant\u201d virus","volume":"117","author":"Claverie","year":"2006","journal-title":"Virus Res"},{"key":"2024111706273321500_bib17","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/978-3-540-68618-7_3","article-title":"Mimivirus","volume-title":"Lesser Known Large dsDNA Viruses","author":"Claverie","year":"2009"},{"key":"2024111706273321500_bib18","first-page":"83","article-title":"Origins and evolution of viruses","volume-title":"Viruses","author":"Foster","year":"2018"},{"key":"2024111706273321500_bib19","doi-asserted-by":"crossref","first-page":"102333","DOI":"10.1016\/j.fsigen.2020.102333","article-title":"Species assignment in forensics and the challenge of hybrids","volume":"48","author":"Amorim","year":"2020","journal-title":"Forensic Sci Int Genet"},{"issue":"7080","key":"2024111706273321500_bib20","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/nature04531","article-title":"Introns and the origin of nucleus\u2013cytosol compartmentalization","volume":"440","author":"Martin","year":"2006","journal-title":"Nature"},{"issue":"1","key":"2024111706273321500_bib21","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1745-6150-5-7","article-title":"Origin of the cell nucleus, mitosis and sex: roles of intracellular coevolution","volume":"5","author":"Cavalier-Smith","year":"2010","journal-title":"Biol Direct"},{"key":"2024111706273321500_bib22","doi-asserted-by":"crossref","first-page":"2169","DOI":"10.3389\/fmicb.2020.571831","article-title":"Medusavirus ancestor in a proto-eukaryotic cell: updating the hypothesis for the viral origin of the nucleus","volume":"11","author":"Takemura","year":"2020","journal-title":"Front Microbiol"},{"key":"2024111706273321500_bib23","doi-asserted-by":"crossref","first-page":"7","DOI":"10.3389\/fcimb.2021.657245","article-title":"The human bone marrow is host to the DNAs of several viruses","volume":"11","author":"Toppinen","year":"2021","journal-title":"Front Cell Infect Microbiol"},{"key":"2024111706273321500_bib24","doi-asserted-by":"crossref","first-page":"102353","DOI":"10.1016\/j.fsigen.2020.102353","article-title":"The landscape of persistent human DNA viruses in femoral bone","volume":"48","author":"Toppinen","year":"2020","journal-title":"Forensic Sci Int Genet"},{"issue":"2\u20133","key":"2024111706273321500_bib25","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/j.forsciint.2003.10.019","article-title":"Trial for the geographical identification using JC viral genotyping in Japan","volume":"139","author":"Ikegaya","year":"2004","journal-title":"Forensic Sci Int"},{"issue":"26","key":"2024111706273321500_bib26","doi-asserted-by":"crossref","first-page":"14542","DOI":"10.1073\/pnas.94.26.14542","article-title":"Asian genotypes of JC virus in Native Americans and in a Pacific Island population: markers of viral evolution and human migration","volume":"94","author":"Agostini","year":"1997","journal-title":"Proc Natl Acad Sci"},{"issue":"17","key":"2024111706273321500_bib27","doi-asserted-by":"crossref","first-page":"9191","DOI":"10.1073\/pnas.94.17.9191","article-title":"Typing of urinary JC virus DNA offers a novel means of tracing human migrations","volume":"94","author":"Sugimoto","year":"1997","journal-title":"Proc Natl Acad Sci"},{"issue":"3","key":"2024111706273321500_bib28","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1007\/s00239-001-2329-2","article-title":"JC virus strains indigenous to northeastern Siberians and Canadian Inuits are unique but evolutionally related to those distributed throughout Europe and Mediterranean areas","volume":"55","author":"Sugimoto","year":"2002","journal-title":"J Mol Evol"},{"issue":"2","key":"2024111706273321500_bib29","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1093\/molbev\/msz227","article-title":"You will never walk alone: codispersal of JC polyomavirus with human populations","volume":"37","author":"Forni","year":"2020","journal-title":"Mol Biol Evol"},{"issue":"12","key":"2024111706273321500_bib30","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1002\/prot.25834","article-title":"Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13)","volume":"87","author":"Senior","year":"2019","journal-title":"Proteins"},{"issue":"7792","key":"2024111706273321500_bib31","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/s41586-019-1923-7","article-title":"Improved protein structure prediction using potentials from deep learning","volume":"577","author":"Senior","year":"2020","journal-title":"Nature"},{"key":"2024111706273321500_bib32","first-page":"228","article-title":"On the role of inverted repeats in DNA sequence similarity","author":"Hosseini","year":"2017","journal-title":"International Conference on Practical Applications of Computational Biology & Bioinformatics"},{"key":"2024111706273321500_bib33","author":"Toppinen","year":"2021","journal-title":"Parvoviral genomes in human soft tissues and bones over decades"},{"issue":"14","key":"2024111706273321500_bib34","doi-asserted-by":"crossref","first-page":"e01031","DOI":"10.1128\/JVI.01031-17","article-title":"Complexities of viral mutation rates","volume":"92","author":"Peck","year":"2018","journal-title":"J Virol"},{"issue":"29","key":"2024111706273321500_bib35","doi-asserted-by":"crossref","first-page":"9936","DOI":"10.1073\/pnas.0804510105","article-title":"Replication stalling at unstable inverted repeats: interplay between DNA hairpins and fork stabilizing proteins","volume":"105","author":"Voineagu","year":"2008","journal-title":"Proc Natl Acad Sci"},{"issue":"4","key":"2024111706273321500_bib36","doi-asserted-by":"crossref","first-page":"d408","DOI":"10.2741\/A284","article-title":"DNA inverted repeats and human disease","volume":"3","author":"Bissler","year":"1998","journal-title":"Front Biosci"},{"issue":"17","key":"2024111706273321500_bib37","doi-asserted-by":"crossref","first-page":"3529","DOI":"10.1093\/nar\/29.17.3529","article-title":"Inverted repeats as genetic elements for promoting DNA inverted duplication: implications in gene amplification","volume":"29","author":"Lin","year":"2001","journal-title":"Nucleic Acids Res"},{"issue":"15","key":"2024111706273321500_bib38","first-page":"7007","article-title":"Ribosomal frameshifting and transcriptional slippage: from genetic steganography and cryptography to adventitious use","volume":"44","author":"Atkins","year":"2016","journal-title":"Nucleic Acids Res"},{"issue":"7090","key":"2024111706273321500_bib39","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1038\/nature04735","article-title":"A mechanical explanation of RNA pseudoknot function in programmed ribosomal frameshifting","volume":"441","author":"Namy","year":"2006","journal-title":"Nature"},{"issue":"1","key":"2024111706273321500_bib40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-16961-8","article-title":"High-throughput interrogation of programmed ribosomal frameshifting in human cells","volume":"11","author":"Mikl","year":"2020","journal-title":"Nat Commun"},{"key":"2024111706273321500_bib41","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1146\/annurev-virology-031413-085444","article-title":"Parvoviruses: small does not mean simple","volume":"1","author":"Cotmore","year":"2014","journal-title":"Annu Rev Virol"},{"issue":"1","key":"2024111706273321500_bib42","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1128\/JVI.79.1.364-379.2005","article-title":"Inverted terminal repeat sequences are important for intermolecular recombination and circularization of adeno-associated virus genomes","volume":"79","author":"Yan","year":"2005","journal-title":"J Virol"},{"issue":"7","key":"2024111706273321500_bib43","doi-asserted-by":"crossref","first-page":"1233","DOI":"10.1101\/gr.091561.109","article-title":"The polyadenylation site of Mimivirus transcripts obeys a stringent \u2018hairpin rule\u2019","volume":"19","author":"Byrne","year":"2009","journal-title":"Genome Res"},{"key":"2024111706273321500_bib44","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1146\/annurev-genet-102108-134255","article-title":"Mimivirus and its virophage","volume":"43","author":"Claverie","year":"2009","journal-title":"Annu Rev Genet"},{"issue":"1","key":"2024111706273321500_bib45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0019-9958(64)90223-2","article-title":"A formal theory of inductive inference. Part I","volume":"7","author":"Solomonoff","year":"1964","journal-title":"Information Control"},{"issue":"2","key":"2024111706273321500_bib46","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1016\/S0019-9958(64)90131-7","article-title":"A formal theory of inductive inference. Part II","volume":"7","author":"Solomonoff","year":"1964","journal-title":"Information Control"},{"issue":"1","key":"2024111706273321500_bib47","first-page":"1","article-title":"Three approaches to the quantitative definition of information","volume":"1","author":"Kolmogorov","year":"1965","journal-title":"Problems Information Transmission"},{"issue":"4","key":"2024111706273321500_bib48","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1145\/321356.321363","article-title":"On the length of programs for computing finite binary sequences","volume":"13","author":"Chaitin","year":"1966","journal-title":"JACM"},{"issue":"2","key":"2024111706273321500_bib49","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1006\/jcss.1999.1677","article-title":"Inequalities for Shannon entropy and Kolmogorov complexity","volume":"60","author":"Hammer","year":"2000","journal-title":"J Comput Syst Sci"},{"issue":"6","key":"2024111706273321500_bib50","doi-asserted-by":"crossref","first-page":"1101","DOI":"10.1111\/jep.12068","article-title":"Entropy and compression: two measures of complexity","volume":"19","author":"Henriques","year":"2013","journal-title":"J Eval Clin Pract"},{"issue":"5","key":"2024111706273321500_bib51","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1371\/journal.pone.0096223","article-title":"Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines","volume":"9","author":"Soler-Toscano","year":"2014","journal-title":"PLoS One"},{"issue":"8","key":"2024111706273321500_bib52","doi-asserted-by":"crossref","first-page":"605","DOI":"10.3390\/e20080605","article-title":"A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity","volume":"20","author":"Zenil","year":"2018","journal-title":"Entropy"},{"key":"2024111706273321500_bib53","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.physa.2014.02.060","article-title":"Correlation of automorphism group size and topological properties with program-size complexity evaluations of graphs and complex networks","volume":"404","author":"Zenil","year":"2014","journal-title":"Physica A"},{"key":"2024111706273321500_bib54","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.cognition.2014.11.038","article-title":"Structure emerges faster during cultural transmission in children than in adults","volume":"136","author":"Kempe","year":"2015","journal-title":"Cognition"},{"key":"2024111706273321500_bib55","doi-asserted-by":"crossref","first-page":"e23","DOI":"10.7717\/peerj-cs.23","article-title":"Two-dimensional Kolmogorov complexity and an empirical validation of the Coding theorem method by compressibility","volume":"1","author":"Zenil","year":"2015","journal-title":"PeerJ Comput Sci"},{"key":"2024111706273321500_bib56","doi-asserted-by":"crossref","first-page":"107864","DOI":"10.1016\/j.patcog.2021.107864","article-title":"Automatic analysis of artistic paintings using information-based measures","volume":"114","author":"Silva","year":"2021","journal-title":"Pattern Recognition"},{"key":"2024111706273321500_bib57","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-49820-1","volume-title":"An introduction to Kolmogorov complexity and its applications","author":"Li","year":"2008"},{"key":"2024111706273321500_bib58","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1007\/978-3-319-11662-4_24","article-title":"A safe approximation for Kolmogorov complexity","volume-title":"International Conference on Algorithmic Learning Theory","author":"Bloem","year":"2014"},{"key":"2024111706273321500_bib59","doi-asserted-by":"crossref","DOI":"10.1155\/9789775945075","volume-title":"Genomic signal processing and statistics","author":"Dougherty","year":"2005"},{"key":"2024111706273321500_bib60","author":"Gailly","year":"2020"},{"key":"2024111706273321500_bib61","author":"bzip2","year":"2020"},{"key":"2024111706273321500_bib62","author":"Pavlov","year":"2020"},{"key":"2024111706273321500_bib63","doi-asserted-by":"crossref","first-page":"340","DOI":"10.1109\/DCC.1993.253115","article-title":"Compression of DNA sequences","volume-title":"[Proceedings] DCC93: Data Compression Conference","author":"Grumbach","year":"1993"},{"issue":"7","key":"2024111706273321500_bib64","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1016\/S0169-5347(01)02187-5","article-title":"Chromosomal rearrangements and speciation","volume":"16","author":"Rieseberg","year":"2001","journal-title":"Trends Ecol Evol"},{"issue":"1","key":"2024111706273321500_bib65","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/0092-8674(80)90131-2","article-title":"DNA rearrangements associated with a transposable element in yeast","volume":"21","author":"Roeder","year":"1980","journal-title":"Cell"},{"key":"2024111706273321500_bib66","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1146\/annurev-biodatasci-072018-021229","article-title":"Genomic data compression","volume":"2","author":"Hernaez","year":"2019","journal-title":"Annu Rev Biomed Data Sci"},{"issue":"6","key":"2024111706273321500_bib67","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/0306-4573(94)90014-0","article-title":"A new challenge for compression algorithms: genetic sequences","volume":"30","author":"Grumbach","year":"1994","journal-title":"Information Processing Management"},{"issue":"14","key":"2024111706273321500_bib68","first-page":"1397","article-title":"A simple and fast DNA compressor","volume":"34","author":"Manzini","year":"2004","journal-title":"Software"},{"key":"2024111706273321500_bib69","article-title":"Grammar-based compression of DNA sequences","volume":"21","author":"Cherniavsky","year":"2004","journal-title":"DIMACS Working Group on The Burrows-Wheeler Transform"},{"issue":"1","key":"2024111706273321500_bib70","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/1055709.1055711","article-title":"An efficient normalized maximum likelihood algorithm for DNA sequence compression","volume":"23","author":"Korodi","year":"2005","journal-title":"ACM Trans Information Syst"},{"key":"2024111706273321500_bib71","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1093\/database\/bap013","article-title":"Differential direct coding: a compression algorithm for nucleotide sequence data","volume":"2009","author":"Vey","year":"2009","journal-title":"Database"},{"issue":"1","key":"2024111706273321500_bib72","doi-asserted-by":"crossref","first-page":"39","DOI":"10.5120\/757-954","article-title":"An efficient horizontal and vertical method for online DNA sequence compression","volume":"3","author":"Mishra","year":"2010","journal-title":"Int J Comput Applications"},{"key":"2024111706273321500_bib73","first-page":"25","article-title":"GENBIT Compress-Algorithm for repetitive and non repetitive DNA sequences","volume":"2","author":"Rajeswari","year":"2010","journal-title":"Int J Comput Sci Information Technol"},{"issue":"3","key":"2024111706273321500_bib74","doi-asserted-by":"crossref","first-page":"245","DOI":"10.2316\/Journal.202.2011.3.202-3114","article-title":"A novel approach for compressing DNA sequences using semi-statistical compressor","volume":"33","author":"Gupta","year":"2011","journal-title":"Int J Comput Applications"},{"issue":"5","key":"2024111706273321500_bib75","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1109\/TEVC.2011.2160399","article-title":"DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm","volume":"15","author":"Zhu","year":"2011","journal-title":"IEEE Trans Evol Comput"},{"issue":"6","key":"2024111706273321500_bib76","doi-asserted-by":"crossref","first-page":"e21588","DOI":"10.1371\/journal.pone.0021588","article-title":"On the representability of complete genomes by multiple competing finite-context (Markov) models","volume":"6","author":"Pinho","year":"2011","journal-title":"PLoS One"},{"key":"2024111706273321500_bib77","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1109\/DCC.2016.60","article-title":"Efficient compression of genomic sequences","volume-title":"2016 Data Compression Conference (DCC)","author":"Pratas","year":"2016"},{"issue":"19","key":"2024111706273321500_bib78","doi-asserted-by":"crossref","first-page":"3826","DOI":"10.1093\/bioinformatics\/btz144","article-title":"Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences","volume":"35","author":"Kryukov","year":"2019","journal-title":"Bioinformatics"},{"key":"2024111706273321500_bib79","article-title":"Kirillkryukov\/NAF: Nucleotide archival format\u2014compressed file format for DNA\/RNA\/protein sequences","author":"Kryukov"},{"key":"2024111706273321500_bib80","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1093\/gigascience\/giab099","article-title":"MBGC: Multiple Bacteria Genome Compressor","volume":"11","author":"Grabowski","year":"2022","journal-title":"GigaScience"},{"key":"2024111706273321500_bib81","article-title":"Byronknoll\/cmix: Cmix is a lossless data compression program aimed at optimizing compression ratio at the cost of high CPU\/memory usage","author":"Knoll"},{"key":"2024111706273321500_bib82","first-page":"43","article-title":"A simple statistical algorithm for biological sequence compression","volume-title":"2007 Data Compression Conference (DCC\u201907)","author":"Cao","year":"2007"},{"issue":"11","key":"2024111706273321500_bib83","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.3390\/e21111074","article-title":"A reference-free lossless compression algorithm for DNA sequences using a competitive prediction of two classes of weighted models","volume":"21","author":"Pratas","year":"2019","journal-title":"Entropy"},{"issue":"11","key":"2024111706273321500_bib84","doi-asserted-by":"crossref","first-page":"giaa119","DOI":"10.1093\/gigascience\/giaa119","article-title":"Efficient DNA sequence compression with neural networks","volume":"9","author":"Silva","year":"2020","journal-title":"GigaScience"},{"issue":"7","key":"2024111706273321500_bib85","doi-asserted-by":"crossref","first-page":"giaa072","DOI":"10.1093\/gigascience\/giaa072","article-title":"Sequence Compression Benchmark (SCB) database\u2014a comprehensive evaluation of reference-free compressors for FASTA-formatted sequences","volume":"9","author":"Kryukov","year":"2020","journal-title":"GigaScience"},{"key":"2024111706273321500_bib86","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1109\/DCC.2012.44","article-title":"A machine learning perspective on predictive coding with PAQ8","volume-title":"2012 Data Compression Conference","author":"Knoll","year":"2012"},{"key":"2024111706273321500_bib87","author":"Buchner"},{"issue":"8","key":"2024111706273321500_bib88","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2024111706273321500_bib89","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-23873-5_17","article-title":"GeCo2: An optimized tool for lossless compression and analysis of DNA sequences","volume-title":"International Conference on Practical Applications of Computational Biology & Bioinformatics","author":"Pratas"},{"issue":"11","key":"2024111706273321500_bib90","doi-asserted-by":"crossref","first-page":"e79922","DOI":"10.1371\/journal.pone.0079922","article-title":"DNA sequences at a glance","volume":"8","author":"Pinho","year":"2013","journal-title":"PLoS One"},{"key":"2024111706273321500_bib91","first-page":"2024","article-title":"Symbolic to numerical conversion of DNA sequences using finite-context models","volume-title":"2011 19th European Signal Processing Conference","author":"Pinho","year":"2011"},{"key":"2024111706273321500_bib92","doi-asserted-by":"crossref","first-page":"100535","DOI":"10.1016\/j.softx.2020.100535","article-title":"GTO: a toolkit to unify pipelines in genomic and proteomic research","volume":"12","author":"Almeida","year":"2020","journal-title":"SoftwareX"},{"issue":"8","key":"2024111706273321500_bib93","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1101\/gr.104372.109","article-title":"Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes","volume":"20","author":"Romiguier","year":"2010","journal-title":"Genome Res"},{"key":"2024111706273321500_bib94","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1146\/annurev-genom-082908-150001","article-title":"Biased gene conversion and the evolution of mammalian genomic landscapes","volume":"10","author":"Duret","year":"2009","journal-title":"Annu Rev Genomics Human Genet"},{"issue":"6","key":"2024111706273321500_bib95","doi-asserted-by":"crossref","first-page":"e1009596","DOI":"10.1371\/journal.ppat.1009596","article-title":"Extensive C-&gt; U transition biases in the genomes of a wide range of mammalian RNA viruses; potential associations with transcriptional mutations, damage-or host-mediated editing of viral RNA","volume":"17","author":"Simmonds","year":"2021","journal-title":"PLoS Pathogens"},{"issue":"2","key":"2024111706273321500_bib96","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1093\/nar\/gkj454","article-title":"Base-stacking and base-pairing contributions into thermal stability of the DNA double helix","volume":"34","author":"Yakovchuk","year":"2006","journal-title":"Nucleic Acids Res"},{"issue":"14","key":"2024111706273321500_bib97","doi-asserted-by":"crossref","first-page":"8891","DOI":"10.1039\/D0CP06630C","article-title":"Analysis of DNA interactions and GC content with energy decomposition in large-scale quantum mechanical calculations","volume":"23","author":"Chen","year":"2021","journal-title":"Phys Chem Chem Phys"},{"key":"2024111706273321500_bib98","volume-title":"Entrez direct: E-utilities on the UNIX command line","author":"Kans","year":"2020"},{"key":"2024111706273321500_bib99","volume-title":"Discriminant analysis and statistical pattern recognition","author":"McLachlan","year":"2004"},{"key":"2024111706273321500_bib100","first-page":"41","article-title":"An empirical study of the naive Bayes classifier","volume-title":"IJCAI 2001 workshop on empirical methods in artificial intelligence","author":"Rish","year":"2001"},{"key":"2024111706273321500_bib101","first-page":"986","article-title":"KNN model-based approach in classification","author":"Guo","year":"2003"},{"key":"2024111706273321500_bib102","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511801389","volume-title":"An introduction to support vector machines and other kernel-based learning methods","author":"Cristianini","year":"2000"},{"key":"2024111706273321500_bib103","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","article-title":"XGBoost: a scalable tree boosting system","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD \u201916","author":"Chen","year":"2016"},{"key":"2024111706273321500_bib104","author":"Mahoney M","year":": 02\/03\/2022","journal-title":"The PAQ Data Compression Programs"},{"issue":"2","key":"2024111706273321500_bib105","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1099\/jgv.0.001190","article-title":"ICTV virus taxonomy profile: Tristromaviridae","volume":"100","author":"Prangishvili","year":"2019","journal-title":"J Gen Virol"},{"key":"2024111706273321500_bib106","doi-asserted-by":"crossref","first-page":"JVI","DOI":"10.1128\/JVI.00673-21","article-title":"Adnaviria: a new realm for archaeal filamentous viruses with linear A-form double-stranded DNA genomes","volume":"95","author":"Krupovic","year":"2021","journal-title":"Journal of Virology"},{"key":"2024111706273321500_bib107","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.virusres.2017.11.025","article-title":"Viruses of archaea: structural, functional, environmental and evolutionary genomics","volume":"244","author":"Krupovic","year":"2018","journal-title":"Virus Res"},{"issue":"5","key":"2024111706273321500_bib108","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1099\/jgv.0.001409","article-title":"ICTV virus taxonomy profile: Botourmiaviridae","volume":"101","author":"Ayll\u00f3n","year":"2020","journal-title":"J Gen Virol"},{"issue":"1","key":"2024111706273321500_bib109","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1743-422X-7-308","article-title":"A neurotropic herpesvirus infecting the gastropod, abalone, shares ancestry with oyster herpesvirus and a herpesvirus associated with the amphioxus genome","volume":"7","author":"Savin","year":"2010","journal-title":"Virol J"},{"key":"2024111706273321500_bib110","volume-title":"Virus taxonomy: ninth report of the International Committee on Taxonomy of Viruses","author":"King","year":"2011"},{"issue":"3","key":"2024111706273321500_bib111","doi-asserted-by":"crossref","first-page":"e00265","DOI":"10.1128\/mSphere.00265-20","article-title":"HERQ-9 is a new multiplex PCR for differentiation and quantification of all nine human herpesviruses","volume":"5","author":"Py\u00f6ri\u00e4","year":"2020","journal-title":"Msphere"},{"key":"2024111706273321500_bib112","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511545313.006","article-title":"Genetic comparison of human alphaherpesvirus genomes","author":"Baines","year":"2007","journal-title":"Human herpesviruses: biology, therapy, and immunoprophylaxis"},{"issue":"8","key":"2024111706273321500_bib113","doi-asserted-by":"crossref","first-page":"e1008915","DOI":"10.1371\/journal.pgen.1008915","article-title":"Endogenization and excision of human herpesvirus 6 in human genomes","volume":"16","author":"Liu","year":"2020","journal-title":"PLoS Genet"},{"issue":"6","key":"2024111706273321500_bib114","doi-asserted-by":"crossref","first-page":"e33","DOI":"10.1093\/nar\/gkaa1237","article-title":"SurVirus: a repeat-aware virus integration caller","volume":"49","author":"Rajaby","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024111706273321500_bib115","doi-asserted-by":"crossref","first-page":"104720","DOI":"10.1016\/j.antiviral.2020.104720","article-title":"Current understanding of human herpesvirus 6 (HHV-6) chromosomal integration","volume":"176","author":"Aimola","year":"2020","journal-title":"Antiviral Res"},{"key":"2024111706273321500_bib116","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1007\/978-1-0716-1036-7_8","article-title":"Sequence comparison without alignment: the SpaM approaches","volume-title":"Multiple sequence alignment","author":"Morgenstern","year":"2021"},{"issue":"1","key":"2024111706273321500_bib117","doi-asserted-by":"crossref","first-page":"Lqz013","DOI":"10.1093\/nargab\/lqz013","article-title":"\u2018Multi-SpaM\u2019: a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees","volume":"2","author":"Dencker","year":"2019","journal-title":"NAR Genomics Bioinformatics"},{"key":"2024111706273321500_bib118","doi-asserted-by":"crossref","first-page":"5911","DOI":"10.1016\/j.csbj.2021.10.029","article-title":"A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes","volume":"19","author":"Garcia","year":"2021","journal-title":"Computational Structural Biotechnol J"},{"issue":"1","key":"2024111706273321500_bib119","first-page":"1","article-title":"Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer","volume":"7","author":"Zhang","year":"2017","journal-title":"Sci Rep"},{"key":"2024111706273321500_bib120","doi-asserted-by":"crossref","first-page":"105106","DOI":"10.1016\/j.meegid.2021.105106","article-title":"Alignment-free sequence comparison for virus genomes based on location correlation coefficient","volume":"96","author":"He","year":"2021","journal-title":"Infect Genet Evol"},{"key":"2024111706273321500_bib121","doi-asserted-by":"crossref","DOI":"10.1515\/sagmb-2018-0004","article-title":"Comparisons of classification methods for viral genomes and protein families using alignment-free vectorization","volume":"17","author":"Huang","year":"2018","journal-title":"Statistical Applications in Genetics and Molecular Biology"},{"issue":"6","key":"2024111706273321500_bib122","doi-asserted-by":"crossref","first-page":"e1006277","DOI":"10.1371\/journal.pcbi.1006277","article-title":"Removing contaminants from databases of draft genomes","volume":"14","author":"Lu","year":"2018","journal-title":"PLoS Comput Biol"},{"issue":"23","key":"2024111706273321500_bib123","doi-asserted-by":"crossref","first-page":"4433","DOI":"10.1007\/s00018-016-2299-6","article-title":"Mechanisms of viral mutation","volume":"73","author":"Sanju\u00e1n","year":"2016","journal-title":"Cell Mol Life Sci"},{"issue":"5","key":"2024111706273321500_bib124","doi-asserted-by":"crossref","first-page":"899","DOI":"10.3201\/eid1605.100164","article-title":"The evolution and emergence of RNA viruses","volume":"16","author":"Mahy","year":"2010","journal-title":"Emerg Infect Dis"},{"issue":"3","key":"2024111706273321500_bib125","doi-asserted-by":"crossref","first-page":"e00408","DOI":"10.1128\/mSphere.00408-20","article-title":"Rampant C\u2192 U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: causes and consequences for their short-and long-term evolutionary trajectories","volume":"5","author":"Simmonds","year":"2020","journal-title":"Msphere"},{"key":"2024111706273321500_bib126","doi-asserted-by":"crossref","unstructured":"Silva JM, Pratas D, Caetano T, et al. \u00a0Supporting data for \"The complexity landscape of viral genomes.\u201d. GigaScience Database. 2022. 10.5524\/102241.","DOI":"10.1093\/gigascience\/giac079"}],"container-title":["GigaScience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/gigascience\/article-pdf\/doi\/10.1093\/gigascience\/giac079\/60705302\/giac079.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gigascience\/article-pdf\/doi\/10.1093\/gigascience\/giac079\/60705302\/giac079.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,17]],"date-time":"2024-11-17T06:30:14Z","timestamp":1731825014000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/gigascience\/article\/doi\/10.1093\/gigascience\/giac079\/6661051"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":126,"URL":"https:\/\/doi.org\/10.1093\/gigascience\/giac079","relation":{},"ISSN":["2047-217X"],"issn-type":[{"value":"2047-217X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]},"article-number":"giac079"}}