{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:56:18Z","timestamp":1761598578089,"version":"3.41.2"},"reference-count":102,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T00:00:00Z","timestamp":1732579200000},"content-version":"vor","delay-in-days":330,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Foundation for Science and Technology","doi-asserted-by":"publisher","award":["UIDB\/00127\/2020"],"award-info":[{"award-number":["UIDB\/00127\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000780","name":"EC","doi-asserted-by":"publisher","award":["101081813"],"award-info":[{"award-number":["101081813"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Most viral genome sequences generated during the latest pandemic have presented new challenges for computational analysis. Analyzing millions of viral genomes in multi-FASTA format is computationally demanding, especially when using alignment-based methods. Most existing methods are not designed to handle such large datasets, often requiring the analysis to be divided into smaller parts to obtain results using available computational resources.<\/jats:p><\/jats:sec><jats:sec><jats:title>Findings<\/jats:title><jats:p>We introduce AltaiR, a toolkit for analyzing multiple sequences in multi-FASTA format using exclusively alignment-free methodologies. AltaiR enables the identification of singularity and similarity patterns within sequences and computes static and temporal dynamics without restrictions on the number or size of input sequences. It automatically filters low-quality, biased, or deviant data. We demonstrate AltaiR\u2019s capabilities by analyzing more than 1.5 million full severe acute respiratory virus coronavirus 2 sequences, revealing interesting observations regarding viral genome characteristics over time, such as shifts in nucleotide composition, decreases in average Kolmogorov sequence complexity, and the evolution of the smallest sequences not found in the human host.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>AltaiR can identify temporal characteristics and trends in large numbers of sequences, making it ideal for scenarios involving endemic or epidemic outbreaks with vast amounts of available sequence data. Implemented in C with multithreading and methodological optimizations, AltaiR is computationally efficient, flexible, and dependency-free. It accepts any sequence in FASTA format, including amino acid sequences. The complete toolkit is freely available at https:\/\/github.com\/cobilab\/altair.<\/jats:p><\/jats:sec>","DOI":"10.1093\/gigascience\/giae086","type":"journal-article","created":{"date-parts":[[2024,11,26]],"date-time":"2024-11-26T13:06:03Z","timestamp":1732626363000},"source":"Crossref","is-referenced-by-count":2,"title":["AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data"],"prefix":"10.1093","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6331-6091","authenticated-orcid":false,"given":"Jorge M","family":"Silva","sequence":"first","affiliation":[{"name":"IEETA\/LASI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro ,","place":["Portugal"]},{"name":"DETI, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro ,","place":["Portugal"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9164-0016","authenticated-orcid":false,"given":"Armando J","family":"Pinho","sequence":"additional","affiliation":[{"name":"IEETA\/LASI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro ,","place":["Portugal"]},{"name":"DETI, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro ,","place":["Portugal"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1176-552X","authenticated-orcid":false,"given":"Diogo","family":"Pratas","sequence":"additional","affiliation":[{"name":"IEETA\/LASI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro ,","place":["Portugal"]},{"name":"DETI, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro ,","place":["Portugal"]},{"name":"DoV, Department of Virology, University of Helsinki, Helsinki ,","place":["Finland"]}]}],"member":"286","published-online":{"date-parts":[[2024,11,26]]},"reference":[{"issue":"49","key":"2024112613050073900_bib1","doi-asserted-by":"publisher","first-page":"1049","DOI":"10.46234\/ccdcw2021.255","article-title":"GISAID\u2019s role in pandemic response","volume":"3","author":"Khare","year":"2021","journal-title":"China CDC Wkly"},{"issue":"D1","key":"2024112613050073900_bib2","doi-asserted-by":"publisher","first-page":"D482","DOI":"10.1093\/nar\/gkw1065","article-title":"Virus Variation Resource\u2013improved response to emergent viral outbreaks","volume":"45","author":"Hatcher","year":"2017","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2024112613050073900_bib3","doi-asserted-by":"publisher","first-page":"D48","DOI":"10.1093\/nar\/gkv1323","article-title":"The international nucleotide sequence database collaboration","volume":"44","author":"Cochrane","year":"2016","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2024112613050073900_bib4","doi-asserted-by":"publisher","first-page":"D92","DOI":"10.1093\/nar\/gkaa1023","article-title":"GenBank","volume":"49","author":"Sayers","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2024112613050073900_bib5","doi-asserted-by":"publisher","first-page":"D82","DOI":"10.1093\/nar\/gkaa1028","article-title":"The European Nucleotide Archive in 2020","volume":"49","author":"Harrison","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2024112613050073900_bib6","doi-asserted-by":"publisher","first-page":"D102","DOI":"10.1093\/nar\/gkab995","article-title":"DNA Data Bank of Japan (DDBJ) update report 2021","volume":"50","author":"Okido","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024112613050073900_bib7","doi-asserted-by":"publisher","first-page":"104812","DOI":"10.1016\/j.jcv.2021.104812","article-title":"Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting","volume":"138","author":"de\u00a0Vries","year":"2021","journal-title":"J Clin Virol"},{"key":"2024112613050073900_bib8","doi-asserted-by":"publisher","first-page":"104691","DOI":"10.1016\/j.jcv.2020.104691","article-title":"Recommendations for the introduction of metagenomic high-throughput sequencing in clinical virology, part I: wet lab procedure","volume":"134","author":"L\u00f3pez-Labrador","year":"2021","journal-title":"J Clin Virol"},{"issue":"8","key":"2024112613050073900_bib9","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1038\/nrmicro2614","article-title":"Why do RNA viruses recombine?","volume":"9","author":"Simon-Loriere","year":"2011","journal-title":"Nat Rev Microbiol"},{"issue":"27","key":"2024112613050073900_bib10","doi-asserted-by":"publisher","first-page":"eabb9153","DOI":"10.1126\/sciadv.abb9153","article-title":"Emergence of SARS-CoV-2 through recombination and strong purifying selection","volume":"6","author":"Li","year":"2020","journal-title":"Science advances"},{"issue":"5923","key":"2024112613050073900_bib11","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1126\/science.1165557","article-title":"Sequencing and analyses of all known human rhinovirus genomes reveal structure and evolution","volume":"324","author":"Palmenberg","year":"2009","journal-title":"Science"},{"issue":"9","key":"2024112613050073900_bib12","doi-asserted-by":"publisher","first-page":"e609","DOI":"10.1016\/S2214-109X(16)30143-7","article-title":"Global burden of cancers attributable to infections in 2012: a synthetic analysis","volume":"4","author":"Plummer","year":"2016","journal-title":"Lancet Global Health"},{"key":"2024112613050073900_bib13","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1146\/annurev-pathmechdis-012418-013023","article-title":"Epstein\u2013Barr virus and cancer","volume":"14","author":"Farrell","year":"2019","journal-title":"Annu Rev Pathol Mech"},{"issue":"8","key":"2024112613050073900_bib14","doi-asserted-by":"publisher","first-page":"762","DOI":"10.3390\/v11080762","article-title":"Viruses and autoimmunity: a review on the potential interaction and molecular mechanisms","volume":"11","author":"Smatti","year":"2019","journal-title":"Viruses"},{"issue":"7","key":"2024112613050073900_bib15","doi-asserted-by":"publisher","first-page":"3223","DOI":"10.1093\/nar\/gkad199","article-title":"Unmasking the tissue-resident eukaryotic DNA virome in humans","volume":"51","author":"Py\u00f6ri\u00e4","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024112613050073900_bib16","doi-asserted-by":"publisher","first-page":"329","DOI":"10.3389\/fcimb.2021.657245","article-title":"The human bone marrow is host to the DNAs of several viruses","volume":"11","author":"Toppinen","year":"2021","journal-title":"Front Cell Infect Microbiol"},{"key":"2024112613050073900_bib17","doi-asserted-by":"publisher","first-page":"102353","DOI":"10.1016\/j.fsigen.2020.102353","article-title":"The landscape of persistent human DNA viruses in femoral bone","volume":"48","author":"Toppinen","year":"2020","journal-title":"Forensic Sci Int Genet"},{"issue":"2","key":"2024112613050073900_bib18","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1007\/s10142-015-0433-4","article-title":"Insights from 20\u00a0years of bacterial genome sequencing","volume":"15","author":"Land","year":"2015","journal-title":"Functional Integrative Genomics"},{"issue":"6588","key":"2024112613050073900_bib19","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1126\/science.abj6987","article-title":"The complete sequence of a human genome","volume":"376","author":"Nurk","year":"2022","journal-title":"Science"},{"key":"2024112613050073900_bib20","doi-asserted-by":"crossref","unstructured":"Qi W, Lim YW, Patrignani A, et al. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. Gigascience. 2022;11:giac028. 10.1093\/gigascience\/giac028","DOI":"10.1093\/gigascience\/giac028"},{"key":"2024112613050073900_bib21","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1038\/s41592-022-01440-3","article-title":"Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies","volume":"19","author":"Mc\u00a0Cartney","year":"2022","journal-title":"Nat Methods"},{"issue":"4","key":"2024112613050073900_bib22","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1101\/gr.276723.122","article-title":"Implications of the first complete human genome assembly","volume":"32","author":"Alkan","year":"2022","journal-title":"Genome Res"},{"issue":"4","key":"2024112613050073900_bib23","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison\u2014a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"issue":"12","key":"2024112613050073900_bib24","doi-asserted-by":"publisher","first-page":"1615","DOI":"10.1089\/cmb.2009.0198","article-title":"Alignment-free sequence comparison (I): statistics and power","volume":"16","author":"Reinert","year":"2009","journal-title":"J Comput Biol"},{"issue":"11","key":"2024112613050073900_bib25","doi-asserted-by":"publisher","first-page":"1467","DOI":"10.1089\/cmb.2010.0056","article-title":"Alignment-free sequence comparison (II): theoretical power of comparison statistics","volume":"17","author":"Wan","year":"2010","journal-title":"J Comput Biol"},{"issue":"1","key":"2024112613050073900_bib26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol"},{"issue":"1","key":"2024112613050073900_bib27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-019-1755-7","article-title":"Benchmarking of alignment-free sequence comparison methods","volume":"20","author":"Zielezinski","year":"2019","journal-title":"Genome Biol"},{"issue":"9","key":"2024112613050073900_bib28","doi-asserted-by":"publisher","first-page":"814","DOI":"10.1016\/j.tibtech.2017.03.006","article-title":"Microbiome tools for forensic science","volume":"35","author":"Metcalf","year":"2017","journal-title":"Trends Biotechnol"},{"issue":"3","key":"2024112613050073900_bib29","doi-asserted-by":"publisher","first-page":"1047","DOI":"10.1093\/bib\/bbz041","article-title":"iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data","volume":"21","author":"Chen","year":"2020","journal-title":"Brief Bioinform"},{"issue":"16","key":"2024112613050073900_bib30","doi-asserted-by":"publisher","first-page":"2586","DOI":"10.1093\/bioinformatics\/btx223","article-title":"DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses","volume":"33","author":"Yang","year":"2017","journal-title":"Bioinformatics"},{"issue":"23","key":"2024112613050073900_bib31","doi-asserted-by":"publisher","first-page":"3983","DOI":"10.1093\/bioinformatics\/bty476","article-title":"Meffil: efficient normalization and analysis of very large DNA methylation datasets","volume":"34","author":"Min","year":"2018","journal-title":"Bioinformatics"},{"issue":"1","key":"2024112613050073900_bib32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-9-11","article-title":"SeqAn an efficient, generic C++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2024112613050073900_bib33","doi-asserted-by":"publisher","first-page":"900","DOI":"10.12688\/f1000research.6924.1","article-title":"The khmer software package: enabling efficient nucleotide sequence analysis","volume":"4","author":"Crusoe","year":"2015;","journal-title":"F1000Research"},{"issue":"15","key":"2024112613050073900_bib34","doi-asserted-by":"publisher","first-page":"272","DOI":"10.21105\/joss.00272","article-title":"khmer release v2.1: software for biological sequence analysis","volume":"2","author":"Standage","year":"2017","journal-title":"J Open Source Softw"},{"key":"2024112613050073900_bib35","doi-asserted-by":"publisher","first-page":"100535","DOI":"10.1016\/j.softx.2020.100535","article-title":"GTO: a toolkit to unify pipelines in genomic and proteomic research","volume":"12","author":"Almeida","year":"2020","journal-title":"SoftwareX"},{"issue":"20","key":"2024112613050073900_bib36","doi-asserted-by":"publisher","first-page":"2959","DOI":"10.1093\/bioinformatics\/btu406","article-title":"GATB: genome assembly & analysis tool box","volume":"30","author":"Drezen","year":"2014","journal-title":"Bioinformatics"},{"issue":"W1","key":"2024112613050073900_bib37","doi-asserted-by":"publisher","first-page":"W102","DOI":"10.1093\/nar\/gky406","article-title":"Mutalisk: a web-based somatic MUTation AnaLyIS toolKit for genomic, transcriptional and epigenomic signatures","volume":"46","author":"Lee","year":"2018","journal-title":"Nucleic Acids Res"},{"issue":"9","key":"2024112613050073900_bib38","doi-asserted-by":"publisher","first-page":"1290","DOI":"10.1093\/bioinformatics\/btt756","article-title":"CGAT: computational genomics analysis toolkit","volume":"30","author":"Sims","year":"2014","journal-title":"Bioinformatics"},{"key":"2024112613050073900_bib39","doi-asserted-by":"crossref","unstructured":"Hiltemann S, Mei H, de\u00a0Hollander M, et al. CGtag: complete genomics toolkit and annotation in a cloud-based Galaxy. Gigascience. 2014;3(1):2047\u2013217X-3-1. 10.1186\/2047-217X-3-1.","DOI":"10.1186\/2047-217X-3-1"},{"key":"2024112613050073900_bib40","doi-asserted-by":"crossref","unstructured":"de\u00a0Koning W, Miladi M, Hiltemann S, et al. NanoGalaxy: nanopore long-read sequencing data analysis in Galaxy. Gigascience. 2020;9(10):giaa105. 10.1093\/gigascience\/giaa105.","DOI":"10.1093\/gigascience\/giaa105"},{"key":"2024112613050073900_bib41","doi-asserted-by":"crossref","unstructured":"Silva JM, Qi W, Pinho AJ, et al. AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data. Gigascience. 2022; 12: giad101. 10.1093\/gigascience\/giad101.","DOI":"10.1093\/gigascience\/giad101"},{"issue":"23","key":"2024112613050073900_bib42","doi-asserted-by":"publisher","first-page":"3399","DOI":"10.1093\/bioinformatics\/btu555","article-title":"Poretools: a toolkit for analyzing nanopore sequence data","volume":"30","author":"Loman","year":"2014","journal-title":"Bioinformatics"},{"issue":"8","key":"2024112613050073900_bib43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/gb-2007-8-8-r171","article-title":"PyCogent: a toolkit for making sense from sequence","volume":"8","author":"Knight","year":"2007","journal-title":"Genome Biol"},{"issue":"10","key":"2024112613050073900_bib44","doi-asserted-by":"publisher","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"2024112613050073900_bib45","doi-asserted-by":"publisher","first-page":"e230","DOI":"10.1038\/mtna.2015.4","article-title":"FASTAptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections","volume":"4","author":"Alam","year":"2015","journal-title":"Mol Ther Nucl Acids"},{"key":"2024112613050073900_bib46","first-page":"48","article-title":"fairseq: a fast, extensible toolkit for sequence modeling","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)","author":"Ott","year":"2019"},{"issue":"8","key":"2024112613050073900_bib47","doi-asserted-by":"publisher","first-page":"1194","DOI":"10.1016\/j.molp.2020.06.009","article-title":"TBtools: an integrative toolkit developed for interactive analyses of big biological data","volume":"13","author":"Chen","year":"2020","journal-title":"Mol Plant"},{"issue":"1","key":"2024112613050073900_bib48","doi-asserted-by":"publisher","first-page":"e108","DOI":"10.1002\/cpbi.108","article-title":"Protein sequence analysis using the MPI bioinformatics toolkit","volume":"72","author":"Gabler","year":"2020","journal-title":"Curr Protoc Bioinform"},{"issue":"7","key":"2024112613050073900_bib49","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1038\/nbt.4163","article-title":"KBase: the United States Department of Energy Systems Biology Knowledgebase","volume":"36","author":"Arkin","year":"2018","journal-title":"Nat Biotechnol"},{"issue":"19","key":"2024112613050073900_bib50","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1093\/bioinformatics\/btz144","article-title":". Nucleotide archival format (NAF) enables efficient lossless reference-free compression of DNA sequences","volume":"35","author":"Kryukov","year":"2019","journal-title":"Bioinformatics"},{"issue":"3","key":"2024112613050073900_bib51","doi-asserted-by":"publisher","first-page":"btad097","DOI":"10.1093\/bioinformatics\/btad097","article-title":"AGC: compact representation of assembled genomes with fast queries and updates","volume":"39","author":"Deorowicz","year":"2023","journal-title":"Bioinformatics"},{"key":"2024112613050073900_bib52","doi-asserted-by":"crossref","unstructured":"Grabowski S, Kowalski TM. MBGC: Multiple Bacteria Genome Compressor. Gigascience. 2022;11: giab099. 10.1093\/gigascience\/giab099.","DOI":"10.1093\/gigascience\/giab099"},{"issue":"1","key":"2024112613050073900_bib53","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1093\/bioinformatics\/btt594","article-title":"MFCompress: a compression tool for FASTA and multi-FASTA data","volume":"30","author":"Pinho","year":"2014","journal-title":"Bioinformatics"},{"issue":"1","key":"2024112613050073900_bib54","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1093\/bioinformatics\/bty645","article-title":"Cryfa: a secure encryption tool for genomic data","volume":"35","author":"Hosseini","year":"2019","journal-title":"Bioinformatics"},{"issue":"6","key":"2024112613050073900_bib55","doi-asserted-by":"publisher","first-page":"e1006277","DOI":"10.1371\/journal.pcbi.1006277","article-title":"Removing contaminants from databases of draft genomes","volume":"14","author":"Lu","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2024112613050073900_bib56","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/s13323-014-0017-4","article-title":"Editors\u2019 pick: contamination has always been the issue!","volume":"5","author":"Sajantila","year":"2014","journal-title":"BioMed Central"},{"issue":"12","key":"2024112613050073900_bib57","doi-asserted-by":"publisher","first-page":"3250","DOI":"10.1109\/TIT.2004.838101","article-title":"The similarity metric","volume":"50","author":"Li","year":"2004","journal-title":"IEEE Trans Inform Theory"},{"key":"2024112613050073900_bib58","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-49820-1","article-title":"An introduction to Kolmogorov complexity and its applications","author":"Li","year":"2008"},{"issue":"1","key":"2024112613050073900_bib59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/00207166808803030","article-title":"Three approaches to the quantitative definition of information","volume":"1","author":"Kolmogorov","year":"1965","journal-title":"Probl Inf Transm"},{"issue":"6","key":"2024112613050073900_bib60","doi-asserted-by":"publisher","first-page":"393","DOI":"10.3390\/e20060393","article-title":"Comparison of compression-based measures with application to the evolution of primate genomes","volume":"20","author":"Pratas","year":"2018","journal-title":"Entropy"},{"issue":"4","key":"2024112613050073900_bib61","doi-asserted-by":"publisher","first-page":"439","DOI":"10.3390\/e24040439","article-title":"Fast phylogeny of SARS-CoV-2 by compression","volume":"24","author":"Cilibrasi","year":"2022","journal-title":"Entropy"},{"issue":"5","key":"2024112613050073900_bib62","doi-asserted-by":"publisher","first-page":"530","DOI":"10.3390\/e23050530","article-title":"AC2: an efficient protein sequence compression tool using artificial neural networks and Cache-Hash models","volume":"23","author":"Silva","year":"2021","journal-title":"Entropy"},{"issue":"4","key":"2024112613050073900_bib63","doi-asserted-by":"publisher","first-page":"367","DOI":"10.4310\/CIS.2005.v5.n4.a1","article-title":"Common pitfalls using the normalized compression distance: what to watch out for in a compressor","volume":"5","author":"Cebri\u00e1n","year":"2005","journal-title":"Commun Inform Syst"},{"key":"2024112613050073900_bib64","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1007\/978-3-319-60816-7_28","article-title":"On the role of inverted repeats in DNA sequence similarity","volume-title":"International Conference on Practical Applications of Computational Biology & Bioinformatics","author":"Hosseini","year":"2017"},{"key":"2024112613050073900_bib65","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1007\/978-3-319-60816-7_32","article-title":"Substitutional tolerant Markov models for relative compression of DNA sequences","volume-title":"International Conference on Practical Applications of Computational Biology & Bioinformatics","author":"Pratas","year":"2017"},{"key":"2024112613050073900_bib66","doi-asserted-by":"crossref","unstructured":"Silva M, Pratas D, Pinho AJ. Efficient DNA sequence compression with neural networks. Gigascience. 2020;9(11):giaa119. 10.1093\/gigascience\/giaa119.","DOI":"10.1093\/gigascience\/giaa119"},{"key":"2024112613050073900_bib67","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1007\/978-3-319-58838-4_29","article-title":"On the approximation of the Kolmogorov complexity for DNA sequences","volume-title":"Iberian Conference on Pattern Recognition and Image Analysis","author":"Pratas","year":"2017"},{"issue":"15","key":"2024112613050073900_bib68","doi-asserted-by":"publisher","first-page":"2421","DOI":"10.1093\/bioinformatics\/btv189","article-title":"Three minimal sequences found in Ebola virus genomes and absent from human DNA","volume":"31","author":"Silva","year":"2015","journal-title":"Bioinformatics"},{"key":"2024112613050073900_bib69","doi-asserted-by":"publisher","first-page":"5129","DOI":"10.1093\/bioinformatics\/btaa686","article-title":"Persistent minimal sequences of SARS-CoV-2","volume":"36","author":"Pratas","year":"2020","journal-title":"Bioinformatics"},{"article-title":"Compression and analysis of genomic data","year":"2016","author":"Pratas","key":"2024112613050073900_bib70"},{"key":"2024112613050073900_bib71","first-page":"555","article-title":"Minimal forbidden words and symbolic dynamics","volume-title":"Annual Symposium on Theoretical Aspects of Computer Science","author":"B\u00e9al","year":"1996"},{"issue":"3","key":"2024112613050073900_bib72","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/S0020-0190(98)00104-5","article-title":"Automata and forbidden words","volume":"67","author":"Crochemore","year":"1998","journal-title":"Inf Process Lett"},{"issue":"1","key":"2024112613050073900_bib73","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-10-137","article-title":"On finding minimal absent words","volume":"10","author":"Pinho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2024112613050073900_bib74","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1007\/978-3-030-89716-1_8","article-title":"Absent subsequences in words","volume-title":"International Conference on Reachability Problems","author":"Kosche","year":"2021"},{"key":"2024112613050073900_bib75","first-page":"1","article-title":"Constructing strings avoiding forbidden substrings","volume":"191","author":"Bernardini","year":"2021","journal-title":"32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)"},{"issue":"6","key":"2024112613050073900_bib76","doi-asserted-by":"publisher","first-page":"3139","DOI":"10.1093\/nar\/gkab139","article-title":"Significant non-existence of sequences in genomes and proteomes","volume":"49","author":"Koulouras","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"10","key":"2024112613050073900_bib77","doi-asserted-by":"publisher","first-page":"1468","DOI":"10.1093\/bioinformatics\/btaa853","article-title":"ADACT: a tool for analysing (dis) similarity among nucleotide and protein sequences using minimal and relative absent words","volume":"37","author":"Akon","year":"2021","journal-title":"Bioinformatics"},{"key":"2024112613050073900_bib78","article-title":"AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data","author":"cobilab","year":"2024","journal-title":"cobilab"},{"year":"2023","author":"NCBI. NCBI Virus","key":"2024112613050073900_bib79"},{"issue":"4","key":"2024112613050073900_bib80","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1038\/s12276-021-00604-z","article-title":"On the origin and evolution of SARS-CoV-2","volume":"53","author":"Singh","year":"2021","journal-title":"Exp Mol Med"},{"issue":"24","key":"2024112613050073900_bib81","doi-asserted-by":"publisher","first-page":"13910","DOI":"10.1073\/pnas.96.24.13910","article-title":"Mutation rates among RNA viruses","volume":"96","author":"Drake","year":"1999","journal-title":"Proc Natl Acad Sci"},{"issue":"19","key":"2024112613050073900_bib82","doi-asserted-by":"publisher","first-page":"9733","DOI":"10.1128\/JVI.00694-10","article-title":"Viral mutation rates","volume":"84","author":"Sanju\u00e1n","year":"2010","journal-title":"J Virol"},{"issue":"4","key":"2024112613050073900_bib83","doi-asserted-by":"publisher","first-page":"794","DOI":"10.1016\/j.cell.2020.06.040","article-title":"Making sense of mutation: what D614G means for the COVID-19 pandemic remains unclear","volume":"182","author":"Grubaugh","year":"2020","journal-title":"Cell"},{"issue":"7","key":"2024112613050073900_bib84","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1038\/s41579-021-00573-0","article-title":"SARS-CoV-2 variants, spike mutations and immune escape","volume":"19","author":"Harvey","year":"2021","journal-title":"Nat Rev Microbiol"},{"issue":"4","key":"2024112613050073900_bib85","doi-asserted-by":"publisher","first-page":"812","DOI":"10.1016\/j.cell.2020.06.043","article-title":"Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus","volume":"182","author":"Korber","year":"2020","journal-title":"Cell"},{"issue":"7852","key":"2024112613050073900_bib86","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1038\/s41586-020-2895-3","article-title":"Spike mutation D614G alters SARS-CoV-2 fitness","volume":"592","author":"Plante","year":"2021","journal-title":"Nature"},{"issue":"2","key":"2024112613050073900_bib87","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1016\/j.cell.2020.02.058","article-title":"Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein","volume":"181","author":"Walls","year":"2020","journal-title":"Cell"},{"volume-title":"Homo sapiens genome assembly T2T-CHM13v2.0.","year":"2023","author":"NCBI","key":"2024112613050073900_bib88"},{"article-title":"Human Genome Resources at NCBI","year":"2023","author":"NCBI","key":"2024112613050073900_bib89"},{"issue":"1","key":"2024112613050073900_bib90","doi-asserted-by":"publisher","first-page":"12331","DOI":"10.1038\/s41598-020-69342-y","article-title":"Human SARS-CoV-2 has evolved to reduce CG dinucleotide in its open reading frames","volume":"10","author":"Wang","year":"2020","journal-title":"Sci Rep UK"},{"issue":"7674","key":"2024112613050073900_bib91","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1038\/nature24039","article-title":"CG dinucleotide suppression enables antiviral defence targeting non-self RNA","volume":"550","author":"Takata","year":"2017","journal-title":"Nature"},{"issue":"1","key":"2024112613050073900_bib92","doi-asserted-by":"publisher","first-page":"2420","DOI":"10.1038\/s41598-022-06046-5","article-title":"The low abundance of CpG in the SARS-CoV-2 genome is not an evolutionarily signature of ZAP","volume":"12","author":"Afrasiabi","year":"2022","journal-title":"Sci Rep UK"},{"issue":"1","key":"2024112613050073900_bib93","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1038\/s42003-023-04784-4","article-title":"HaploCoV: unsupervised classification and rapid detection of novel emerging variants of SARS-CoV-2","volume":"6","author":"Chiara","year":"2023","journal-title":"Commun Biol"},{"volume-title":"Bioinformatics pipeline for analyzing SARS-CoV-2 genomes","author":"B","key":"2024112613050073900_bib94"},{"volume-title":"HaploCoV: a tool for haplotype analysis in SARS-CoV-2 genomes","author":"Chiara","key":"2024112613050073900_bib95","doi-asserted-by":"publisher","DOI":"10.1038\/s42003-023-04784-4"},{"key":"2024112613050073900_bib96","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1016\/j.bios.2016.04.091","article-title":"Aptamers, antibody scFv, and antibody Fab\u2019 fragments: an overview and comparison of three of the most versatile biosensor biorecognition elements","volume":"85","author":"Crivianu-Gaita","year":"2016","journal-title":"Biosensors Bioelectronics"},{"issue":"7","key":"2024112613050073900_bib97","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1038\/nrd3141","article-title":"Aptamers as therapeutics","volume":"9","author":"Keefe","year":"2010","journal-title":"Nat Rev Drug Discov"},{"issue":"1","key":"2024112613050073900_bib98","doi-asserted-by":"publisher","first-page":"6074","DOI":"10.1038\/s41598-021-85629-0","article-title":"AptaNet as a deep learning approach for aptamer\u2013protein interaction prediction","volume":"11","author":"Emami","year":"2021","journal-title":"Sci Rep UK"},{"key":"2024112613050073900_bib99","doi-asserted-by":"publisher","first-page":"114096","DOI":"10.1016\/j.bcp.2020.114096","article-title":"Animal toxins\u2014Nature\u2019s evolutionary-refined toolkit for basic research and drug discovery","volume":"181","author":"Herzig","year":"2020","journal-title":"Biochem Pharmacol"},{"key":"2024112613050073900_bib100","doi-asserted-by":"crossref","unstructured":"Pratas D, Toppinen M, Py\u00f6ri\u00e4 L, et al. A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level. Gigascience. 2020;9(8):giaa086. 10.1093\/gigascience\/giaa086.","DOI":"10.1093\/gigascience\/giaa086"},{"issue":"1","key":"2024112613050073900_bib101","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12870-021-03383-x","article-title":"Identification of Pueraria spp. through DNA barcoding and comparative transcriptomics","volume":"22","author":"Adolfo","year":"2022","journal-title":"BMC Plant Biol"},{"key":"2024112613050073900_bib102","doi-asserted-by":"crossref","unstructured":"Silva JM, Pinho AJ, Pratas D. Supporting data for \u201cAltaiR: A C Toolkit for Alignment-Free and Temporal Analysis of Multi-FASTA Data.\u201d GigaScience Database. 2024. 10.5524\/102587.","DOI":"10.1093\/gigascience\/giae086"}],"container-title":["GigaScience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/gigascience\/article-pdf\/doi\/10.1093\/gigascience\/giae086\/60816901\/giae086.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gigascience\/article-pdf\/doi\/10.1093\/gigascience\/giae086\/60816901\/giae086.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,29]],"date-time":"2024-11-29T17:50:23Z","timestamp":1732902623000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/gigascience\/article\/doi\/10.1093\/gigascience\/giae086\/7908817"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":102,"URL":"https:\/\/doi.org\/10.1093\/gigascience\/giae086","relation":{},"ISSN":["2047-217X"],"issn-type":[{"type":"electronic","value":"2047-217X"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]},"article-number":"giae086"}}