{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T06:24:25Z","timestamp":1769840665246,"version":"3.49.0"},"reference-count":91,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,4,26]],"date-time":"2021-04-26T00:00:00Z","timestamp":1619395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2\u20139% and 6\u20137% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences\u2019 input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.<\/jats:p>","DOI":"10.3390\/e23050530","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T02:31:43Z","timestamp":1619490703000},"page":"530","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7535-4933","authenticated-orcid":false,"given":"Milton","family":"Silva","sequence":"first","affiliation":[{"name":"IEETA\u2014Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1176-552X","authenticated-orcid":false,"given":"Diogo","family":"Pratas","sequence":"additional","affiliation":[{"name":"IEETA\u2014Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Virology, University of Helsinki, 00014 Helsinki, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9164-0016","authenticated-orcid":false,"given":"Armando J.","family":"Pinho","sequence":"additional","affiliation":[{"name":"IEETA\u2014Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1042","DOI":"10.1126\/science.1219021","article-title":"The protein-folding problem, 50 years on","volume":"338","author":"Dill","year":"2012","journal-title":"Science"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1126\/science.370.6521.1144","article-title":"\u2018The game has changed.\u2019 AI triumphs at protein folding","volume":"370","author":"Service","year":"2020","journal-title":"Science"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Golan, A. (2018). Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information, Oxford University Press.","DOI":"10.1093\/oso\/9780199349524.001.0001"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sayood, K. (2017). Introduction to Data Compression, Morgan Kaufmann.","DOI":"10.1016\/B978-0-12-809474-7.00019-7"},{"key":"ref_5","unstructured":"Baxevanis, A.D., Bader, G.D., and Wishart, D.S. (2020). Bioinformatics, John Wiley & Sons."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Amich, M., De Luca, P., and Fiscale, S. (2020, January 5\u20138). Accelerated implementation of FQSqueezer novel genomic compression method. Proceedings of the 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), Warsaw, Poland.","DOI":"10.1109\/ISPDC51135.2020.00030"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"34","DOI":"10.3390\/e12010034","article-title":"Data compression concepts and algorithms and their applications to bioinformatics","volume":"12","author":"Nalbantoglu","year":"2010","journal-title":"Entropy"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pratas, D., and Pinho, A.J. (2017, January 20\u201323). On the approximation of the Kolmogorov complexity for DNA sequences. Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Faro Portugal.","DOI":"10.1007\/978-3-319-58838-4_29"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1007\/s12539-019-00322-1","article-title":"AC: A compression tool for amino acid sequences","volume":"11","author":"Hosseini","year":"2019","journal-title":"Interdiscip. Sci. Comput. Life Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1093\/bioinformatics\/bti806","article-title":"Application of compression-based distance measures to protein sequence classification: A methodological study","volume":"22","author":"Kocsor","year":"2005","journal-title":"Bioinformatics"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., and Valiente, G. (2007). Compression-based classification of biological sequences and structures via the Universal Similarity Metric: Experimental assessment. BMC Bioinform., 8.","DOI":"10.1186\/1471-2105-8-252"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Cilibrasi, R.L., and Vit\u00e1nyi, P.M. (2020). Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression. bioRxiv.","DOI":"10.1101\/2020.07.22.216242"},{"key":"ref_13","unstructured":"Cilibrasi, R.L. (2007). Statistical Inference through Data Compression. [Ph.D. Thesis, Universiteit van Amsterdam]."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kuruppu, S., Puglisi, S.J., and Zobel, J. (2010, January 11\u201313). Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. Proceedings of the International Symposium on String Processing and Information Retrieval, Los Cabos, Mexico.","DOI":"10.1007\/978-3-642-16321-0_20"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"giaa048","DOI":"10.1093\/gigascience\/giaa048","article-title":"Smash++: An alignment-free and memory-efficient tool to find genomic rearrangements","volume":"9","author":"Hosseini","year":"2020","journal-title":"GigaScience"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1413","DOI":"10.1093\/bioinformatics\/btz782","article-title":"An improved encoding of genetic variation in a Burrows\u2013Wheeler transform","volume":"36","author":"Ohlebusch","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bywater, R.P. (2015). Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity. PLoS ONE, 10.","DOI":"10.1371\/journal.pone.0119306"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Subramanian, R., Allison, L., Stuckey, P.J., De La Banda, M.G., Abramson, D., Lesk, A.M., and Konagurthu, A.S. (2017, January 4\u20137). Statistical compression of protein folding patterns for inference of recurrent substructural themes. Proceedings of the 2017 Data Compression Conference (DCC), Snowbird, UT, USA.","DOI":"10.1109\/DCC.2017.46"},{"key":"ref_19","unstructured":"Beller, T., and Ohlebusch, E. (July, January 29). Efficient construction of a compressed de Bruijn graph for pan-genome analysis. Proceedings of the Annual Symposium on Combinatorial Pattern Matching, Ischia Island, Italy."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Pratas, D., and Pinho, A.J. (2018, January 3\u20137). Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight. Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Eternal City, Italy.","DOI":"10.23919\/EUSIPCO.2018.8553297"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, Y., Ding, Y., Guo, F., Wei, L., and Tang, J. (2017). Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0185587"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1016\/j.ymeth.2014.01.012","article-title":"Proteome compression via protein domain compositions","volume":"67","author":"Hayashida","year":"2014","journal-title":"Methods"},{"key":"ref_23","unstructured":"Hayashida, M., Ishibashi, K., and Koyano, H. (August, January 30). Analyzing Order of Domains in Grammar-based Compression of Proteomes. Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, NV, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hosseini, M., Pratas, D., and Pinho, A.J. (2016). A survey on data compression methods for biological sequences. Information, 7.","DOI":"10.3390\/info7040056"},{"key":"ref_25","unstructured":"Hategan, A., and Tabus, I. (2004, January 9\u201311). Protein is compressible. Proceedings of the 6th Nordic Signal Processing Symposium, Espoo, Finland."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hategan, A., and Tabus, I. (2007, January 10\u201312). Jointly Encoding Protein Sequences and their Secondary Structure Information. Proceedings of the 2007 IEEE International Workshop on Genomic Signal Processing and Statistics, Tuusula, Finland.","DOI":"10.1109\/GENSIPS.2007.4365849"},{"key":"ref_27","unstructured":"Adjeroh, D., and Nan, F. (2006, January 28\u201330). On compressibility of protein sequences. Proceedings of the Data Compression Conference (DCC\u201906), Snowbird, UT, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2007\/60723","article-title":"Compressing proteomes: The relevance of medium range correlations","volume":"2007","author":"Benedetto","year":"2007","journal-title":"EURASIP J. Bioinform. Syst. Biol."},{"key":"ref_29","unstructured":"Cao, M.D., Dix, T.I., Allison, L., and Mears, C. (2007, January 27\u201329). A simple statistical algorithm for biological sequence compression. Proceedings of the 2007 Data Compression Conference (DCC\u201907), Snowbird, UT, USA."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"i283","DOI":"10.1093\/bioinformatics\/btt214","article-title":"Compressive genomics for protein databases","volume":"29","author":"Daniels","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_31","first-page":"1","article-title":"Adaptive dictionary-based compression of protein sequences","volume":"5","author":"Nag","year":"2017","journal-title":"Int. J. Educ. Manag. Eng."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pratas, D., Hosseini, M., and Pinho, A.J. (2018, January 20\u201322). Compression of amino acid sequences. Proceedings of the International Conference on Practical Applications of Computational Biology & Bioinformatics, Toledo, Spain.","DOI":"10.1007\/978-3-319-98702-6_13"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"3826","DOI":"10.1093\/bioinformatics\/btz144","article-title":"Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences","volume":"35","author":"Kryukov","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1093\/bioinformatics\/bty619","article-title":"CoMSA: Compression of protein multiple sequence alignment files","volume":"35","author":"Deorowicz","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Fulber-Garcia, V., and Sardi Mergen, S.L. (2020). LUISA: Decoupling the Frequency Model From the Context Model in Prediction-Based Compression. Comput. J.","DOI":"10.1093\/comjnl\/bxaa074"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generalization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Netw."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"4675","DOI":"10.1093\/bioinformatics\/btaa572","article-title":"Allowing mutations in maximal matches boosts genome compression performance","volume":"36","author":"Liu","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Pratas, D., Hosseini, M., and Pinho, A.J. (2017, January 21\u201323). Substitutional tolerant Markov models for relative compression of DNA sequences. Proceedings of the International Conference on Practical Applications of Computational Biology & Bioinformatics, Porto, Portugal.","DOI":"10.1007\/978-3-319-60816-7_32"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Goyal, M., Tatwawadi, K., Chandak, S., and Ochoa, I. (2018). DeepZip: Lossless Data Compression using Recurrent Neural Networks. arXiv.","DOI":"10.1109\/DCC.2019.00087"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/S0092-8240(89)80049-7","article-title":"Stochastic models for heterogeneous DNA sequences","volume":"51","author":"Churchill","year":"1989","journal-title":"Bull. Math. Biol."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Lara-Ben\u00edtez, P., Carranza-Garc\u00eda, M., Mart\u00ednez-\u00c1lvarez, F., and Riquelme, J.C. (2020). On the performance of deep learning models for time series classification in streaming. arXiv.","DOI":"10.1007\/978-3-030-57802-2_14"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1016\/j.procs.2018.05.050","article-title":"NSE stock market prediction using deep-learning models","volume":"132","author":"Hiransha","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.neucom.2018.09.098","article-title":"Hierarchical temporal memory and recurrent neural networks for time series prediction: An empirical validation and reduction to multilayer perceptrons","volume":"396","author":"Struye","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1007\/s10618-019-00619-1","article-title":"Deep learning for time series classification: A review","volume":"33","author":"Fawaz","year":"2019","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, Z., Yan, W., and Oates, T. (2017, January 14\u201319). Time series classification from scratch with deep neural networks: A strong baseline. Proceedings of the 2017 International joint conference on neural networks (IJCNN), Anchorage, AK, USA.","DOI":"10.1109\/IJCNN.2017.7966039"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lin, T., Guo, T., and Aberer, K. (2017, January 19\u201325). Hybrid neural networks for learning the trend in time series. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia.","DOI":"10.24963\/ijcai.2017\/316"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1007\/s11600-020-00446-9","article-title":"Modelling reference evapotranspiration by combining neuro-fuzzy and evolutionary strategies","volume":"68","author":"Alizamir","year":"2020","journal-title":"Acta Geophys."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1016\/j.procs.2019.08.046","article-title":"A non-iterative neural-like framework for missing data imputation","volume":"155","author":"Tkachenko","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Tkachenko, R., and Izonin, I. (2018, January 18\u201320). Model and principles for the implementation of neural-like structures based on geometric data transformations. Proceedings of the International Conference on Computer Science, Engineering and Education Applications, Kiev, Ukraine.","DOI":"10.1007\/978-3-319-91008-6_58"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1162\/NECO_a_00793","article-title":"An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification","volume":"28","author":"Rojas","year":"2016","journal-title":"Neural Comput."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Silva, M., Pratas, D., and Pinho, A.J. (2020). Efficient DNA sequence compression with neural networks. GigaScience, 9.","DOI":"10.1093\/gigascience\/giaa119"},{"key":"ref_53","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"LeCun, Y.A., Bottou, L., Orr, G.B., and M\u00fcller, K.R. (2012). Efficient backprop. Neural Networks: Tricks of the Trade, Springer.","DOI":"10.1007\/978-3-642-35289-8_3"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Ferreira, P.J., and Pinho, A.J. (2014, January 4\u20139). Compression-based normal similarity measures for DNA sequences. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853630"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"giaa072","DOI":"10.1093\/gigascience\/giaa072","article-title":"Sequence Compression Benchmark (SCB) database\u2014A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences","volume":"9","author":"Kryukov","year":"2020","journal-title":"GigaScience"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: A worldwide hub of protein knowledge","volume":"47","author":"Consortium","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"ref_60","first-page":"D682","article-title":"Ensembl 2020","volume":"48","author":"Yates","year":"2020","journal-title":"Nucleic Acids Res."},{"key":"ref_61","unstructured":"Mahoney, M. (2020, October 18). Big Block BWT. Available online: http:\/\/mattmahoney.net\/dc\/#bbb."},{"key":"ref_62","unstructured":"Pavlov, I. (2021, April 23). Lzma Sdk (Software Development Kit). Available online: https:\/\/www.7-zip.org\/sdk.html."},{"key":"ref_63","unstructured":"Knoll, B. (2020, January 23). CMIX. Available online: http:\/\/www.byronknoll.com\/cmix.html."},{"key":"ref_64","unstructured":"(2020, May 06). BFLOAT16\u2014Hardware Numerics Definition. Available online: https:\/\/software.intel.com\/sites\/default\/files\/managed\/40\/8b\/bf16-hardware-numerics-definition-white-paper.pdf."},{"key":"ref_65","unstructured":"(2020, August 19). IBM Reveals Next-Generation IBM POWER10 Processor. Available online: https:\/\/newsroom.ibm.com\/2020-08-17-IBM-Reveals-Next-Generation-IBM-POWER10-Processor."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"100535","DOI":"10.1016\/j.softx.2020.100535","article-title":"GTO: A toolkit to unify pipelines in genomic and proteomic research","volume":"12","author":"Almeida","year":"2020","journal-title":"SoftwareX"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1038\/nrg705","article-title":"Segmental duplications and the evolution of the primate genome","volume":"3","author":"Samonte","year":"2002","journal-title":"Nat. Rev. Genet."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-2-r28","article-title":"Hominoid chromosomal rearrangements on 17q map to complex regions of segmental duplication","volume":"9","author":"Cardone","year":"2008","journal-title":"Genome Biol."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1038\/s41586-020-2012-7","article-title":"A pneumonia outbreak associated with a new coronavirus of probable bat origin","volume":"579","author":"Zhou","year":"2020","journal-title":"Nature"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Xia, X. (2018). Bioinformatics and the Cell: Modern Computational Approaches in Genomics, Proteomics and Transcriptomics, Springer.","DOI":"10.1007\/978-3-319-90684-3"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/s41586-020-2008-3","article-title":"A new coronavirus associated with human respiratory disease in China","volume":"579","author":"Wu","year":"2020","journal-title":"Nature"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1016\/S0140-6736(20)30185-9","article-title":"A novel coronavirus outbreak of global health concern","volume":"395","author":"Wang","year":"2020","journal-title":"Lancet"},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.3389\/fmed.2020.607786","article-title":"Effects of environmental factors on severity and mortality of COVID-19","volume":"7","author":"Kifer","year":"2021","journal-title":"Front. Med."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Rusanen, J., Kareinen, L., Levanov, L., Mero, S., Pakkanen, S.H., Kantele, A., Amanat, F., Krammer, F., Hedman, K., and Vapalahti, O. (2021). A 10-Minute \u201cMix and Read\u201d Antibody Assay for SARS-CoV-2. Viruses, 13.","DOI":"10.3390\/v13020143"},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1056\/NEJMc2032195","article-title":"Durability of responses after SARS-CoV-2 mRNA-1273 vaccination","volume":"384","author":"Widge","year":"2021","journal-title":"N. Engl. J. Med."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"952","DOI":"10.1016\/S0140-6736(21)00370-6","article-title":"SARS-CoV-2 variants and ending the COVID-19 pandemic","volume":"397","author":"Fontanet","year":"2021","journal-title":"Lancet"},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"112906","DOI":"10.1016\/j.jim.2020.112906","article-title":"SARS-CoV-2 variants lacking a functional ORF8 may reduce accuracy of serological testing","volume":"488","author":"Pereira","year":"2021","journal-title":"J. Immunol. Methods"},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"1346","DOI":"10.1016\/j.cub.2020.03.022","article-title":"Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak","volume":"30","author":"Zhang","year":"2020","journal-title":"Curr. Biol."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"veaa098","DOI":"10.1093\/ve\/veaa098","article-title":"Synonymous mutations and the molecular evolution of SARS-Cov-2 origins","volume":"7","author":"Wang","year":"2021","journal-title":"Virus Evol."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1038\/s41586-020-2313-x","article-title":"Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins","volume":"583","author":"Xiao","year":"2020","journal-title":"Nature"},{"key":"ref_81","first-page":"1","article-title":"Evidence for SARS-CoV-2 related coronaviruses circulating in bats and pangolins in Southeast Asia","volume":"12","author":"Wacharapluesadee","year":"2021","journal-title":"Nat. Commun."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1038\/d41586-020-01989-z","article-title":"Six months of coronavirus: The mysteries scientists are still racing to solve","volume":"583","author":"Callaway","year":"2020","journal-title":"Nature"},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"1407","DOI":"10.1109\/18.681318","article-title":"Information distance","volume":"44","author":"Bennett","year":"1998","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"3250","DOI":"10.1109\/TIT.2004.838101","article-title":"The similarity metric","volume":"50","author":"Li","year":"2004","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Nikvand, N., Wang, Z., Farjow, W., Fernando, X., and Sadat-Nejad, S.Y. (2019, January 3\u20136). Perceptually Inspired Normalized Conditional Compression Distance. Proceedings of the 2019 53rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA.","DOI":"10.1109\/IEEECONF44664.2019.9048741"},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-19996-z","article-title":"Genetic architecture of host proteins involved in SARS-CoV-2 infection","volume":"11","author":"Pietzner","year":"2020","journal-title":"Nat. Commun."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.3201\/eid1911.131172","article-title":"Middle East respiratory syndrome coronavirus in bats, Saudi Arabia","volume":"19","author":"Memish","year":"2013","journal-title":"Emerg. Infect. Dis."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1128\/CMR.00023-07","article-title":"Severe acute respiratory syndrome coronavirus as an agent of emerging and reemerging infection","volume":"20","author":"Cheng","year":"2007","journal-title":"Clin. Microbiol. Rev."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1016\/j.tim.2020.04.001","article-title":"Pangolins harbor SARS-CoV-2-related coronaviruses","volume":"28","author":"Han","year":"2020","journal-title":"Trends Microbiol."},{"key":"ref_90","first-page":"1","article-title":"Three approaches to the quantitative definition of information","volume":"1","author":"Kolmogorov","year":"1965","journal-title":"Probl. Inf. Transm."},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Silva, J.M., Pinho, E., Matos, S., and Pratas, D. (2020). Statistical Complexity Analysis of Turing Machine tapes with Fixed Algorithmic Complexity Using the Best-Order Markov Model. Entropy, 22.","DOI":"10.3390\/e22010105"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/5\/530\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:52:48Z","timestamp":1760161968000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/5\/530"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,26]]},"references-count":91,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["e23050530"],"URL":"https:\/\/doi.org\/10.3390\/e23050530","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,26]]}}}