{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:56:34Z","timestamp":1760237794399,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2020,6,24]],"date-time":"2020-06-24T00:00:00Z","timestamp":1592956800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression: Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.<\/jats:p>","DOI":"10.3390\/a13060151","type":"journal-article","created":{"date-parts":[[2020,6,24]],"date-time":"2020-06-24T08:54:50Z","timestamp":1592988890000},"page":"151","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Compression of Next-Generation Sequencing Data and of DNA Digital Files"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1960-9986","authenticated-orcid":false,"given":"Bruno","family":"Carpentieri","sequence":"first","affiliation":[{"name":"Dipartimento di Informatica, Universit\u00e0 di Salerno; Via Giovanni Paolo II, 132-84084 Fisciano (SA), Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,6,24]]},"reference":[{"key":"ref_1","unstructured":"(2020, May 02). International HapMap Project, Available online: https:\/\/www.genome.gov\/10001688\/international-hapmap-project."},{"key":"ref_2","unstructured":"(2020, May 02). 1000 Genomes: A Deep Catalog of Human Genetic Variation. Available online: https:\/\/www.internationalgenome.org\/."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1186\/s13059-019-1763-7","article-title":"Challenges in funding and developing genomic software: Roots and remedies","volume":"20","author":"Siepel","year":"2019","journal-title":"Genome Boil."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1146\/annurev-biodatasci-072018-021229","article-title":"Genomic data compression","volume":"2","author":"Hernaez","year":"2019","journal-title":"Annu. Rev. Biomed. Data Sci."},{"key":"ref_5","unstructured":"Carpentieri, B. Next Generation Sequencing Data and its Compression. IOP Conference Series, Proceedings of the 5th World Multidisciplinary Earth Sciences Symposium (WMESS 2019), Prague, Czech Republic, 9\u201313 September 2019, IOP Publishing."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"e171","DOI":"10.1093\/nar\/gks754","article-title":"Compression of next-generation sequencing reads aided by highly efficient de novo assembly","volume":"40","author":"Jones","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1093\/bib\/bbq016","article-title":"Challenges of sequencing human genomes","volume":"11","author":"Koboldt","year":"2010","journal-title":"Brief. Bioinform"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCF tools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bonfield, J., and Mahoney, M.V. (2013). Compression of FASTQ and SAM format sequencing data. PLoS ONE, 8.","DOI":"10.1371\/journal.pone.0059190"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/S0020-0255(01)00104-9","article-title":"LZ-based image compression","volume":"135","author":"Rizzo","year":"2001","journal-title":"Inf. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"76","DOI":"10.3390\/a5010076","article-title":"Visualization, band ordering and compression of hyperspectral images","volume":"5","author":"Pizzolante","year":"2012","journal-title":"Algorithms"},{"key":"ref_13","unstructured":"(2020, May 02). gzip. Available online: https:\/\/www.gzip.org\/."},{"key":"ref_14","unstructured":"(2020, May 02). bzip2. Available online: http:\/\/www.bzip.org\/."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btu390","article-title":"The Scramble conversion tool","volume":"30","author":"Bonfield","year":"2014","journal-title":"Bioinformatics"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1082","DOI":"10.1038\/nmeth.3133","article-title":"DeeZ: Reference-based compression by local assembly","volume":"11","author":"Hach","year":"2014","journal-title":"Nat. Methods"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/nbt.2241","article-title":"Compressive genomics","volume":"30","author":"Loh","year":"2012","journal-title":"Nat. Biotechnol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"i283","DOI":"10.1093\/bioinformatics\/btt214","article-title":"Compressive genomics for protein databases","volume":"29","author":"Daniels","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_19","unstructured":"(2020, May 26). Quip. Available online: https:\/\/homes.cs.washington.edu\/~dcjones\/quip\/."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Pizzolante, R., and Carpentieri, B. (2013, January 1\u20133). Lossless, low-complexity, compression of three-dimensional volumetric medical images via linear prediction. Proceedings of the 18th International Conference on Digital Signal Processing (DSP), Fira, Greece.","DOI":"10.1109\/ICDSP.2013.6622763"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pizzolante, R., Castiglione, A., Carpentieri, B., De Santis, A., and Castiglione, A. (2014, January 10\u201312). Protection of Microscopy Images through Digital Watermarking Techniques. Proceedings of the International Conference on Intelligent Networking and Collaborative Systems, Salerno, Italy.","DOI":"10.1109\/INCoS.2014.116"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2893474","article-title":"On-board format-independent security of functional magnetic resonance images","volume":"16","author":"Castiglione","year":"2017","journal-title":"ACM Trans. Embed. Comput. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1093\/bioinformatics\/btt594","article-title":"MFCompress: A compression tool for FASTA and multi-FASTA data","volume":"30","author":"Pinho","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_24","unstructured":"(2020, May 26). ALAPY. Available online: http:\/\/alapy.com\/services\/alapy-compressor\/."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3826","DOI":"10.1093\/bioinformatics\/btz144","article-title":"Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences","volume":"35","author":"Kryukov","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kryukov, K., Ueda, M.T., Nakagawa, S., and Imanishi, T. (2019). Sequence Compression Benchmark (SCB) database\u2014A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences. bioRxiv.","DOI":"10.1101\/642553"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"International nucleotide sequence database collaboration the sequence read archive","volume":"39","author":"Leinonen","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1038\/nprot.2017.148","article-title":"Genome-wide analysis of replication timing by next-generation sequencing with E\/L Repli-seq","volume":"13","author":"Marchal","year":"2018","journal-title":"Nat. Protoc."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1016\/j.cose.2017.06.003","article-title":"On the protection of consumer genomic data in the Internet of Living Things","volume":"74","author":"Pizzolante","year":"2018","journal-title":"Comput. Secur."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/6\/151\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:42:12Z","timestamp":1760175732000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/6\/151"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,24]]},"references-count":29,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["a13060151"],"URL":"https:\/\/doi.org\/10.3390\/a13060151","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2020,6,24]]}}}