{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T09:57:45Z","timestamp":1775037465328,"version":"3.50.1"},"reference-count":106,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T00:00:00Z","timestamp":1605139200000},"content-version":"vor","delay-in-days":11,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100006136","name":"Fuel Cell Technologies Program","doi-asserted-by":"publisher","award":["UIDB\/00127\/2020"],"award-info":[{"award-number":["UIDB\/00127\/2020"]}],"id":[{"id":"10.13039\/100006136","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006136","name":"Fuel Cell Technologies Program","doi-asserted-by":"publisher","award":["CI-CTTI-94-ARH\/2019"],"award-info":[{"award-number":["CI-CTTI-94-ARH\/2019"]}],"id":[{"id":"10.13039\/100006136","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,11,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Background<\/jats:title>\n                  <jats:p>The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA models. For this purpose, we created GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Findings<\/jats:title>\n                  <jats:p>We benchmark GeCo3 as a reference-free DNA compressor in 5 datasets, including a balanced and comprehensive dataset of DNA sequences, the Y-chromosome and human mitogenome, 2 compilations of archaeal and virus genomes, 4 whole genomes, and 2 collections of FASTQ data of a human virome and ancient DNA. GeCo3 achieves a solid improvement in compression over the previous version (GeCo2) of $2.4\\%$, $7.1\\%$, $6.1\\%$, $5.8\\%$, and $6.0\\%$, respectively. To test its performance as a reference-based DNA compressor, we benchmark GeCo3 in 4 datasets constituted by the pairwise compression of the chromosomes of the genomes of several primates. GeCo3 improves the compression in $12.4\\%$, $11.7\\%$, $10.8\\%$, and $10.1\\%$ over the state of the art. The cost of this compression improvement is some additional computational time (1.7\u20133 times slower than GeCo2). The RAM use is constant, and the tool scales efficiently, independently of the sequence size. Overall, these values outperform the state of the art.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>GeCo3 is a genomic sequence compressor with a neural network mixing approach that provides additional gains over top specific genomic compressors. The proposed mixing method is portable, requiring only the probabilities of the models as inputs, providing easy adaptation to other data compressors or compression-based data analysis tools. GeCo3 is released under GPLv3 and is available for free download at https:\/\/github.com\/cobilab\/geco3.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/gigascience\/giaa119","type":"journal-article","created":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T02:12:40Z","timestamp":1605147160000},"source":"Crossref","is-referenced-by-count":49,"title":["Efficient DNA sequence compression with neural networks"],"prefix":"10.1093","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7535-4933","authenticated-orcid":false,"given":"Milton","family":"Silva","sequence":"first","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro,","place":["Portugal"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1176-552X","authenticated-orcid":false,"given":"Diogo","family":"Pratas","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Virology, University of Helsinki , Haartmaninkatu 3, 00014 Helsinki,","place":["Finland"]}]},{"given":"Armando J","family":"Pinho","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro,","place":["Portugal"]},{"name":"Department of Electronics Telecommunications and Informatics, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro,","place":["Portugal"]}]}],"member":"286","published-online":{"date-parts":[[2020,11,11]]},"reference":[{"issue":"7","key":"2024111605542228700_bib1","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pbio.1002195","article-title":"Big data: astronomical or genomical?","volume":"13","author":"Stephens","year":"2015","journal-title":"PLoS Biol"},{"key":"2024111605542228700_bib2","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1109\/DCC.2016.60","article-title":"Efficient compression of genomic sequences","volume-title":"2016 Data Compression Conference (DCC)","author":"Pratas","year":"2016"},{"key":"2024111605542228700_bib3","first-page":"137","article-title":"GeCo2: An optimized tool for lossless compression and analysis of DNA sequences","volume-title":"International Conference on Practical Applications of Computational Biology and Bioinformatics","author":"Pratas","year":"2019"},{"key":"2024111605542228700_bib4","volume-title":"Data Compression Explained","author":"Mahoney","year":"2010\u20132012"},{"key":"2024111605542228700_bib5","first-page":"265","article-title":"Substitutional tolerant Markov models for relative compression of DNA sequences","volume-title":"International Conference on Practical Applications of Computational Biology and Bioinformatics","author":"Pratas","year":"2017"},{"issue":"3","key":"2024111605542228700_bib6","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/MCAS.2006.1688199","article-title":"Ensemble based systems in decision making","volume":"6","author":"Polikar","year":"2006","journal-title":"IEEE Circuits Syst Mag"},{"issue":"2","key":"2024111605542228700_bib7","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","article-title":"Stacked generalization","volume":"5","author":"Wolpert","year":"1992","journal-title":"Neural Netw"},{"key":"2024111605542228700_bib8","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1109\/SAI.2014.6918213","article-title":"A survey of feature selection and feature extraction techniques in machine learning","volume-title":"2014 Science and Information Conference","author":"Khalid","year":"2014"},{"issue":"5","key":"2024111605542228700_bib9","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Fritz","year":"2011","journal-title":"Genome Res"},{"issue":"13","key":"2024111605542228700_bib10","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btp117","article-title":"Textual data compression in computational biology: A synopsis","volume":"25","author":"Giancarlo","year":"2009","journal-title":"Bioinformatics"},{"issue":"1","key":"2024111605542228700_bib11","doi-asserted-by":"crossref","first-page":"10203","DOI":"10.1038\/srep10203","article-title":"An alignment-free method to find and visualise rearrangements between pairs of DNA sequences","volume":"5","author":"Pratas","year":"2015","journal-title":"Sci Rep"},{"key":"2024111605542228700_bib12","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.23919\/EUSIPCO.2018.8553297","article-title":"Metagenomic composition analysis of sedimentary ancient DNA from the Isle of Wight","volume-title":"2018 26th European Signal Processing Conference (EUSIPCO)","author":"Pratas","year":"2018"},{"issue":"5","key":"2024111605542228700_bib13","doi-asserted-by":"crossref","first-page":"1339","DOI":"10.1099\/ijsem.0.001814","article-title":"Pedobacter lusitanus sp. nov., isolated from sludge of a deactivated uranium mine","volume":"67","author":"Covas","year":"2017","journal-title":"Int J Syst Evol Microbiol"},{"issue":"3","key":"2024111605542228700_bib14","doi-asserted-by":"crossref","first-page":"e00265","DOI":"10.1128\/mSphere.00265-20","article-title":"HERQ-9 is a new multiplex PCR for differentiation and quantification of all nine human herpesviruses","volume":"5","author":"Py\u00f6ri\u00e4","year":"2020","journal-title":"Msphere"},{"key":"2024111605542228700_bib15","doi-asserted-by":"crossref","first-page":"102353","DOI":"10.1016\/j.fsigen.2020.102353","article-title":"The landscape of persistent human DNA viruses in femoral bone","volume":"48","author":"Toppinen","year":"2020","journal-title":"Forensic Sci Int Genet"},{"issue":"24","key":"2024111605542228700_bib16","doi-asserted-by":"crossref","first-page":"3407","DOI":"10.1016\/j.cub.2016.10.061","article-title":"17th century variola virus reveals the recent history of smallpox","volume":"26","author":"Duggan","year":"2016","journal-title":"Curr Biol"},{"key":"2024111605542228700_bib17","doi-asserted-by":"crossref","first-page":"207","DOI":"10.3389\/fmars.2016.00207","article-title":"A catalogue of marine biodiversity indicators","volume":"3","author":"Teixeira","year":"2016","journal-title":"Front Mar Sci"},{"key":"2024111605542228700_bib18","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.mib.2015.05.005","article-title":"Metagenomics of extreme environments","volume":"25","author":"Cowan","year":"2015","journal-title":"Curr Opin Microbiol"},{"issue":"7","key":"2024111605542228700_bib19","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1016\/S0169-5347(01)02187-5","article-title":"Chromosomal rearrangements and speciation","volume":"16","author":"Rieseberg","year":"2001","journal-title":"Trends Ecology Evol"},{"issue":"1","key":"2024111605542228700_bib20","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/0092-8674(80)90131-2","article-title":"DNA rearrangements associated with a transposable element in yeast","volume":"21","author":"Roeder","year":"1980","journal-title":"Cell"},{"key":"2024111605542228700_bib21","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1186\/s13323-014-0017-4","article-title":"Editors\u2019 Pick: Contamination has always been the issue!","volume":"5","author":"Sajantila","year":"2014","journal-title":"Investig Genet"},{"issue":"11","key":"2024111605542228700_bib22","doi-asserted-by":"crossref","first-page":"3439","DOI":"10.1073\/pnas.1418652112","article-title":"Evidence for recent, population-specific evolution of the human mutation rate","volume":"112","author":"Harris","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024111605542228700_bib23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.gde.2014.06.011","article-title":"Adaptations to local environments in modern human populations","volume":"29","author":"Jeong","year":"2014","journal-title":"Curr Opin Genet Dev"},{"key":"2024111605542228700_bib24","first-page":"00403","article-title":"Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes","author":"Beres","year":"2016","journal-title":"mBio"},{"key":"2024111605542228700_bib25","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.coi.2014.05.001","article-title":"Human genome variability, natural selection and infectious diseases","volume":"30","author":"Fumagalli","year":"2014","journal-title":"Curr Opin Immunol"},{"issue":"2","key":"2024111605542228700_bib26","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/s41559-017-0425-y","article-title":"Evolutionary determinants of genome-wide nucleotide composition","volume":"2","author":"Long","year":"2018","journal-title":"Nat Ecol Evol"},{"key":"2024111605542228700_bib27","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780199349524.001.0001","volume-title":"Foundations of Info-Metrics: Modeling and Inference with Imperfect Information","author":"Golan","year":"2017"},{"key":"2024111605542228700_bib28","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1146\/annurev-biodatasci-072018-021229","article-title":"Genomic data compression","volume":"2","author":"Hernaez","year":"2019","journal-title":"Annu Rev Biomed Data Sci"},{"issue":"4","key":"2024111605542228700_bib29","doi-asserted-by":"crossref","first-page":"56","DOI":"10.3390\/info7040056","article-title":"A survey on data compression methods for biological sequences","volume":"7","author":"Hosseini","year":"2016","journal-title":"Information"},{"key":"2024111605542228700_bib30","first-page":"340","article-title":"Compression of DNA sequences","volume-title":"DCC '93: Data Compression Conference, Snowbird, UT","author":"Grumbach","year":"1993"},{"issue":"6","key":"2024111605542228700_bib31","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/0306-4573(94)90014-0","article-title":"A new challenge for compression algorithms: genetic sequences","volume":"30","author":"Grumbach","year":"1994","journal-title":"Inf Process Manag"},{"key":"2024111605542228700_bib32","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1109\/DCC.1996.488385","article-title":"A guaranteed compression scheme for repetitive DNA sequences","volume-title":"DCC '96: Data Compression Conference, Snowbird, UT","author":"Rivals","year":"1996"},{"key":"2024111605542228700_bib33","first-page":"125","article-title":"Significantly lower entropy estimates for natural DNA sequences","volume-title":"J Comput Biol","author":"Loewenstern","year":"1999"},{"key":"2024111605542228700_bib34","first-page":"8","article-title":"Compression of strings with approximate repeats","volume-title":"Proc Int Conf Intell Syst Mol Biol","author":"Allison","year":"1998"},{"key":"2024111605542228700_bib35","first-page":"143","article-title":"Compression of biological sequences by greedy off-line textual substitution","volume-title":"DCC '00: Proceedings of the Conference on Data Compression","author":"Apostolico","year":"2000"},{"issue":"12","key":"2024111605542228700_bib36","first-page":"1696","article-title":"DNACompress: Fast and effective DNA sequence compression","volume":"18","author":"Chen","year":"2002"},{"key":"2024111605542228700_bib37","first-page":"43","article-title":"Biological sequence compression algorithms","volume-title":"Genome Informatics 2000: Proc. of the 11th Workshop, Tokyo","author":"Matsumoto","year":"2000"},{"key":"2024111605542228700_bib38","first-page":"253","article-title":"DNA sequence compression using the normalized maximum likelihood model for discrete regression","volume-title":"DCC '03: Proceedings of the Conference on Data Compression","author":"Tabus","year":"2003"},{"issue":"1","key":"2024111605542228700_bib39","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/1055709.1055711","article-title":"An efficient normalized maximum likelihood algorithm for DNA sequence compression","volume":"23","author":"Korodi","year":"2005","journal-title":"ACM Trans Inf Syst"},{"key":"2024111605542228700_bib40","volume-title":"Grammar-based compression of DNA sequences","author":"Cherniavsky","year":"2004"},{"key":"2024111605542228700_bib41","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1002\/spe.619","article-title":"A simple and fast DNA compressor","volume":"34","author":"Manzini","year":"2004","journal-title":"J Softw Pract Exp"},{"key":"2024111605542228700_bib43","doi-asserted-by":"crossref","DOI":"10.1007\/11496656_17","article-title":"DNA compression challenge revisited: A dynamic programming approach","volume-title":"Combinatorial Pattern Matching: Proc. of CPM-2005","author":"Behzadi","year":"2005"},{"key":"2024111605542228700_bib44","first-page":"43","article-title":"A simple statistical algorithm for biological sequence compression","volume-title":"2007 Data Compression Conference (DCC'07), Snowbird, UT","author":"Cao","year":"2007"},{"key":"2024111605542228700_bib45","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bap013","article-title":"Differential direct coding: a compression algorithm for nucleotide sequence data","volume":"2009","author":"Vey","year":"2009","journal-title":"Database (Oxford)"},{"issue":"1","key":"2024111605542228700_bib46","first-page":"39","article-title":"An efficient horizontal and vertical method for online dna sequence compression","volume":"3","author":"Mishra","year":"2010","journal-title":"Int J Comput Appl"},{"key":"2024111605542228700_bib47","first-page":"25","article-title":"GENBIT Compress-Algorithm for repetitive and non repetitive DNA sequences","volume":"2","author":"Rajeswari","year":"2010","journal-title":"Int J Comput Sci Inf Technol"},{"issue":"3","key":"2024111605542228700_bib48","first-page":"245","article-title":"A novel approach for compressing DNA sequences using semi-statistical compressor","volume":"33","author":"Gupta","year":"2011","journal-title":"Int J Comput Appl"},{"issue":"5","key":"2024111605542228700_bib49","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1109\/TEVC.2011.2160399","article-title":"DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm","volume":"15","author":"Zhu","year":"2011","journal-title":"IEEE Trans Evol Comput"},{"key":"2024111605542228700_bib50","first-page":"125","article-title":"Bacteria DNA sequence compression using a mixture of finite-context models","volume-title":"Proc. of the IEEE Workshop on Statistical Signal Processing, Nice, France","author":"Pinho","year":"2011"},{"issue":"6","key":"2024111605542228700_bib51","doi-asserted-by":"crossref","first-page":"e21588","DOI":"10.1371\/journal.pone.0021588","article-title":"On the representability of complete genomes by multiple competing finite-context (Markov) models","volume":"6","author":"Pinho","year":"2011","journal-title":"PLoS One"},{"key":"2024111605542228700_bib52","first-page":"1209.5905","article-title":"An efficient biological sequence compression technique using lut and repeat in the sequence","author":"Roy","year":"2012","journal-title":"arXiv"},{"key":"2024111605542228700_bib53","article-title":"GenCodex - A novel algorithm for compressing DNA sequences on multi-cores and GPUs","volume-title":"Proc. IEEE, 19th International Conf. on High Performance Computing (HiPC), Pune, India","author":"Satyanvesh","year":"2012"},{"issue":"4","key":"2024111605542228700_bib54","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1007\/s12038-012-9230-6","article-title":"BIND\u2013An algorithm for loss-less compression of nucleotide sequence data","volume":"37","author":"Bose","year":"2012","journal-title":"J Biosci"},{"issue":"11","key":"2024111605542228700_bib55","first-page":"e80377","article-title":"DNA-COMPACT: DNA compression based on a pattern-aware contextual modeling technique","volume":"8","author":"Li","year":"2013"},{"key":"2024111605542228700_bib56","first-page":"2395","article-title":"Exploring deep Markov models in genomic data compression using sequence pre-analysis","volume-title":"22nd European Signal Processing Conference (EUSIPCO), Lisbon","author":"Pratas","year":"2014"},{"issue":"4","key":"2024111605542228700_bib57","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.ygeno.2014.08.007","article-title":"SeqCompress: An algorithm for biological sequence compression","volume":"104","author":"Sardaraz","year":"2014","journal-title":"Genomics"},{"key":"2024111605542228700_bib58","first-page":"29","article-title":"Genome compression based on Hilbert space filling curve","volume-title":"Proceedings of the 3rd International Conference on Management, Education, Information and Control (MEICI 2015), Shenyang, China","author":"Guo","year":"2015"},{"issue":"6","key":"2024111605542228700_bib59","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1109\/TCBB.2015.2430331","article-title":"CoGI: Towards compressing genomes as an imag","volume":"12","author":"Xie","year":"2015","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"2","key":"2024111605542228700_bib60","doi-asserted-by":"publisher","DOI":"10.4238\/gmr16026784","article-title":"Genome sequence compression based on optimized context weighting","volume":"16","author":"Chen","year":"2017","journal-title":"Genet Mol Res"},{"key":"2024111605542228700_bib61","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1109\/ICENCO.2017.8289802","article-title":"Improve the compression of bacterial DNA sequence","volume-title":"2017 13th International Computer Engineering Conference (ICENCO)","author":"Bakr","year":"2017"},{"key":"2024111605542228700_bib62","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1007\/978-3-030-04239-4_34","article-title":"One-Bit DNA Compression Algorithm","volume-title":"International Conference on Neural Information Processing","author":"Mansouri","year":"2018"},{"key":"2024111605542228700_bib63","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1109\/BIBM.2018.8621140","article-title":"DeepDNA: A hybrid convolutional and recurrent neural network for compressing human mitochondrial genomes","volume-title":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Wang","year":"2018"},{"issue":"1","key":"2024111605542228700_bib64","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1186\/s40246-019-0225-3","article-title":"Human mitochondrial genome compression using machine learning techniques","volume":"13","author":"Wang","year":"2019","journal-title":"Hum Genomics"},{"issue":"11","key":"2024111605542228700_bib65","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.3390\/e21111074","article-title":"A reference-free lossless compression algorithm for DNA sequences using a competitive prediction of two classes of weighted models","volume":"21","author":"Pratas","year":"2019","journal-title":"Entropy"},{"issue":"19","key":"2024111605542228700_bib66","doi-asserted-by":"crossref","first-page":"2527","DOI":"10.1093\/bioinformatics\/bts467","article-title":"DELIMINATE\u2014A fast and efficient method for loss-less compression of genomic sequences: sequence analysis","volume":"28","author":"Mohammed","year":"2012","journal-title":"Bioinformatics"},{"issue":"1","key":"2024111605542228700_bib67","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1093\/bioinformatics\/btt594","article-title":"MFCompress: A compression tool for FASTA and multi-FASTA data","volume":"30","author":"Pinho","year":"2014","journal-title":"Bioinformatics"},{"issue":"19","key":"2024111605542228700_bib68","doi-asserted-by":"crossref","first-page":"3826","DOI":"10.1093\/bioinformatics\/btz144","article-title":"Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences","volume":"35","author":"Kryukov","year":"2019","journal-title":"Bioinformatics"},{"issue":"2","key":"2024111605542228700_bib69","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1093\/bioinformatics\/btn582","article-title":"Human genomes as email attachments","volume":"25","author":"Christley","year":"2009","journal-title":"Bioinformatics"},{"issue":"14","key":"2024111605542228700_bib70","doi-asserted-by":"crossref","first-page":"1731","DOI":"10.1093\/bioinformatics\/btp319","article-title":"Data structures and compression algorithms for genomic sequence data","volume":"25","author":"Brandon","year":"2009","journal-title":"Bioinformatics"},{"issue":"5","key":"2024111605542228700_bib71","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1093\/bioinformatics\/btu698","article-title":"iDoComp: A compression scheme for assembled genomes","volume":"31","author":"Ochoa","year":"2015","journal-title":"Bioinformatics"},{"key":"2024111605542228700_bib72","doi-asserted-by":"crossref","first-page":"11565","DOI":"10.1038\/srep11565","article-title":"GDC 2: Compression of large collections of genomes","volume":"5","author":"Deorowicz","year":"2015","journal-title":"Sci Rep"},{"key":"2024111605542228700_bib73","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/978-3-642-16321-0_20","article-title":"Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval","volume-title":"International Symposium on String Processing and Information Retrieval","author":"Kuruppu","year":"2010"},{"issue":"7","key":"2024111605542228700_bib74","doi-asserted-by":"crossref","first-page":"e45","DOI":"10.1093\/nar\/gkr009","article-title":"A novel compression tool for efficient storage of genome resequencing data","volume":"39","author":"Wang","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2024111605542228700_bib75","first-page":"91","article-title":"Optimized relative Lempel-Ziv compression of genomes","volume-title":"Proceedings of the Thirty-Fourth Australasian Computer Science Conference-Volume 113","author":"Kuruppu","year":"2011"},{"issue":"21","key":"2024111605542228700_bib76","doi-asserted-by":"crossref","first-page":"2979","DOI":"10.1093\/bioinformatics\/btr505","article-title":"Robust relative compression of genomes with random access","volume":"27","author":"Deorowicz","year":"2011","journal-title":"Bioinformatics"},{"issue":"4","key":"2024111605542228700_bib77","doi-asserted-by":"crossref","first-page":"e27","DOI":"10.1093\/nar\/gkr1124","article-title":"GReEn: A tool for efficient compression of genome resequencing data","volume":"40","author":"Pinho","year":"2012","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"2024111605542228700_bib78","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1109\/TCBB.2013.122","article-title":"FRESCO: Referential compression of highly similar sequences","volume":"10","author":"Wandelt","year":"2013","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"21","key":"2024111605542228700_bib79","doi-asserted-by":"crossref","first-page":"3364","DOI":"10.1093\/bioinformatics\/btx412","article-title":"High-speed and high-ratio referential genome compression","volume":"33","author":"Liu","year":"2017","journal-title":"Bioinformatics"},{"key":"2024111605542228700_bib80","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/DCC.2017.50","article-title":"Complementary contextual models with FM-Index for DNA compression","volume-title":"2017 Data Compression Conference (DCC)","author":"Fan","year":"2017"},{"key":"2024111605542228700_bib81","doi-asserted-by":"publisher","DOI":"10.1155\/2019\/3108950","article-title":"HRCM: An efficient hybrid referential compression method for genomic big data","volume":"2019","author":"Yao","year":"2019","journal-title":"BioMed Res Int"},{"key":"2024111605542228700_bib82","author":"Byron"},{"key":"2024111605542228700_bib83","first-page":"1811.08162","article-title":"DeepZip: Lossless data compression using recurrent neural networks","author":"Goyal","year":"2018","journal-title":"arXiv"},{"key":"2024111605542228700_bib84","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/BdKCSE48644.2019.9010661","article-title":"A fast reference-free genome compression using deep neural networks","volume-title":"2019 Big Data, Knowledge and Control Systems Engineering (BdKCSE), Sofia, Bulgaria","author":"Absardi","year":"2019"},{"issue":"3","key":"2024111605542228700_bib85","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann Math Stat"},{"key":"2024111605542228700_bib86","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1016\/j.procs.2018.05.050","article-title":"NSE stock market prediction using deep-learning models","volume":"132","author":"Hiransha","year":"2018","journal-title":"Procedia Comput Sci"},{"key":"2024111605542228700_bib87","first-page":"249","article-title":"Understanding the difficulty of training deep feedforward neural networks","volume-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics","author":"Glorot","year":"2010"},{"key":"2024111605542228700_bib88","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/978-3-642-35289-8_3","article-title":"Efficient backprop","volume-title":"Neural Networks: Tricks of the trade","author":"LeCun","year":"2012"},{"key":"2024111605542228700_bib89","doi-asserted-by":"crossref","first-page":"giaa086","DOI":"10.1093\/gigascience\/giaa086","article-title":"A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level","volume":"9","author":"Pratas","year":"2020","journal-title":"Gigascience"},{"issue":"6104","key":"2024111605542228700_bib90","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1126\/science.1224344","article-title":"A high-coverage genome sequence from an archaic Denisovan individual","volume":"338","author":"Meyer","year":"2012","journal-title":"Science"},{"issue":"7","key":"2024111605542228700_bib91","doi-asserted-by":"crossref","first-page":"giaa072","DOI":"10.1093\/gigascience\/giaa072","article-title":"Sequence Compression Benchmark (SCB) database\u2014A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences","volume":"9","author":"Kryukov","year":"2020","journal-title":"Gigascience"},{"key":"2024111605542228700_bib92","first-page":"208","article-title":"A DNA sequence corpus for compression benchmark","volume-title":"International Conference on Practical Applications of Computational Biology and Bioinformatics","author":"Pratas","year":"2018"},{"issue":"20","key":"2024111605542228700_bib93","doi-asserted-by":"crossref","first-page":"9051","DOI":"10.1073\/pnas.88.20.9051","article-title":"Origin of human chromosome 2: An ancestral telomere-telomere fusion","volume":"88","author":"Ijdo","year":"1991","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024111605542228700_bib94","author":"Hagedoorn","journal-title":"AMD Ryzen 5 3600 review - Power Consumption and temperatures"},{"key":"2024111605542228700_bib95"},{"key":"2024111605542228700_bib96"},{"key":"2024111605542228700_bib97"},{"issue":"17","key":"2024111605542228700_bib98","doi-asserted-by":"crossref","first-page":"4325","DOI":"10.1073\/pnas.1720115115","article-title":"Earth BioGenome Project: Sequencing life for the future of life","volume":"115","author":"Lewin","year":"2018","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024111605542228700_bib99","first-page":"259","article-title":"On the approximation of the Kolmogorov complexity for DNA sequences","volume-title":"Iberian Conference on Pattern Recognition and Image Analysis","author":"Pratas","year":"2017"},{"issue":"5","key":"2024111605542228700_bib100","doi-asserted-by":"publisher","DOI":"10.1093\/gigascience\/giaa048","article-title":"Smash++: An alignment-free and memory-efficient tool to find genomic rearrangements","volume":"9","author":"Hosseini","year":"2020","journal-title":"Gigascience"},{"issue":"4","key":"2024111605542228700_bib101","doi-asserted-by":"crossref","first-page":"1523","DOI":"10.1109\/TIT.2005.844059","article-title":"Clustering by compression","volume":"51","author":"Cilibrasi","year":"2005","journal-title":"IEEE Trans Inf Theor"},{"key":"2024111605542228700_bib102","first-page":"cs\/0111054","article-title":"The similarity metric","author":"Li","year":"2008","journal-title":"arXiv"},{"key":"2024111605542228700_bib103"},{"key":"2024111605542228700_bib104"},{"key":"2024111605542228700_bib105","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1016\/j.neucom.2004.04.002","article-title":"Artificial neural networks for non-stationary time series","volume":"61","author":"Kim","year":"2004","journal-title":"Neurocomputing"},{"issue":"20","key":"2024111605542228700_bib106","doi-asserted-by":"crossref","first-page":"638","DOI":"10.1186\/s12859-019-3205-7","article-title":"Read-SpaM: Assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage","volume":"20","author":"Lau","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2024111605542228700_bib107","doi-asserted-by":"crossref","unstructured":"Silva M, Pratas D, Pinho AJ. Supporting data for \u201cEfficient DNA sequence compression with neural networks.\u201d. GigaScience Database; 2020. 10.5524\/100808","DOI":"10.1093\/gigascience\/giaa119"}],"container-title":["GigaScience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/gigascience\/article-pdf\/9\/11\/giaa119\/60688371\/giaa119.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gigascience\/article-pdf\/9\/11\/giaa119\/60688371\/giaa119.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,16]],"date-time":"2024-11-16T05:55:12Z","timestamp":1731736512000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/gigascience\/article\/doi\/10.1093\/gigascience\/giaa119\/5974977"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11]]},"references-count":106,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2020,11,11]]}},"URL":"https:\/\/doi.org\/10.1093\/gigascience\/giaa119","relation":{},"ISSN":["2047-217X"],"issn-type":[{"value":"2047-217X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,11]]},"published":{"date-parts":[[2020,11]]},"article-number":"giaa119"}}