{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T06:16:36Z","timestamp":1772950596925,"version":"3.50.1"},"reference-count":67,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2020,7,13]],"date-time":"2020-07-13T00:00:00Z","timestamp":1594598400000},"content-version":"vor","delay-in-days":12,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Genome Canada Large-Scale Applied Research Project"},{"name":"National Science and Engineering Research Council of Canada"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Code and dataset are available at https:\/\/github.com\/DongjoonLim\/EvoLSTM.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa447","type":"journal-article","created":{"date-parts":[[2020,7,1]],"date-time":"2020-07-01T19:11:27Z","timestamp":1593630687000},"page":"i353-i361","source":"Crossref","is-referenced-by-count":6,"title":["EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM"],"prefix":"10.1093","volume":"36","author":[{"given":"Dongjoon","family":"Lim","sequence":"first","affiliation":[{"name":"School of Computer Science, McGill University , Montreal, Quebec H3A 0G4, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9555-860X","authenticated-orcid":false,"given":"Mathieu","family":"Blanchette","sequence":"additional","affiliation":[{"name":"School of Computer Science, McGill University , Montreal, Quebec H3A 0G4, Canada"}]}],"member":"286","published-online":{"date-parts":[[2020,7,13]]},"reference":[{"key":"2024021913350355400_btaa447-B1","article-title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems","author":"Abadi","year":"2015"},{"key":"2024021913350355400_btaa447-B2","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1038\/ng.3511","article-title":"An expanded sequence context model broadly explains variability in polymorphism levels across the human genome","volume":"48","author":"Aggarwala","year":"2016","journal-title":"Nat. Genet"},{"key":"2024021913350355400_btaa447-B3","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1093\/molbev\/msz023","article-title":"Signals of variation in human mutation rate at multiple levels of sequence context","volume":"36","author":"Aikens","year":"2019","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B4","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2024021913350355400_btaa447-B5","doi-asserted-by":"crossref","first-page":"319","DOI":"10.3389\/fgene.2015.00319","article-title":"Trends in substitution models of molecular evolution","volume":"6","author":"Arenas","year":"2015","journal-title":"Front. Genet"},{"key":"2024021913350355400_btaa447-B6","doi-asserted-by":"crossref","first-page":"2322","DOI":"10.1093\/bioinformatics\/bti376","article-title":"Identification and measurement of neighbor-dependent nucleotide substitution processes","volume":"21","author":"Arndt","year":"2005","journal-title":"Bioinformatics"},{"key":"2024021913350355400_btaa447-B7","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1089\/10665270360688039","article-title":"DNA sequence evolution with neighbor-dependent mutation","volume":"10","author":"Arndt","year":"2003","journal-title":"J. Comput. Biol"},{"key":"2024021913350355400_btaa447-B8","doi-asserted-by":"crossref","first-page":"1283","DOI":"10.1126\/science.287.5456.1283","article-title":"Evidence for a high frequency of simultaneous double-nucleotide substitutions","volume":"287","author":"Averof","year":"2000","journal-title":"Science"},{"key":"2024021913350355400_btaa447-B9","article-title":"Neural machine translation by jointly learning to align and translate","author":"Bahdanau","year":"2014"},{"key":"2024021913350355400_btaa447-B10","doi-asserted-by":"crossref","first-page":"2923","DOI":"10.1128\/JCM.38.8.2923-2928.2000","article-title":"Mapping of IS6110 insertion sites in two epidemic strains of Mycobacterium tuberculosis","volume":"38","author":"Beggs","year":"2000","journal-title":"J. Clin. Microbiol"},{"key":"2024021913350355400_btaa447-B11","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1093\/nar\/8.7.1499","article-title":"DNA methylation and the frequency of CpG in animal DNA","volume":"8","author":"Bird","year":"1980","journal-title":"Nucleic Acids Res"},{"key":"2024021913350355400_btaa447-B12","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1101\/gr.1933104","article-title":"Aligning multiple genomic sequences with the threaded blockset aligner","volume":"14","author":"Blanchette","year":"2004","journal-title":"Genome Res"},{"key":"2024021913350355400_btaa447-B13","doi-asserted-by":"crossref","first-page":"2412","DOI":"10.1101\/gr.2800104","article-title":"Reconstructing large regions of an ancestral mammalian genome in silico","volume":"14","author":"Blanchette","year":"2004","journal-title":"Genome Res"},{"key":"2024021913350355400_btaa447-B14","doi-asserted-by":"crossref","first-page":"1769","DOI":"10.1093\/molbev\/mss056","article-title":"Inferring divergence of context-dependent substitution rates in drosophila genomes with applications to comparative genomics","volume":"29","author":"Chachick","year":"2012","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B15","doi-asserted-by":"crossref","DOI":"10.3115\/v1\/D14-1179","article-title":"Learning phrase representations using RNN encoder-decoder for statistical machine translation","author":"Cho","year":"2014"},{"key":"2024021913350355400_btaa447-B16","article-title":"Keras","author":"Chollet","year":"2015"},{"key":"2024021913350355400_btaa447-B17","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"2024021913350355400_btaa447-B18","first-page":"2745","article-title":"Mean field variational approximation for continuous-time Bayesian networks","volume":"11","author":"Cohn","year":"2010","journal-title":"J. Mach. Learn. Res"},{"key":"2024021913350355400_btaa447-B19","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1038\/nrg1603","article-title":"Phylogenomics and the reconstruction of the tree of life","volume":"6","author":"Delsuc","year":"2005","journal-title":"Nat. Rev. Genet"},{"key":"2024021913350355400_btaa447-B20","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1093\/bioinformatics\/btp600","article-title":"Ancestors 1.0: a web server for ancestral sequence reconstruction","volume":"26","author":"Diallo","year":"2010","journal-title":"Bioinformatics"},{"key":"2024021913350355400_btaa447-B21","doi-asserted-by":"crossref","first-page":"2077","DOI":"10.1101\/gr.174920.114","article-title":"Alignathon: a competitive assessment of whole-genome alignment methods","volume":"24","author":"Earl","year":"2014","journal-title":"Genome Res"},{"key":"2024021913350355400_btaa447-B22","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024021913350355400_btaa447-B23","article-title":"Evolver","author":"Edgar","year":"2019"},{"key":"2024021913350355400_btaa447-B24","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1126\/science.6262918","article-title":"5-methylcytosine in eukaryotic DNA","volume":"212","author":"Ehrlich","year":"1981","journal-title":"Science"},{"key":"2024021913350355400_btaa447-B25","doi-asserted-by":"crossref","first-page":"12777","DOI":"10.1074\/jbc.M112297200","article-title":"Transcription-coupled DNA repair is genomic context-dependent","volume":"277","author":"Feng","year":"2002","journal-title":"J. Biol. Chem"},{"key":"2024021913350355400_btaa447-B26","doi-asserted-by":"crossref","first-page":"1879","DOI":"10.1093\/molbev\/msp098","article-title":"INDELible: a flexible simulator of biological sequence evolution","volume":"26","author":"Fletcher","year":"2009","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B27","first-page":"2451","article-title":"Learning to forget: continual prediction with LSTM","author":"Gers","year":"2000"},{"key":"2024021913350355400_btaa447-B28","first-page":"725","article-title":"A codon-based model of nucleotide substitution for protein-coding DNA sequences","volume":"11","author":"Goldman","year":"1994","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B29","doi-asserted-by":"crossref","first-page":"2222","DOI":"10.1109\/TNNLS.2016.2582924","article-title":"LSTM: a search space odyssey","volume":"28","author":"Greff","year":"2017","journal-title":"IEEE Trans. Neural Networks Learn. Syst"},{"key":"2024021913350355400_btaa447-B30","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1038\/ng.2684","article-title":"An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions","volume":"45","author":"Haudry","year":"2013","journal-title":"Nat. Genet"},{"key":"2024021913350355400_btaa447-B31","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1038\/nrg3729","article-title":"Mechanisms underlying mutational signatures in human cancers","volume":"15","author":"Helleday","year":"2014","journal-title":"Nat. Rev. Genet"},{"key":"2024021913350355400_btaa447-B32","first-page":"1449","article-title":"A probabilistic model for sequence alignment with context-sensitive indels","author":"Hickey","year":"2011"},{"key":"2024021913350355400_btaa447-B33","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1186\/1471-2105-5-166","article-title":"A probabilistic model for the evolution of RNA structure","volume":"5","author":"Holmes","year":"2004","journal-title":"BMC Bioinform"},{"key":"2024021913350355400_btaa447-B34","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.gene.2004.02.043","article-title":"Cytosine methylation and cpg, tpg (cpa) and tpa frequencies","volume":"333","author":"Jabbari","year":"2004","journal-title":"Gene"},{"key":"2024021913350355400_btaa447-B35","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1007\/s42452-019-0611-4","article-title":"Using deep reinforcement learning approach for solving the multiple sequence alignment problem","volume":"1","author":"Jafari","year":"2019","journal-title":"SN Appl. Sci"},{"key":"2024021913350355400_btaa447-B36","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1239\/aap\/1013540176","article-title":"Probabilistic models of DNA sequence evolution with context dependent rates of substitution","volume":"32","author":"Jensen","year":"2000","journal-title":"Adv. Appl. Prob"},{"key":"2024021913350355400_btaa447-B37","first-page":"132","article-title":"Evolution of protein molecules","volume":"3","author":"Jukes","year":"1969","journal-title":"Mammalian Protein Metab"},{"key":"2024021913350355400_btaa447-B38","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/BF01731581","article-title":"A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences","volume":"16","author":"Kimura","year":"1980","journal-title":"J. Mol. Evol"},{"key":"2024021913350355400_btaa447-B39","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014"},{"key":"2024021913350355400_btaa447-B40","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1186\/1471-2105-6-298","article-title":"Kalign\u2014an accurate and fast multiple sequence alignment algorithm","volume":"6","author":"Lassmann","year":"2005","journal-title":"BMC Bioinform"},{"key":"2024021913350355400_btaa447-B41","doi-asserted-by":"crossref","first-page":"893","DOI":"10.1093\/molbev\/msz248","article-title":"A Bayesian framework for inferring the influence of sequence context on point mutations","volume":"37","author":"Ling","year":"2019","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B42","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1038\/nrg3890","article-title":"The effects of chromatin organization on variation in mutation rates in the genome","volume":"16","author":"Makova","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2024021913350355400_btaa447-B43","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1093\/molbev\/msm035","article-title":"The majority of recent short DNA insertions in the human genome are tandem duplications","volume":"24","author":"Messer","year":"2007","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B44","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013"},{"key":"2024021913350355400_btaa447-B45","doi-asserted-by":"crossref","first-page":"1797","DOI":"10.1101\/gr.6761107","article-title":"28-way vertebrate alignment and conservation track in the UCSC genome browser","volume":"17","author":"Miller","year":"2007","journal-title":"Genome Res"},{"key":"2024021913350355400_btaa447-B46","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-62524-9_6","article-title":"A reinforcement learning based approach to multiple sequence alignment","author":"Mircea","year":"2018"},{"key":"2024021913350355400_btaa447-B47","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1007\/s00239-002-2430-1","article-title":"The role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast DNA","volume":"56","author":"Morton","year":"2003","journal-title":"J. Mol. Evol"},{"key":"2024021913350355400_btaa447-B48","first-page":"807","article-title":"Rectified linear units improve restricted Boltzmann machines","author":"Nair","year":"2010"},{"key":"2024021913350355400_btaa447-B49","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2024021913350355400_btaa447-B50","article-title":"Neural machine translation and sequence-to-sequence models: a tutorial","author":"Neubig","year":"2017"},{"key":"2024021913350355400_btaa447-B51","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1093\/bioinformatics\/btm076","article-title":"Cobalt: constraint-based alignment tool for multiple protein sequences","volume":"23","author":"Papadopoulos","year":"2007","journal-title":"Bioinformatics"},{"key":"2024021913350355400_btaa447-B52","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"Fasttree 2\u2014approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2024021913350355400_btaa447-B53","first-page":"61","article-title":"Rlalign: a reinforcement learning approach for multiple sequence alignment","author":"Ramakrishnan","year":"2018"},{"key":"2024021913350355400_btaa447-B54","doi-asserted-by":"crossref","first-page":"e22594","DOI":"10.1371\/journal.pone.0022594","article-title":"MACSE: multiple alignment of coding sequences accounting for frameshifts and stop codons","volume":"6","author":"Ranwez","year":"2011","journal-title":"PLoS One"},{"key":"2024021913350355400_btaa447-B55","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.gene.2004.12.011","article-title":"Site interdependence attributed to tertiary structure in amino acid sequence evolution","volume":"347","author":"Rodrigue","year":"2005","journal-title":"Gene"},{"key":"2024021913350355400_btaa447-B56","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1101\/gr.10.4.577","article-title":"Pipmaker\u2014a web server for aligning two genomic DNA sequences","volume":"10","author":"Schwartz","year":"2000","journal-title":"Genome Res"},{"key":"2024021913350355400_btaa447-B57","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1093\/molbev\/msh039","article-title":"Phylogenetic estimation of context-dependent substitution rates by maximum likelihood","volume":"21","author":"Siepel","year":"2003","journal-title":"Mol. Biol. Evol"},{"key":"2024021913350355400_btaa447-B58","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2024021913350355400_btaa447-B59","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1093\/bioinformatics\/14.2.157","article-title":"Rose: generating sequence families","volume":"14","author":"Stoye","year":"1998","journal-title":"Bioinformatics"},{"key":"2024021913350355400_btaa447-B60","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2012-65","article-title":"LSTM neural networks for language modeling","author":"Sundermeyer","year":"2012"},{"key":"2024021913350355400_btaa447-B61","doi-asserted-by":"crossref","first-page":"10571","DOI":"10.1073\/pnas.162278199","article-title":"Clusters of transcription-coupled repair in the human genome","volume":"99","author":"Surrall\u00e9s","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2024021913350355400_btaa447-B62","article-title":"Sequence to sequence learning with neural networks","author":"Sutskever","year":"2014"},{"key":"2024021913350355400_btaa447-B63","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1007\/BF02193625","article-title":"An evolutionary model for maximum likelihood alignment of DNA sequences","volume":"33","author":"Thorne","year":"1991","journal-title":"J. Mol. Evol"},{"key":"2024021913350355400_btaa447-B64","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2024021913350355400_btaa447-B65","doi-asserted-by":"crossref","first-page":"3169","DOI":"10.1099\/00221287-145-11-3169","article-title":"Context-sensitive transposition of IS6110 in mycobacteria","volume":"145","author":"Wall","year":"1999","journal-title":"Microbiology"},{"key":"2024021913350355400_btaa447-B66","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1038\/s41580-018-0016-z","article-title":"Dynamics and function of DNA methylation in plants","volume":"19","author":"Zhang","year":"2018","journal-title":"Nat. Rev. Mol. Cell Biol"},{"key":"2024021913350355400_btaa447-B67","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1534\/genetics.116.195677","article-title":"Statistical methods for identifying sequence motifs affecting point mutations","volume":"205","author":"Zhu","year":"2017","journal-title":"Genetics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i353\/56702721\/bioinformatics_36_supplement1_i353.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_1\/i353\/56702721\/bioinformatics_36_supplement1_i353.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T13:45:15Z","timestamp":1708350315000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_1\/i353\/5870475"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,1]]},"references-count":67,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2020,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa447","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,7]]},"published":{"date-parts":[[2020,7,1]]}}}