{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T12:28:30Z","timestamp":1768912110055,"version":"3.49.0"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T00:00:00Z","timestamp":1697673600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Multiple sequence alignment (MSA) is one of the hotspots of current research and is commonly used in sequence analysis scenarios. However, there is no lasting solution for MSA because it is a Nondeterministic Polynomially complete problem, and the existing methods still have room to improve the accuracy.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose Deep reinforcement learning with Positional encoding and self-Attention for MSA, based on deep reinforcement learning, to enhance the accuracy of the alignment Specifically, inspired by the translation technique in natural language processing, we introduce self-attention and positional encoding to improve accuracy and reliability. Firstly, positional encoding encodes the position of the sequence to prevent the loss of nucleotide position information. Secondly, the self-attention model is used to extract the key features of the sequence. Then input the features into a multi-layer perceptron, which can calculate the insertion position of the gap according to the features. In addition, a novel reinforcement learning environment is designed to convert the classic progressive alignment into progressive column alignment, gradually generating each column\u2019s sub-alignment. Finally, merge the sub-alignment into the complete alignment. Extensive experiments based on several datasets validate our method\u2019s effectiveness for MSA, outperforming some state-of-the-art methods in terms of the Sum-of-pairs and Column scores.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The process is implemented in Python and available as open-source software from https:\/\/github.com\/ZhangLab312\/DPAMSA.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad636","type":"journal-article","created":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T16:59:39Z","timestamp":1697734779000},"source":"Crossref","is-referenced-by-count":6,"title":["Multiple sequence alignment based on deep reinforcement learning with self-attention and positional encoding"],"prefix":"10.1093","volume":"39","author":[{"given":"Yuhang","family":"Liu","sequence":"first","affiliation":[{"name":"School of Computer Science, Chengdu University of Information Technology , Chengdu 610225, China"}]},{"given":"Hao","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Computer Science, Chengdu University of Information Technology , Chengdu 610225, China"}]},{"given":"Qiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Chengdu University of Information Technology , Chengdu 610225, China"}]},{"given":"Zixuan","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Electronics and Information Engineering, Sichuan University , Chengdu 610065, China"}]},{"given":"Shuwen","family":"Xiong","sequence":"additional","affiliation":[{"name":"School of Computer Science, Chengdu University of Information Technology , Chengdu 610225, China"}]},{"given":"Naifeng","family":"Wen","sequence":"additional","affiliation":[{"name":"School of Mechanical and Electrical Engineering, Dalian Minzu University , Dalian 116600, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3422-8305","authenticated-orcid":false,"given":"Yongqing","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Chengdu University of Information Technology , Chengdu 610225, China"}]}],"member":"286","published-online":{"date-parts":[[2023,10,19]]},"reference":[{"key":"2023110706585317500_btad636-B1","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1137\/0149012","article-title":"Trees, stars, and multiple biological sequence alignment","volume":"49","author":"Altschul","year":"1989","journal-title":"SIAM J Appl Math"},{"key":"2023110706585317500_btad636-B2","doi-asserted-by":"crossref","first-page":"e0127431","DOI":"10.1371\/journal.pone.0127431","article-title":"Quantifying the displacement of mismatches in multiple sequence alignment benchmarks","volume":"10","author":"Bawono","year":"2015","journal-title":"PLoS One"},{"key":"2023110706585317500_btad636-B3","doi-asserted-by":"crossref","first-page":"12543","DOI":"10.1038\/s41598-017-13083-y","article-title":"Tm-aligner: multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy","volume":"7","author":"Bhat","year":"2017","journal-title":"Sci Rep"},{"key":"2023110706585317500_btad636-B4","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1093\/bib\/bbv099","article-title":"Multiple sequence alignment modeling: methods and applications","volume":"17","author":"Chatzou","year":"2016","journal-title":"Brief Bioinform"},{"key":"2023110706585317500_btad636-B5","doi-asserted-by":"crossref","first-page":"15871","DOI":"10.1007\/s00500-020-04917-5","article-title":"A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm","volume":"24","author":"Chowdhury","year":"2020","journal-title":"Soft Comput"},{"key":"2023110706585317500_btad636-B6","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"Muscle: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023110706585317500_btad636-B7","doi-asserted-by":"crossref","first-page":"4291","DOI":"10.1109\/TNNLS.2020.3019893","article-title":"Attention in natural language processing","volume":"32","author":"Galassi","year":"2021","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"2023110706585317500_btad636-B8","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/978-1-0716-1036-7_6","article-title":"Multiple sequence alignment computation using the t-coffee regressive algorithm implementation","volume":"2231","author":"Garriga","year":"2021","journal-title":"Methods Mol Biol"},{"key":"2023110706585317500_btad636-B9","first-page":"571","author":"Hussein","year":"2019"},{"key":"2023110706585317500_btad636-B10","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1007\/s42452-019-0611-4","article-title":"Using deep reinforcement learning approach for solving the multiple sequence alignment problem","volume":"1","author":"Jafari","year":"2019","journal-title":"SN Appl Sci"},{"key":"2023110706585317500_btad636-B11","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement learning: a survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"JAIR"},{"key":"2023110706585317500_btad636-B12","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1002\/jmv.27515","article-title":"Omicron variant genome evolution and phylogenetics","volume":"94","author":"Kandeel","year":"2022","journal-title":"J Med Virol"},{"key":"2023110706585317500_btad636-B13","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023110706585317500_btad636-B14","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.1093\/bioinformatics\/btz795","article-title":"Kalign 3: multiple sequence alignment of large datasets","volume":"36","author":"Lassmann","year":"2019","journal-title":"Bioinformatics"},{"key":"2023110706585317500_btad636-B15","doi-asserted-by":"crossref","first-page":"1763","DOI":"10.1093\/bioinformatics\/bty851","article-title":"VIRULIGN: fast codon-correct alignment and annotation of viral genomes","volume":"35","author":"Libin","year":"2019","journal-title":"Bioinformatics"},{"key":"2023110706585317500_btad636-B16","doi-asserted-by":"crossref","first-page":"1958","DOI":"10.1093\/bioinformatics\/btq338","article-title":"MSAProbs: multiple sequence alignment based on pair hidden markov models and partition function posterior probabilities","volume":"26","author":"Liu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023110706585317500_btad636-B17","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1186\/s12859-021-04442-8","article-title":"ProPIP: a tool for progressive multiple sequence alignment with poisson indel process","volume":"22","author":"Maiolo","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023110706585317500_btad636-B18","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1089\/cmb.2014.0156","article-title":"PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences","volume":"22","author":"Mirarab","year":"2015","journal-title":"J Comput Biol"},{"key":"2023110706585317500_btad636-B19","first-page":"51","author":"Mircea","year":"2015"},{"key":"2023110706585317500_btad636-B20","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"2023110706585317500_btad636-B21","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1093\/bioinformatics\/btaa743","article-title":"ViralMSA: massively scalable reference-guided multiple sequence alignment of viral genomes","volume":"37","author":"Moshiri","year":"2021","journal-title":"Bioinformatics"},{"key":"2023110706585317500_btad636-B22","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.neucom.2021.03.091","article-title":"A review on the attention mechanism of deep learning","volume":"452","author":"Niu","year":"2021","journal-title":"Neurocomputing"},{"key":"2023110706585317500_btad636-B23","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J Mol Biol"},{"key":"2023110706585317500_btad636-B24","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/1471-2105-4-47","article-title":"OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy","volume":"4","author":"Raghava","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023110706585317500_btad636-B25","first-page":"61","author":"Ramakrishnan","year":"2018"},{"key":"2023110706585317500_btad636-B26","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1093\/nar\/28.1.231","article-title":"SMART: a web-based tool for the study of genetically mobile domains","volume":"28","author":"Schultz","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023110706585317500_btad636-B27","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol Syst Biol"},{"key":"2023110706585317500_btad636-B28","first-page":"3999","author":"Takase","year":"2019"},{"key":"2023110706585317500_btad636-B29","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023110706585317500_btad636-B30","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1002\/prot.20527","article-title":"Balibase 3.0: latest developments of the multiple sequence alignment benchmark","volume":"61","author":"Thompson","year":"2005","journal-title":"Proteins"},{"key":"2023110706585317500_btad636-B31","first-page":"3104","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inform Process Syst"},{"key":"2023110706585317500_btad636-B32","doi-asserted-by":"crossref","first-page":"1305","DOI":"10.1007\/s10529-020-02914-0","article-title":"Small design from big alignment: engineering proteins with multiple sequence alignment as the starting point","volume":"42","author":"Wang","year":"2020","journal-title":"Biotechnol Lett"},{"key":"2023110706585317500_btad636-B33","first-page":"7354","author":"Zhang","year":"2019"},{"key":"2023110706585317500_btad636-B34","doi-asserted-by":"crossref","first-page":"bbac069","DOI":"10.1093\/bib\/bbac069","article-title":"A survey on the algorithm and development of multiple sequence alignment","volume":"23","author":"Zhang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023110706585317500_btad636-B35","first-page":"213","article-title":"Deep reinforcement learning for power system applications: an overview","volume":"6","author":"Zhang","year":"2020","journal-title":"CSEE J Power Energy Syst"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad636\/52267437\/btad636.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad636\/52771912\/btad636.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad636\/52771912\/btad636.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,7]],"date-time":"2023-11-07T06:59:21Z","timestamp":1699340361000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad636\/7323576"}},"subtitle":[],"editor":[{"given":"Peter","family":"Robinson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,10,19]]},"references-count":35,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad636","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,10,19]]},"article-number":"btad636"}}