{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T21:03:16Z","timestamp":1775077396329,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2019,11,18]],"date-time":"2019-11-18T00:00:00Z","timestamp":1574035200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["GM083107"],"award-info":[{"award-number":["GM083107"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["GM116960"],"award-info":[{"award-number":["GM116960"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["AI134678"],"award-info":[{"award-number":["AI134678"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI1564756"],"award-info":[{"award-number":["DBI1564756"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS1901191"],"award-info":[{"award-number":["IIS1901191"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/zhanglab.ccmb.med.umich.edu\/DeepMSA\/.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz863","type":"journal-article","created":{"date-parts":[[2019,11,15]],"date-time":"2019-11-15T20:10:56Z","timestamp":1573848656000},"page":"2105-2112","source":"Crossref","is-referenced-by-count":184,"title":["DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7290-1324","authenticated-orcid":false,"given":"Chengxin","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2984-9003","authenticated-orcid":false,"given":"Wei","family":"Zheng","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109, USA"}]},{"given":"S M","family":"Mortuza","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109, USA"}]},{"given":"Yang","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109, USA"},{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology , Nanjing 210094, China"}]},{"given":"Yang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109, USA"},{"name":"Department of Biological Chemistry, University of Michigan , Ann Arbor, MI 48109, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,11,18]]},"reference":[{"key":"2023062312014893000_btz863-B1","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1093\/bioinformatics\/btx781","article-title":"DNCON2: improved protein contact prediction using two-level deep convolutional neural networks","volume":"34","author":"Adhikari","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023062312014893000_btz863-B3","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1002\/prot.25379","article-title":"Improved protein contact predictions with the MetaPSICOV2 server in CASP12","volume":"86","author":"Buchan","year":"2018","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B4","doi-asserted-by":"crossref","first-page":"31865","DOI":"10.1038\/srep31865","article-title":"FFPred 3: feature-based function prediction for all Gene Ontology domains","volume":"6","author":"Cozzetto","year":"2016","journal-title":"Sci. Rep"},{"key":"2023062312014893000_btz863-B5","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B6","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1093\/bioinformatics\/bty523","article-title":"The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis","volume":"35","author":"Gil","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B7","doi-asserted-by":"crossref","first-page":"4039","DOI":"10.1093\/bioinformatics\/bty481","article-title":"Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks","volume":"34","author":"Hanson","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B8","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1186\/1471-2105-14-248","article-title":"kClust: fast and sensitive clustering of large protein sequence databases","volume":"14","author":"Hauser","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023062312014893000_btz863-B9","doi-asserted-by":"crossref","first-page":"2296","DOI":"10.1093\/bioinformatics\/btx164","article-title":"NeBcon: protein contact map prediction using neural network training coupled with na\u00efve Bayes classifiers","volume":"33","author":"He","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B10","first-page":"1147","article-title":"SCOP, Structural Classification of Proteins Database: applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data","volume":"54","author":"Hubbard","year":"2010","journal-title":"Acta Cryst"},{"key":"2023062312014893000_btz863-B11","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1006\/jmbi.1999.3091","article-title":"Protein secondary structure prediction based on position-specific scoring matrices","volume":"292","author":"Jones","year":"1999","journal-title":"J. Mol. Biol"},{"key":"2023062312014893000_btz863-B12","doi-asserted-by":"crossref","first-page":"3308","DOI":"10.1093\/bioinformatics\/bty341","article-title":"High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features","volume":"34","author":"Jones","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B13","doi-asserted-by":"crossref","first-page":"1082","DOI":"10.1002\/prot.25798","article-title":"Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13","volume":"87","author":"Li","year":"2019","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B14","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.cels.2017.11.014","article-title":"Enhancing evolutionary couplings with deep convolutional neural networks","volume":"6","author":"Liu","year":"2018","journal-title":"Cell Syst"},{"key":"2023062312014893000_btz863-B15","first-page":"2677","article-title":"PconsC4: fast, free, easy, and accurate contact predictions","author":"Michel","year":"2018"},{"key":"2023062312014893000_btz863-B16","doi-asserted-by":"crossref","first-page":"D170","DOI":"10.1093\/nar\/gkw1081","article-title":"Uniclust databases of clustered and deeply annotated protein sequences and alignments","volume":"45","author":"Mirdita","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023062312014893000_btz863-B17","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1002\/prot.25390","article-title":"Protein structure prediction using Rosetta in CASP12","volume":"86 (Suppl. 1","author":"Ovchinnikov","year":"2018","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B18","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1126\/science.aah4043","article-title":"Protein structure determination using metagenome sequence data","volume":"355","author":"Ovchinnikov","year":"2017","journal-title":"Science"},{"key":"2023062312014893000_btz863-B19","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat. Methods"},{"key":"2023062312014893000_btz863-B20","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1002\/prot.25407","article-title":"Assessment of contact predictions in CASP12: co-evolution and deep learning coming of age","volume":"86 (Suppl. 1","author":"Schaarschmidt","year":"2018","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B21","doi-asserted-by":"crossref","first-page":"3128","DOI":"10.1093\/bioinformatics\/btu500","article-title":"CCMpred\u2014fast and precise prediction of protein residue\u2013residue contacts from correlated mutations","volume":"30","author":"Seemayer","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B22","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol"},{"key":"2023062312014893000_btz863-B23","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"Soding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B24","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023062312014893000_btz863-B25","doi-asserted-by":"crossref","first-page":"2542","DOI":"10.1038\/s41467-018-04964-5","article-title":"Clustering huge protein sequence sets in linear time","volume":"9","author":"Steinegger","year":"2018","journal-title":"Nat. Commun"},{"key":"2023062312014893000_btz863-B26","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B27","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/MCSE.2014.80","article-title":"XSEDE: accelerating scientific discovery","volume":"16","author":"Towns","year":"2014","journal-title":"Comput. Sci. Eng"},{"key":"2023062312014893000_btz863-B28","doi-asserted-by":"crossref","first-page":"e1005324","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023062312014893000_btz863-B29","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1186\/s13059-019-1823-z","article-title":"Fueling ab initio folding with marine microbiome enables structure and function predictions of new protein families","volume":"20","author":"Wang","year":"2019","journal-title":"Genome Biol"},{"key":"2023062312014893000_btz863-B30","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1016\/j.str.2011.05.004","article-title":"Improving protein structure prediction using multiple sequence-based contact predictions","volume":"19","author":"Wu","year":"2011","journal-title":"Structure"},{"key":"2023062312014893000_btz863-B31","doi-asserted-by":"crossref","first-page":"3375","DOI":"10.1093\/nar\/gkm251","article-title":"LOMETS: a local meta-threading-server for protein structure prediction","volume":"35","author":"Wu","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023062312014893000_btz863-B32","doi-asserted-by":"crossref","first-page":"e3400","DOI":"10.1371\/journal.pone.0003400","article-title":"ANGLOR: a composite machine-learning algorithm for protein backbone torsion angle prediction","volume":"3","author":"Wu","year":"2008","journal-title":"PLoS One"},{"key":"2023062312014893000_btz863-B33","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1002\/prot.21945","article-title":"MUSTER: improving protein sequence profile\u2013profile alignments by using multiple sources of structure information","volume":"72","author":"Wu","year":"2008","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B34","doi-asserted-by":"crossref","first-page":"2619","DOI":"10.1038\/srep02619","article-title":"A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction","volume":"3","author":"Yan","year":"2013","journal-title":"Sci. Rep"},{"key":"2023062312014893000_btz863-B35","doi-asserted-by":"crossref","first-page":"2588","DOI":"10.1093\/bioinformatics\/btt447","article-title":"Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment","volume":"29","author":"Yang","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062312014893000_btz863-B36","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1038\/nmeth.3213","article-title":"The I-TASSER Suite: protein structure and function prediction","volume":"12","author":"Yang","year":"2015","journal-title":"Nat. Methods"},{"key":"2023062312014893000_btz863-B37","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1002\/(SICI)1097-0134(19990201)34:2<220::AID-PROT7>3.0.CO;2-K","article-title":"A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment","volume":"34","author":"Zemla","year":"1999","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B38","doi-asserted-by":"crossref","first-page":"W291","DOI":"10.1093\/nar\/gkx366","article-title":"COFACTOR: improved protein function prediction by combining structure, sequence and protein\u2013protein interaction information","volume":"45","author":"Zhang","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023062312014893000_btz863-B39","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1002\/prot.25414","article-title":"Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12","volume":"86 (Suppl. 1","author":"Zhang","year":"2018","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B40","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Zhang","year":"2004","journal-title":"Proteins"},{"key":"2023062312014893000_btz863-B41","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023062312014893000_btz863-B42","doi-asserted-by":"crossref","first-page":"W429","DOI":"10.1093\/nar\/gkz384","article-title":"LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins","volume":"47","author":"Zheng","year":"2019","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz863\/30960243\/btz863.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/7\/2105\/50670279\/bioinformatics_36_7_2105.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/7\/2105\/50670279\/bioinformatics_36_7_2105.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T21:12:56Z","timestamp":1687641176000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/7\/2105\/5628221"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,11,18]]},"references-count":42,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2020,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz863","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,4,1]]},"published":{"date-parts":[[2019,11,18]]}}}