{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:46:58Z","timestamp":1778082418385,"version":"3.51.4"},"reference-count":60,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T00:00:00Z","timestamp":1587945600000},"content-version":"vor","delay-in-days":57,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T00:00:00Z","timestamp":1587945600000},"content-version":"tdm","delay-in-days":57,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"Emmy Noether program of the Deutsche Forschungsgemeinschaft"},{"name":"Foreign collaborative research study support by The Scientific and Technological Research Council of Turkey, TUBIITAK- BIDEB","award":["2214-A programme"],"award-info":[{"award-number":["2214-A programme"]}]},{"name":"Long Program Machine Learning for Physics and the Physics of Learning at the Institute for Pure and Applied Mathematics"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive <jats:italic>a priori<\/jats:italic> intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.<\/jats:p>","DOI":"10.1088\/2632-2153\/ab80b7","type":"journal-article","created":{"date-parts":[[2020,3,18]],"date-time":"2020-03-18T14:29:44Z","timestamp":1584541784000},"page":"015012","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":35,"title":["Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders"],"prefix":"10.1088","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9179-2458","authenticated-orcid":false,"given":"Yasemin","family":"Bozkurt Varolg\u00fcne\u015f","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9945-1271","authenticated-orcid":false,"given":"Tristan","family":"Bereau","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3403-640X","authenticated-orcid":false,"given":"Joseph F","family":"Rudzinski","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2020,4,27]]},"reference":[{"key":"mlstab80b7bib1","author":"Binder","year":"1995"},{"key":"mlstab80b7bib2","doi-asserted-by":"publisher","first-page":"646","DOI":"10.1038\/nsb0902-646","article-title":"Molecular dynamics simulations of biomolecules","volume":"9","author":"Karplus","year":"2002","journal-title":"Nat. Struct. Molecular Biol."},{"key":"mlstab80b7bib3","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1126\/science.aat4010","article-title":"Biophysical experiments and biomolecular simulations: A perfect match?","volume":"361","author":"Bottaro","year":"2018","journal-title":"Sci."},{"key":"mlstab80b7bib4","author":"Allen","year":"1987"},{"key":"mlstab80b7bib5","author":"Bellman","year":"2015"},{"key":"mlstab80b7bib6","doi-asserted-by":"publisher","first-page":"715","DOI":"10.4310\/CMS.2003.v1.n4.a5","article-title":"Equation-free, coarse-grained multiscale computation: Enabling mocroscopic simulators to perform system-level analysis","volume":"1","author":"Kevrekidis","year":"2003","journal-title":"Commun. Math. Sci."},{"key":"mlstab80b7bib7","doi-asserted-by":"publisher","first-page":"884","DOI":"10.1038\/nature02261","article-title":"Protein folding and misfolding","volume":"426","author":"Dobson","year":"2003","journal-title":"Nature"},{"key":"mlstab80b7bib8","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1016\/j.sbi.2004.01.009","article-title":"Theory of protein folding","volume":"14","author":"Onuchic","year":"2004","journal-title":"Curr. Opin. Struct. Biol."},{"key":"mlstab80b7bib9","first-page":"367","article-title":"Heterogeneous multiscale methods: a review","volume":"2","author":"Weinan","year":"2007","journal-title":"Commun. Comput. Phys."},{"key":"mlstab80b7bib10","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1146\/annurev-physchem-040215-112229","article-title":"Enhancing important fluctuations: Rare events and metadynamics from a conceptual viewpoint","volume":"67","author":"Valsson","year":"2016","journal-title":"Annu. Rev. Phys. Chem."},{"key":"mlstab80b7bib11","doi-asserted-by":"publisher","first-page":"2386","DOI":"10.1021\/jacs.7b12191","article-title":"Markov state models: From an art to a science","volume":"140","author":"Husic","year":"2018","journal-title":"J. Am. Chem. Soc."},{"key":"mlstab80b7bib12","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"London Edinburgh Dublin Phil. Mag. J. Sci."},{"key":"mlstab80b7bib13","doi-asserted-by":"publisher","first-page":"3634","DOI":"10.1103\/PhysRevLett.72.3634","article-title":"Separation of a mixture of independent signals using time delayed correlations","volume":"72","author":"Molgedey","year":"1994","journal-title":"Phys. Rev. Lett."},{"key":"mlstab80b7bib14","doi-asserted-by":"publisher","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","article-title":"A global geometric framework for nonlinear dimensionality reduction","volume":"290","author":"Tenenbaum","year":"2000","journal-title":"Science"},{"key":"mlstab80b7bib15","doi-asserted-by":"publisher","first-page":"03B624","DOI":"10.1063\/1.3569857","article-title":"Determination of reaction coordinates via locally scaled diffusion map","volume":"134","author":"Rohrdanz","year":"2011","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib16","doi-asserted-by":"publisher","first-page":"13023","DOI":"10.1073\/pnas.1108486108","article-title":"Simplifying the representation of complex free-energy landscapes using sketch-map","volume":"108","author":"Ceriotti","year":"2011","journal-title":"Proc. Natl. Acad. Sci."},{"key":"mlstab80b7bib17","doi-asserted-by":"publisher","first-page":"2079","DOI":"10.1002\/jcc.25520","article-title":"Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration","volume":"39","author":"Chen","year":"2018","journal-title":"J. Comput. Chem."},{"key":"mlstab80b7bib18","doi-asserted-by":"publisher","DOI":"10.1063\/1.5025487","article-title":"Reweighted autoencoded variational Bayes for enhanced sampling (RAVE)","volume":"149","author":"Ribeiro","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib19","doi-asserted-by":"publisher","first-page":"17641","DOI":"10.1073\/pnas.1907975116","article-title":"Neural networks based variationally enhanced sampling","volume":"116","author":"Bonati","year":"2019","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstab80b7bib20","doi-asserted-by":"publisher","first-page":"6769","DOI":"10.1021\/jp045546c","article-title":"Automatic method for identifying reaction coordinates in complex systems","volume":"109","author":"Ao","year":"2005","journal-title":"J. Phys. Chem. B"},{"key":"mlstab80b7bib21","doi-asserted-by":"publisher","DOI":"10.1063\/1.4825111","article-title":"Neural networks for local structure detection in polymorphic systems","volume":"139","author":"Geiger","year":"2013","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib22","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"key":"mlstab80b7bib23","article-title":"Dimensionality reduction methods for molecular simulations","author":"Doerr","year":"2017"},{"key":"mlstab80b7bib24","doi-asserted-by":"publisher","first-page":"1209","DOI":"10.1021\/acs.jctc.8b00975","article-title":"Encodermap: Dimensionality reduction and generation of molecule conformations","volume":"15","author":"Lemke","year":"2019","journal-title":"J. Chem. Theory Comput."},{"key":"mlstab80b7bib25","article-title":"Auto-encoding variational Bayes","author":"Kingma","year":"2013"},{"key":"mlstab80b7bib26","first-page":"50","article-title":"Discovering interpretable representations for both deep generative and discriminative models","author":"Adel","year":"2018"},{"key":"mlstab80b7bib27","doi-asserted-by":"publisher","DOI":"10.1063\/1.5011399","article-title":"Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics","volume":"148","author":"Wehmeyer","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib28","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.97.062412","article-title":"Variational encoding of complex dynamics","volume":"97","author":"Hern\u00e1ndez","year":"2018","journal-title":"Phys. Rev. E"},{"key":"mlstab80b7bib29","doi-asserted-by":"publisher","DOI":"10.1063\/1.5112048","article-title":"Capabilities and limitations of time-lagged autoencoders for slow mode discovery in dynamical systems","volume":"151","author":"Chen","year":"2019","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib30","article-title":"Deep unsupervised clustering with gaussian mixture variational autoencoders","author":"Dilokthanakul","year":"2016"},{"key":"mlstab80b7bib31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-017-02388-1","article-title":"VAMPnets for deep learning of molecular kinetics","volume":"9","author":"Mardt","year":"2018","journal-title":"Nat. Commun."},{"key":"mlstab80b7bib32","doi-asserted-by":"publisher","first-page":"4950","DOI":"10.1038\/s41467-018-07210-0","article-title":"Deep learning for universal linear embeddings of nonlinear dynamics","volume":"9","author":"Lusch","year":"2018","journal-title":"Nat. Commun."},{"key":"mlstab80b7bib33","author":""},{"key":"mlstab80b7bib34","author":"Shu","year":"2016"},{"key":"mlstab80b7bib35","first-page":"867","article-title":"Variational autoencoder with truncated mixture of gaussians for functional connectivity analysis","author":"Zhao","year":"2019"},{"key":"mlstab80b7bib36","article-title":"Approximate inference for deep latent gaussian mixtures","volume":"vol 2","author":"Nalisnick","year":"2016"},{"key":"mlstab80b7bib37","article-title":"Fixing Gaussian mixture VAEs for interpretable text generation","author":"Shi","year":"2019"},{"key":"mlstab80b7bib38","article-title":"TensorFlow: Large-scale machine learning on heterogeneous systems","author":"Abadi","year":"2015"},{"key":"mlstab80b7bib39","article-title":"Adam: A method for stochastic optimization","author":"Kingma","year":"2014"},{"key":"mlstab80b7bib40","author":"Bowman","year":"2014"},{"key":"mlstab80b7bib41","doi-asserted-by":"publisher","DOI":"10.1063\/1.3565032","article-title":"Markov models of molecular kinetics: Generation and validation","volume":"134","author":"Prinz","year":"2011","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib42","doi-asserted-by":"publisher","first-page":"07B604_1","DOI":"10.1063\/1.4811489","article-title":"Identification of slow molecular order parameters for Markov model construction","volume":"139","author":"P\u00e9rez-Hern\u00e1ndez","year":"2013","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib43","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s11634-013-0134-6","article-title":"Fuzzy spectral clustering by PCCA+: Application to Markov state models and data classification","volume":"7","author":"R\u00f6blitz","year":"2013","journal-title":"Adv. Data Anal. Classif."},{"key":"mlstab80b7bib44","doi-asserted-by":"publisher","first-page":"5525","DOI":"10.1021\/acs.jctc.5b00743","article-title":"Pyemma 2: A software package for estimation, validation and analysis of markov models","volume":"11","author":"Scherer","year":"2015","journal-title":"J. Chem. Theory Comput."},{"key":"mlstab80b7bib45","doi-asserted-by":"publisher","first-page":"1963","DOI":"10.1063\/1.1731802","article-title":"On the theory of helix coil transition in polypeptides","volume":"34","author":"Lifson","year":"1961","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1039\/9781847558282-00001","article-title":"The a-helix as the simplest protein model: Helix-coil theory, stability and design","author":"Doig","year":"2008","journal-title":"Protein Folding, Misfolding and Aggregation (Cambridge, Royal Society of Chemistry)"},{"key":"mlstab80b7bib47","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1186\/s12859-018-2507-5","article-title":"Deep clustering of protein folding simulations","volume":"19","author":"Bhowmik","year":"2018","journal-title":"BMC Bioinform."},{"key":"mlstab80b7bib48","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1021\/ct5007357","article-title":"Modeling molecular kinetics with tICA and the kernel trick","volume":"11","author":"Schwantes","year":"2015","journal-title":"J. Chem. Theory Comput."},{"key":"mlstab80b7bib49","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1007\/BF00547608","article-title":"Location of saddle points and minimum energy paths by a constrained simplex optimization procedure","volume":"53","author":"M\u00fcller","year":"1979","journal-title":"Theor. Chim. Acta."},{"key":"mlstab80b7bib50","doi-asserted-by":"publisher","DOI":"10.1063\/1.4976518","article-title":"Markov state models from short non-equilibrium simulations\u2014analysis and correction of estimation bias","volume":"146","author":"N\u00fcske","year":"2017","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib51","doi-asserted-by":"publisher","first-page":"5571","DOI":"10.1021\/acs.jpclett.9b02012","article-title":"Deep representation learning for complex free-energy landscapes","volume":"10","author":"Zhang","year":"2019","journal-title":"J. Phys. Chem. Lett."},{"key":"mlstab80b7bib52","doi-asserted-by":"publisher","DOI":"10.1063\/1.5092521","article-title":"Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets","volume":"150","author":"Chen","year":"2019","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib53","author":"","year":"2019"},{"key":"mlstab80b7bib54","doi-asserted-by":"publisher","first-page":"06B620","DOI":"10.1063\/1.2945165","article-title":"Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis","volume":"128","author":"Altis","year":"2008","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib55","doi-asserted-by":"publisher","first-page":"04B616","DOI":"10.1063\/1.2714538","article-title":"Automatic discovery of metastable states for the construction of markov models of macromolecular conformational dynamics","volume":"126","author":"Chodera","year":"2007","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib56","doi-asserted-by":"publisher","DOI":"10.1063\/1.5025125","article-title":"Structural-kinetic-thermodynamic relationships identified from physics-based molecular simulation models","volume":"148","author":"Rudzinski","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstab80b7bib57","doi-asserted-by":"publisher","first-page":"21","DOI":"10.3390\/computation6010021","article-title":"Tristan Bereau The role of conformational entropy in the determination of structural-kinetic relationships for helix-coil transitions","volume":"6","author":"Rudzinski","year":"2018","journal-title":"Computation"},{"key":"mlstab80b7bib58","doi-asserted-by":"publisher","first-page":"4726","DOI":"10.1021\/acs.jctc.6b00503","article-title":"Using dimensionality reduction to systematically expand conformational sampling of intrinsically disordered peptides","volume":"12","author":"Kukharenko","year":"2016","journal-title":"J. Chem. Theory Comput."},{"key":"mlstab80b7bib59","doi-asserted-by":"publisher","first-page":"3810","DOI":"10.1021\/ct300077q","article-title":"Identifying metastable states of folding proteins","volume":"8","author":"Jain","year":"2012","journal-title":"J. Chem. Theory Comput."},{"key":"mlstab80b7bib60","first-page":"1","article-title":"Past\u2013future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics","volume":"10","author":"Wang","year":"2019","journal-title":"Nat. Commun."}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7","content-type":"text\/html","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,15]],"date-time":"2022-01-15T04:45:43Z","timestamp":1642221943000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ab80b7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,1]]},"references-count":60,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,4,27]]},"published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ab80b7","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,1]]},"assertion":[{"value":"Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2020 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2019-12-19","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2020-03-17","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2020-04-27","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}