{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:20:06Z","timestamp":1772173206824,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010561","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,10,11]],"date-time":"2022-10-11T00:00:00Z","timestamp":1665446400000}}],"reference-count":56,"publisher":"Public Library of Science (PLoS)","issue":"9","license":[{"start":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T00:00:00Z","timestamp":1664409600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2155095 and TG-BIO210009"],"award-info":[{"award-number":["2155095 and TG-BIO210009"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["RBMPro CE30-0021-01 and ANR-19 Decrypted CE30-0021-01"],"award-info":[{"award-number":["RBMPro CE30-0021-01 and ANR-19 Decrypted CE30-0021-01"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Union\u2019s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement","award":["101026293"],"award-info":[{"award-number":["101026293"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Selection protocols such as SELEX, where molecules are selected over multiple rounds for their ability to bind to a target of interest, are popular methods for obtaining binders for diagnostic and therapeutic purposes. We show that Restricted Boltzmann Machines (RBMs), an unsupervised two-layer neural network architecture, can successfully be trained on sequence ensembles from single rounds of SELEX experiments for thrombin aptamers. RBMs assign scores to sequences that can be directly related to their fitnesses estimated through experimental enrichment ratios. Hence, RBMs trained from sequence data at a given round can be used to predict the effects of selection at later rounds. Moreover, the parameters of the trained RBMs are interpretable and identify functional features contributing most to sequence fitness. To exploit the generative capabilities of RBMs, we introduce two different training protocols: one taking into account sequence counts, capable of identifying the few best binders, and another based on unique sequences only, generating more diverse binders. We then use RBMs model to generate novel aptamers with putative disruptive mutations or good binding properties, and validate the generated sequences with gel shift assay experiments. Finally, we compare the RBM\u2019s performance with different supervised learning approaches that include random forests and several deep neural network architectures.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010561","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T15:33:07Z","timestamp":1664465587000},"page":"e1010561","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":31,"title":["Generative and interpretable machine learning for aptamer design and analysis of in vitro sequence selection"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6085-7589","authenticated-orcid":true,"given":"Andrea","family":"Di Gioacchino","sequence":"first","affiliation":[]},{"given":"Jonah","family":"Procyk","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8838-4093","authenticated-orcid":true,"given":"Marco","family":"Molari","sequence":"additional","affiliation":[]},{"given":"John S.","family":"Schreck","sequence":"additional","affiliation":[]},{"given":"Yu","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Yan","family":"Liu","sequence":"additional","affiliation":[]},{"given":"R\u00e9mi","family":"Monasson","sequence":"additional","affiliation":[]},{"given":"Simona","family":"Cocco","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1565-6769","authenticated-orcid":true,"given":"Petr","family":"\u0160ulc","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"issue":"23","key":"pcbi.1010561.ref001","doi-asserted-by":"crossref","first-page":"6561","DOI":"10.1016\/j.bmcl.2009.10.032","article-title":"From selection to caged aptamers: identification of light-dependent ssDNA aptamers targeting cytohesin","volume":"19","author":"G Mayer","year":"2009","journal-title":"Bioorganic & medicinal chemistry letters"},{"issue":"18","key":"pcbi.1010561.ref002","doi-asserted-by":"crossref","first-page":"5459","DOI":"10.1002\/ange.201409597","article-title":"Selective Aptamer-Based Control of Intraneuronal Signaling","volume":"127","author":"S Lennarz","year":"2015","journal-title":"Angewandte Chemie"},{"issue":"5","key":"pcbi.1010561.ref003","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1002\/cbic.201600491","article-title":"Activation of the glmS ribozyme confers bacterial growth inhibition","volume":"18","author":"A Sch\u00fcller","year":"2017","journal-title":"Chembiochem"},{"issue":"18","key":"pcbi.1010561.ref004","doi-asserted-by":"crossref","first-page":"10279","DOI":"10.1002\/anie.202100316","article-title":"A SARS-CoV-2 Spike Binding DNA Aptamer that Inhibits Pseudovirus Infection by an RBD-Independent Mechanism","volume":"60","author":"A Schmitz","year":"2021","journal-title":"Angewandte Chemie International Edition"},{"issue":"31","key":"pcbi.1010561.ref005","doi-asserted-by":"crossref","first-page":"10752","DOI":"10.1002\/anie.201903479","article-title":"A Receptor-Guided Design Strategy for Ligand Identification","volume":"58","author":"M Rosenthal","year":"2019","journal-title":"Angewandte Chemie International Edition"},{"key":"pcbi.1010561.ref006","article-title":"A synthetic RNA-based biosensor for fructose-1, 6-bisphosphate that reports glycolytic flux","author":"AD Ortega","year":"2021","journal-title":"Cell Chemical Biology"},{"issue":"50","key":"pcbi.1010561.ref007","doi-asserted-by":"crossref","first-page":"22600","DOI":"10.1002\/ange.202009240","article-title":"Aptamer-Mediated Reversible Transactivation of Gene Expression by Light","volume":"132","author":"C Renzl","year":"2020","journal-title":"Angewandte Chemie"},{"issue":"1","key":"pcbi.1010561.ref008","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-03631-z","article-title":"Poly-ligand profiling differentiates trastuzumab-treated breast cancer patients according to their outcomes","volume":"9","author":"V Domenyuk","year":"2018","journal-title":"Nature communications"},{"key":"pcbi.1010561.ref009","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.biochi.2017.10.007","article-title":"Systematic evaluation of cell-SELEX enriched aptamers binding to breast cancer cells","volume":"145","author":"L Civit","year":"2018","journal-title":"Biochimie"},{"issue":"8","key":"pcbi.1010561.ref010","doi-asserted-by":"crossref","first-page":"4013","DOI":"10.1093\/nar\/gkaa034","article-title":"ADAPT identifies an ESCRT complex composition that discriminates VCaP from LNCaP prostate cancer cell exosomes","volume":"48","author":"T Hornung","year":"2020","journal-title":"Nucleic acids research"},{"key":"pcbi.1010561.ref011","doi-asserted-by":"crossref","first-page":"e169","DOI":"10.1038\/mtna.2014.21","article-title":"Cell-type-specific, aptamer-functionalized agents for targeted disease therapy","volume":"3","author":"J Zhou","year":"2014","journal-title":"Molecular Therapy-Nucleic Acids"},{"issue":"4968","key":"pcbi.1010561.ref012","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1126\/science.2200121","article-title":"Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase","volume":"249","author":"C Tuerk","year":"1990","journal-title":"Science"},{"issue":"6287","key":"pcbi.1010561.ref013","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1038\/346818a0","article-title":"In vitro selection of RNA molecules that bind specific ligands","volume":"346","author":"AD Ellington","year":"1990","journal-title":"Nature"},{"key":"pcbi.1010561.ref014","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/j.omtn.2020.05.025","article-title":"Aptamers against live targets: is in vivo SELEX finally coming to the edge?","volume":"21","author":"M Sola","year":"2020","journal-title":"Molecular Therapy-Nucleic Acids"},{"issue":"4","key":"pcbi.1010561.ref015","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1007\/s00253-005-0193-5","article-title":"Aptamers\u2014basic research, drug development, and clinical applications","volume":"69","author":"D Proske","year":"2005","journal-title":"Applied microbiology and biotechnology"},{"issue":"12","key":"pcbi.1010561.ref016","doi-asserted-by":"crossref","first-page":"4522","DOI":"10.3390\/ijms21124522","article-title":"Chemical modification of aptamers for increased binding affinity in diagnostic applications: Current status and future prospects","volume":"21","author":"JP Elskens","year":"2020","journal-title":"International Journal of Molecular Sciences"},{"issue":"4","key":"pcbi.1010561.ref017","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1016\/j.drudis.2020.03.003","article-title":"Machine learning models for drug\u2013target interactions: current knowledge and future directions","volume":"25","author":"S D\u2019Souza","year":"2020","journal-title":"Drug Discovery Today"},{"issue":"7873","key":"pcbi.1010561.ref018","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"J Jumper","year":"2021","journal-title":"Nature"},{"issue":"6558","key":"pcbi.1010561.ref019","doi-asserted-by":"crossref","first-page":"1047","DOI":"10.1126\/science.abe5650","article-title":"Geometric deep learning of RNA structure","volume":"373","author":"RJ Townshend","year":"2021","journal-title":"Science"},{"key":"pcbi.1010561.ref020","article-title":"Machine learning directed drug formulation development","author":"P Bannigan","year":"2021","journal-title":"Advanced Drug Delivery Reviews"},{"issue":"12","key":"pcbi.1010561.ref021","doi-asserted-by":"crossref","first-page":"i215","DOI":"10.1093\/bioinformatics\/bts210","article-title":"Identification of sequence\u2013structure RNA binding motifs for SELEX-derived aptamers","volume":"28","author":"J Hoinka","year":"2012","journal-title":"Bioinformatics"},{"issue":"4","key":"pcbi.1010561.ref022","doi-asserted-by":"crossref","first-page":"3307","DOI":"10.1021\/acs.analchem.9b05203","article-title":"A sequential multidimensional analysis algorithm for aptamer identification based on structure analysis and machine learning","volume":"92","author":"J Song","year":"2019","journal-title":"Analytical chemistry"},{"key":"pcbi.1010561.ref023","doi-asserted-by":"crossref","first-page":"e230","DOI":"10.1038\/mtna.2015.4","article-title":"FASTAptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections","volume":"4","author":"KK Alam","year":"2015","journal-title":"Molecular Therapy-Nucleic Acids"},{"issue":"W1","key":"pcbi.1010561.ref024","doi-asserted-by":"crossref","first-page":"W39","DOI":"10.1093\/nar\/gkv416","article-title":"The MEME suite","volume":"43","author":"TL Bailey","year":"2015","journal-title":"Nucleic acids research"},{"issue":"18","key":"pcbi.1010561.ref025","doi-asserted-by":"crossref","first-page":"2665","DOI":"10.1093\/bioinformatics\/btu348","article-title":"MPBind: a Meta-motif-based statistical framework and pipeline to Predict Binding potential of SELEX-derived aptamers","volume":"30","author":"P Jiang","year":"2014","journal-title":"Bioinformatics"},{"issue":"12","key":"pcbi.1010561.ref026","doi-asserted-by":"crossref","first-page":"5939","DOI":"10.1021\/acs.jctc.5b00707","article-title":"Searching the Sequence Space for Potent Aptamers Using SELEX in Silico","volume":"11","author":"Q Zhou","year":"2015","journal-title":"Journal of Chemical Theory and Computation"},{"issue":"2","key":"pcbi.1010561.ref027","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1021\/acs.jpclett.6b02769","article-title":"Exploring the Mutational Robustness of Nucleic Acids by Searching Genotype Neighborhoods in Sequence Space","volume":"8","author":"Q Zhou","year":"2017","journal-title":"The Journal of Physical Chemistry Letters"},{"issue":"14","key":"pcbi.1010561.ref028","doi-asserted-by":"crossref","first-page":"8167","DOI":"10.1093\/nar\/gkx540","article-title":"Analysis of in vitro evolution reveals the underlying distribution of catalytic activity among random sequences","volume":"45","author":"A Pressman","year":"2017","journal-title":"Nucleic Acids Research"},{"issue":"15","key":"pcbi.1010561.ref029","doi-asserted-by":"crossref","first-page":"6213","DOI":"10.1021\/jacs.8b13298","article-title":"Mapping a Systematic Ribozyme Fitness Landscape Reveals a Frustrated Evolutionary Network for Self-Aminoacylating RNA","volume":"141","author":"AD Pressman","year":"2019","journal-title":"Journal of the American Chemical Society"},{"key":"pcbi.1010561.ref030","first-page":"418459","article-title":"Inferring sequence-structure preferences of RNA-binding proteins with convolutional residual networks","author":"PK Koo","year":"2018","journal-title":"BioRxiv"},{"key":"pcbi.1010561.ref031","doi-asserted-by":"crossref","DOI":"10.3389\/fmolb.2021.673363","article-title":"Learning the regulatory code of gene expression","volume":"8","author":"J Zrimec","year":"2021","journal-title":"Frontiers in Molecular Biosciences"},{"key":"pcbi.1010561.ref032","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.coisb.2020.04.001","article-title":"Deep learning for inferring transcription factor binding sites","volume":"19","author":"PK Koo","year":"2020","journal-title":"Current opinion in systems biology"},{"issue":"6","key":"pcbi.1010561.ref033","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1038\/s41587-020-00793-4","article-title":"Deep diversification of an AAV capsid protein by machine learning","volume":"39","author":"DH Bryant","year":"2021","journal-title":"Nature Biotechnology"},{"issue":"3","key":"pcbi.1010561.ref034","doi-asserted-by":"crossref","first-page":"032601","DOI":"10.1088\/1361-6633\/aa9965","article-title":"Inverse statistical physics of protein sequences: a key issues review","volume":"81","author":"S Cocco","year":"2018","journal-title":"Reports on Progress in Physics"},{"issue":"21","key":"pcbi.1010561.ref035","first-page":"10444","article-title":"Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction","volume":"43","author":"E De Leonardis","year":"2015","journal-title":"Nucleic acids research"},{"issue":"49","key":"pcbi.1010561.ref036","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"F Morcos","year":"2011","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"6502","key":"pcbi.1010561.ref037","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1126\/science.aba3304","article-title":"An evolution-based model for designing chorismate mutase enzymes","volume":"369","author":"WP Russ","year":"2020","journal-title":"Science"},{"issue":"1","key":"pcbi.1010561.ref038","first-page":"1","article-title":"Global pairwise RNA interaction landscapes reveal core features of protein recognition","volume":"9","author":"Q Zhou","year":"2018","journal-title":"Nature communications"},{"key":"pcbi.1010561.ref039","doi-asserted-by":"crossref","first-page":"e39397","DOI":"10.7554\/eLife.39397","article-title":"Learning protein constitutive motifs from sequence data","volume":"8","author":"J Tubiana","year":"2019","journal-title":"eLife"},{"issue":"2","key":"pcbi.1010561.ref040","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.cels.2020.11.005","article-title":"RBM-MHC: A Semi-Supervised Machine-Learning Method for Sample-Specific Prediction of Antigen Presentation by HLA-I Alleles","volume":"12","author":"B Bravi","year":"2021","journal-title":"Cell systems"},{"issue":"19","key":"pcbi.1010561.ref041","doi-asserted-by":"crossref","first-page":"2494","DOI":"10.1002\/cbic.201900265","article-title":"DNA-Nanoscaffold-Assisted Selection of Femtomolar Bivalent Human alpha-Thrombin Aptamers with Potent Anticoagulant Activity","volume":"20","author":"Y Zhou","year":"2019","journal-title":"ChemBioChem"},{"issue":"8","key":"pcbi.1010561.ref042","doi-asserted-by":"crossref","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training Products of Experts by Minimizing Contrastive Divergence","volume":"14","author":"GE Hinton","year":"2002","journal-title":"Neural Computation"},{"key":"pcbi.1010561.ref043","doi-asserted-by":"crossref","unstructured":"Tieleman T. Training Restricted Boltzmann Machines Using Approximations to the Likelihood Gradient. In: Proceedings of the 25th International Conference on Machine Learning. ICML\u201908. New York, NY, USA: Association for Computing Machinery; 2008. p. 1064\u20131071. Available from: https:\/\/doi.org\/10.1145\/1390156.1390290.","DOI":"10.1145\/1390156.1390290"},{"issue":"4","key":"pcbi.1010561.ref044","doi-asserted-by":"crossref","first-page":"1283","DOI":"10.1103\/RevModPhys.83.1283","article-title":"Statistical genetics and evolution of quantitative traits","volume":"83","author":"RA Neher","year":"2011","journal-title":"Reviews of Modern Physics"},{"issue":"3","key":"pcbi.1010561.ref045","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1093\/genetics\/111.3.655","article-title":"Limits of Adaptation: The Evolution of Selective Neutrality","volume":"111","author":"DL Hartl","year":"1985","journal-title":"Genetics"},{"issue":"24","key":"pcbi.1010561.ref046","doi-asserted-by":"crossref","first-page":"17651","DOI":"10.1016\/S0021-9258(17)46749-4","article-title":"The structure of alpha-thrombin inhibited by a 15-mer single-stranded DNA aptamer","volume":"268","author":"K Padmanabhan","year":"1993","journal-title":"Journal of Biological Chemistry"},{"issue":"1","key":"pcbi.1010561.ref047","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-29325-6","article-title":"Systematic evaluation of error rates and causes in short samples in next-generation sequencing","volume":"8","author":"F Pfeiffer","year":"2018","journal-title":"Scientific reports"},{"key":"pcbi.1010561.ref048","doi-asserted-by":"crossref","DOI":"10.1515\/9781400849383","volume-title":"Robustness and evolvability in living systems","author":"A Wagner","year":"2013"},{"issue":"1","key":"pcbi.1010561.ref049","doi-asserted-by":"crossref","first-page":"msab321","DOI":"10.1093\/molbev\/msab321","article-title":"Modeling sequence-space exploration and emergence of epistatic signals in protein evolution","volume":"39","author":"M Bisardi","year":"2022","journal-title":"Molecular biology and evolution"},{"key":"pcbi.1010561.ref050","first-page":"171","article-title":"Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS)","volume":"118","author":"TF Lou","year":"2017","journal-title":"Methods"},{"issue":"2","key":"pcbi.1010561.ref051","doi-asserted-by":"crossref","first-page":"e17","DOI":"10.1038\/jid.2013.521","article-title":"Antibody phage display: technique and applications","volume":"134","author":"CM Hammers","year":"2014","journal-title":"The Journal of investigative dermatology"},{"issue":"6","key":"pcbi.1010561.ref052","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1016\/S0958-1669(02)00380-4","article-title":"Antibody discovery: phage display","volume":"13","author":"T Kretzschmar","year":"2002","journal-title":"Current opinion in biotechnology"},{"issue":"20","key":"pcbi.1010561.ref053","doi-asserted-by":"crossref","first-page":"10908","DOI":"10.3390\/ijms222010908","article-title":"AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape","volume":"22","author":"L Sesta","year":"2021","journal-title":"International journal of molecular sciences"},{"key":"pcbi.1010561.ref054","doi-asserted-by":"crossref","first-page":"138301","DOI":"10.1103\/PhysRevLett.118.138301","article-title":"Emergence of Compositional Representations in Restricted Boltzmann Machines","volume":"118","author":"J Tubiana","year":"2017","journal-title":"Phys Rev Lett"},{"issue":"3","key":"pcbi.1010561.ref055","doi-asserted-by":"crossref","first-page":"034109","DOI":"10.1103\/PhysRevE.104.034109","article-title":"Barriers and dynamical paths in alternating Gibbs sampling of restricted Boltzmann machines","volume":"104","author":"C Roussel","year":"2021","journal-title":"Physical Review E"},{"issue":"9","key":"pcbi.1010561.ref056","doi-asserted-by":"crossref","first-page":"e1007383","DOI":"10.1371\/journal.pcbi.1007383","article-title":"Why do G-quadruplexes dimerize through the 5\u2019-ends? Driving forces for G4 DNA dimerization examined in atomic detail","volume":"15","author":"M Kogut","year":"2019","journal-title":"PLoS computational biology"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010561","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,10,11]],"date-time":"2022-10-11T00:00:00Z","timestamp":1665446400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010561","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,11]],"date-time":"2022-10-11T13:29:16Z","timestamp":1665494956000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010561"}},"subtitle":[],"editor":[{"given":"Jinyan","family":"Li","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,9,29]]},"references-count":56,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,9,29]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010561","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.03.12.484094","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,29]]}}}