{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T19:58:29Z","timestamp":1776196709686,"version":"3.50.1"},"reference-count":80,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T00:00:00Z","timestamp":1687996800000},"content-version":"vor","delay-in-days":28,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T00:00:00Z","timestamp":1687996800000},"content-version":"tdm","delay-in-days":28,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2023,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Automated computational analysis of the vast chemical space is critical for numerous fields of research such as drug discovery and material science. Representation learning techniques have recently been employed with the primary objective of generating compact and informative numerical expressions of complex data, for efficient usage in subsequent prediction tasks. One approach to efficiently learn molecular representations is processing string-based notations of chemicals via natural language processing algorithms. Majority of the methods proposed so far utilize SMILES notations for this purpose, which is the most extensively used string-based encoding for molecules. However, SMILES is associated with numerous problems related to validity and robustness, which may prevent the model from effectively uncovering the knowledge hidden in the data. In this study, we propose SELFormer, a transformer architecture-based chemical language model (CLM) that utilizes a 100% valid, compact and expressive notation, SELFIES, as input, in order to learn flexible and high-quality molecular representations. SELFormer is pre-trained on two million drug-like compounds and fine-tuned for diverse molecular property prediction tasks. Our performance evaluation has revealed that, SELFormer outperforms all competing methods, including graph learning-based approaches and SMILES-based CLMs, on predicting aqueous solubility of molecules and adverse drug reactions, while producing comparable results for the remaining tasks. We also visualized molecular representations learned by SELFormer via dimensionality reduction, which indicated that even the pre-trained model can discriminate molecules with differing structural properties. We shared SELFormer as a programmatic tool, together with its datasets and pre-trained models at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/HUBioDataLab\/SELFormer\" xlink:type=\"simple\">https:\/\/github.com\/HUBioDataLab\/SELFormer<\/jats:ext-link>. Overall, our research demonstrates the benefit of using the SELFIES notations in the context of chemical language modeling and opens up new possibilities for the design and discovery of novel drug candidates with desired features.<\/jats:p>","DOI":"10.1088\/2632-2153\/acdb30","type":"journal-article","created":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T22:57:04Z","timestamp":1685746624000},"page":"025035","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":52,"title":["SELFormer: molecular representation learning via SELFIES language models"],"prefix":"10.1088","volume":"4","author":[{"given":"Atakan","family":"Y\u00fcksel","sequence":"first","affiliation":[]},{"given":"Erva","family":"Ulusoy","sequence":"additional","affiliation":[]},{"given":"Atabey","family":"\u00dcnl\u00fc","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1298-9763","authenticated-orcid":true,"given":"Tunca","family":"Do\u011fan","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2023,6,29]]},"reference":[{"key":"mlstacdb30bib1","article-title":"ChemBERTa-2: towards chemical foundation models","author":"Ahmad","year":"2022"},{"key":"mlstacdb30bib2","doi-asserted-by":"publisher","first-page":"13","DOI":"10.3390\/asi5010013","article-title":"A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and SVM","volume":"5","author":"AlBadani","year":"2022","journal-title":"Appl. Syst. Innov."},{"key":"mlstacdb30bib3","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1080\/02648725.2021.1966920","article-title":"In-silico strategies to combat COVID-19: a comprehensive review","volume":"37","author":"Basu","year":"2021","journal-title":"Biotechnol. Genet. Eng. Rev."},{"key":"mlstacdb30bib4","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","article-title":"The properties of known drugs. 1. Molecular frameworks","volume":"39","author":"Bemis","year":"1996","journal-title":"J. Med. Chem."},{"key":"mlstacdb30bib5","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: a review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"mlstacdb30bib6","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1016\/j.ijpharm.2018.01.044","article-title":"Computational prediction of drug solubility in water-based systems: qualitative and quantitative approaches used in the current drug discovery and development setting","volume":"540","author":"Bergstr\u00f6m","year":"2018","journal-title":"Int. J. Pharm."},{"key":"mlstacdb30bib7","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1038\/s42256-023-00639-z","article-title":"Regression Transformer enables concurrent sequence regression and generation for molecular language modelling","volume":"5","author":"Born","year":"2023","journal-title":"Nat. Mach. Intell."},{"key":"mlstacdb30bib8","doi-asserted-by":"publisher","DOI":"10.1039\/D2DD00099G","article-title":"Chemical representation learning for toxicity prediction","author":"Born","year":"2023","journal-title":"Digit. Discovery"},{"key":"mlstacdb30bib9","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","article-title":"GuacaMol: benchmarking models for de novo molecular design","volume":"59","author":"Brown","year":"2019","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib10","doi-asserted-by":"publisher","first-page":"bbac408","DOI":"10.1093\/bib\/bbac408","article-title":"FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction","volume":"23","author":"Cai","year":"2022","journal-title":"Brief. Bioinformatics"},{"key":"mlstacdb30bib11","doi-asserted-by":"publisher","first-page":"3099","DOI":"10.1021\/ci300367a","article-title":"admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties","volume":"52","author":"Cheng","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib12","article-title":"ChemBERTa: large-scale self-supervised pretraining for molecular property prediction","author":"Chithrananda","year":"2020"},{"key":"mlstacdb30bib13","doi-asserted-by":"publisher","first-page":"8705","DOI":"10.1021\/acs.jmedchem.0c00385","article-title":"Learning molecular representations for medicinal chemistry: miniperspective","volume":"63","author":"Chuang","year":"2020","journal-title":"J. Med. Chem."},{"key":"mlstacdb30bib14","article-title":"Oral contraceptive pills","author":"Cooper","year":"2022"},{"key":"mlstacdb30bib15","doi-asserted-by":"publisher","first-page":"1000","DOI":"10.1021\/ci034243x","article-title":"ESOL: estimating aqueous solubility directly from molecular structure","volume":"44","author":"Delaney","year":"2004","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"mlstacdb30bib16","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018"},{"key":"mlstacdb30bib17","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1093\/jac\/dkx351","article-title":"Metronidazole: an update on metabolism, structure-cytotoxicity and resistance mechanisms","volume":"73","author":"Dingsdag","year":"2018","journal-title":"J. Antimicrob. Chemother."},{"key":"mlstacdb30bib18","doi-asserted-by":"crossref","first-page":"e5298","DOI":"10.7717\/peerj.5298","article-title":"HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences","volume":"6","author":"Do\u011fan","year":"2018","journal-title":"PeerJ"},{"key":"mlstacdb30bib19","doi-asserted-by":"publisher","first-page":"e96","DOI":"10.1093\/nar\/gkab543","article-title":"CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations","volume":"49","author":"Do\u011fan","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"mlstacdb30bib20","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1109\/MSP.2021.3134634","article-title":"Self-supervised representation learning: introduction, advances, and challenges","volume":"39","author":"Ericsson","year":"2022","journal-title":"IEEE Signal Process. Mag."},{"key":"mlstacdb30bib21","article-title":"Molecular representation learning with language models and domain-relevant auxiliary tasks","author":"Fabian","year":"2020"},{"key":"mlstacdb30bib22","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1038\/s42256-021-00438-4","article-title":"Geometry-enhanced molecular representation learning for property prediction","volume":"4","author":"Fang","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"mlstacdb30bib23","doi-asserted-by":"publisher","DOI":"10.26434\/chemrxiv-2022-3s512","article-title":"Neural scaling of deep chemical models","author":"Frey","year":"2022"},{"key":"mlstacdb30bib24","article-title":"Directional message passing for molecular graphs","author":"Gasteiger","year":"2020"},{"key":"mlstacdb30bib25","doi-asserted-by":"publisher","first-page":"D945","DOI":"10.1093\/nar\/gkw1074","article-title":"The ChEMBL database in 2017","volume":"45","author":"Gaulton","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"mlstacdb30bib26","first-page":"1263","article-title":"Neural message passing for quantum chemistry","author":"Gilmer","year":"2017"},{"key":"mlstacdb30bib27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00535-x","article-title":"Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier","volume":"13","author":"Handsel","year":"2021","journal-title":"J. Cheminformatics"},{"key":"mlstacdb30bib28","doi-asserted-by":"publisher","first-page":"397","DOI":"10.26355\/eurrev_201901_16788","article-title":"Therapeutic uses of metronidazole and its side effects: an update","volume":"23","author":"Hern\u00e1ndez Ceruelos","year":"2019","journal-title":"Eur. Rev. Med. Pharmacol. Sci."},{"key":"mlstacdb30bib29","article-title":"Strategies for pre-training graph neural networks","author":"Hu","year":"2019"},{"key":"mlstacdb30bib30","doi-asserted-by":"publisher","first-page":"1757","DOI":"10.1021\/ci3001277","article-title":"ZINC: a free tool to discover chemistry for biology","volume":"52","author":"Irwin","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib31","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac3ffb","article-title":"Chemformer: a pre-trained transformer for computational chemistry","volume":"3","author":"Irwin","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstacdb30bib32","article-title":"Predicting organic reaction outcomes with Weisfeiler-Lehman network","volume":"vol 30","author":"Jin","year":"2017"},{"key":"mlstacdb30bib33","article-title":"Ammus: a survey of transformer-based pretrained models in natural language processing","author":"Kalyan","year":"2021"},{"key":"mlstacdb30bib34","doi-asserted-by":"publisher","first-page":"D1373","DOI":"10.1093\/nar\/gkac956","article-title":"PubChem 2023 update","volume":"51","author":"Kim","year":"2023","journal-title":"Nucleic Acids Res."},{"key":"mlstacdb30bib35","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2021.100198","article-title":"Latent representation learning in biology and translational medicine","volume":"2","author":"Kopf","year":"2021","journal-title":"Patterns"},{"key":"mlstacdb30bib36","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2022.100588","article-title":"SELFIES and the future of molecular string representations","volume":"3","author":"Krenn","year":"2022","journal-title":"Patterns"},{"key":"mlstacdb30bib37","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba947","article-title":"Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation","volume":"1","author":"Krenn","year":"2020","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstacdb30bib38","doi-asserted-by":"publisher","first-page":"D1075","DOI":"10.1093\/nar\/gkv1075","article-title":"The SIDER database of drugs and side effects","volume":"44","author":"Kuhn","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"mlstacdb30bib39","article-title":"Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","author":"Lewis","year":"2019"},{"key":"mlstacdb30bib40","doi-asserted-by":"crossref","DOI":"10.1145\/3534678.3539426","article-title":"Kpgt: knowledge-guided pre-training of graph transformer for molecular property prediction","author":"Li","year":"2022c"},{"key":"mlstacdb30bib41","doi-asserted-by":"publisher","first-page":"4541","DOI":"10.1609\/aaai.v36i4.20377","article-title":"Geomgcl: geometric graph contrastive learning for molecular property prediction","volume":"36","author":"Li","year":"2022a","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"mlstacdb30bib42","doi-asserted-by":"publisher","DOI":"10.1016\/j.drudis.2022.103373","article-title":"Deep learning methods for molecular representation and property prediction","volume":"27","author":"Li","year":"2022b","journal-title":"Drug Discov. Today"},{"key":"mlstacdb30bib43","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/j.aiopen.2022.10.001","article-title":"A survey of transformers","volume":"3","author":"Lin","year":"2022","journal-title":"AI Open"},{"key":"mlstacdb30bib44","article-title":"Multi-modal molecule structure-text model for text-based retrieval and editing","author":"Liu","year":"2022"},{"key":"mlstacdb30bib45","article-title":"Pre-training molecular graph representation with 3d geometry","author":"Liu","year":"2021"},{"key":"mlstacdb30bib46","article-title":"RoBERTa: a robustly optimized BERT pretraining approach","author":"Liu","year":"2019"},{"key":"mlstacdb30bib47","doi-asserted-by":"publisher","first-page":"1052","DOI":"10.1609\/aaai.v33i01.33011052","article-title":"Molecular property prediction: a multilevel quantum interactions modeling perspective","volume":"33","author":"Lu","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"mlstacdb30bib48","doi-asserted-by":"publisher","first-page":"1686","DOI":"10.1021\/ci300124c","article-title":"A Bayesian approach to in silico blood-brain barrier penetration modeling","volume":"52","author":"Martins","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib49","article-title":"Umap: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2018"},{"key":"mlstacdb30bib50","author":"","year":"n.d."},{"key":"mlstacdb30bib51","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1007\/s10822-014-9747-x","article-title":"FreeSolv: a database of experimental and calculated hydration free energies, with input files","volume":"28","author":"Mobley","year":"2014","journal-title":"J. Comput.-Aided Mol. Des."},{"key":"mlstacdb30bib52","doi-asserted-by":"publisher","first-page":"4602","DOI":"10.1609\/aaai.v33i01.33014602","article-title":"Weisfeiler and Leman go neural: higher-order graph neural networks","volume":"33","author":"Morris","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"mlstacdb30bib53","doi-asserted-by":"publisher","first-page":"7079","DOI":"10.1039\/D1SC00231G","article-title":"Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES","volume":"12","author":"Nigam","year":"2021","journal-title":"Chem. Sci."},{"key":"mlstacdb30bib54","article-title":"Representation learning with contrastive predictive coding","author":"Oord","year":"2018"},{"key":"mlstacdb30bib55","author":"Radford","year":"2019"},{"key":"mlstacdb30bib56","doi-asserted-by":"publisher","first-page":"1256","DOI":"10.1038\/s42256-022-00580-7","article-title":"Large-scale chemical language representations capture molecular structure and properties","volume":"4","author":"Ross","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"mlstacdb30bib57","doi-asserted-by":"publisher","first-page":"II42","DOI":"10.1161\/01.HYP.11.3_Pt_2.II42","article-title":"Side effects of calcium channel blockers","volume":"11","author":"Russell","year":"1988","journal-title":"Hypertension"},{"key":"mlstacdb30bib58","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1186\/s12987-017-0080-3","article-title":"The opioid epidemic: a central role for the blood brain barrier in opioid analgesia and abuse","volume":"14","author":"Schaefer","year":"2017","journal-title":"Fluids Barriers CNS"},{"key":"mlstacdb30bib59","article-title":"Schnet: a continuous-filter convolutional neural network for modeling quantum interactions","volume":"vol 30","author":"Sch\u00fctt","year":"2017"},{"key":"mlstacdb30bib60","article-title":"Roformer: enhanced transformer with rotary position embedding","author":"Su","year":"2021"},{"key":"mlstacdb30bib61","doi-asserted-by":"publisher","first-page":"1936","DOI":"10.1021\/acs.jcim.6b00290","article-title":"Computational modeling of \u03b2-secretase 1 (BACE-1) inhibitors using ligand based approaches","volume":"56","author":"Subramanian","year":"2016","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib62","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3530811","article-title":"Efficient transformers: a survey","volume":"55","author":"Tay","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"mlstacdb30bib63","article-title":"AIDS antiviral screen data\u2014NCI DTP data\u2014NCI Wiki","author":"","year":"n.d."},{"key":"mlstacdb30bib64","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1038\/s42256-022-00457-9","article-title":"Learning functional properties of proteins with language models","volume":"4","author":"Unsal","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"mlstacdb30bib65","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","article-title":"Applications of machine learning in drug discovery and development","volume":"18","author":"Vamathevan","year":"2019","journal-title":"Nat. Rev. Drug Discovery"},{"key":"mlstacdb30bib66","article-title":"Attention is all you need","volume":"vol 30","author":"Vaswani","year":"2017"},{"key":"mlstacdb30bib67","doi-asserted-by":"publisher","first-page":"1395","DOI":"10.1021\/ci700096r","article-title":"Development of reliable aqueous solubility models and their application in druglike analysis","volume":"47","author":"Wang","year":"2007","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib68","doi-asserted-by":"publisher","DOI":"10.1016\/j.conengprac.2020.104458","article-title":"Review on deep learning techniques for marine object recognition: architectures and algorithms","volume":"118","author":"Wang","year":"2022a","journal-title":"Control Eng. Pract."},{"key":"mlstacdb30bib69","doi-asserted-by":"publisher","first-page":"2977","DOI":"10.1021\/jm030580l","article-title":"The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures","volume":"47","author":"Wang","year":"2004","journal-title":"J. Med. Chem."},{"key":"mlstacdb30bib70","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1016\/j.isatra.2019.06.007","article-title":"SMILES-BERT: large scale unsupervised pre-training for molecular property prediction","author":"Wang","year":"2019"},{"key":"mlstacdb30bib71","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1038\/s42256-022-00447-x","article-title":"Molecular contrastive learning of representations via graph neural networks","volume":"4","author":"Wang","year":"2022b","journal-title":"Nat. Mach. Intell."},{"key":"mlstacdb30bib72","doi-asserted-by":"publisher","first-page":"btad085","DOI":"10.1093\/bioinformatics\/btad085","article-title":"Multimodal representation learning for predicting molecule\u2013disease relations","volume":"39","author":"Wen","year":"2023","journal-title":"Bioinformatics"},{"key":"mlstacdb30bib73","doi-asserted-by":"publisher","first-page":"e1603","DOI":"10.1002\/wcms.1603","article-title":"A review of molecular representation in the age of machine learning","volume":"12","author":"Wigh","year":"2022","journal-title":"Wiley Interdiscip. Rev.-Comput. Mol. Sci."},{"key":"mlstacdb30bib74","article-title":"Huggingface\u2019s transformers: state-of-the-art natural language processing","author":"Wolf","year":"2019"},{"key":"mlstacdb30bib75","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1039\/C7SC02664A","article-title":"MoleculeNet: a benchmark for molecular machine learning","volume":"9","author":"Wu","year":"2018","journal-title":"Chem. Sci."},{"key":"mlstacdb30bib76","article-title":"How powerful are graph neural networks?","author":"Xu","year":"2018"},{"key":"mlstacdb30bib77","doi-asserted-by":"publisher","DOI":"10.1101\/2020.12.23.424259","article-title":"X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis","author":"Xue","year":"2020"},{"key":"mlstacdb30bib78","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","article-title":"Analyzing learned molecular representations for property prediction","volume":"59","author":"Yang","year":"2019","journal-title":"J. Chem. Inf. Model."},{"key":"mlstacdb30bib79","article-title":"SS-GNN: a simple-structured graph neural network for affinity prediction","author":"Zhang","year":"2022"},{"key":"mlstacdb30bib80","doi-asserted-by":"publisher","first-page":"15956","DOI":"10.1021\/acsomega.9b01997","article-title":"Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein\u2013ligand binding affinity prediction","volume":"4","author":"Zheng","year":"2019","journal-title":"ACS Omega"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T13:36:11Z","timestamp":1688045771000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/acdb30"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":80,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,6,29]]},"published-print":{"date-parts":[[2023,6,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/acdb30","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,1]]},"assertion":[{"value":"SELFormer: molecular representation learning via SELFIES language models","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2023 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2023-04-01","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-06-02","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-06-29","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}