{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T14:17:03Z","timestamp":1762957023828,"version":"3.37.3"},"reference-count":48,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2022,10,31]],"date-time":"2022-10-31T00:00:00Z","timestamp":1667174400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,31]],"date-time":"2022-10-31T00:00:00Z","timestamp":1667174400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"European Research council","award":["ERC2017-StG-757733"],"award-info":[{"award-number":["ERC2017-StG-757733"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2022,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large and need voluminous data for training, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe accomplishes 4\u00d7 faster rate for training and inference, due to a 10- and 3-times reduction of the number of parameters and layers, respectively. Meanwhile, the integrated model achieves comparable performance in QSAR and VS, because of capturing general-domain (basic structure) and task-specific knowledge (specific property prediction). Moreover, we anticipate that the model compression strategy provides a pathway to the creation of effective generative transformer models for organic drugs and material design.<\/jats:p>","DOI":"10.1088\/2632-2153\/ac99ba","type":"journal-article","created":{"date-parts":[[2022,10,12]],"date-time":"2022-10-12T22:43:40Z","timestamp":1665614620000},"page":"045009","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Chemical transformer compression for accelerating both training and inference of molecular modeling"],"prefix":"10.1088","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8360-005X","authenticated-orcid":false,"given":"Yi","family":"Yu","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8533-201X","authenticated-orcid":true,"given":"Karl","family":"B\u00f6rjesson","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2022,10,31]]},"reference":[{"key":"mlstac99babib1","doi-asserted-by":"publisher","first-page":"9121","DOI":"10.1039\/D0CS01065K","article-title":"A critical overview of computational approaches employed for COVID-19 drug discovery","volume":"50","author":"Muratov","year":"2021","journal-title":"Chem. Soc. Rev."},{"key":"mlstac99babib2","doi-asserted-by":"publisher","first-page":"1790","DOI":"10.1093\/bib\/bbaa034","article-title":"Virtual screening web servers: designing chemical probes and drug candidates in the cyberspace","volume":"22","author":"Singh","year":"2020","journal-title":"Brief. Bioinform."},{"key":"mlstac99babib3","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","article-title":"Applications of machine learning in drug discovery and development","volume":"18","author":"Vamathevan","year":"2019","journal-title":"Nat. Rev. Drug Discovery"},{"key":"mlstac99babib4","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1038\/s42256-022-00463-x","article-title":"The transformational role of GPU computing and deep learning in drug discovery","volume":"4","author":"Pandey","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"mlstac99babib5","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1126\/science.aat2663","article-title":"Inverse molecular design using machine learning: generative models for matter engineering","volume":"361","author":"Sanchez-Lengeling","year":"2018","journal-title":"Science"},{"key":"mlstac99babib6","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1038\/s41573-019-0050-3","article-title":"Rethinking drug design in the artificial intelligence era","volume":"19","author":"Schneider","year":"2020","journal-title":"Nat. Rev. Drug Discovery"},{"key":"mlstac99babib7","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba947","article-title":"Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation","volume":"1","author":"Krenn","year":"2020","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstac99babib8","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci."},{"year":"2010","author":"James","key":"mlstac99babib9"},{"key":"mlstac99babib10","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1007\/s10822-016-9938-8","article-title":"Molecular graph convolutions: moving beyond fingerprints","volume":"30","author":"Kearnes","year":"2016","journal-title":"J. Comput. Aided Mol. Des."},{"key":"mlstac99babib11","doi-asserted-by":"publisher","first-page":"1379","DOI":"10.1016\/j.chempr.2020.02.017","article-title":"A structure-based platform for predicting chemical reactivity","volume":"6","author":"Sandfort","year":"2020","journal-title":"Chem"},{"key":"mlstac99babib12","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","article-title":"GuacaMol: benchmarking models for de novo molecular design","volume":"59","author":"Brown","year":"2019","journal-title":"J. Chem. Inf. Model."},{"key":"mlstac99babib13","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","article-title":"Generating focused molecule libraries for drug discovery with recurrent neural networks","volume":"4","author":"Segler","year":"2018","journal-title":"ACS Cent. Sci."},{"key":"mlstac99babib14","doi-asserted-by":"publisher","first-page":"bbab544","DOI":"10.1093\/bib\/bbab544","article-title":"Comprehensive assessment of deep generative architectures for de novo drug design","volume":"23","author":"Wang","year":"2021","journal-title":"Brief. Bioinform."},{"article-title":"Attention is all you need","year":"2017","author":"Vaswani","key":"mlstac99babib15"},{"key":"mlstac99babib16","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac3ffb","article-title":"Chemformer: a pre-trained transformer for computational chemistry","volume":"3","author":"Irwin","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstac99babib17","doi-asserted-by":"publisher","first-page":"bbab152","DOI":"10.1093\/bib\/bbab152","article-title":"MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction","volume":"22","author":"Zhang","year":"2021","journal-title":"Brief. Bioinform."},{"key":"mlstac99babib18","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1145\/3307339.3342186","article-title":"SMILES-BERT: large scale unsupervised pre-training for molecular property prediction","author":"Wang","year":"2019"},{"article-title":"Do large scale molecular language representations capture important structural information?","year":"2021","author":"Ross","key":"mlstac99babib19"},{"article-title":"Molecule attention transformer","year":"2020","author":"Maziarka","key":"mlstac99babib20"},{"article-title":"Chemberta: large-scale self-supervised pretraining for molecular property prediction","year":"2020","author":"Chithrananda","key":"mlstac99babib21"},{"article-title":"Molecular representation learning with language models and domain-relevant auxiliary tasks","year":"2020","author":"Fabian","key":"mlstac99babib22"},{"key":"mlstac99babib23","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1186\/s13321-021-00497-0","article-title":"Molecular optimization by capturing chemist\u2019s intuition using deep neural networks","volume":"13","author":"He","year":"2021","journal-title":"J. Cheminformatics"},{"key":"mlstac99babib24","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1016\/j.scib.2022.01.029","article-title":"X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis","volume":"67","author":"Xue","year":"2022","journal-title":"Sci. Bull."},{"article-title":"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter","year":"2019","author":"Sanh","key":"mlstac99babib25"},{"article-title":"Tinybert: distilling bert for natural language understanding","year":"2019","author":"Jiao","key":"mlstac99babib26"},{"article-title":"Universal transformers","year":"2018","author":"Dehghani","key":"mlstac99babib27"},{"article-title":"Albert: a lite bert for self-supervised learning of language representations","year":"2019","author":"Lan","key":"mlstac99babib28"},{"article-title":"Efficient vision transformers via fine-grained manifold distillation","year":"2021","author":"Jia","key":"mlstac99babib29"},{"key":"mlstac99babib30","doi-asserted-by":"crossref","DOI":"10.1109\/WF-IoT48130.2020.9221198","article-title":"A survey of methods for low-power deep learning and computer vision","author":"Goel","year":"2020"},{"article-title":"Distilling task-specific knowledge from bert into simple neural networks","year":"2019","author":"Tang","key":"mlstac99babib31"},{"key":"mlstac99babib32","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D19-1441","article-title":"Patient knowledge distillation for bert model compression","author":"Sun","year":"2019"},{"article-title":"Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers","year":"2020","author":"Wang","key":"mlstac99babib33"},{"key":"mlstac99babib34","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.acl-main.195","article-title":"Mobilebert: a compact task-agnostic bert for resource-limited devices","author":"Sun","year":"2020"},{"article-title":"Distilling the knowledge in a neural network","year":"2015","author":"Hinton","key":"mlstac99babib35"},{"key":"mlstac99babib36","doi-asserted-by":"publisher","first-page":"D945","DOI":"10.1093\/nar\/gkw1074","article-title":"The ChEMBL database in 2017","volume":"45","author":"Gaulton","year":"2016","journal-title":"Nucleic Acids Res."},{"year":"2020","author":"Landrum","key":"mlstac99babib37"},{"key":"mlstac99babib38","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1039\/C8SC04175J","article-title":"Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations","volume":"10","author":"Winter","year":"2019","journal-title":"Chem. Sci."},{"key":"mlstac99babib39","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1039\/C7SC02664A","article-title":"MoleculeNet: a benchmark for molecular machine learning","volume":"9","author":"Wu","year":"2018","journal-title":"Chem. Sci."},{"key":"mlstac99babib40","doi-asserted-by":"publisher","first-page":"2077","DOI":"10.1021\/ci900161g","article-title":"Benchmark Data Set for in Silico Prediction of Ames Mutagenicity","volume":"49","author":"Hansen","year":"2009","journal-title":"J. Chem. Inf. Model."},{"key":"mlstac99babib41","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1016\/j.chemosphere.2015.07.036","article-title":"Identifying potential endocrine disruptors among industrial chemicals and their metabolites\u2014development and evaluation of in silico tools","volume":"139","author":"Rybacka","year":"2015","journal-title":"Chemosphere"},{"key":"mlstac99babib42","doi-asserted-by":"publisher","first-page":"D1083","DOI":"10.1093\/nar\/gkt1031","article-title":"The ChEMBL bioactivity database: an update","volume":"42","author":"Bento","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"mlstac99babib43","doi-asserted-by":"publisher","first-page":"7257","DOI":"10.3390\/molecules26237257","article-title":"CRNNTL: convolutional recurrent neural network and transfer learning for QSAR modeling in organic drug and material discovery","volume":"26","author":"Li","year":"2021","journal-title":"Molecules"},{"key":"mlstac99babib44","doi-asserted-by":"publisher","first-page":"2829","DOI":"10.1021\/ci400466r","article-title":"Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing","volume":"53","author":"Riniker","year":"2013","journal-title":"J. Chem. Inf. Model."},{"article-title":"Pytorch: an imperative style, high-performance deep learning library","year":"2019","author":"Paszke","key":"mlstac99babib45"},{"article-title":"Huggingface\u2019s transformers: state-of-the-art natural language processing","year":"2019","author":"Wolf","key":"mlstac99babib46"},{"key":"mlstac99babib47","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.acl-demos.2","article-title":"Textbrewer: an open-source knowledge distillation toolkit for natural language processing","author":"Yang","year":"2020"},{"key":"mlstac99babib48","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/s13321-020-00423-w","article-title":"Transformer-CNN: swiss knife for QSAR modeling and interpretation","volume":"12","author":"Karpov","year":"2020","journal-title":"J. Cheminformatics"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,31]],"date-time":"2022-10-31T13:04:42Z","timestamp":1667221482000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac99ba"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,31]]},"references-count":48,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,10,31]]},"published-print":{"date-parts":[[2022,12,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ac99ba","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2022,10,31]]},"assertion":[{"value":"Chemical transformer compression for accelerating both training and inference of molecular modeling","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2022 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2022-05-13","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2022-10-12","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2022-10-31","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}