{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T22:48:51Z","timestamp":1775688531901,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2022,8,5]],"date-time":"2022-08-05T00:00:00Z","timestamp":1659657600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2021YFF1201400"],"award-info":[{"award-number":["2021YFF1201400"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["22173118"],"award-info":[{"award-number":["22173118"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1811462"],"award-info":[{"award-number":["U1811462"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hunan Provincial Science Fund for Distinguished Young Scholars","award":["2021JJ10068"],"award-info":[{"award-number":["2021JJ10068"]}]},{"name":"science and technology innovation Program of Hunan Province","award":["2021RC4011"],"award-info":[{"award-number":["2021RC4011"]}]},{"name":"Changsha Municipal Natural Science Foundation","award":["kq2014144"],"award-info":[{"award-number":["kq2014144"]}]},{"name":"Changsha Science and Technology Bureau project","award":["kq2001034"],"award-info":[{"award-number":["kq2001034"]}]},{"name":"HKBU Strategic Development Fund project","award":["SDF19 0402 P02"],"award-info":[{"award-number":["SDF19 0402 P02"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Automatic recognition of chemical structures from molecular images provides an important avenue for the rediscovery of chemicals. Traditional rule-based approaches that rely on expert knowledge and fail to consider all the stylistic variations of molecular images usually suffer from cumbersome recognition processes and low generalization ability. Deep learning-based methods that integrate different image styles and automatically learn valuable features are flexible, but currently under-researched and have limitations, and are therefore not fully exploited.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>MICER, an encoder\u2013decoder-based, reconstructed architecture for molecular image captioning, combines transfer learning, attention mechanisms and several strategies to strengthen effectiveness and plasticity in different datasets. The effects of stereochemical information, molecular complexity, data volume and pre-trained encoders on MICER performance were evaluated. Experimental results show that the intrinsic features of the molecular images and the sub-model match have a significant impact on the performance of this task. These findings inspire us to design the training dataset and the encoder for the final validation model, and the experimental results suggest that the MICER model consistently outperforms the state-of-the-art methods on four datasets. MICER was more reliable and scalable due to its interpretability and transfer capacity and provides a practical framework for developing comprehensive and accurate automated molecular structure identification tools to explore unknown chemical space.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/Jiacai-Yi\/MICER.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac545","type":"journal-article","created":{"date-parts":[[2022,8,5]],"date-time":"2022-08-05T13:43:50Z","timestamp":1659707030000},"page":"4562-4572","source":"Crossref","is-referenced-by-count":18,"title":["MICER: a pre-trained encoder\u2013decoder architecture for molecular image captioning"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6823-1882","authenticated-orcid":false,"given":"Jiacai","family":"Yi","sequence":"first","affiliation":[{"name":"School of Computer Science, National University of Defense Technology , Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9688-5311","authenticated-orcid":false,"given":"Chengkun","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute for Quantum Information & State Key Laboratory of High-Performance Computing, College of Computer Science and Technology, National University of Defense Technology , Changsha 410073, China"}]},{"given":"Xiaochen","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science, National University of Defense Technology , Changsha 410073, China"}]},{"given":"Xinyi","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Computer Science, National University of Defense Technology , Changsha 410073, China"}]},{"given":"Yanlong","family":"Qiu","sequence":"additional","affiliation":[{"name":"School of Computer Science, National University of Defense Technology , Changsha 410073, China"}]},{"given":"Wentao","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science, National University of Defense Technology , Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7227-2580","authenticated-orcid":false,"given":"Tingjun","family":"Hou","sequence":"additional","affiliation":[{"name":"Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University , Hangzhou 310058, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3604-3785","authenticated-orcid":false,"given":"Dongsheng","family":"Cao","sequence":"additional","affiliation":[{"name":"Xiangya School of Pharmaceutical Sciences, Central South University , Changsha 410003, China"},{"name":"Hunan Key Laboratory of Diagnostic and Therapeutic Drug Research for Chronic Diseases, Central South University , Changsha 410013, China"}]}],"member":"286","published-online":{"date-parts":[[2022,8,5]]},"reference":[{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1897","DOI":"10.1351\/pac200678101897","article-title":"Graphical representation of stereochemical configuration (IUPAC recommendations 2006)","volume":"78","author":"Brecher","year":"2006","journal-title":"Pure Appl. Chem"},{"key":"2023041408225230800_","author":"Cho","year":"2014"},{"key":"2023041408225230800_","first-page":"302","article-title":"Computational perception and recognition of digitized molecular structures","volume":"30","author":"Contreras","year":"1990","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408225230800_","article-title":"Artificial intelligence in drug discovery: applications and techniques","author":"Deng","year":"2021"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-6-17","article-title":"Chemical named entities recognition: a review on approaches and applications","volume":"6","author":"Eltyeb","year":"2014","journal-title":"J. Cheminform"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"4","DOI":"10.3389\/frai.2020.00004","article-title":"An introductory review of deep learning for prediction models with big data","volume":"3","author":"Emmert-Streib","year":"2020","journal-title":"Front. Artif. Intell"},{"key":"2023041408225230800_","first-page":"740","article-title":"Optical structure recognition software to recover chemical information: OSRA, an open source solution","author":"Filippov","year":"2009"},{"key":"2023041408225230800_","first-page":"33","article-title":"Attentional pooling for action recognition","author":"Girdhar","year":"2017"},{"key":"2023041408225230800_","first-page":"580","author":"Girshick","year":"2014"},{"key":"2023041408225230800_","first-page":"770","author":"He","year":"2016"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3295748","article-title":"A comprehensive survey of deep learning for image captioning","volume":"51","author":"Hossain","year":"2019","journal-title":"ACM Comput. Surv"},{"key":"2023041408225230800_","first-page":"4700","author":"Huang","year":"2017"},{"key":"2023041408225230800_","article-title":"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and &lt; 0.5 MB model size","author":"Iandola","year":"2016"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1021\/ci3001277","article-title":"ZINC: a free tool to discover chemistry for biology","volume":"52","author":"Irwin","year":"2012","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"6065","DOI":"10.1021\/acs.jcim.0c00675","article-title":"ZINC20\u2014a free ultralarge-scale chemical database for ligand discovery","volume":"60","author":"Irwin","year":"2020","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408225230800_","author":"Kingma","year":"2014"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"045024","DOI":"10.1088\/2632-2153\/aba947","article-title":"Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation","volume":"1","author":"Krenn","year":"2020","journal-title":"Mach. Learn. Sci. Technol"},{"key":"2023041408225230800_","article-title":"Survey of dropout methods for deep neural networks","author":"Labach","year":"2019"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"2023041408225230800_","first-page":"3005","article-title":"PyTorch distributed: experiences on accelerating data parallel training","author":"Li","year":"2020"},{"key":"2023041408225230800_","first-page":"3431","author":"Long","year":"2015"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/375360.375365","article-title":"A guided tour to approximate string matching","volume":"33","author":"Navarro","year":"2001","journal-title":"ACM Comput. Surv"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-4-22","article-title":"Towards a universal SMILES representation-A standard method to generate canonical SMILES based on the InChI","volume":"4","author":"O\u2019Boyle","year":"2012","journal-title":"J. Cheminform"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"4506","DOI":"10.1021\/acs.jcim.0c00459","article-title":"ChemGrapher: optical graph recognition of chemical compounds by deep learning","volume":"60","author":"Oldenhof","year":"2020","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"P4","DOI":"10.1186\/1758-2946-3-S1-P4","article-title":"Indigo: universal cheminformatics API","volume":"3","author":"Pavlov","year":"2011","journal-title":"J. Cheminform"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-020-00465-0","article-title":"A review of optical chemical structure recognition tools","volume":"12","author":"Rajan","year":"2020","journal-title":"J. Cheminform"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-020-00469-w","article-title":"DECIMER: towards deep learning for chemical image recognition","volume":"12","author":"Rajan","year":"2020","journal-title":"J. Cheminform"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1021\/ci00065a003","article-title":"Automatic processing of graphics for image databases in science","volume":"30","author":"Rozas","year":"1990","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023041408225230800_","first-page":"4510","author":"Sandler","year":"2018"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","article-title":"Deep learning in neural networks: an overview","volume":"61","author":"Schmidhuber","year":"2015","journal-title":"Neural Netw"},{"key":"2023041408225230800_","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan","year":"2015"},{"key":"2023041408225230800_","first-page":"296","volume-title":"In: The Twentieth Text REtrieval Conference Proceedings, Gaithersburg, Maryland","author":"Smolov","year":"2011"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1021\/acs.jcim.8b00669","article-title":"Molecular structure extraction from documents using deep learning","volume":"59","author":"Staker","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408225230800_","first-page":"2818","author":"Szegedy","year":"2016"},{"key":"2023041408225230800_","first-page":"270","volume-title":"International Conference on Artificial Neural Networks","author":"Tan","year":"2018"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-03899-3","article-title":"CGINet: graph convolutional network-based model for identifying chemical-gene interaction in an integrated multi-relational graph","volume":"21","author":"Wang","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"2023041408225230800_","first-page":"2048","author":"Xu","year":"2015"},{"key":"2023041408225230800_","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/JPROC.2020.3004555","article-title":"A comprehensive survey on transfer learning","volume":"109","author":"Zhuang","year":"2020","journal-title":"Proc. IEEE"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac545\/45494387\/btac545.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/19\/4562\/49885639\/btac545.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/19\/4562\/49885639\/btac545.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T10:10:44Z","timestamp":1700907044000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/19\/4562\/6656348"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,8,5]]},"references-count":40,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2022,9,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac545","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,10,1]]},"published":{"date-parts":[[2022,8,5]]}}}