{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T21:45:17Z","timestamp":1775252717146,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T00:00:00Z","timestamp":1721606400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T00:00:00Z","timestamp":1721606400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Important Drug Development Fund, Ministry of Science and Technology of China","award":["2018ZX09735002"],"award-info":[{"award-number":["2018ZX09735002"]}]},{"DOI":"10.13039\/501100012166","name":"National Key R &D Program of China","doi-asserted-by":"crossref","award":["2016YFA0502304"],"award-info":[{"award-number":["2016YFA0502304"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Optical chemical structure recognition (OCSR) is a fundamental and crucial task in the field of chemistry, which aims at transforming intricate chemical structure images into machine-readable formats. Current deep learning-based OCSR methods typically use image feature extractors to extract visual features and employ encoder-decoder architectures for chemical structure recognition. However, the performance of these methods is limited by their image feature extractors and the class imbalance of elements in chemical structure representation. This paper proposes MPOCSR (multi-path optical chemical structure recognition), which introduces the multi-path Vision Transformer (MPViT) and the class-balanced (CB) loss function to address these two challenges. MPOCSR uses MPViT as an image feature extractor, combining the advantages of convolutional neural networks and Vision Transformers. This strategy enables the provision of richer visual information for subsequent decoding processes. Furthermore, MPOCSR incorporates CB loss function to rebalance the loss weights among different categories. For training and validation of our method, we constructed a dataset that includes both Markush and non-Markush structures. Experimental results show that MPOCSR achieves an accuracy of 90.95% on the test set, surpassing other existing methods.<\/jats:p>","DOI":"10.1007\/s40747-024-01561-6","type":"journal-article","created":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T18:02:12Z","timestamp":1721671332000},"page":"7553-7563","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["MPOCSR: optical chemical structure recognition based on multi-path Vision Transformer"],"prefix":"10.1007","volume":"10","author":[{"given":"Fan","family":"Lin","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5438-1529","authenticated-orcid":false,"given":"Jianhua","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,22]]},"reference":[{"issue":"1","key":"1561_CR1","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/CI00057A005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336. https:\/\/doi.org\/10.1021\/CI00057A005","journal-title":"J Chem Inf Comput Sci"},{"issue":"4","key":"1561_CR2","doi-asserted-by":"publisher","first-page":"45024","DOI":"10.1088\/2632-2153\/aba947","volume":"1","author":"M Krenn","year":"2020","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1(4):45024. https:\/\/doi.org\/10.1088\/2632-2153\/aba947","journal-title":"Mach Learn Sci Technol"},{"issue":"4","key":"1561_CR3","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1021\/ci00008a018","volume":"32","author":"JR McDaniel","year":"1992","unstructured":"McDaniel JR, Balmuth JR (1992) Kekule: ocr-optical chemical (structure) recognition. J Chem Inf Comput Sci 32(4):373\u2013378. https:\/\/doi.org\/10.1021\/ci00008a018","journal-title":"J Chem Inf Comput Sci"},{"key":"1561_CR4","doi-asserted-by":"publisher","unstructured":"Casey RG, Boyer S, Healey P, Miller A, Oudot B, Zilles K (1993) Optical recognition of chemical graphics. In: 2nd International Conference Document Analysis and Recognition, ICDAR \u201993, October 20\u201322, Tsukuba City. IEEE Computer Society, pp 627\u2013631. https:\/\/doi.org\/10.1109\/ICDAR.1993.395658","DOI":"10.1109\/ICDAR.1993.395658"},{"issue":"3","key":"1561_CR5","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1021\/ci00013a010","volume":"33","author":"P Ibison","year":"1993","unstructured":"Ibison P, Jacquot M, Kam F, Neville AG, Simpson RW, Tonnelier CAG, Venczel T, Johnson AP (1993) Chemical literature data extraction: the clide project. J Chem Inf Comput Sci 33(3):338\u2013344. https:\/\/doi.org\/10.1021\/ci00013a010","journal-title":"J Chem Inf Comput Sci"},{"issue":"3","key":"1561_CR6","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1021\/ci800067r","volume":"49","author":"IV Filippov","year":"2009","unstructured":"Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information: Osra, an open source solution. J Chem Inf Model 49(3):740\u2013743. https:\/\/doi.org\/10.1021\/ci800067r","journal-title":"J Chem Inf Model"},{"key":"1561_CR7","doi-asserted-by":"crossref","unstructured":"Smolov V, Zentsev F, Rybalkin M (2011) Imago: open-source toolkit for 2d chemical structure image recognition. In: Voorhees EM, Buckland LP (eds) Proceedings of The 20th Text REtrieval conference, TREC 2011, Gaithersburg, November 15\u201318, NIST Special Publication, vol. 500\u2013296. National Institute of Standards and Technology (NIST). http:\/\/trec.nist.gov\/pubs\/trec20\/papers\/GGA.chemical.pdf","DOI":"10.6028\/NIST.SP.500-296.chemical-GGA"},{"issue":"42","key":"1561_CR8","doi-asserted-by":"publisher","first-page":"14174","DOI":"10.1039\/D1SC01839F","volume":"12","author":"D-A Clevert","year":"2021","unstructured":"Clevert D-A, Le T, Winter R, Montanari F (2021) Img2mol-accurate smiles recognition from molecular graphical depictions. Chem Sci 12(42):14174\u201314181. https:\/\/doi.org\/10.1039\/D1SC01839F","journal-title":"Chem Sci"},{"issue":"3","key":"1561_CR9","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1021\/acs.jcim.8b00669","volume":"59","author":"J Staker","year":"2019","unstructured":"Staker J, Marshall K, Abel R, McQuaw CM (2019) Molecular structure extraction from documents using deep learning. J Chem Inf Model 59(3):1017\u20131029. https:\/\/doi.org\/10.1021\/acs.jcim.8b00669","journal-title":"J Chem Inf Model"},{"key":"1561_CR10","unstructured":"Kalchbrenner N, Danihelka I, Graves A (2016) Grid long short-term memory. arXiv:1507.01526v3"},{"issue":"1","key":"1561_CR11","doi-asserted-by":"publisher","DOI":"10.1002\/cmtd.202100069","volume":"2","author":"I Khokhlov","year":"2022","unstructured":"Khokhlov I, Krasnov L, Fedorov MV, Sosnin S (2022) Image2smiles: transformer-based molecular optical recognition engine. Chem Methods 2(1):e202100069. https:\/\/doi.org\/10.1002\/cmtd.202100069","journal-title":"Chem Methods"},{"key":"1561_CR12","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, June 27\u201330. IEEE Computer Society, pp 770\u2013778. https:\/\/doi.org\/10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"1561_CR13","doi-asserted-by":"publisher","unstructured":"Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, June 7\u201312. IEEE Computer Society, pp 3156\u20133164. https:\/\/doi.org\/10.1109\/CVPR.2015.7298935","DOI":"10.1109\/CVPR.2015.7298935"},{"issue":"1","key":"1561_CR14","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1186\/s13321-020-00469-w","volume":"12","author":"K Rajan","year":"2020","unstructured":"Rajan K, Zielesny A, Steinbeck C (2020) Decimer: towards deep learning for chemical image recognition. J Cheminform 12(1):65\u201373. https:\/\/doi.org\/10.1186\/s13321-020-00469-w","journal-title":"J Cheminform"},{"issue":"1","key":"1561_CR15","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1186\/s13321-021-00538-8","volume":"13","author":"K Rajan","year":"2021","unstructured":"Rajan K, Zielesny A, Steinbeck C (2021) Decimer 1.0: deep learning for chemical image recognition using transformers. J Cheminform 13(1):61\u201376. https:\/\/doi.org\/10.1186\/s13321-021-00538-8","journal-title":"J Cheminform"},{"issue":"1","key":"1561_CR16","doi-asserted-by":"publisher","first-page":"5045","DOI":"10.1038\/s41467-023-40782-0","volume":"14","author":"K Rajan","year":"2023","unstructured":"Rajan K, Brinkhaus HO, Agea MI, Zielesny A, Steinbeck C (2023) Decimer. AI\u2014an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nat Commun 14(1):5045\u20135062. https:\/\/doi.org\/10.1038\/s41467-023-40782-0","journal-title":"Nat Commun"},{"key":"1561_CR17","doi-asserted-by":"publisher","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, June 27\u201330. IEEE Computer Society, pp 2818\u20132826. https:\/\/doi.org\/10.1109\/CVPR.2016.308","DOI":"10.1109\/CVPR.2016.308"},{"key":"1561_CR18","unstructured":"Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555v1"},{"key":"1561_CR19","unstructured":"Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML , 9\u201315 June 2019, Long Beach, Proceedings of Machine Learning Research, vol. 97. PMLR, pp 6105\u20136114. http:\/\/proceedings.mlr.press\/v97\/tan19a.html"},{"key":"1561_CR20","unstructured":"Tan M, Le QV (2021) Efficientnetv2: smaller models and faster training. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning, ICML 2021, 18\u201324 July 2021, virtual event, proceedings of machine learning research, vol. 139. PMLR, pp 10096\u201310106. http:\/\/proceedings.mlr.press\/v139\/tan21a.html"},{"issue":"1","key":"1561_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-022-00624-5","volume":"14","author":"Z Xu","year":"2022","unstructured":"Xu Z, Li J, Yang Z, Li S, Li H (2022) Swinocsr: end-to-end optical chemical structure recognition using a swin transformer. J Cheminform 14(1):1\u201313. https:\/\/doi.org\/10.1186\/s13321-022-00624-5","journal-title":"J Cheminform"},{"key":"1561_CR22","doi-asserted-by":"publisher","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE\/CVF international conference on computer vision, ICCV 2021, Montreal, October 10\u201317. IEEE, pp 9992\u201310002. https:\/\/doi.org\/10.1109\/ICCV48922.2021.00986","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"1561_CR23","doi-asserted-by":"publisher","unstructured":"O\u2019Boyle N, Dalke A (2018) Deepsmiles: an adaptation of smiles for use in machine-learning of chemical structures. https:\/\/doi.org\/10.26434\/chemrxiv.7097960.v1","DOI":"10.26434\/chemrxiv.7097960.v1"},{"issue":"7","key":"1561_CR24","doi-asserted-by":"publisher","first-page":"1925","DOI":"10.1021\/acs.jcim.2c01480","volume":"63","author":"Y Qian","year":"2023","unstructured":"Qian Y, Guo J, Tu Z, Li Z, Coley CW, Barzilay R (2023) Molscribe: robust molecular structure recognition with image-to-graph generation. J Chem Inf Model 63(7):1925\u20131934. https:\/\/doi.org\/10.1021\/acs.jcim.2c01480","journal-title":"J Chem Inf Model"},{"key":"1561_CR25","doi-asserted-by":"publisher","unstructured":"Lee Y, Kim J, Willette J, Hwang SJ (2022) Mpvit: multi-path vision transformer for dense prediction. In: IEEE\/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, June 18\u201324. IEEE, pp 7277\u20137286. https:\/\/doi.org\/10.1109\/CVPR52688.2022.00714","DOI":"10.1109\/CVPR52688.2022.00714"},{"key":"1561_CR26","doi-asserted-by":"publisher","unstructured":"Cui Y, Jia M, Lin T, Song Y, Belongie SJ (2019) Class-balanced loss based on effective number of samples. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, June 16\u201320. Computer Vision Foundation\/IEEE, pp 9268\u20139277. https:\/\/doi.org\/10.1109\/CVPR.2019.00949. http:\/\/openaccess.thecvf.com\/content_CVPR_2019\/html\/Cui_Class-Balanced_Loss_Based_on_Effective_Number_of_Samples_CVPR_2019_paper.html","DOI":"10.1109\/CVPR.2019.00949"},{"issue":"2","key":"1561_CR27","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1039\/D1DD00013F","volume":"1","author":"K Rajan","year":"2022","unstructured":"Rajan K, Steinbeck C, Zielesny A (2022) Performance of chemical structure string representations for chemical image recognition using transformers. Digit Discov 1(2):84\u201390. https:\/\/doi.org\/10.1039\/D1DD00013F","journal-title":"Digit Discov"},{"key":"1561_CR28","doi-asserted-by":"publisher","unstructured":"Xu W, Xu Y, Chang TA, Tu Z (2021) Co-scale conv-attentional image transformers. In: 2021 IEEE\/CVF international conference on computer vision, ICCV 2021, Montreal, October 10\u201317. IEEE, pp 9961\u20139970. https:\/\/doi.org\/10.1109\/ICCV48922.2021.00983","DOI":"10.1109\/ICCV48922.2021.00983"},{"key":"1561_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-017-0220-4","volume":"9","author":"EL Willighagen","year":"2017","unstructured":"Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chert\u00f3 M, Spjuth O (2017) The chemistry development kit (cdk) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9:1\u201319. https:\/\/doi.org\/10.1186\/s13321-017-0220-4","journal-title":"J Cheminform"},{"issue":"D1","key":"1561_CR30","doi-asserted-by":"publisher","first-page":"D1388","DOI":"10.1093\/nar\/gkaa971","volume":"49","author":"S Kim","year":"2021","unstructured":"Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B (2021) Pubchem in 2021: new data content and improved web interfaces. Nucleic Acids Res 49(D1):D1388\u2013D1395. https:\/\/doi.org\/10.1093\/nar\/gkaa971","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"1561_CR31","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1186\/s13321-022-00609-4","volume":"14","author":"HO Brinkhaus","year":"2022","unstructured":"Brinkhaus HO, Rajan K, Zielesny A, Steinbeck C (2022) Randepict: random chemical structure depiction generator. J Cheminform 14(1):31\u201337. https:\/\/doi.org\/10.1186\/s13321-022-00609-4","journal-title":"J Cheminform"},{"key":"1561_CR32","unstructured":"Loshchilov I, Hutter F (2017) Fixing weight decay regularization in adam. arXiv:1711.05101"},{"key":"1561_CR33","unstructured":"OpenAI (2023) Gpt-4 technical report. arXiv:2303.08774"},{"key":"1561_CR34","unstructured":"Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozi\u00e8re B, Goyal N, Hambro E, Azhar F (2023) Llama: open and efficient foundation language models. arXiv2302.13971"},{"key":"1561_CR35","unstructured":"Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S, Schuh P, Shi K, Tsvyashchenko S, Maynez J, Rao A, Barnes P, Tay Y, Shazeer N, Prabhakaran V, Reif E, Du N, Hutchinson B, Pope R, Bradbury J, Austin J, Isard M, Gur-Ari G, Yin P, Duke T, Levskaya A, Ghemawat S, Dev S, Michalewski H, Garcia X, Misra V, Robinson K, Fedus L, Zhou D, Ippolito D, Luan D, Lim H, Zoph B,Spiridonov A, Sepassi R, Dohan D, Agrawal S, Omernick M, Dai AM, Pillai TS, Pellat M, Lewkowycz A, Moreira E, Child R, Polozov O, Lee K, Zhou Z, Wang X, Saeta B, Diaz M, Firat O, Catasta M, Wei J, Meier-Hellstern K, Eck D, Dean J, Petrov S, Fiedel N (2023) Palm: scaling language modeling with pathways. J Mach Learn Res 24:240:1\u2013240:113. http:\/\/jmlr.org\/papers\/v24\/22-1144.html"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01561-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01561-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01561-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T22:07:50Z","timestamp":1729116470000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01561-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,22]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["1561"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01561-6","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,22]]},"assertion":[{"value":"6 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 July 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflict of interest in the publication of this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}