{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T02:38:08Z","timestamp":1773196688681,"version":"3.50.1"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:00:00Z","timestamp":1700438400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:00:00Z","timestamp":1700438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Natural Science Foundation of Hunan Provinces","award":["2022JJ30438"],"award-info":[{"award-number":["2022JJ30438"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage, paving the way for further analyses and studies on chemical information. A pivotal initial step in OCSR is automating the noise-free extraction of molecular descriptions from literature. Despite efforts utilising rule-based and deep learning approaches for the extraction process, the accuracy achieved to date is unsatisfactory. To address this issue, we introduce a deep learning model named YoDe-Segmentation in this study, engineered for the automated retrieval of molecular structures from scientific documents. This model operates via a three-stage process encompassing detection, mask generation, and calculation. Initially, it identifies and isolates molecular structures during the detection phase. Subsequently, mask maps are created based on these isolated structures in the mask generation stage. In the final calculation stage, refined and separated mask maps are combined with the isolated molecular structure images, resulting in the acquisition of pure molecular structures. Our model underwent rigorous testing using texts from multiple chemistry-centric journals, with the outcomes subjected to manual validation. The results revealed the superior performance of YoDe-Segmentation compared to alternative algorithms, documenting an average extraction efficiency of 97.62%. This outcome not only highlights the robustness and reliability of the model but also suggests its applicability on a broad scale.<\/jats:p>","DOI":"10.1186\/s13321-023-00783-z","type":"journal-article","created":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T20:02:07Z","timestamp":1700510527000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications"],"prefix":"10.1186","volume":"15","author":[{"given":"Chong","family":"Zhou","sequence":"first","affiliation":[]},{"given":"Wei","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xiyue","family":"Song","sequence":"additional","affiliation":[]},{"given":"Mengling","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Xiaowang","family":"Peng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,11,20]]},"reference":[{"issue":"1","key":"783_CR1","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1186\/s13321-020-00465-0","volume":"12","author":"K Rajan","year":"2020","unstructured":"Rajan K, Brinkhaus HO, Zielesny A, Steinbeck C (2020) A review of optical chemical structure recognition tools. J Cheminform 12(1):60. https:\/\/doi.org\/10.1186\/s13321-020-00465-0","journal-title":"J Cheminform"},{"issue":"1","key":"783_CR2","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1186\/s13321-022-00624-5","volume":"14","author":"Z Xu","year":"2022","unstructured":"Xu Z, Li J, Yang Z, Li S, Li H (2022) SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer. J Cheminform 14(1):41. https:\/\/doi.org\/10.1186\/s13321-022-00624-5","journal-title":"J Cheminform"},{"issue":"1","key":"783_CR3","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1186\/s13321-021-00538-8","volume":"13","author":"K Rajan","year":"2021","unstructured":"Rajan K, Zielesny A, Steinbeck C (2021) DECIMER 1.0: deep learning for chemical image recognition using transformers. J Cheminform 13(1):61. https:\/\/doi.org\/10.1186\/s13321-021-00538-8","journal-title":"J Cheminform"},{"issue":"4","key":"783_CR4","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1021\/ci00008a018","volume":"32","author":"JR McDaniel","year":"1992","unstructured":"McDaniel JR, Balmuth JR (1992) Kekule: OCR-optical chemical (structure) recognition. J Chem Inf Comput Sci 32(4):373\u2013378. https:\/\/doi.org\/10.1021\/ci00008a018","journal-title":"J Chem Inf Comput Sci"},{"issue":"1","key":"783_CR5","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comput Sci"},{"issue":"22","key":"783_CR6","doi-asserted-by":"publisher","first-page":"5321","DOI":"10.1021\/acs.jcim.2c00733","volume":"62","author":"Y Xu","year":"2022","unstructured":"Xu Y, Xiao J, Chou CH, Zhang J, Zhu J, Hu Q, Li H, Han N, Liu B, Zhang S, Han J, Zhang Z, Zhang S, Zhang W, Lai L, Pei J (2022) MolMiner: you only look once for chemical structure recognition. J Chem Inf Model 62(22):5321\u20135328. https:\/\/doi.org\/10.1021\/acs.jcim.2c00733","journal-title":"J Chem Inf Model"},{"issue":"5","key":"783_CR7","doi-asserted-by":"publisher","first-page":"883","DOI":"10.1109\/TEVC.2021.3064943","volume":"25","author":"X Liang","year":"2021","unstructured":"Liang X, Guo Q, Qian Y, Ding W, Zhang Q (2021) Evolutionary deep fusion method and its application in chemical structure recognition. IEEE Trans Evol Computat 25(5):883\u2013893. https:\/\/doi.org\/10.1109\/TEVC.2021.3064943","journal-title":"IEEE Trans Evol Computat"},{"issue":"19","key":"783_CR8","doi-asserted-by":"publisher","first-page":"4562","DOI":"10.1093\/bioinformatics\/btac545","volume":"38","author":"J Yi","year":"2022","unstructured":"Yi J, Wu C, Zhang X, Xiao X, Qiu Y, Zhao W, Hou T, Cao D (2022) MICER: a pre-trained encoder\u2013decoder architecture for molecular image captioning. Bioinformatics 38(19):4562\u20134572. https:\/\/doi.org\/10.1093\/bioinformatics\/btac545","journal-title":"Bioinformatics"},{"issue":"7","key":"783_CR9","doi-asserted-by":"publisher","first-page":"1925","DOI":"10.1021\/acs.jcim.2c01480","volume":"63","author":"Y Qian","year":"2023","unstructured":"Qian Y, Guo J, Tu Z, Li Z, Coley CW, Barzilay R (2023) MolScribe: robust molecular structure recognition with image-to-graph generation. J Chem Inf Model 63(7):1925\u20131934. https:\/\/doi.org\/10.1021\/acs.jcim.2c01480","journal-title":"J Chem Inf Model"},{"issue":"1","key":"783_CR10","doi-asserted-by":"publisher","first-page":"5045","DOI":"10.1038\/s41467-023-40782-0","volume":"14","author":"K Rajan","year":"2023","unstructured":"Rajan K, Brinkhaus HO, Agea MI, Zielesny A, Steinbeck C (2023) DECIMER. ai-An open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nat Commun 14(1):5045. https:\/\/doi.org\/10.1038\/s41467-023-40782-0","journal-title":"Nat Commun"},{"issue":"1","key":"783_CR11","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1186\/s13321-023-00713-z","volume":"15","author":"S Nemoto","year":"2023","unstructured":"Nemoto S, Mizuno T, Kusuhara H (2023) Investigation of chemical structure recognition by encoder\u2013decoder models in learning progress. J Cheminform 15(1):45. https:\/\/doi.org\/10.1186\/s13321-023-00713-z","journal-title":"J Cheminform"},{"issue":"3","key":"783_CR12","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1021\/ci800067r","volume":"49","author":"IV Filippov","year":"2009","unstructured":"Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information: OSRA, an open source solution. J Chem Inf Model 49(3):740\u2013743. https:\/\/doi.org\/10.1021\/ci800067r","journal-title":"J Chem Inf Model"},{"issue":"4","key":"783_CR13","doi-asserted-by":"publisher","first-page":"2059","DOI":"10.1021\/acs.jcim.0c00042","volume":"60","author":"EJ Beard","year":"2020","unstructured":"Beard EJ, Cole JM (2020) ChemSchematicResolver: a toolkit to decode 2-d chemical diagrams with labels and R-groups into annotated chemical named entities. J Chem Inf Model 60(4):2059\u20132072. https:\/\/doi.org\/10.1021\/acs.jcim.0c00042","journal-title":"J Chem Inf Model"},{"issue":"3","key":"783_CR14","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1021\/acs.jcim.8b00669","volume":"59","author":"J Staker","year":"2019","unstructured":"Staker J, Marshall K, Abel R, McQuaw CM (2019) Molecular Structure extraction from documents using deep learning. J Chem Inf Model 59(3):1017\u20131029. https:\/\/doi.org\/10.1021\/acs.jcim.8b00669","journal-title":"J Chem Inf Model"},{"key":"783_CR15","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1007\/978-3-319-24574-4_28","volume-title":"U-net: convolutional networks for biomedical image segmentation","author":"O Ronneberger","year":"2015","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Springer, Cham, pp 234\u2013241. https:\/\/doi.org\/10.1007\/978-3-319-24574-4_28"},{"issue":"1","key":"783_CR16","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-021-00496-1","volume":"13","author":"K Rajan","year":"2021","unstructured":"Rajan K, Brinkhaus HO, Sorokina M, Zielesny A, Steinbeck C (2021) DECIMER-segmentation: automated extraction of chemical structure depictions from scientific literature. J Cheminform 13(1):20. https:\/\/doi.org\/10.1186\/s13321-021-00496-1","journal-title":"J Cheminform"},{"key":"783_CR17","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp.\u00a02961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"key":"783_CR18","unstructured":"Jocher G YOLOv5. https:\/\/github.com\/ultralytics\/yolov5. Accessed Jun 2022"},{"key":"783_CR19","unstructured":"PyTorch FAIR. https:\/\/pytorch.org\/docs. Accessed Jun 2022"},{"key":"783_CR20","unstructured":"CoderWanFeng python-office. https:\/\/github.com\/CoderWanFeng\/python-office. Accessed 9 Dec 2020"},{"key":"783_CR21","unstructured":"Jameslahm LabelMe. https:\/\/jameslahm.github.io\/labelme. Accessed Jun 2022"},{"key":"783_CR22","unstructured":"Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587"},{"issue":"8","key":"783_CR23","doi-asserted-by":"publisher","first-page":"1467","DOI":"10.1109\/JPROC.2010.2050290","volume":"98","author":"A Torralba","year":"2010","unstructured":"Torralba A, Russell BC, Yuen J (2010) Labelme: online image annotation and applications. Proc IEEE 98(8):1467\u20131484. https:\/\/doi.org\/10.1109\/JPROC.2010.2050290","journal-title":"Proc IEEE"},{"key":"783_CR24","unstructured":"Khayal M, Khan A, Bashir S, Khan FH, Aslam S (2011) Modified new algorithm for seed filling. J Theor Appl Inf Technol 26(1)"},{"issue":"4","key":"783_CR25","doi-asserted-by":"publisher","first-page":"045024","DOI":"10.1088\/2632-2153\/aba947","volume":"1","author":"M Krenn","year":"2020","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1(4):045024. https:\/\/doi.org\/10.1088\/2632-2153\/aba947","journal-title":"Mach Learn Sci Technol"},{"issue":"1","key":"783_CR26","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1186\/s13321-020-00469-w","volume":"12","author":"K Rajan","year":"2020","unstructured":"Rajan K, Zielesny A, Steinbeck C (2020) DECIMER: towards deep learning for chemical image recognition. J Cheminform 12(1):65. https:\/\/doi.org\/10.1186\/s13321-020-00469-w","journal-title":"J Cheminform"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00783-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00783-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00783-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T20:04:18Z","timestamp":1700510658000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00783-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,20]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["783"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00783-z","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,20]]},"assertion":[{"value":"20 September 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing financial interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"111"}}