{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T02:58:28Z","timestamp":1779332308048,"version":"3.51.4"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2022,8,12]],"date-time":"2022-08-12T00:00:00Z","timestamp":1660262400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61702240"],"award-info":[{"award-number":["61702240"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004775","name":"Natural Science Foundation of Gansu Province","doi-asserted-by":"publisher","award":["20JR10RA613"],"award-info":[{"award-number":["20JR10RA613"]}],"id":[{"id":"10.13039\/501100004775","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004775","name":"Natural Science Foundation of Gansu Province","doi-asserted-by":"publisher","award":["21JR7RA460"],"award-info":[{"award-number":["21JR7RA460"]}],"id":[{"id":"10.13039\/501100004775","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Extracting useful molecular features is essential for molecular property prediction. Atom-level representation is a common representation of molecules, ignoring the sub-structure or branch information of molecules to some extent; however, it is vice versa for the substring-level representation. Both atom-level and substring-level representations may lose the neighborhood or spatial information of molecules. While molecular graph representation aggregating the neighborhood information of a molecule has a weak ability in expressing the chiral molecules or symmetrical structure. In this article, we aim to make use of the advantages of representations in different granularities simultaneously for molecular property prediction. To this end, we propose a fusion model named MultiGran-SMILES, which integrates the molecular features of atoms, sub-structures and graphs from the input. Compared with the single granularity representation of molecules, our method leverages the advantages of various granularity representations simultaneously and adjusts the contribution of each type of representation adaptively for molecular property prediction.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The experimental results show that our MultiGran-SMILES method achieves state-of-the-art performance on BBBP, LogP, HIV and ClinTox datasets. For the BACE, FDA and Tox21 datasets, the results are comparable with the state-of-the-art models. Moreover, the experimental results show that the gains of our proposed method are bigger for the molecules with obvious functional groups or branches.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The code and data underlying this work are available on GitHub at https:\/\/github. com\/Jiangjing0122\/MultiGran.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac550","type":"journal-article","created":{"date-parts":[[2022,8,12]],"date-time":"2022-08-12T21:40:20Z","timestamp":1660340420000},"page":"4573-4580","source":"Crossref","is-referenced-by-count":26,"title":["MultiGran-SMILES: multi-granularity SMILES learning for molecular property prediction"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5648-2323","authenticated-orcid":false,"given":"Jing","family":"Jiang","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"},{"name":"Key Laboratory of China\u2019s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University , Lanzhou 730030, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruisheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhili","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunwu","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongna","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bojuan","family":"Niu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University , Lanzhou 730000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,8,12]]},"reference":[{"key":"2023041408223923700_","first-page":"1","author":"Altszyler","year":"2018"},{"key":"2023041408223923700_","first-page":"1481","author":"Chakrabarty","year":"2017"},{"key":"2023041408223923700_","first-page":"103","author":"Cho","year":"2014"},{"key":"2023041408223923700_","author":"Chung","year":"2014"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1021\/acs.jcim.6b00601","article-title":"Convolutional embedding of attributed molecular graphs for physical property prediction","volume":"57","author":"Coley","year":"2017","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","first-page":"4171","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019","journal-title":"Proceedings of NAACL-HLT"},{"key":"2023041408223923700_","author":"Gasteiger","year":"2019"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1021\/cc9800071","article-title":"A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases","volume":"1","author":"Ghose","year":"1999","journal-title":"J. Comb. Chem"},{"key":"2023041408223923700_","first-page":"1263","author":"Gilmer","year":"2017"},{"key":"2023041408223923700_","first-page":"199","article-title":"Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME","volume":"9","author":"Glem","year":"2006","journal-title":"IDrugs"},{"key":"2023041408223923700_","first-page":"435","author":"Guo","year":"2020"},{"key":"2023041408223923700_","first-page":"1025","author":"Hamilton","year":"2017"},{"key":"2023041408223923700_","author":"Honda","year":"2019"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1021\/acs.jcim.7b00616","article-title":"Mol2vec: unsupervised machine learning approach with chemical intuition","volume":"58","author":"Jaeger","year":"2018","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.ymeth.2020.05.009","article-title":"The message passing neural networks for chemical property prediction on smiles","volume":"179","author":"Jo","year":"2020","journal-title":"Methods"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1007\/s10822-016-9938-8","article-title":"Molecular graph convolutions: moving beyond fingerprints","volume":"30","author":"Kearnes","year":"2016","journal-title":"J. Comput. Aided Mol. Des"},{"key":"2023041408223923700_","author":"Kingma","year":"2015"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"1560","DOI":"10.1021\/acs.jcim.0c01127","article-title":"Smiles pair encoding: a data-driven substructure tokenization algorithm for deep learning","volume":"61","author":"Li","year":"2021","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","first-page":"1052","author":"Lu","year":"2019"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbab317","article-title":"Mol2Context-vec: learning molecular representation from context awareness for drug discovery","volume":"22","author":"Lv","year":"2021","journal-title":"Brief. Bioinformatics"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.neucom.2021.06.037","article-title":"Molecular graph enhanced transformer for retrosynthesis prediction","volume":"457","author":"Mao","year":"2021","journal-title":"Neurocomputing"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"1686","DOI":"10.1021\/ci300124c","article-title":"A Bayesian approach to in silico blood-brain barrier penetration modeling","volume":"52","author":"Martins","year":"2012","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"1077","DOI":"10.1351\/pac199466051077","article-title":"Glossary of terms used in physical organic chemistry (IUPAC recommendations 1994)","volume":"66","author":"Muller","year":"1994","journal-title":"Pure Appl. Chem"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-4-22","article-title":"Towards a universal smiles representation-a standard method to generate canonical smiles based on the InChi","volume":"4","author":"O\u2019Boyle","year":"2012","journal-title":"J. Cheminform"},{"key":"2023041408223923700_","author":"Ramsundar","year":"2015"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"058301","DOI":"10.1103\/PhysRevLett.108.058301","article-title":"Fast and accurate modeling of molecular atomization energies with machine learning","volume":"108","author":"Rupp","year":"2012","journal-title":"Phys. Rev. Lett"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process"},{"key":"2023041408223923700_","author":"Sennrich","year":"2015"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.neucom.2021.02.025","article-title":"Multi-view spectral graph convolution with consistent edge attention for molecular modeling","volume":"445","author":"Shang","year":"2021","journal-title":"Neurocomputing"},{"key":"2023041408223923700_","first-page":"429","author":"Sheng","year":"2019"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","article-title":"Zinc 15\u2013ligand discovery for everyone","volume":"55","author":"Sterling","year":"2015","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"1936","DOI":"10.1021\/acs.jcim.6b00290","article-title":"Computational modeling of \u03b2-secretase 1 (BACE-1) inhibitors using ligand based approaches","volume":"56","author":"Subramanian","year":"2016","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","first-page":"5998","author":"Vaswani","year":"2017"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1021\/acs.accounts.0c00699","article-title":"Applications of deep learning in molecule generation and molecular property prediction","volume":"54","author":"Walters","year":"2021","journal-title":"Acc. Chem. Res"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"3505","DOI":"10.1002\/jcc.21939","article-title":"Application of molecular dynamics simulations in molecular property prediction II: diffusion coefficient","volume":"32","author":"Wang","year":"2011","journal-title":"J. Comput. Chem"},{"key":"2023041408223923700_","first-page":"429","author":"Wang","year":"2019"},{"key":"2023041408223923700_","first-page":"31","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Model"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1021\/ci00062a008","article-title":"SMILES. 2. Algorithm for generation of unique smiles notation","volume":"29","author":"Weininger","year":"1989","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1039\/C7SC02664A","article-title":"MoleculeNet: a benchmark for molecular machine learning","volume":"9","author":"Wu","year":"2018","journal-title":"Chem. Sci"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","article-title":"A comprehensive survey on graph neural networks","volume":"32","author":"Wu","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"8749","DOI":"10.1021\/acs.jmedchem.9b00959","article-title":"Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism","volume":"63","author":"Xiong","year":"2020","journal-title":"J. Med. Chem"},{"key":"2023041408223923700_","first-page":"285","author":"Xu","year":"2017"},{"key":"2023041408223923700_","first-page":"404","author":"Zhang","year":"2018"},{"issue":"6","key":"2023041408223923700_","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbab152","article-title":"MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction","volume":"22","author":"Zhang","year":"2021","journal-title":"Brief. Bioinformatics"},{"key":"2023041408223923700_","doi-asserted-by":"crossref","first-page":"2981","DOI":"10.1093\/bioinformatics\/btab195","article-title":"FRaGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction","volume":"37","author":"Zhang","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041408223923700_","first-page":"1","article-title":"Motif-based graph self-supervised learning for molecular property prediction","author":"Zhang","year":"2021"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac550\/45474841\/btac550.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/19\/4573\/49884855\/btac550.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/19\/4573\/49884855\/btac550.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,1]],"date-time":"2024-10-01T16:55:25Z","timestamp":1727801725000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/19\/4573\/6663988"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,8,12]]},"references-count":46,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2022,9,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac550","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,10,1]]},"published":{"date-parts":[[2022,8,12]]}}}