{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T04:03:19Z","timestamp":1775102599245,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Bogazici University Research Fund"},{"DOI":"10.13039\/501100000544","name":"BAP","doi-asserted-by":"publisher","award":["12304"],"award-info":[{"award-number":["12304"]}],"id":[{"id":"10.13039\/501100000544","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, Basic local alignment tool and ProtVec, and two compound fingerprint-based protein representation methods are compared.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We showed that ligand-based protein representation, which uses only SMILES strings of the ligands that proteins bind to, performs as well as protein sequence-based representation methods in protein clustering. The results suggest that ligand-based protein description can be an alternative to the traditional sequence or structure-based representation of proteins and this novel approach can be applied to different bioinformatics problems such as prediction of new protein\u2013ligand interactions and protein function annotation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/hkmztrk\/SMILESVecProteinRepresentation<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty287","type":"journal-article","created":{"date-parts":[[2018,4,13]],"date-time":"2018-04-13T19:27:15Z","timestamp":1523647635000},"page":"i295-i303","source":"Crossref","is-referenced-by-count":41,"title":["A novel methodology on distributed representations of proteins using their interacting ligands"],"prefix":"10.1093","volume":"34","author":[{"given":"Hakime","family":"\u00d6zt\u00fcrk","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Bogazici University, Istanbul, Turkey"}]},{"given":"Elif","family":"Ozkirimli","sequence":"additional","affiliation":[{"name":"Department of Chemical Engineering, Bogazici University, Istanbul, Turkey"}]},{"given":"Arzucan","family":"\u00d6zg\u00fcr","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Bogazici University, Istanbul, Turkey"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604244664400_bty287-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023051604244664400_bty287-B2","doi-asserted-by":"crossref","first-page":"e0141287.","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PloS One"},{"key":"2023051604244664400_bty287-B3","doi-asserted-by":"crossref","DOI":"10.1002\/9780470567623","volume-title":"Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery","author":"Balakin","year":"2009"},{"key":"2023051604244664400_bty287-B4","doi-asserted-by":"crossref","first-page":"34.","DOI":"10.1186\/s12859-014-0445-4","article-title":"Evaluation and improvements of clustering algorithms for detecting remote homologous protein families","volume":"16","author":"Bernardes","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023051604244664400_bty287-B100","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/S1574-1400(08)00012-1","article-title":"PubChem: integrated platform of small molecules and biological activities","volume":"4","author":"Bolton","year":"2008","journal-title":"Annu. Rep. Comput. Chem"},{"key":"2023051604244664400_bty287-B5","doi-asserted-by":"crossref","first-page":"3692","DOI":"10.1093\/nar\/gkg600","article-title":"Svm-prot: web-based support vector machine software for functional classification of a protein from its primary sequence","volume":"31","author":"Cai","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023051604244664400_bty287-B6","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1080\/1062936X.2011.645874","article-title":"In silico toxicity prediction by support vector machine and smiles representation-based string kernel","volume":"23","author":"Cao","year":"2012","journal-title":"SAR QSAR Environ. Res"},{"key":"2023051604244664400_bty287-B7","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.ymeth.2015.09.011","article-title":"Integrated protein function prediction by mining function associations, sequences, and protein\u2013protein and gene\u2013gene interaction networks","volume":"93","author":"Cao","year":"2016","journal-title":"Methods"},{"key":"2023051604244664400_bty287-B8","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1016\/j.jmb.2016.11.023","article-title":"Scope: manual curation and artifact removal in the structural classification of proteins\u2013extended database","volume":"429","author":"Chandonia","year":"2017","journal-title":"J. Mol. Biol"},{"key":"2023051604244664400_bty287-B9","doi-asserted-by":"crossref","first-page":"S8.","DOI":"10.1186\/1471-2164-15-S9-S8","article-title":"Homopharma: a new concept for exploring the molecular binding mechanisms and drug repurposing","volume":"15","author":"Chiu","year":"2014","journal-title":"BMC Genomics"},{"key":"2023051604244664400_bty287-B10","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo-amino acid composition","volume":"43","author":"Chou","year":"2001","journal-title":"Proteins"},{"key":"2023051604244664400_bty287-B11","doi-asserted-by":"crossref","first-page":"3241","DOI":"10.1093\/bioinformatics\/btt547","article-title":"Bioservices: a common python package to access biological web services programmatically","volume":"29","author":"Cokelaer","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051604244664400_bty287-B12","doi-asserted-by":"crossref","first-page":"W612","DOI":"10.1093\/nar\/gkv352","article-title":"Chembl web services: streamlining access to drug discovery data and utilities","volume":"43","author":"Davies","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051604244664400_bty287-B13","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.patrec.2016.06.012","article-title":"Representation learning for very short texts using weighted word embedding aggregation","volume":"80","author":"De Boom","year":"2016","journal-title":"Pattern Recogn. Lett"},{"key":"2023051604244664400_bty287-B14","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/nar\/30.7.1575","article-title":"An efficient algorithm for large-scale detection of protein families","volume":"30","author":"Enright","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023051604244664400_bty287-B15","doi-asserted-by":"crossref","first-page":"D304","DOI":"10.1093\/nar\/gkt1240","article-title":"Scope: structural classification of proteins\u2013extended, integrating scop and astral data and classification of new structures","volume":"42","author":"Fox","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051604244664400_bty287-B16","author":"Frasca","year":"2017"},{"key":"2023051604244664400_bty287-B17","first-page":"D1100","author":"Gaulton","year":"2011"},{"key":"2023051604244664400_bty287-B18","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1021\/ci8000259","article-title":"Quantifying the relationships among drug classes","volume":"48","author":"Hert","year":"2008","journal-title":"J. Chem. Inform. Model"},{"key":"2023051604244664400_bty287-B19","first-page":"615","volume-title":"Nat. Rev. Genet","author":"Hu","year":"2016"},{"key":"2023051604244664400_bty287-B20","first-page":"1","volume-title":"Computational Intelligence and Cybernetics (CYBERNETICSCOM), 2013 IEEE International Conference on","author":"Iqbal","year":"2013"},{"key":"2023051604244664400_bty287-B21","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1021\/acs.jcim.7b00616","article-title":"Mol2vec: unsupervised machine learning approach with chemical intuition","volume":"58","author":"Jaeger","year":"2018","journal-title":"J. Chem. Inform. Model"},{"key":"2023051604244664400_bty287-B22","doi-asserted-by":"crossref","first-page":"43904.","DOI":"10.1038\/srep43904","article-title":"Mechanism of error-free dna synthesis across n1-methyl-deoxyadenosine by human dna polymerase-\u03b9","volume":"7","author":"Jain","year":"2017","journal-title":"Sci. Rep"},{"key":"2023051604244664400_bty287-B23","volume-title":"International Conference on Learning Representations, ICLR 2016 - Workshop Track","author":"Jastrz\u0119bski","year":"2016"},{"key":"2023051604244664400_bty287-B24","doi-asserted-by":"crossref","first-page":"197.","DOI":"10.1038\/nbt1284","article-title":"Relating protein pharmacology by ligand chemistry","volume":"25","author":"Keiser","year":"2007","journal-title":"Nat. Biotechnol"},{"key":"2023051604244664400_bty287-B25","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1016\/S0969-2126(98)00089-6","article-title":"Protein folds and functions","volume":"6","author":"Martin","year":"1998","journal-title":"Structure"},{"key":"2023051604244664400_bty287-B26","first-page":"3111","volume-title":"Advances in Neural Information Processing Systems","author":"Mikolov","year":"2013"},{"key":"2023051604244664400_bty287-B101","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023051604244664400_bty287-B27","doi-asserted-by":"crossref","first-page":"46.","DOI":"10.1186\/s12859-016-0890-3","article-title":"A multiple kernel learning algorithm for drug-target interaction prediction","volume":"17","author":"Nascimento","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023051604244664400_bty287-B29","doi-asserted-by":"crossref","first-page":"e0160098.","DOI":"10.1371\/journal.pone.0160098","article-title":"Ligand similarity complements sequence, physical interaction, and co-expression for gene function prediction","volume":"11","author":"O\u2019meara","year":"2016","journal-title":"PloS One"},{"key":"2023051604244664400_bty287-B30","doi-asserted-by":"crossref","first-page":"e0117874.","DOI":"10.1371\/journal.pone.0117874","article-title":"Classification of beta-lactamases and penicillin binding proteins using ligand-centric network models","volume":"10","author":"\u00d6zt\u00fcrk","year":"2015","journal-title":"PloS One"},{"key":"2023051604244664400_bty287-B31","doi-asserted-by":"crossref","first-page":"128.","DOI":"10.1186\/s12859-016-0977-x","article-title":"A comparative study of smiles-based compound similarity functions for drug-target interaction prediction","volume":"17","author":"\u00d6zt\u00fcrk","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023051604244664400_bty287-B32","first-page":"361","article-title":"The chembl database: a taster for medicinal chemists","volume":"6","author":"Papadatos","year":"2014","journal-title":"Future"},{"key":"2023051604244664400_bty287-B33","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1098\/rspl.1895.0041","article-title":"Note on regression and inheritance in the case of two parents","volume":"58","author":"Pearson","year":"1895","journal-title":"Proc. Roy. Soc. Lond"},{"key":"2023051604244664400_bty287-B34","doi-asserted-by":"crossref","first-page":"15","DOI":"10.3389\/fchem.2016.00015","article-title":"How reliable are ligand-centric methods for target fishing?","volume":"4","author":"Pe\u00f3n","year":"2016","journal-title":"Front. Chem"},{"key":"2023051604244664400_bty287-B35","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1016\/j.phrs.2016.06.018","article-title":"Network pharmacology of cancer: from understanding of complex interactomes to the design of multi-target specific therapeutics from nature","volume":"111","author":"Poornima","year":"2016","journal-title":"Pharmacol. Res"},{"key":"2023051604244664400_bty287-B36","first-page":"45","article-title":"Software Framework for Topic Modelling with Large Corpora","volume-title":"Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks","author":"\u0158eh\u016f\u0159ek","year":"2010"},{"key":"2023051604244664400_bty287-B37","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inform. Model"},{"key":"2023051604244664400_bty287-B38","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1016\/j.molmed.2014.10.002","article-title":"A network approach to clinical intervention in neurodegenerative diseases","volume":"20","author":"Santiago","year":"2014","journal-title":"Trends Mol. Med"},{"key":"2023051604244664400_bty287-B39","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1002\/minf.201400066","article-title":"Benchmarking a wide range of chemical descriptors for drug-target interaction prediction using a chemogenomic approach","volume":"33","author":"Sawada","year":"2014","journal-title":"Mol. Inform"},{"key":"2023051604244664400_bty287-B40","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1038\/nchembio.1199","article-title":"Target identification and mechanism of action in chemical biology and drug discovery","volume":"9","author":"Schenone","year":"2013","journal-title":"Nat. Chem. Biol"},{"key":"2023051604244664400_bty287-B41","doi-asserted-by":"crossref","first-page":"1979","DOI":"10.1021\/ci400206h","article-title":"Smifp (smiles fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules","volume":"53","author":"Schwartz","year":"2013","journal-title":"J. Chem. Inform. Model"},{"key":"2023051604244664400_bty287-B42","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.ymeth.2015.04.036","article-title":"Predicting drug\u2013target interaction for new drugs using enhanced similarity measures and super-target clustering","volume":"83","author":"Shi","year":"2015","journal-title":"Methods"},{"key":"2023051604244664400_bty287-B43","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1021\/ci0496797","article-title":"Lingo, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities","volume":"45","author":"Vidal","year":"2005","journal-title":"J. Chem. Inform. Model"},{"key":"2023051604244664400_bty287-B45","doi-asserted-by":"crossref","first-page":"33.","DOI":"10.1186\/s13321-017-0220-4","article-title":"The chemistry development kit (cdk) v2. 0: atom typing, depiction, molecular formulas, and substructure searching","volume":"9","author":"Willighagen","year":"2017","journal-title":"J. Cheminform"},{"key":"2023051604244664400_bty287-B46","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1038\/nmeth0610-419","article-title":"Partitioning biological data with transitivity clustering","volume":"7","author":"Wittkop","year":"2010","journal-title":"Nat. Methods"},{"key":"2023051604244664400_bty287-B47","doi-asserted-by":"crossref","first-page":"5597","DOI":"10.1158\/0008-5472.CAN-04-0603","article-title":"Altered dna polymerase \u03b9 expression in breast cancer cells leads to a reduction in dna replication fidelity and a higher rate of mutagenesis","volume":"64","author":"Yang","year":"2004","journal-title":"Cancer Res"},{"key":"2023051604244664400_bty287-B48","doi-asserted-by":"crossref","first-page":"32274.","DOI":"10.18632\/oncotarget.8580","article-title":"Dna polymerase iota (pol \u03b9) promotes invasion and metastasis of esophageal squamous cell carcinoma","volume":"7","author":"Zou","year":"2016","journal-title":"Oncotarget"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i295\/50316221\/bioinformatics_34_13_i295.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i295\/50316221\/bioinformatics_34_13_i295.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T04:28:06Z","timestamp":1684211286000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i295\/5045707"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":48,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty287","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}