{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T22:20:24Z","timestamp":1778797224269,"version":"3.51.4"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3015,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods in this area focus on measuring the structural similarities between chemical structures. However, traditional similarity measures are often too rigid or consider only global similarities between structures. The maximum common substructure (MCS) approach provides a more promising and flexible alternative for predicting bioactive compounds.<\/jats:p>\n               <jats:p>Results: In this article, a new backtracking algorithm for MCS is proposed and compared to global similarity measurements. Our algorithm provides high flexibility in the matching process, and it is very efficient in identifying local structural similarities. To predict and cluster biologically active compounds more efficiently, the concept of basis compounds is proposed that enables researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Support vector machines (SVMs) are used to test how the MCS-based similarity measure and the basis compound vectorization method perform on two empirically tested datasets. The test results show that MCS complements the well-known atom pair descriptor-based similarity measure. By combining these two measures, our SVM-based model predicts the biological activities of chemical compounds with higher specificity and sensitivity.<\/jats:p>\n               <jats:p>Contact: \u00a0ycao@cs.ucr.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn186","type":"journal-article","created":{"date-parts":[[2008,6,27]],"date-time":"2008-06-27T07:43:13Z","timestamp":1214552593000},"page":"i366-i374","source":"Crossref","is-referenced-by-count":180,"title":["A maximum common substructure-based algorithm for searching and predicting drug-like compounds"],"prefix":"10.1093","volume":"24","author":[{"given":"Yiqun","family":"Cao","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and Engineering and 2Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tao","family":"Jiang","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering and 2Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Girke","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering and 2Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,7,1]]},"reference":[{"key":"2023020210390624200_B1","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1214\/ss\/1009213288","article-title":"Sequential approach for identifying lead compounds in large chemical databases","volume":"16","author":"Abt","year":"2001","journal-title":"Statist. Sci"},{"key":"2023020210390624200_B2","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1136\/bmj.309.6947.102","article-title":"Diagnostic tests 2: predictive values","volume":"309","author":"Altman","year":"1994","journal-title":"Br. Med. J"},{"key":"2023020210390624200_B3","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/0020-0190(76)90049-1","article-title":"Subgraph isomorphism, matching relational structures and maximal cliques","volume":"4","author":"Barrow","year":"1976","journal-title":"Inf. Process. Lett"},{"key":"2023020210390624200_B4","doi-asserted-by":"crossref","first-page":"1089","DOI":"10.1109\/34.954600","article-title":"Efficient matching and indexing of graph models in content-based retrieval","volume":"23","author":"Berretti","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023020210390624200_B5","doi-asserted-by":"crossref","first-page":"115","DOI":"10.2174\/138620706775541882","article-title":"Comparison of methods for sequential screening of large compound sets","volume":"9","author":"Blower","year":"2006","journal-title":"Comb. Chem. High Throughput Screen"},{"key":"2023020210390624200_B6","first-page":"82","article-title":"Graph matching: theoretical foundations, algorithms and applications","author":"Bunke","year":"2000"},{"key":"2023020210390624200_B7","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1016\/S0167-8655(97)00179-7","article-title":"A graph distance metric based on the maximal common subgraph","volume":"19","author":"Bunke","year":"1998","journal-title":"Pattern Recognit. Lett"},{"key":"2023020210390624200_B8","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1002\/qsar.19960150103","article-title":"Using artificial neural networks to predict biological activity from simple molecular structural considerations","volume":"15","author":"Burden","year":"1996","journal-title":"Quant. Struct.-Act. Rel"},{"key":"2023020210390624200_B9","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1021\/ci00046a002","article-title":"Atom pairs as molecular features in structure-activity studies: definition and applications","volume":"25","author":"Carhart","year":"1985","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210390624200_B10","unstructured":"Chang\n              C\n            \n            \u00a0LinC\n          LIBSVM: a library for support vector machines\n          2001\n          Software available at http:\/\/www.csie.ntu.edu.tw\/cjlin\/libsvm"},{"key":"2023020210390624200_B11","doi-asserted-by":"crossref","first-page":"1407","DOI":"10.1021\/ci025531g","article-title":"Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients","volume":"42","author":"Chen","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210390624200_B12","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/nbt1273","article-title":"Structure-based maximal affinity model predicts small-molecule druggability","volume":"25","author":"Cheng","year":"2007","journal-title":"Nat. Biotechnol"},{"key":"2023020210390624200_B13","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511801389","volume-title":"An Introduction to Support Vector Machines and Other Kernel-based Learning Methods","author":"Christianini","year":"2000"},{"key":"2023020210390624200_B14","doi-asserted-by":"crossref","first-page":"7668","DOI":"10.1021\/ja00465a041","article-title":"Molecular structure comparison program for the identification of maximal common substructures","volume":"99","author":"Cone","year":"1977","journal-title":"J. Am. Chem. Soc"},{"key":"2023020210390624200_B15","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1142\/S0218001404003228","article-title":"Thirty years of graph matching in pattern recognition","volume":"18","author":"Conte","year":"2004","journal-title":"Inter. J. Pattern Recognit. Artif. Intell"},{"key":"2023020210390624200_B16","first-page":"1582","article-title":"Graph matching: a fast algorithm and its evaluation","volume-title":"Proceedings of the 14th International Conference Pattern Recognition","author":"Cordella","year":"1998"},{"key":"2023020210390624200_B17","first-page":"149","article-title":"An improved algorithm for matching large graphs","volume-title":"Proceedings of the 3rd IAPR TC-15 Workshop on Graphbased Representations in Pattern Recognition","author":"Cordella","year":"2001"},{"key":"2023020210390624200_B18","doi-asserted-by":"crossref","DOI":"10.1007\/978-94-011-1350-2","volume-title":"Molecular Similarity in Drug Design","author":"Dean","year":"1995"},{"key":"2023020210390624200_B19","doi-asserted-by":"crossref","first-page":"1036","DOI":"10.1109\/TKDE.2005.127","article-title":"Frequent Sub-structure-based approaches for classifying chemical compounds","volume":"17","author":"Deshpande","year":"2005","journal-title":"IEEE Trans. Knowled. Data Eng"},{"key":"2023020210390624200_B20","article-title":"e1071: misc functions of the department of Statistics (e1071)","author":"Dimitriadou","year":"2005"},{"key":"2023020210390624200_B21","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1038\/nature03192","article-title":"Chemical space and biology","volume":"432","author":"Dobson","year":"2004","journal-title":"Nature"},{"key":"2023020210390624200_B22","first-page":"439","article-title":"Consistent inexact graph matching applied to labelling coronarysegments in arteriograms. Pattern Recognition","volume-title":"Proceedings of the 11th Image, Speech and Signal Analysis (IAPR) International Conference on Patten Recognition","author":"Dumay","year":"1992"},{"key":"2023020210390624200_B23","first-page":"275","article-title":"Smart screening: approaches to efficient HTS","volume":"4","author":"Engels","year":"2001","journal-title":"Curr. Opin. Drug Discov. Devel"},{"key":"2023020210390624200_B24","volume-title":"Computers and Intractability: a Guide to the Theory of NP-Completeness","author":"Garey","year":"1979"},{"key":"2023020210390624200_B25","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1104\/pp.105.062687","article-title":"ChemMine. A compound mining database for chemical genomics","volume":"138","author":"Girke","year":"2005","journal-title":"Plant Physiol"},{"key":"2023020210390624200_B26","article-title":"Application of graph-based concept learning to the predictive toxicology domain","volume-title":"Proceedings of the Predictive Toxicology Challenge Workshop","author":"Gonzalez","year":"2001"},{"key":"2023020210390624200_B27","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1021\/ci00009a019","article-title":"Molecular substructure similarity searching:efficient retrieval in two-dimensional structure databases","volume":"32","author":"Hagadone","year":"1992","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210390624200_B28","first-page":"169","article-title":"Substructure discovery in the subdue system","volume":"94","author":"Holder","year":"1994","journal-title":"Proc. AAAI"},{"key":"2023020210390624200_B29","volume-title":"Daylight Theory Manual","author":"James","year":"1995"},{"key":"2023020210390624200_B30","volume-title":"Concepts and Applications of Molecular Similarity","author":"Johnson","year":"1990"},{"key":"2023020210390624200_B31","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1023\/A:1007967728701","article-title":"The discovery of indicator variables for QSAR using inductive logic programming","volume":"11","author":"King","year":"1997","journal-title":"J. Comput. Aided Mol. Des"},{"key":"2023020210390624200_B32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0304-3975(00)00286-3","article-title":"Enumerating all connected maximal common subgraphs in two graphs","volume":"250","author":"Koch","year":"2001","journal-title":"Theor. Comput. Sci"},{"key":"2023020210390624200_B33","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1007\/BF02575586","article-title":"A note on the derivation of maximal common subgraphs of two directed or undirected graphs","volume":"9","author":"Levi","year":"1973","journal-title":"Calcolo"},{"key":"2023020210390624200_B34","doi-asserted-by":"crossref","first-page":"1120","DOI":"10.1109\/34.954602","article-title":"Structural graph matching using the EM algorithm and singular value decomposition","volume":"23","author":"Luo","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023020210390624200_B35","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1002\/spe.4380120103","article-title":"Backtrack search algorithms and the maximal common subgraph problem","volume":"12","author":"McGregor","year":"1982","journal-title":"Software-Pract. Exper"},{"key":"2023020210390624200_B36","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1002\/qsar.2660110304","article-title":"Fuzzy adaptive least squares and its application to structure-activity studies","volume":"11","author":"Moriguchi","year":"1992","journal-title":"Quant. struct. activ. relation"},{"key":"2023020210390624200_B37","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1023\/A:1007601015854","article-title":"Robust classification for imprecise environments","volume":"42","author":"Provost","year":"2001","journal-title":"Mach. Learn"},{"key":"2023020210390624200_B38","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1021\/ci010381f","article-title":"Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm","volume":"42","author":"Raymond","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210390624200_B39","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1093\/comjnl\/45.6.631","article-title":"RASCAL:calculation of graph similarity using maximum common edge subgraphs","volume":"45","author":"Raymond","year":"2002","journal-title":"Comput. J"},{"key":"2023020210390624200_B40","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1016\/S1359-6446(02)02411-X","article-title":"Why do we need so many chemical similarity search methods?","volume":"7","author":"Sheridan","year":"2002","journal-title":"Drug Discov. Today"},{"key":"2023020210390624200_B41","doi-asserted-by":"crossref","first-page":"757","DOI":"10.1109\/TSMC.1979.4310127","article-title":"Error-correcting isomorphisms of attributed relational graphs for pattern analysis","volume":"9","author":"Tsai","year":"1979","journal-title":"IEEE Trans. Syst. Man Cybern"},{"key":"2023020210390624200_B42","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1109\/3477.604100","article-title":"Genetic-based search for error-correcting graph isomorphism","volume":"27","author":"Wang","year":"1997","journal-title":"IEEE Trans. Syst. Man Cybern. Part B"},{"issue":"(Database issue)","key":"2023020210390624200_B43","doi-asserted-by":"crossref","first-page":"D5","DOI":"10.1093\/nar\/gkl1031","article-title":"Database resources of the national center for biotechnology information","volume":"35","author":"Wheeler","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023020210390624200_B44","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1021\/ci9800211","article-title":"Chemical similarity searching","volume":"38","author":"Willett","year":"1998","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023020210390624200_B45","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1109\/34.601251","article-title":"Structural matching by discrete relaxation","volume":"19","author":"Wilson","year":"1997","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023020210390624200_B46","doi-asserted-by":"crossref","first-page":"766","DOI":"10.1145\/1066157.1066244","article-title":"Substructure similarity search in graph databases","volume-title":"SIGMOD'05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data","author":"Yan","year":"2005"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i366\/49050372\/bioinformatics_24_13_i366.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i366\/49050372\/bioinformatics_24_13_i366.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T12:22:12Z","timestamp":1675340532000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/13\/i366\/236402"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,1]]},"references-count":46,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2008,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn186","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,7,1]]},"published":{"date-parts":[[2008,7,1]]}}}