{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T09:29:29Z","timestamp":1769938169887,"version":"3.49.0"},"reference-count":70,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,10,22]],"date-time":"2019-10-22T00:00:00Z","timestamp":1571702400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,10,22]],"date-time":"2019-10-22T00:00:00Z","timestamp":1571702400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/EEI-ESS\/4923\/2014"],"award-info":[{"award-number":["PTDC\/EEI-ESS\/4923\/2014"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["SFRH\/BD\/111654\/2015"],"award-info":[{"award-number":["SFRH\/BD\/111654\/2015"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n              <jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space.<\/jats:p>\n              <\/jats:sec>\n              <jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling.<\/jats:p>\n              <\/jats:sec>\n              <jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s13321-019-0386-z","type":"journal-article","created":{"date-parts":[[2019,10,23]],"date-time":"2019-10-23T04:27:24Z","timestamp":1571804844000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["A visual approach for analysis and inference of molecular activity spaces"],"prefix":"10.1186","volume":"11","author":[{"given":"Samina","family":"Kausar","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3588-8746","authenticated-orcid":false,"given":"Andre O.","family":"Falcao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,10,22]]},"reference":[{"issue":"10","key":"386_CR1","doi-asserted-by":"publisher","first-page":"661","DOI":"10.2533\/chimia.2017.661","volume":"71","author":"M Awale","year":"2017","unstructured":"Awale M, Visini R, Probst D, Ar\u00fas-Pous J, Reymond J-L (2017) Chemical space: big data challenge for molecular diversity. CHIMIA Int J Chem 71(10):661\u2013666. \n                    https:\/\/doi.org\/10.2533\/chimia.2017.661","journal-title":"CHIMIA Int J Chem"},{"issue":"1","key":"386_CR2","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1039\/c0md00020e","volume":"1","author":"JL Reymond","year":"2010","unstructured":"Reymond JL, Van Deursen R, Blum LC, Ruddigkeit L (2010) Chemical space as a source for new drugs. Med Chem Comm 1(1):30\u201338. \n                    https:\/\/doi.org\/10.1039\/c0md00020e","journal-title":"Med Chem Comm"},{"issue":"7019","key":"386_CR3","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1038\/nature03192","volume":"432","author":"CM Dobson","year":"2004","unstructured":"Dobson CM (2004) Chemical space and biology. Nature 432(7019):824\u2013828. \n                    https:\/\/doi.org\/10.1038\/nature03192","journal-title":"Nature"},{"issue":"5","key":"386_CR4","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1007\/s10822-017-0019-4","volume":"31","author":"P Sidorov","year":"2017","unstructured":"Sidorov P, Viira B, Davioud-Charvet E, Maran U, Marcou G, Horvath D, Varnek A (2017) QSAR modeling and chemical space analysis of antimalarial compounds. J Comput Aided Mol Design 31(5):441\u2013451. \n                    https:\/\/doi.org\/10.1007\/s10822-017-0019-4","journal-title":"J Comput Aided Mol Design"},{"issue":"6","key":"386_CR5","doi-asserted-by":"publisher","first-page":"1286","DOI":"10.1021\/acs.jcim.7b00048","volume":"57","author":"J Ash","year":"2017","unstructured":"Ash J, Fourches D (2017) Characterizing the chemical space of ERK2 kinase inhibitors using descriptors computed from molecular dynamics trajectories. J Chem Inf Model 57(6):1286\u20131299. \n                    https:\/\/doi.org\/10.1021\/acs.jcim.7b00048","journal-title":"J Chem Inf Model"},{"issue":"7","key":"386_CR6","doi-asserted-by":"publisher","first-page":"605","DOI":"10.1080\/17460441.2018.1465926","volume":"13","author":"M Vogt","year":"2018","unstructured":"Vogt M (2018) Progress with modeling activity landscapes in drug discovery. Expert Opin Drug Discov 13(7):605\u2013615. \n                    https:\/\/doi.org\/10.1080\/17460441.2018.1465926","journal-title":"Expert Opin Drug Discov"},{"key":"386_CR7","doi-asserted-by":"publisher","DOI":"10.1039\/9781847558879","volume-title":"Chemoinformatics Approaches to Virtual Screening","year":"2008","unstructured":"Varnek A, Tropsha A (2008) Chemoinformatics approaches to virtual screening. Royal Society of Chemistry, Cambridge. \n                    https:\/\/doi.org\/10.1039\/9781847558879\n                    \n                  . \n                    http:\/\/ebook.rsc.org\/?"},{"issue":"910","key":"386_CR8","doi-asserted-by":"publisher","first-page":"1006","DOI":"10.1002\/qsar.200330831","volume":"22","author":"N Nikolova","year":"2003","unstructured":"Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity\u2014a review. QSAR Comb Sci 22(910):1006\u20131026. \n                    https:\/\/doi.org\/10.1002\/qsar.200330831","journal-title":"QSAR Comb Sci"},{"key":"386_CR9","volume-title":"Concepts and applications of molecular similarity","author":"MA Johnson","year":"1990","unstructured":"Johnson MA, Maggiora GM (1990) Concepts and applications of molecular similarity. Wiley, New York"},{"issue":"6","key":"386_CR10","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1021\/ci9800211","volume":"38","author":"P Willett","year":"1998","unstructured":"Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38(6):983\u2013996. \n                    https:\/\/doi.org\/10.1021\/ci9800211","journal-title":"J Chem Inf Comput Sci"},{"issue":"22","key":"386_CR11","doi-asserted-by":"publisher","first-page":"3204","DOI":"10.1039\/b409813g","volume":"2","author":"A Bender","year":"2004","unstructured":"Bender A, Glen RC (2004) Molecular similarity: a key technique in molecular informatics. Org Biomol Chem 2(22):3204\u20133218. \n                    https:\/\/doi.org\/10.1039\/b409813g","journal-title":"Org Biomol Chem"},{"issue":"8","key":"386_CR12","doi-asserted-by":"publisher","first-page":"3186","DOI":"10.1021\/jm401411z","volume":"57","author":"G Maggiora","year":"2014","unstructured":"Maggiora G, Vogt M, Stumpfe D, Bajorath J (2014) Molecular similarity in medicinal chemistry. J Med Chem 57(8):3186\u20133204. \n                    https:\/\/doi.org\/10.1021\/jm401411z","journal-title":"J Med Chem"},{"issue":"5\u20136","key":"386_CR13","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1016\/j.drudis.2007.01.011","volume":"12","author":"H Eckert","year":"2007","unstructured":"Eckert H, Bajorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12(5\u20136):225\u2013233. \n                    https:\/\/doi.org\/10.1016\/j.drudis.2007.01.011","journal-title":"Drug Discov Today"},{"issue":"2","key":"386_CR14","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1002\/wcms.23","volume":"1","author":"D Stumpfe","year":"2011","unstructured":"Stumpfe D, Bajorath J (2011) Similarity searching. Wiley Interdiscip Rev Comput Mol Sci 1(2):260\u2013282. \n                    https:\/\/doi.org\/10.1002\/wcms.23","journal-title":"Wiley Interdiscip Rev Comput Mol Sci"},{"key":"386_CR15","series-title":"Methods in molecular biology\u2122","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1385\/1-59259-802-1:001","volume-title":"Chemoinformatics","author":"GM Maggiora","year":"2004","unstructured":"Maggiora GM, Shanmugasundaram V (2004) Molecular similarity measures. In: Bajorath J (ed) Chemoinformatics. Methods in molecular biology\u2122,  vol 275. Humana Press, Totowa, NJ, pp. 1\u201350. \n                    https:\/\/doi.org\/10.1385\/1-59259-802-1:001"},{"key":"386_CR16","series-title":"Methods in Molecular Biology","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1007\/978-1-4939-6613-4_13","volume-title":"Bioinformatics","author":"J Bajorath","year":"2017","unstructured":"Bajorath J (2017) Molecular Similarity Concepts for Informatics Applications. In: Keith J (ed) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY, pp 231\u2013245. \n                    https:\/\/doi.org\/10.1007\/978-1-4939-6613-4_13"},{"issue":"10","key":"386_CR17","doi-asserted-by":"publisher","first-page":"2511","DOI":"10.1021\/ci400324u","volume":"53","author":"AL Teixeira","year":"2013","unstructured":"Teixeira AL, Falcao AO (2013) Noncontiguous atom matching structural similarity function. J Chem Inf Model 53(10):2511\u20132524. \n                    https:\/\/doi.org\/10.1021\/ci400324u","journal-title":"J Chem Inf Model"},{"issue":"1","key":"386_CR18","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1002\/wcms.5","volume":"1","author":"H-C Ehrlich","year":"2011","unstructured":"Ehrlich H-C, Rarey M (2011) Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. Wiley Interdiscip Rev Comput Mol Sci 1(1):68\u201379. \n                    https:\/\/doi.org\/10.1002\/wcms.5","journal-title":"Wiley Interdiscip Rev Comput Mol Sci"},{"issue":"7","key":"386_CR19","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1023\/A:1021271615909","volume":"16","author":"JW Raymond","year":"2002","unstructured":"Raymond JW, Willett P (2002) Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J Comput Aided Mol Des 16(7):521\u201333. \n                    https:\/\/doi.org\/10.1023\/A:1021271615909","journal-title":"J Comput Aided Mol Des"},{"issue":"4","key":"386_CR20","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1021\/ci00014a001","volume":"33","author":"JM Barnard","year":"1993","unstructured":"Barnard JM (1993) Substructure searching methods: old and new. J Chem Inf Model 33(4):532\u2013538. \n                    https:\/\/doi.org\/10.1021\/ci00014a001","journal-title":"J Chem Inf Model"},{"key":"386_CR21","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1021\/bk-2016-1222.ch012","volume-title":"Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: J\u00fcrgen Bajorath","author":"H\u00e9l\u00e9na A. Gaspar","year":"2016","unstructured":"Gaspar HA, Baskin II, Varnek A (2016) Visualization of a multidimensional descriptor space. ACS Symposium Series 1222. \n                    https:\/\/doi.org\/10.1021\/bk-2016-1222.ch012"},{"key":"386_CR22","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1007\/978-3-642-42054-2_77","volume-title":"Neural Information Processing. ICONIP 2013","author":"M Verleysen","year":"2013","unstructured":"Verleysen M, Lee JA (2013) Nonlinear Dimensionality Reduction for Visualization. In: Lee M, Hirose A, Hou ZG, Kil RM (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8226. Springer, Berlin, Heidelberg, pp 617\u2013622"},{"issue":"6","key":"386_CR23","doi-asserted-by":"publisher","first-page":"1045","DOI":"10.1039\/c6md00108d","volume":"7","author":"D Stumpfe","year":"2016","unstructured":"Stumpfe D, Bajorath J (2016) Recent developments in SAR visualization. Med Chem Comm 7(6):1045\u20131055. \n                    https:\/\/doi.org\/10.1039\/c6md00108d","journal-title":"Med Chem Comm"},{"issue":"3","key":"386_CR24","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1080\/00401706.1988.10488412","volume":"30","author":"Colin Goodall","year":"1988","unstructured":"Goodall C, Jolliffe IT (1988) Principal component analysis. Technometrics 30(3), 351. \n                    https:\/\/doi.org\/10.2307\/1270093\n                    \n                  . \n                    arXiv:1011.1669v3","journal-title":"Technometrics"},{"issue":"1","key":"386_CR25","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1021\/ci300535x","volume":"53","author":"L Ruddigkeit","year":"2013","unstructured":"Ruddigkeit L, Blum LC, Reymond J-L (2013) Visualization and virtual screening of the chemical universe database GDB-17. J Chem Inf Model 53(1):56\u201365. \n                    https:\/\/doi.org\/10.1021\/ci300535x","journal-title":"J Chem Inf Model"},{"issue":"2","key":"386_CR26","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1021\/ci300513m","volume":"53","author":"M Awale","year":"2013","unstructured":"Awale M, van Deursen R, Reymond J-L (2013) MQN-mapplet: visualization of chemical space with interactive maps of drugbank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 53(2):509\u2013518. \n                    https:\/\/doi.org\/10.1021\/ci300513m","journal-title":"J Chem Inf Model"},{"issue":"4","key":"386_CR27","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1007\/BF02288916","volume":"17","author":"WS Torgerson","year":"1952","unstructured":"Torgerson WS (1952) Multidimensional scaling: I. Theory and method. Psychometrika 17(4):401\u2013419. \n                    https:\/\/doi.org\/10.1007\/BF02288916","journal-title":"Psychometrika"},{"issue":"1","key":"386_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/BF02289565","volume":"29","author":"JB Kruskal","year":"1964","unstructured":"Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1\u201327. \n                    https:\/\/doi.org\/10.1007\/BF02289565","journal-title":"Psychometrika"},{"issue":"5","key":"386_CR29","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1109\/T-C.1969.222678","volume":"C\u201318","author":"JW Sammon","year":"1969","unstructured":"Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput C\u201318(5):401\u2013409. \n                    https:\/\/doi.org\/10.1109\/T-C.1969.222678\n                    \n                  \n                           \n                    arXiv: 1011.1669","journal-title":"IEEE Trans Comput"},{"key":"386_CR30","first-page":"857","volume-title":"Advances in neural information processing systems 15","author":"GE Hinton","year":"2003","unstructured":"Hinton GE, Roweis ST (2003) Stochastic neighbor embedding. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT Press, Cambridge, pp. 857\u2013864.  \n                    http:\/\/papers.nips.cc\/paper\/2276-stochastic-neighbor-embedding.pdf\n                    \n                  . Accessed 30 Sept 2018"},{"issue":"10","key":"386_CR31","doi-asserted-by":"publisher","first-page":"1215","DOI":"10.1002\/jcc.10234","volume":"24","author":"DK Agrafiotis","year":"2003","unstructured":"Agrafiotis DK (2003) Stochastic proximity embedding. J Comput Chem 24(10):1215\u20131221. \n                    https:\/\/doi.org\/10.1002\/jcc.10234","journal-title":"J Comput Chem"},{"issue":"9","key":"386_CR32","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.1109\/5.58325","volume":"78","author":"T Kohonen","year":"1990","unstructured":"Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464\u20131480. \n                    https:\/\/doi.org\/10.1109\/5.58325","journal-title":"Proc IEEE"},{"issue":"3\u20134","key":"386_CR33","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1002\/minf.201100163","volume":"31","author":"N Kireeva","year":"2012","unstructured":"Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, Varnek A (2012) Generative topographic mapping (gtm): Universal tool for data visualization, structure-activity modeling and dataset comparison. Mol Inform 31(3\u20134):301\u2013312. \n                    https:\/\/doi.org\/10.1002\/minf.201100163","journal-title":"Mol Inform"},{"issue":"23","key":"386_CR34","doi-asserted-by":"publisher","first-page":"8209","DOI":"10.1021\/jm100933w","volume":"53","author":"AM Wassermann","year":"2010","unstructured":"Wassermann AM, Wawer M, Bajorath J (2010) Activity landscape representations for structure-activity relationship analysis. J Med Chem 53(23):8209\u20138223. \n                    https:\/\/doi.org\/10.1021\/jm100933w","journal-title":"J Med Chem"},{"issue":"6","key":"386_CR35","doi-asserted-by":"publisher","first-page":"1021","DOI":"10.1021\/ci100091e","volume":"50","author":"L Peltason","year":"2010","unstructured":"Peltason L, Iyer P, Bajorath J (2010) Rationalizing three-dimensional activity landscapes and the influence of molecular representations on landscape topology and the formation of activity cliffs. J Chem Inf Model 50(6):1021\u20131033. \n                    https:\/\/doi.org\/10.1021\/ci100091e","journal-title":"J Chem Inf Model"},{"issue":"7","key":"386_CR36","doi-asserted-by":"publisher","first-page":"1833","DOI":"10.1021\/ci500110v","volume":"54","author":"AL Teixeira","year":"2014","unstructured":"Teixeira AL, Falcao AO (2014) Structural similarity based kriging for quantitative structure activity and property relationship modeling. J Chem Inf Model 54(7):1833\u20131849. \n                    https:\/\/doi.org\/10.1021\/ci500110v","journal-title":"J Chem Inf Model"},{"issue":"1","key":"386_CR37","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1021\/jm401120g","volume":"57","author":"D Stumpfe","year":"2014","unstructured":"Stumpfe D, Hu Y, Dimova D, Bajorath J (2014) Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 57(1):18\u201328. \n                    https:\/\/doi.org\/10.1021\/jm401120g","journal-title":"J Med Chem"},{"issue":"9","key":"386_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/molecules24091698","volume":"24","author":"S Kausar","year":"2019","unstructured":"Kausar S, Falcao AO (2019) Analysis and comparison of vector space and metric space representations in QSAR modeling. Molecules 24(9):1\u201322. \n                    https:\/\/doi.org\/10.3390\/molecules24091698","journal-title":"Molecules"},{"key":"386_CR39","first-page":"2579","volume":"9","author":"L van der Maaten","year":"2008","unstructured":"van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579\u20132605","journal-title":"J Mach Learn Res"},{"issue":"1","key":"386_CR40","doi-asserted-by":"publisher","first-page":"120","DOI":"10.2307\/2347507","volume":"37","author":"P. J. Green","year":"1988","unstructured":"Silverman B (1986) Density estimation for statistics and data analysis. Chapman and Hall 37(1):1\u201322. \n                    https:\/\/doi.org\/10.2307\/2347507\n                    \n                  \n                           \n                    arXiv:1011.1669v3","journal-title":"Applied Statistics"},{"issue":"May","key":"386_CR41","doi-asserted-by":"publisher","first-page":"162","DOI":"10.3389\/fchem.2018.00162","volume":"6","author":"A Yosipof","year":"2018","unstructured":"Yosipof A, Guedes RC, Garc\u00eda-Sosa AT (2018) Data mining and machine learning models for predicting drug likeness and their disease or organ category. Front Chem 6(May):162. \n                    https:\/\/doi.org\/10.3389\/fchem.2018.00162","journal-title":"Front Chem"},{"key":"386_CR42","first-page":"445","volume":"33","author":"J Jaworska","year":"2005","unstructured":"Jaworska J, Aldenberg T, Nikolova N (2005) Review of methods for QSAR applicability domain estimation by the training set. Atla 33:445\u2013459","journal-title":"Atla"},{"issue":"5","key":"386_CR43","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.3390\/molecules17054791","volume":"17","author":"F Sahigara","year":"2012","unstructured":"Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17(5):4791\u20134810. \n                    https:\/\/doi.org\/10.3390\/molecules17054791","journal-title":"Molecules"},{"issue":"1","key":"386_CR44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-016-0182-y","volume":"8","author":"N Aniceto","year":"2016","unstructured":"Aniceto N, Freitas AA, Bender A, Ghafourian T (2016) A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. J Cheminform 8(1):1\u201320. \n                    https:\/\/doi.org\/10.1186\/s13321-016-0182-y","journal-title":"J Cheminform"},{"issue":"34","key":"386_CR45","doi-asserted-by":"publisher","first-page":"3494","DOI":"10.2174\/138161207782794257","volume":"13","author":"A Tropsha","year":"2007","unstructured":"Tropsha A, Golbraikh A (2007) Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr Pharm Des 13(34):3494\u2013504. \n                    https:\/\/doi.org\/10.2174\/138161207782794257","journal-title":"Curr Pharm Des"},{"key":"386_CR46","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1016\/j.electacta.2013.08.022","volume":"111","author":"Zhisheng Lv","year":"2013","unstructured":"Venables WN, Ripley BD (2002) modern applied statistics with S. Springer. \n                    https:\/\/doi.org\/10.1016\/j.electacta.2013.08.022\n                    \n                  . \n                    http:\/\/stat.ethz.ch\/ R-manual\/R-patched\/library\/stats\/html\/prcomp.html","journal-title":"Electrochimica Acta"},{"key":"386_CR47","doi-asserted-by":"publisher","DOI":"10.1002\/9783527618279","volume-title":"Handbook of Chemoinformatics","year":"2003","unstructured":"Gasteiger J (2003) Handbook of chemoinformatics. vol. 1\u20134, pp. 1\u20131870. Wiley-VCH Verlag GmbH, Weinheim, Germany. \n                    https:\/\/doi.org\/10.1002\/9783527618279\n                    \n                  . \n                    arXiv:1011.1669v3"},{"key":"386_CR48","doi-asserted-by":"publisher","DOI":"10.1002\/9783527628766","volume-title":"Molecular descriptors for chemoinformatics. Methods and principles in medicinal chemistry","author":"R Todeschini","year":"2009","unstructured":"Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Methods and principles in medicinal chemistry. Wiley, Weinheim. \n                    https:\/\/doi.org\/10.1002\/9783527628766"},{"key":"386_CR49","unstructured":"James C, Weininger D, Delaney J (2011) Daylight theory manual version 4.9. \n                    http:\/\/www.daylight.com\/dayhtml\/doc\/theory\/\n                    \n                  . Accessed 30 Sept 2018"},{"issue":"6\u20137","key":"386_CR50","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1002\/minf.201400024","volume":"33","author":"P Willett","year":"2014","unstructured":"Willett P (2014) The calculation of molecular structural similarity: principles and practice. Mol Inform 33(6\u20137):403\u2013413. \n                    https:\/\/doi.org\/10.1002\/minf.201400024","journal-title":"Mol Inform"},{"issue":"2","key":"386_CR51","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1517\/17460441.2016.1117070","volume":"11","author":"I Muegge","year":"2016","unstructured":"Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert Opin Drug Discov 11(2):137\u2013148. \n                    https:\/\/doi.org\/10.1517\/17460441.2016.1117070","journal-title":"Expert Opin Drug Discov"},{"issue":"0","key":"386_CR52","doi-asserted-by":"publisher","first-page":"591","DOI":"10.12688\/f1000research.8357.2","volume":"5","author":"S Jasial","year":"2016","unstructured":"Jasial S, Hu Y, Vogt M, Bajorath J (2016) Activity-relevant similarity values for fingerprints and implications for similarity searching. F1000Res 5(0):591. \n                    https:\/\/doi.org\/10.12688\/f1000research.8357.2","journal-title":"F1000Res"},{"issue":"1","key":"386_CR53","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-015-0069-3","volume":"7","author":"D Bajusz","year":"2015","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform 7(1):1\u201313. \n                    https:\/\/doi.org\/10.1186\/s13321-015-0069-3","journal-title":"J Cheminform"},{"issue":"1","key":"386_CR54","first-page":"43","volume":"8","author":"C Seung-Seok","year":"2010","unstructured":"Seung-Seok C, Sung-Hyuk C, Tappert CC (2010) A survey of binary similarity and distance measures. J Syst Cybern Inform 8(1):43\u201348.","journal-title":"J Syst Cybern Inform"},{"key":"386_CR55","doi-asserted-by":"publisher","DOI":"10.2172\/7256702","volume-title":"Similarity indices I: what do they measure?","author":"JW Johnston","year":"1976","unstructured":"Johnston JW (1976) Similarity indices I: what do they measure?. Battelle Pacific Northwest Laboratories, Richland"},{"issue":"3","key":"386_CR56","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1021\/ci970437z","volume":"38","author":"DR Flower","year":"1998","unstructured":"Flower DR (1998) On the properties of bit string-based measures of chemical similarity. J Chem Inf Model 38(3):379\u2013386. \n                    https:\/\/doi.org\/10.1021\/ci970437z","journal-title":"J Chem Inf Model"},{"issue":"2","key":"386_CR57","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1021\/ci025592e","volume":"43","author":"VJ Gillet","year":"2003","unstructured":"Gillet VJ, Willett P, Bradshaw J (2003) Similarity searching using reduced graphs. J Chem Inf Comput Sci 43(2):338\u2013345. \n                    https:\/\/doi.org\/10.1021\/ci025592e","journal-title":"J Chem Inf Comput Sci"},{"issue":"17","key":"386_CR58","doi-asserted-by":"publisher","first-page":"903","DOI":"10.1016\/S1359-6446(02)02411-X","volume":"7","author":"RP Sheridan","year":"2002","unstructured":"Sheridan RP, Kearsley SK (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7(17):903\u2013911. \n                    https:\/\/doi.org\/10.1016\/S1359-6446(02)02411-X","journal-title":"Drug Discov Today"},{"issue":"5","key":"386_CR59","doi-asserted-by":"publisher","first-page":"1937","DOI":"10.1021\/ci0601261","volume":"46","author":"J Batista","year":"2006","unstructured":"Batista J, Godden JW, Bajorath J (2006) Assessment of molecular similarity from the analysis of randomly generated structural fragment populations. J Chem Inf Model 46(5):1937\u20131944. \n                    https:\/\/doi.org\/10.1021\/ci0601261","journal-title":"J Chem Inf Model"},{"issue":"5","key":"386_CR60","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.1021\/ci0400213","volume":"44","author":"DJ Graham","year":"2004","unstructured":"Graham DJ, Malarkey C, Schulmerich MV (2004) Information content in organic molecules: quantification and statistical structure via brownian processing. J Chem Inf Comput Sci 44(5):1601\u20131611. \n                    https:\/\/doi.org\/10.1021\/ci0400213","journal-title":"J Chem Inf Comput Sci"},{"issue":"2","key":"386_CR61","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1007\/BF00348251","volume":"9","author":"M Thorrington-Smith","year":"1971","unstructured":"Thorrington-Smith M (1971) West Indian Ocean phytoplankton: a numerical investigation of phytohydrographic regions and their characteristic phytoplankton associations. Mar Biol 9(2):115\u2013137. \n                    https:\/\/doi.org\/10.1007\/BF00348251","journal-title":"Mar Biol"},{"issue":"1","key":"386_CR62","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.chemolab.2005.11.001","volume":"87","author":"R Todeschini","year":"2007","unstructured":"Todeschini R, Ballabio D, Consonni V, Mauri A, Pavan M (2007) CAIMAN (Classification And Influence Matrix Analysis): a new approach to the classification based on leverage-scaled functions. Chemometri Intell Lab Syst 87(1):3\u201317. \n                    https:\/\/doi.org\/10.1016\/j.chemolab.2005.11.001","journal-title":"Chemometri Intell Lab Syst"},{"key":"386_CR63","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972733.19","volume-title":"Nonparametric density estimation: toward computational tractability","author":"A Gray","year":"2003","unstructured":"Gray A, Moore A (2003) Proceedings of the 2003 SIAM international conference on data mining. In: Barbara D, Kamath C (eds) Nonparametric density estimation: toward computational tractability. Society for Industrial and Applied Mathematics, Philadelphia. \n                    https:\/\/doi.org\/10.1137\/1.9781611972733.19"},{"key":"386_CR64","volume-title":"Pattern classification","author":"RO Duda","year":"2000","unstructured":"Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York","edition":"2"},{"key":"386_CR65","doi-asserted-by":"publisher","first-page":"1452","DOI":"10.1017\/S0269888904220161","volume-title":"Bioinformatics: the machine learning approach","author":"P Baldi","year":"2001","unstructured":"Baldi P, Brunak SS (2001) Bioinformatics: the machine learning approach. MIT Press, Cambridge, p 1452. \n                    https:\/\/doi.org\/10.1017\/S0269888904220161"},{"issue":"D1","key":"386_CR66","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1093\/nar\/gkw1074","volume":"45","author":"A Gaulton","year":"2017","unstructured":"Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibri\u00e1n-Uhalte E, Davies M, Dedman N, Karlsson A, Magari\u00f1os MP, Overington JP, Papadatos G, Smit I, Leach AR (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):945\u2013954. \n                    https:\/\/doi.org\/10.1093\/nar\/gkw1074","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"386_CR67","doi-asserted-by":"publisher","first-page":"213","DOI":"10.2174\/138620706776055539","volume":"9","author":"AZ Dudek","year":"2006","unstructured":"Dudek AZ, Arodz T, Galvez J (2006) Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb Chem High Throughput Screen 9(3):213\u2013228. \n                    https:\/\/doi.org\/10.2174\/138620706776055539","journal-title":"Comb Chem High Throughput Screen"},{"issue":"1","key":"386_CR68","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-017-0256-5","volume":"10","author":"S Kausar","year":"2018","unstructured":"Kausar S, Falcao AO (2018) An automated framework for QSAR model building. J Cheminform 10(1):1. \n                    https:\/\/doi.org\/10.1186\/s13321-017-0256-5","journal-title":"J Cheminform"},{"key":"386_CR69","series-title":"Lecture Notes in Physics","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74686-7","volume-title":"Computational Many-Particle Physics","year":"2008","unstructured":"R Development Core Team, R.: R: a language and environment for statistical computing (2011). \n                    https:\/\/doi.org\/10.1007\/978-3-540-74686-7"},{"issue":"3\/4","key":"386_CR70","doi-asserted-by":"publisher","first-page":"325","DOI":"10.2307\/2333639","volume":"53","author":"JC Gower","year":"1966","unstructured":"Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53(3\/4):325\u2013328. \n                    https:\/\/doi.org\/10.2307\/2333639","journal-title":"Biometrika"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-019-0386-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13321-019-0386-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-019-0386-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,10,20]],"date-time":"2020-10-20T23:04:53Z","timestamp":1603235093000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-019-0386-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,22]]},"references-count":70,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["386"],"URL":"https:\/\/doi.org\/10.1186\/s13321-019-0386-z","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,22]]},"assertion":[{"value":"1 October 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 October 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 October 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"63"}}