{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T20:57:25Z","timestamp":1776200245802,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2013,5,30]],"date-time":"2013-05-30T00:00:00Z","timestamp":1369872000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model\u2019s Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model\u2019s AD has yet been recognized.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>This study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits <jats:italic>k<\/jats:italic>-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter <jats:italic>k<\/jats:italic>; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model\u2019s AD for reliable predictions.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1758-2946-5-27","type":"journal-article","created":{"date-parts":[[2013,5,30]],"date-time":"2013-05-30T12:14:32Z","timestamp":1369916072000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":84,"title":["Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions"],"prefix":"10.1186","volume":"5","author":[{"given":"Faizan","family":"Sahigara","sequence":"first","affiliation":[]},{"given":"Davide","family":"Ballabio","sequence":"additional","affiliation":[]},{"given":"Roberto","family":"Todeschini","sequence":"additional","affiliation":[]},{"given":"Viviana","family":"Consonni","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,5,30]]},"reference":[{"key":"466_CR1","unstructured":"REACH. European Community Regulation on chemicals and their safe use. http:\/\/ec.europa.eu\/environment\/chemicals\/reach\/reach_intro.htm,"},{"key":"466_CR2","volume-title":"The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance. ECB Report EUR 21866 EN, 95pp","author":"AP Worth","year":"2005","unstructured":"Worth AP, Bassan A, Gallegos A, Netzeva TI, Patlewicz G, Pavan M, Tsakovska I, Vracko M: The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance. ECB Report EUR 21866 EN, 95pp. 2005, Ispra, Italy: European Commission, Joint Research Centre"},{"key":"466_CR3","unstructured":"OECD. Quantitative Structure-Activity Relationships Project. http:\/\/www.oecd.org\/document\/23\/0,3746,en_2649_34377_33957015_1_1_1_1,00.html,"},{"key":"466_CR4","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1080\/10629360412331297371","volume":"15","author":"AP Worth","year":"2004","unstructured":"Worth AP, van Leeuwen CJ, Hartung T: The prospects for using (Q)SARs in a changing political environment: high expectations and a key role for the Commission\u2019s Joint Research Centre. SAR QSAR Environ Res. 2004, 15: 331-343.","journal-title":"SAR QSAR Environ Res"},{"key":"466_CR5","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1177\/026119290503300510","volume":"33","author":"N Nikolova-Jeliazkova","year":"2005","unstructured":"Nikolova-Jeliazkova N, Jaworska J: An approach to determining applicability domains for QSAR group contribution models: an analysis of SRC KOWWIN. Altern Lab Anim. 2005, 33: 461-470.","journal-title":"Altern Lab Anim"},{"key":"466_CR6","doi-asserted-by":"publisher","first-page":"1912","DOI":"10.1021\/ci049782w","volume":"44","author":"RP Sheridan","year":"2004","unstructured":"Sheridan RP, Feuston BP, Maiorov VN, Kearsley S: Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inf Comput Sci. 2004, 44: 1912-1928.","journal-title":"J Chem Inf Comput Sci"},{"key":"466_CR7","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.3390\/molecules17054791","volume":"17","author":"F Sahigara","year":"2012","unstructured":"Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R: Comparison of different approaches to define the applicability domain of QSAR models. Molecules. 2012, 17: 4791-4810.","journal-title":"Molecules"},{"key":"466_CR8","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1177\/026119290503300209","volume":"33","author":"TI Netzeva","year":"2005","unstructured":"Netzeva TI, Worth A, Aldenberg T, Benigni R, Cronin MT, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz GY, Perkins R, Roberts D, Schultz T, Stanton DW, van de Sandt JJ, Tong W, Veith G, Yang C: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern Lab Anim. 2005, 33: 155-173.","journal-title":"Altern Lab Anim"},{"key":"466_CR9","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1177\/026119290503300508","volume":"33","author":"J Jaworska","year":"2005","unstructured":"Jaworska J, Nikolova-Jeliazkova N, Aldenberg T: QSAR applicabilty domain estimation by projection of the training set descriptor space: A review. Altern Lab Anim. 2005, 33: 445-459.","journal-title":"Altern Lab Anim"},{"key":"466_CR10","doi-asserted-by":"publisher","first-page":"839","DOI":"10.1021\/ci0500381","volume":"45","author":"S Dimitrov","year":"2005","unstructured":"Dimitrov S, Dimitrova G, Pavlov T, Dimitrova N, Patlewicz G, Niemela J, Mekenyan OA: Stepwise approach for defining the applicability domain of SAR and QSAR models. J Chem Inf Model. 2005, 45: 839-849.","journal-title":"J Chem Inf Model"},{"key":"466_CR11","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1080\/00401706.1977.10489521","volume":"19","author":"L Breiman","year":"1977","unstructured":"Breiman L, Meisel W, Purcell E: Variable kernel estimates of multivariate densities. Technometrics. 1977, 19: 135-144.","journal-title":"Technometrics"},{"key":"466_CR12","doi-asserted-by":"publisher","first-page":"6724","DOI":"10.1021\/es049665h","volume":"38","author":"AH Asikainen","year":"2004","unstructured":"Asikainen AH, Ruuskanen J, Tuppurainen KA: Consensus kNN QSAR: a versatile method for predicting the estrogenic activity of organic compounds in silico. A comparative study with five estrogen receptors and a large, diverse set of ligands. Environ Sci Technol. 2004, 38: 6724-6729.","journal-title":"Environ Sci Technol"},{"key":"466_CR13","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1002\/minf.201000061","volume":"29","author":"A Tropsha","year":"2010","unstructured":"Tropsha A: Best practices for QSAR model development, validation, and exploitation. Mol Inf. 2010, 29: 476-488.","journal-title":"Mol Inf"},{"key":"466_CR14","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1023\/A:1025338411016","volume":"17","author":"W Cede\u00f1o","year":"2003","unstructured":"Cede\u00f1o W, Agrafiotis DK: Using particle swarms for the development of QSAR models based on K-nearest neighbor and kernel regression. J Comput Aided Mol Des. 2003, 17: 255-263.","journal-title":"J Comput Aided Mol Des"},{"key":"466_CR15","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1023\/A:1025386326946","volume":"17","author":"A Golbraikh","year":"2003","unstructured":"Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A: Rational selection of training and test sets for the development of validated QSAR models. J Comput Aided Mol Des. 2003, 17: 241-253.","journal-title":"J Comput Aided Mol Des"},{"key":"466_CR16","doi-asserted-by":"publisher","first-page":"777","DOI":"10.1021\/ci049628+","volume":"45","author":"P Itskowitz","year":"2005","unstructured":"Itskowitz P, Tropsha A: k nearest neighbors QSAR modeling as a variational problem: theory and applications. J Chem Inf Model. 2005, 45: 777-785.","journal-title":"J Chem Inf Model"},{"key":"466_CR17","doi-asserted-by":"publisher","first-page":"2412","DOI":"10.1021\/ci060149f","volume":"46","author":"F Nigsch","year":"2006","unstructured":"Nigsch F, Bender A, van Buuren B, Tissen J, Nigsch E, Mitchell JB: Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J Chem Inf Model. 2006, 46: 2412-2422.","journal-title":"J Chem Inf Model"},{"key":"466_CR18","doi-asserted-by":"publisher","first-page":"1733","DOI":"10.1021\/ci800151m","volume":"48","author":"IV Tetko","year":"2008","unstructured":"Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A: Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J Chem Inf Model. 2008, 48: 1733-1746.","journal-title":"J Chem Inf Model"},{"key":"466_CR19","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-3324-9","volume-title":"Density Estimation for Statistics and Data Analysis","author":"BW Silverman","year":"1986","unstructured":"Silverman BW: Density Estimation for Statistics and Data Analysis. 1986, London, UK: Chapman and Hall"},{"key":"466_CR20","doi-asserted-by":"publisher","first-page":"1701","DOI":"10.1016\/j.chemosphere.2008.09.033","volume":"73","author":"C Zhao","year":"2008","unstructured":"Zhao C, Boriani E, Chana A, Roncaglioni A, Benfenati E: A new hybrid system of QSAR models for predicting bioconcentration factors (BCF). Chemosphere. 2008, 73: 1701-1707.","journal-title":"Chemosphere"},{"issue":"Suppl 1","key":"466_CR21","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/1752-153X-4-S1-S1","volume":"4","author":"A Lombardo","year":"2010","unstructured":"Lombardo A, Roncaglioni A, Boriani E, Milan C, Benfenati E: Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem Cent J. 2010, 4 (Suppl 1): S1-10.1186\/1752-153X-4-S1-S1.","journal-title":"Chem Cent J"},{"key":"466_CR22","volume-title":"Graphical methods for Data Analysis","author":"JM Chambers","year":"1983","unstructured":"Chambers JM, Cleveland WS, Kleiner B, Tukey PA: Graphical methods for Data Analysis. 1983, Pacific Grove, CA: Wadsworth & Brooks\/Cole"},{"key":"466_CR23","unstructured":"Box plot \u2013 MATLAB. http:\/\/www.mathworks.it\/it\/help\/stats\/boxplot.html,"},{"key":"466_CR24","doi-asserted-by":"publisher","first-page":"1669","DOI":"10.1021\/ci900115y","volume":"49","author":"V Consonni","year":"2009","unstructured":"Consonni V, Ballabio D, Todeschini R: Comments on the definition of the Q2 parameter for QSAR validation. J Chem Inf Model. 2009, 49: 1669-1678.","journal-title":"J Chem Inf Model"},{"key":"466_CR25","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1002\/cem.1290","volume":"24","author":"V Consonni","year":"2010","unstructured":"Consonni V, Ballabio D, Todeschini R: Evaluation of model predictive ability by external validation techniques. J Chemometr. 2010, 24: 194-201.","journal-title":"J Chemometr"},{"key":"466_CR26","unstructured":"DRAGON (Software for Molecular Descriptor Calculations). Talete srl, Milano, Italy. http:\/\/www.talete.mi.it,"},{"key":"466_CR27","volume-title":"V-PARVUS software, User manual","author":"M Forina","year":"2004","unstructured":"Forina M, Lanteri S, Armanino C, Cerrato Oliveros C, Casolino C: V-PARVUS software, User manual. 2004, http:\/\/parvus@difar.unige.it,"},{"key":"466_CR28","unstructured":"MATLAB. The Language of Technical Computing. http:\/\/www.mathworks.com\/products\/matlab\/,"},{"key":"466_CR29","volume-title":"Anal Chim Acta","author":"R Todeschini","year":"2013","unstructured":"Todeschini R, Ballabio D, Consonni V, Sahigara F, Filzmoser P: Locally-centred Mahalanobis distance: a new distance measure with salient features towards outlier detection. Anal Chim Acta. 2013, 10.1016\/j.aca.2013.04.034."}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-5-27.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1758-2946-5-27\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-5-27.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T19:54:18Z","timestamp":1630612458000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/1758-2946-5-27"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,5,30]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["466"],"URL":"https:\/\/doi.org\/10.1186\/1758-2946-5-27","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,5,30]]},"assertion":[{"value":"13 February 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 May 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 May 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"27"}}