{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T00:32:50Z","timestamp":1771633970237,"version":"3.50.1"},"reference-count":77,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2023,12,18]],"date-time":"2023-12-18T00:00:00Z","timestamp":1702857600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"MESRI (Minist\u00e8re de l\u2019Enseignement sup\u00e9rieur, de la Recherche et de l\u2019Innovation)"},{"name":"Institute Carnot ICEEL"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This article investigates the applicability domain (AD) of machine learning (ML) models trained on high-dimensional data, for the prediction of the ideal gas enthalpy of formation and entropy of molecules via descriptors. The AD is crucial as it describes the space of chemical characteristics in which the model can make predictions with a given reliability. This work studies the AD definition of a ML model throughout its development procedure: during data preprocessing, model construction and model deployment. Three AD definition methods, commonly used for outlier detection in high-dimensional problems, are compared: isolation forest (iForest), random forest prediction confidence (RF confidence) and k-nearest neighbors in the 2D projection of descriptor space obtained via t-distributed stochastic neighbor embedding (tSNE2D\/kNN). These methods compute an anomaly score that can be used instead of the distance metrics of classical low-dimension AD definition methods, the latter being generally unsuitable for high-dimensional problems. Typically, in low- (high-) dimensional problems, a molecule is considered to lie within the AD if its distance from the training domain (anomaly score) is below a given threshold. During data preprocessing, the three AD definition methods are used to identify outlier molecules and the effect of their removal is investigated. A more significant improvement of model performance is observed when outliers identified with RF confidence are removed (e.g., for a removal of 30% of outliers, the MAE (Mean Absolute Error) of the test dataset is divided by 2.5, 1.6 and 1.1 for RF confidence, iForest and tSNE2D\/kNN, respectively). While these three methods identify X-outliers, the effect of other types of outliers, namely Model-outliers and y-outliers, is also investigated. In particular, the elimination of X-outliers followed by that of Model-outliers enables us to divide MAE and RMSE (Root Mean Square Error) by 2 and 3, respectively, while reducing overfitting. The elimination of y-outliers does not display a significant effect on the model performance. During model construction and deployment, the AD serves to verify the position of the test data and of different categories of molecules with respect to the training data and associate this position with their prediction accuracy. For the data that are found to be close to the training data, according to RF confidence, and display high prediction errors, tSNE 2D representations are deployed to identify the possible sources of these errors (e.g., representation of the chemical information in the training data).<\/jats:p>","DOI":"10.3390\/a16120573","type":"journal-article","created":{"date-parts":[[2023,12,18]],"date-time":"2023-12-18T10:04:47Z","timestamp":1702893887000},"page":"573","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["On the Development of Descriptor-Based Machine Learning Models for Thermodynamic Properties: Part 2\u2014Applicability Domain and Outliers"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9743-9153","authenticated-orcid":false,"given":"Cindy","family":"Trinh","sequence":"first","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LRGP, F-54001 Nancy, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4013-9336","authenticated-orcid":false,"given":"Silvia","family":"Lasala","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LRGP, F-54001 Nancy, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2155-098X","authenticated-orcid":false,"given":"Olivier","family":"Herbinet","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LRGP, F-54001 Nancy, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7411-958X","authenticated-orcid":false,"given":"Dimitrios","family":"Meimaroglou","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Lorraine, CNRS, LRGP, F-54001 Nancy, France"}]}],"member":"1968","published-online":{"date-parts":[[2023,12,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1177\/026119290503300209","article-title":"Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships","volume":"33","author":"Netzeva","year":"2005","journal-title":"ATLA Altern. Lab. Anim."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"061009","DOI":"10.1115\/1.4045516","article-title":"Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions","volume":"142","author":"McCartney","year":"2020","journal-title":"J. Eng. Gas Turbines Power"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"20539517231169731","DOI":"10.1177\/20539517231169731","article-title":"Extrapolation and AI transparency: Why machine learning models should reveal when they make decisions beyond their training","volume":"10","author":"Cao","year":"2023","journal-title":"Big Data Soc."},{"key":"ref_4","unstructured":"European Commission Environment Directorate General (2014). Guidance Document on the Validation of (Quantitative)Structure-Activity Relationships [(Q)Sar] Models, OECD."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1080\/10629360902949567","article-title":"How not to develop a quantitative structure-activity or structure-property relationship (QSAR\/QSPR)","volume":"20","author":"Dearden","year":"2009","journal-title":"SAR QSA Environ. Res."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Singh, M.M., and Smith, I.F.C. (2023, January 10\u201312). Extrapolation with machine learning based early-stage energy prediction models. Proceedings of the 2023 European Conference on Computing in Construction and the 40th International CIB W78 Conference, Crete, Greece.","DOI":"10.35490\/EC3.2023.210"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.1039\/D3DD00082F","article-title":"Interpretable models for extrapolation in scientific machine learning","volume":"2","author":"Muckley","year":"2023","journal-title":"Digit. Discov."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1214\/ss\/1177013627","article-title":"Influential Observations, High Leverage Points, and Outliers in Linear Regression: Comment","volume":"1","author":"Hoaglin","year":"1986","journal-title":"Stat. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Aggarwal, C.C., and Yu, P.S. (2001, January 21\u201324). Outlier detection for high dimensional data. Proceedings of the ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA.","DOI":"10.1145\/375663.375668"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1007\/s10618-014-0365-y","article-title":"Graph based anomaly detection and description: A survey","volume":"29","author":"Akoglu","year":"2015","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"100463","DOI":"10.1016\/j.cosrev.2022.100463","article-title":"A survey of outlier detection in high dimensional data streams","volume":"44","author":"Souiden","year":"2022","journal-title":"Comput. Sci. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"100306","DOI":"10.1016\/j.cosrev.2020.100306","article-title":"A critical overview of outlier detection methods","volume":"38","author":"Smiti","year":"2020","journal-title":"Comput. Sci. Rev."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1002\/jcc.21351","article-title":"A New Strategy of Outlier Detection for QSAR\/QSPR","volume":"31","author":"Cao","year":"2010","journal-title":"J. Comput. Chem."},{"key":"ref_14","first-page":"1","article-title":"The development of calibration models for spectroscopic data using principal component regression [Review]","volume":"2","author":"Estienne","year":"1999","journal-title":"Internet J. Chem."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Trinh, C., Tbatou, Y., Lasala, S., Herbinet, O., and Meimaroglou, D. (2023). On the Development of Descriptor-Based Machine Learning Models for Thermodynamic Properties. Part 1\u2014From Data Collection to Model Construction: Understanding of the Methods and their Effects. Processes, 11.","DOI":"10.3390\/pr11123325"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"4791","DOI":"10.3390\/molecules17054791","article-title":"Comparison of different approaches to define the applicability domain of QSAR models","volume":"17","author":"Sahigara","year":"2012","journal-title":"Molecules"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1177\/026119290503300508","article-title":"QSAR applicability domain estimation by projection of the training set in descriptor space: A review","volume":"33","author":"Jaworska","year":"2005","journal-title":"ATLA Altern. Lab. Anim."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1002\/minf.201501019","article-title":"Chemoinformatic Classification Methods and their Applicability Domain","volume":"35","author":"Mathea","year":"2016","journal-title":"Mol. Inform."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.chemolab.2015.04.013","article-title":"On a simple approach for determining applicability domain of QSAR models","volume":"145","author":"Roy","year":"2015","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"8305","DOI":"10.1021\/acs.jpca.9b04771","article-title":"Machine Learning to Predict Standard Enthalpy of Formation of Hydrocarbons","volume":"123","author":"Yalamanchi","year":"2019","journal-title":"J. Phys. Chem. A"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"6270","DOI":"10.1021\/acs.jpca.0c02785","article-title":"Data Science Approach to Estimate Enthalpy of Formation of Cyclic Hydrocarbons","volume":"124","author":"Yalamanchi","year":"2020","journal-title":"J. Phys. Chem. A"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"100054","DOI":"10.1016\/j.egyai.2021.100054","article-title":"Predicting entropy and heat capacity of hydrocarbons using machine learning","volume":"4","author":"Aldosari","year":"2021","journal-title":"Energy AI"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"108291","DOI":"10.1016\/j.compchemeng.2023.108291","article-title":"Application of interpretable group-embedded graph neural networks for pure compound properties","volume":"176","author":"Aouichaoui","year":"2023","journal-title":"Comput. Chem. Eng."},{"key":"ref_24","unstructured":"Balestriero, R., Pesenti, J., and LeCun, Y. (2021). Learning in High Dimension Always Amounts to Extrapolation. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"583","DOI":"10.22190\/FUMI1903583G","article-title":"Mahalanobis Distance and Its Application for detecting multivariate outliers","volume":"34","author":"Ghorbani","year":"2019","journal-title":"Ser. Math. Inform."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0169-7439(99)00047-7","article-title":"The Mahalanobis distance","volume":"50","author":"Massart","year":"2000","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1021\/acs.jcim.2c01091","article-title":"Combining Group-Contribution Concept and Graph Neural Networks Toward Interpretable Molecular Property Models","volume":"63","author":"Aouichaoui","year":"2023","journal-title":"J. Chem. Inf. Model."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Mauri, A., and Bertola, M. (2022). Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood\u2013Brain Barrier Permeability. Int. J. Mol. Sci., 23.","DOI":"10.3390\/ijms232112882"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/qua.26950","article-title":"Quantitative structure\u2013property relationship for the critical temperature of saturated monobasic ketones, aldehydes, and ethers with molecular descriptors","volume":"122","author":"Huoyu","year":"2022","journal-title":"Int. J. Quantum Chem."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.jhazmat.2018.03.025","article-title":"Using machine learning and quantum chemistry descriptors to predict the toxicity of ionic liquids","volume":"352","author":"Cao","year":"2018","journal-title":"J. Hazard. Mater."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12936-019-2941-5","article-title":"Quantitative structure-activity relationship to predict the anti-malarial activity in a set of new imidazolopiperazines based on artificial neural networks","volume":"18","author":"Yousefinejad","year":"2019","journal-title":"Malar. J."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.3390\/molecules16031928","article-title":"QSAR models for cxcr2 receptor antagonists based on the genetic algorithm for data preprocessing prior to application of the pls linear regression method and design of the new compounds using in silico virtual screening","volume":"16","author":"Asadollahi","year":"2011","journal-title":"Molecules"},{"key":"ref_33","first-page":"509","article-title":"Sources of High Leverage in Linear Regression Model","volume":"16","author":"Kim","year":"2004","journal-title":"J. Appl. Math. Inform."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.jesp.2017.09.011","article-title":"Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance","volume":"74","author":"Leys","year":"2018","journal-title":"J. Exp. Soc. Psychol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1002\/qsar.200610151","article-title":"Principles of QSAR models validation: Internal and external","volume":"26","author":"Gramatica","year":"2007","journal-title":"QSAR Comb. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.molliq.2017.06.039","article-title":"Development of robust generalized models for estimating the normal boiling points of pure chemical compounds","volume":"242","author":"Varamesh","year":"2017","journal-title":"J. Mol. Liq."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"105777","DOI":"10.1016\/j.asoc.2019.105777","article-title":"Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction","volume":"85","author":"Sabando","year":"2019","journal-title":"Appl. Soft Comput. J."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/s11030-012-9415-9","article-title":"Reliably assessing prediction reliability for high dimensional QSAR data","volume":"17","author":"Huang","year":"2013","journal-title":"Mol. Divers."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Rakhimbekova, A., Madzhidov, T., Nugmanov, R.I., Baskin, I., Varnek, A., Rakhimbekova, A., Madzhidov, T., Nugmanov, R.I., Gimadiev, T., and Baskin, I. (2021). Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions. Int. J. Mol. Sci., 21.","DOI":"10.3390\/ijms21155542"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2469","DOI":"10.1021\/ci500364e","article-title":"Applicability domain based on ensemble learning in classification and regression analyses","volume":"54","author":"Kaneko","year":"2014","journal-title":"J. Chem. Inf. Model."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Rasmussen, C.E., and Williams, C.K.I. (2006). Gaussian Processes for Machine Learning, MIT Press.","DOI":"10.7551\/mitpress\/3206.001.0001"},{"key":"ref_42","unstructured":"Sushko, I. (2011). Applicability Domain of QSAR Models. [Ph.D. Thesis, Technical University of Munich]."},{"key":"ref_43","first-page":"1","article-title":"Outlier Detection in High Dimensional Data","volume":"19","author":"Kamalov","year":"2020","journal-title":"J. Inf. Knowl. Manag."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Riahi-Madvar, M., Nasersharif, B., and Azirani, A.A. (2021, January 3\u20134). Subspace outlier detection in high dimensional data using ensemble of PCA-based subspaces. Proceedings of the 26th International Computer Conference, Computer Society of Iran, CSICC 2021, Tehran, Iran.","DOI":"10.1109\/CSICC52343.2021.9420589"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Kriegel, H.P., Kr, P., Schubert, E., and Zimek, A. (2009). Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data, Springer.","DOI":"10.1007\/978-3-642-01307-2_86"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1694","DOI":"10.1016\/j.csda.2007.05.018","article-title":"Outlier identification in high dimensions","volume":"52","author":"Filzmoser","year":"2008","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Angiulli, F., and Pizzuti, C. (2002, January 19\u201323). Fast outlier detection in high dimensional spaces. Proceedings of the Principles of Data Mining and Knowledge Discovery, 6th European Conference PKDD, Helsinki, Finland.","DOI":"10.1007\/3-540-45681-3_2"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Kriegel, H.P., Schubert, M., and Zimek, A. (2008, January 24\u201327). Angle-based outlier detection in high-dimensional data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA.","DOI":"10.1145\/1401890.1401946"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2133360.2133363","article-title":"Isolation-based anomaly detection","volume":"6","author":"Liu","year":"2012","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1186\/s40537-020-00320-x","article-title":"A comprehensive survey of anomaly detection techniques for high dimensional big data","volume":"7","author":"Thudumu","year":"2020","journal-title":"J. Big Data"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"652","DOI":"10.2991\/ijcis.11.1.50","article-title":"A comparison of outlier detection techniques for high-dimensional data","volume":"11","author":"Xu","year":"2018","journal-title":"Int. J. Comput. Intell. Syst."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1002\/sam.11161","article-title":"A survey on unsupervised outlier detection in high-dimensional numerical data","volume":"5","author":"Zimek","year":"2012","journal-title":"Stat. Anal. Data Min."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.patcog.2016.03.028","article-title":"High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning","volume":"58","author":"Erfani","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_54","unstructured":"(2023, January 01). Alvascience, AlvaDesc (Software for Molecular Descriptors Calculation), Version 2.0.8. Available online: https:\/\/www.alvascience.com."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1007\/978-1-0716-0150-1_32","article-title":"alvaDesc: A tool to calculate and analyze molecular descriptors and fingerprints","volume":"2","author":"Mauri","year":"2020","journal-title":"Methods Pharmacol. Toxicol."},{"key":"ref_56","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_57","unstructured":"Gaussian, Inc. (2010). Gaussian 09, Gaussian, Inc."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"2822","DOI":"10.1063\/1.477924","article-title":"A complete basis set model chemistry. VI. Use of density functional geometries and frequencies","volume":"110","author":"Montgomery","year":"1999","journal-title":"J. Chem. Phys."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"5648","DOI":"10.1063\/1.464913","article-title":"Thermochemistry. III. The role of exact exchange","volume":"98","author":"Becke","year":"1993","journal-title":"J. Chem. Phys."},{"key":"ref_60","unstructured":"Miyoshi, A. (2023, January 01). GPOP Software, Rev. 2022.01.20m1. Available online: http:\/\/akrmys.com\/gpop\/."},{"key":"ref_61","unstructured":"(2023, June 01). Non-Positive Definite Covariance Matrices. Available online: https:\/\/www.value-at-risk.net\/non-positive-definite-covariance-matrices."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"1069","DOI":"10.1016\/j.drudis.2014.02.003","article-title":"Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde?","volume":"19","author":"Nicolotti","year":"2014","journal-title":"Drug Discov. Today"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1023\/B:JCAM.0000017375.61558.ad","article-title":"Comparison of correlation vector methods for ligand-based similarity searching","volume":"17","author":"Fechner","year":"2003","journal-title":"J. -Comput.-Aided Mol. Des."},{"key":"ref_64","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1007\/s10822-010-9401-1","article-title":"Toward better QSAR\/QSPR modeling: Simultaneous outlier detection and variable selection using distribution of model features","volume":"25","author":"Cao","year":"2011","journal-title":"J.-Comput.-Aided Mol. Des."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"1592","DOI":"10.1111\/biom.13553","article-title":"Simultaneous feature selection and outlier detection with optimality guarantees","volume":"78","author":"Insolia","year":"2022","journal-title":"Biometrics"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"3181","DOI":"10.1016\/j.csda.2010.02.014","article-title":"A diagnostic method for simultaneous feature selection and outlier identification in linear regression","volume":"54","author":"Menjoge","year":"2010","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1080\/02664760701833040","article-title":"Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model","volume":"35","author":"Kim","year":"2008","journal-title":"J. Appl. Stat."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"135675","DOI":"10.1109\/ACCESS.2021.3115848","article-title":"Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression","volume":"9","author":"Jimenez","year":"2021","journal-title":"IEEE Access"},{"key":"ref_70","first-page":"149","article-title":"Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection","volume":"26","author":"Park","year":"2019","journal-title":"Commun. Stat. Appl. Methods"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.chemolab.2009.05.001","article-title":"Simultaneous variable selection and outlier detection using a robust genetic algorithm","volume":"98","author":"Wiegand","year":"2009","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1007\/s00500-003-0310-2","article-title":"Genetic algorithms for outlier detection and variable selection in linear regression models","volume":"8","author":"Tolvi","year":"2004","journal-title":"Soft Comput."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"5586","DOI":"10.1039\/C6AN00764C","article-title":"The model adaptive space shrinkage (MASS) approach: A new method for simultaneous variable selection and outlier detection based on model population analysis","volume":"141","author":"Wen","year":"2016","journal-title":"Analyst"},{"key":"ref_74","unstructured":"(2023, June 01). t-SNE: The Effect of Various Perplexity Values on the Shape. Available online: https:\/\/scikit-learn.org\/stable\/auto_examples\/manifold\/plot_t_sne_perplexity.html."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","article-title":"A Comprehensive Survey on Graph Neural Networks","volume":"32","author":"Wu","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_76","unstructured":"Xu, K., Jegelka, S., Hu, W., and Leskovec, J. (2019, January 6\u20139). How powerful are graph neural networks?. Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-020-00479-8","article-title":"Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models","volume":"13","author":"Jiang","year":"2021","journal-title":"J. Cheminform."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/12\/573\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:40:50Z","timestamp":1760132450000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/12\/573"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,18]]},"references-count":77,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["a16120573"],"URL":"https:\/\/doi.org\/10.3390\/a16120573","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,18]]}}}