{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T11:29:02Z","timestamp":1769858942300,"version":"3.49.0"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2019,6,24]],"date-time":"2019-06-24T00:00:00Z","timestamp":1561334400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["275151"],"award-info":[{"award-number":["275151"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["292307"],"award-info":[{"award-number":["292307"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"name":"EU H2020 NanoSolveIT","award":["814572"],"award-info":[{"award-number":["814572"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>Quantitative structure\u2013activity relationship (QSAR) modelling is currently used in multiple fields to relate structural properties of compounds to their biological activities. This technique is also used for drug design purposes with the aim of predicting parameters that determine drug behaviour. To this end, a sophisticated process, involving various analytical steps concatenated in series, is employed to identify and fine-tune the optimal set of predictors from a large dataset of molecular descriptors (MDs). The search of the optimal model requires to optimize multiple objectives at the same time, as the aim is to obtain the minimal set of features that maximizes the goodness of fit and the applicability domain (AD). Hence, a multi-objective optimization strategy, improving multiple parameters in parallel, can be applied. Here we propose a new multi-niche multi-objective genetic algorithm that simultaneously enables stable feature selection as well as obtaining robust and validated regression models with maximized AD. We benchmarked our method on two simulated datasets. Moreover, we analyzed an aquatic acute toxicity dataset and compared the performances of single- and multi-objective fitness functions on different regression models. Our results show that our multi-objective algorithm is a valid alternative to classical QSAR modelling strategy, for continuous response values, since it automatically finds the model with the best compromise between statistical robustness, predictive performance, widest AD, and the smallest number of MDs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The python implementation of MaNGA is available at https:\/\/github.com\/Greco-Lab\/MaNGA.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz521","type":"journal-article","created":{"date-parts":[[2019,6,19]],"date-time":"2019-06-19T11:10:32Z","timestamp":1560942632000},"page":"145-153","source":"Crossref","is-referenced-by-count":20,"title":["MaNGA: a novel multi-niche multi-objective genetic algorithm for QSAR modelling"],"prefix":"10.1093","volume":"36","author":[{"given":"Angela","family":"Serra","sequence":"first","affiliation":[{"name":"Faculty of Medicine and Health Technology, Tampere University , Tampere 33200, Finland"}]},{"given":"Serli","family":"\u00d6nl\u00fc","sequence":"additional","affiliation":[{"name":"Faculty of Medicine and Health Technology, Tampere University , Tampere 33200, Finland"}]},{"given":"Paola","family":"Festa","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Applications, University of Napoli Federico II , Naples 80138, Italy"}]},{"given":"Vittorio","family":"Fortino","sequence":"additional","affiliation":[{"name":"Institute of Biomedicine, University of Eastern Finland , Kuopio, 80101 Finland"}]},{"given":"Dario","family":"Greco","sequence":"additional","affiliation":[{"name":"Faculty of Medicine and Health Technology, Tampere University , Tampere 33200, Finland"},{"name":"Institute of Biotechnology, University of Helsinki , Helsinki, 00014 Finland"},{"name":"BioMediTech Institute, Tampere University , Tampere 33200, Finland"}]}],"member":"286","published-online":{"date-parts":[[2019,6,24]]},"reference":[{"key":"2023013109450819600_btz521-B1","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/qsar.200430909","article-title":"The better predictive model: high q2 for the training set or low root mean square error of prediction for the test set?","volume":"24","author":"Aptula","year":"2005","journal-title":"QSAR Comb. Sci"},{"key":"2023013109450819600_btz521-B2","doi-asserted-by":"crossref","first-page":"2467","DOI":"10.1021\/acs.jcim.8b00378","article-title":"Multi-objective genetic algorithm (MOGA) as a feature selecting strategy in the development of ionic liquids\u2019 quantitative toxicity\u2013toxicity relationship models","volume":"58","author":"Barycki","year":"2018","journal-title":"J. Chem. Inf. Model"},{"key":"2023013109450819600_btz521-B3","first-page":"203","article-title":"Support vector regression","volume":"11","author":"Basak","year":"2007","journal-title":"Neural Inf. Proc. Let. Rev"},{"key":"2023013109450819600_btz521-B4","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1080\/1062936X.2015.1018938","article-title":"A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas)","volume":"26","author":"Cassotti","year":"2015","journal-title":"SAR QSAR Environ. Res"},{"key":"2023013109450819600_btz521-B5","doi-asserted-by":"crossref","first-page":"4977","DOI":"10.1021\/jm4004285","article-title":"QSAR modeling: where have you been? Where are you going to?","volume":"57","author":"Cherkasov","year":"2014","journal-title":"J. Med. Chem"},{"key":"2023013109450819600_btz521-B6","doi-asserted-by":"crossref","first-page":"2044","DOI":"10.1021\/ci300084j","article-title":"Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection","volume":"52","author":"Chirico","year":"2012","journal-title":"J. Chem. Inf. Model"},{"key":"2023013109450819600_btz521-B7","doi-asserted-by":"crossref","first-page":"1669","DOI":"10.1021\/ci900115y","article-title":"Comments on the definition of the q 2 parameter for QSAR validation","volume":"49","author":"Consonni","year":"2009","journal-title":"J. Chem. Inf. Model"},{"key":"2023013109450819600_btz521-B8","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1002\/cem.1290","article-title":"Evaluation of model predictive ability by external validation techniques","volume":"24","author":"Consonni","year":"2010","journal-title":"J. Chemometrics"},{"key":"2023013109450819600_btz521-B9","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1109\/4235.996017","article-title":"A fast and elitist multiobjective genetic algorithm: nSGA-II","volume":"6","author":"Deb","year":"2002","journal-title":"IEEE Trans. Evol. Comput"},{"key":"2023013109450819600_btz521-B10","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1002\/minf.201100142","article-title":"Benchmarking variable selection in QSAR","volume":"31","author":"Eklund","year":"2012","journal-title":"Mol. Inf"},{"key":"2023013109450819600_btz521-B11","first-page":"2171","article-title":"DEAP: evolutionary algorithms made easy","volume":"13","author":"Fortin","year":"2012","journal-title":"J. Mach. Learning Res"},{"key":"2023013109450819600_btz521-B12","doi-asserted-by":"crossref","first-page":"e107801.","DOI":"10.1371\/journal.pone.0107801","article-title":"A robust and accurate method for feature selection and prioritization from multi-class omics data","volume":"9","author":"Fortino","year":"2014","journal-title":"PLoS One"},{"key":"2023013109450819600_btz521-B13","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511815867","volume-title":"Statistical Models: Theory and Practice","author":"Freedman","year":"2009"},{"key":"2023013109450819600_btz521-B14","doi-asserted-by":"crossref","first-page":"3762","DOI":"10.1021\/jp980230o","article-title":"Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods","volume":"102","author":"Ghose","year":"1998","journal-title":"J. Physical Chem. A"},{"key":"2023013109450819600_btz521-B15","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/S1093-3263(01)00123-1","article-title":"Beware of q2!","volume":"20","author":"Golbraikh","year":"2002","journal-title":"J. Mol. Graph. Model"},{"key":"2023013109450819600_btz521-B16","doi-asserted-by":"crossref","first-page":"636","DOI":"10.5740\/jaoacint.SGE_Goodarzi","article-title":"Feature selection methods in QSAR studies","volume":"95","author":"Goodarzi","year":"2012","journal-title":"J. AOAC Int"},{"key":"2023013109450819600_btz521-B17","first-page":"694","article-title":"Principles of QSAR models validation: internal and external","volume":"26","author":"Gramatica","year":"2007","journal-title":"Mol. Inf"},{"key":"2023013109450819600_btz521-B18","first-page":"499","volume-title":"Computational Toxicology. Methods in Molecular Biology (Methods and Protocols)","author":"Gramatica","year":"2013"},{"key":"2023013109450819600_btz521-B19","doi-asserted-by":"crossref","first-page":"2121","DOI":"10.1002\/jcc.23361","article-title":"QSARINS: a new software for the development, analysis, and validation of QSAR MLR models","volume":"34","author":"Gramatica","year":"2013","journal-title":"J. Comput. Chem"},{"key":"2023013109450819600_btz521-B20","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023013109450819600_btz521-B21","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1016\/j.jmgm.2005.03.003","article-title":"Assessing the reliability of a QSAR model\u2019s predictions","volume":"23","author":"He","year":"2005","journal-title":"J. Mol. Graph. Model"},{"key":"2023013109450819600_btz521-B22","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s10115-006-0040-8","article-title":"Stability of feature selection algorithms: a study on high-dimensional spaces","volume":"12","author":"Kalousis","year":"2007","journal-title":"Knowl. Inf. Syst"},{"key":"2023013109450819600_btz521-B23","doi-asserted-by":"crossref","first-page":"D1202","DOI":"10.1093\/nar\/gkv951","article-title":"PubChem substance and compound databases","volume":"44","author":"Kim","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023013109450819600_btz521-B24","doi-asserted-by":"crossref","first-page":"992","DOI":"10.1016\/j.ress.2005.11.018","article-title":"Multi-objective optimization using genetic algorithms: a tutorial","volume":"91","author":"Konak","year":"2006","journal-title":"Reliab. Eng. Syst. Saf"},{"key":"2023013109450819600_btz521-B25","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1016\/S1093-3263(00)00068-1","article-title":"A widely applicable set of descriptors","volume":"18","author":"Labute","year":"2000","journal-title":"J. Mol. Graph. Model"},{"key":"2023013109450819600_btz521-B26","doi-asserted-by":"crossref","first-page":"1929","DOI":"10.1126\/science.1132939","article-title":"The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease","volume":"313","author":"Lamb","year":"2006","journal-title":"Science"},{"key":"2023013109450819600_btz521-B27","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1002\/cem.651","article-title":"Genetic algorithms in chemometrics and chemistry: a review","volume":"15","author":"Leardi","year":"2001","journal-title":"J. Chemometrics"},{"key":"2023013109450819600_btz521-B28","doi-asserted-by":"crossref","first-page":"1823","DOI":"10.1021\/ci049875d","article-title":"A comparative study on feature selection methods for drug discovery","volume":"44","author":"Liu","year":"2004","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023013109450819600_btz521-B29","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1080\/1062936X.2015.1018939","article-title":"Comparison of global and mode of action-based models for aquatic toxicity","volume":"26","author":"Martin","year":"2015","journal-title":"SAR QSAR Environ. Res"},{"key":"2023013109450819600_btz521-B30","first-page":"237","article-title":"Dragon software: an easy approach to molecular descriptor calculations","volume":"56","author":"Mauri","year":"2006","journal-title":"Match"},{"key":"2023013109450819600_btz521-B31","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1248\/cpb.40.127","article-title":"Simple method of calculating octanol\/water partition coefficient","volume":"40","author":"Moriguchi","year":"1992","journal-title":"Chem. Pharm. Bull"},{"key":"2023013109450819600_btz521-B32","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1021\/jm020919o","article-title":"Multiobjective optimization in quantitative structure\u2013activity relationships: deriving accurate and interpretable QSARs","volume":"45","author":"Nicolotti","year":"2002","journal-title":"J. Med. Chem"},{"key":"2023013109450819600_btz521-B33","article-title":"\u2013","year":"2014"},{"key":"2023013109450819600_btz521-B34","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1021\/ci050212l","article-title":"Statistically validated QSARS, based on theoretical descriptors, for modeling aquatic toxicity of organic chemicals in Pimephales promelas (fathead minnow)","volume":"45","author":"Papa","year":"2005","journal-title":"J. Chem. Inf. Model"},{"key":"2023013109450819600_btz521-B35","doi-asserted-by":"crossref","first-page":"1567","DOI":"10.1517\/17460441.2.12.1567","article-title":"On some aspects of validation of predictive quantitative structure\u2013activity relationship models","volume":"2","author":"Roy","year":"2007","journal-title":"Expert Opin. Drug Discov"},{"key":"2023013109450819600_btz521-B36","doi-asserted-by":"crossref","first-page":"2140","DOI":"10.1021\/ci800253u","article-title":"External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean","volume":"48","author":"Sch\u00fc\u00e4rmann","year":"2008","journal-title":"J. Chem. Inf. Model"},{"key":"2023013109450819600_btz521-B37","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1021\/ci000066d","article-title":"QSAR models using a large diverse set of estrogens","volume":"41","author":"Shi","year":"2001","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023013109450819600_btz521-B38","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1002\/qsar.200960053","article-title":"Multi-objective feature selection in QSAR using a machine learning approach","volume":"28","author":"Soto","year":"2009","journal-title":"QSAR Combinat. Sci"},{"key":"2023013109450819600_btz521-B39","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1021\/ci034160g","article-title":"Random forest: a classification and regression tool for compound classification and QSAR modeling","volume":"43","author":"Svetnik","year":"2003","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023013109450819600_btz521-B40","volume-title":"Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing\/Volume II: Appendices, References. Methods and Principles in Medicinal Chemistry","author":"Todeschini","year":"2009"},{"key":"2023013109450819600_btz521-B41","doi-asserted-by":"crossref","first-page":"1006","DOI":"10.1021\/jm00280a002","article-title":"Utilization of operational schemes for analog synthesis in drug design","volume":"15","author":"Topliss","year":"1972","journal-title":"J. Med. Chem"},{"key":"2023013109450819600_btz521-B42","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1002\/minf.201000061","article-title":"Best practices for QSAR model development, validation, and exploitation","volume":"29","author":"Tropsha","year":"2010","journal-title":"Mol. Inf"},{"key":"2023013109450819600_btz521-B43","doi-asserted-by":"crossref","first-page":"3494","DOI":"10.2174\/138161207782794257","article-title":"Predictive QSAR modeling workflow, model applicability domains, and virtual screening","volume":"13","author":"Tropsha","year":"2007","journal-title":"Curr. Pharm. Des"},{"key":"2023013109450819600_btz521-B44","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1016\/0045-6535(92)90280-5","article-title":"Classifying environmental pollutants","volume":"25","author":"Verhaar","year":"1992","journal-title":"Chemosphere"},{"key":"2023013109450819600_btz521-B45","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1021\/ci010291a","article-title":"Toward an optimal procedure for variable selection and QSAR model building","volume":"41","author":"Yasri","year":"2001","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023013109450819600_btz521-B46","first-page":"908","article-title":"Semi-supervised regression with co-training","volume":"5","author":"Zhou","year":"2005","journal-title":"IJCAI"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz521\/28969048\/btz521.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/145\/48981352\/bioinformatics_36_1_145.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/145\/48981352\/bioinformatics_36_1_145.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T18:26:19Z","timestamp":1675189579000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/1\/145\/5522367"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,6,24]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz521","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,1,1]]},"published":{"date-parts":[[2019,6,24]]}}}