{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T04:47:31Z","timestamp":1772599651061,"version":"3.50.1"},"reference-count":65,"publisher":"IOP Publishing","issue":"3","license":[{"start":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T00:00:00Z","timestamp":1626220800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T00:00:00Z","timestamp":1626220800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"crossref","award":["677013-HBMAP"],"award-info":[{"award-number":["677013-HBMAP"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000727","name":"Trinity College, University of Cambridge","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000727","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Swiss National Supercomputing Centre","award":["s1000"],"award-info":[{"award-number":["s1000"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2021,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Selecting the most relevant features and samples out of a large set of candidates is a task that occurs very often in the context of automated data analysis, where it improves the computational performance and often the transferability of a model. Here we focus on two popular subselection schemes applied to this end: CUR decomposition, derived from a low-rank approximation of the feature matrix, and farthest point sampling (FPS), which relies on the iterative identification of the most diverse samples and discriminating features. We modify these unsupervised approaches, incorporating a supervised component following the same spirit as the principal covariates (PCov) regression method. We show how this results in selections that perform better in supervised tasks, demonstrating with models of increasing complexity, from ridge regression to kernel ridge regression and finally feed-forward neural networks. We also present adjustments to minimise the impact of any subselection when performing unsupervised tasks. We demonstrate the significant improvements associated with PCov-CUR and PCov-FPS selections for applications to chemistry and materials science, typically reducing by a factor of two the number of features and samples required to achieve a given level of regression accuracy.<\/jats:p>","DOI":"10.1088\/2632-2153\/abfe7c","type":"journal-article","created":{"date-parts":[[2021,5,6]],"date-time":"2021-05-06T22:19:00Z","timestamp":1620339540000},"page":"035038","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":34,"title":["Improving sample and feature selection with principal covariates regression"],"prefix":"10.1088","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4515-3441","authenticated-orcid":false,"given":"Rose K","family":"Cersonsky","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2260-7183","authenticated-orcid":false,"given":"Benjamin A","family":"Helfrecht","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2944-9445","authenticated-orcid":false,"given":"Edgar A","family":"Engel","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8326-325X","authenticated-orcid":false,"given":"Sergei","family":"Kliavinek","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2571-2832","authenticated-orcid":false,"given":"Michele","family":"Ceriotti","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2021,7,14]]},"reference":[{"key":"mlstabfe7cbib1","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1214\/ss\/1042727940","article-title":"Statistical fraud detection: a review","volume":"17","author":"Bolton","year":"2002","journal-title":"Stat. Sci."},{"key":"mlstabfe7cbib2","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1016\/j.ejor.2017.11.054","article-title":"Deep learning with long short-term memory networks for financial market predictions","volume":"270","author":"Fischer","year":"2018","journal-title":"Eur. J. Oper. Res."},{"key":"mlstabfe7cbib3","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1016\/S0167-9236(03)00086-1","article-title":"Credit rating analysis with support vector machines and neural networks: a market comparative study","volume":"37","author":"Huang","year":"2004","journal-title":"Decis. Support Syst."},{"key":"mlstabfe7cbib4","doi-asserted-by":"publisher","first-page":"2639","DOI":"10.1016\/j.eswa.2007.05.019","article-title":"Using neural network ensembles for bankruptcy prediction and credit scoring","volume":"34","author":"Tsai","year":"2008","journal-title":"Expert Syst. Appl."},{"key":"mlstabfe7cbib5","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach. Learn."},{"key":"mlstabfe7cbib6","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1371\/journal.pone.0079476","article-title":"Extreme learning machine-based classification of ADHD using brain structural MRI data","volume":"8","author":"Peng","year":"2013","journal-title":"PLoS One"},{"key":"mlstabfe7cbib7","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1038\/s41746-018-0029-1","article-title":"Scalable and accurate deep learning with electronic health records","volume":"1","author":"Rajkomar","year":"2018","journal-title":"NPJ Digital Med."},{"key":"mlstabfe7cbib8","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol."},{"key":"mlstabfe7cbib9","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1016\/j.isprsjprs.2016.01.011","article-title":"Random forest in remote sensing: a review of applications and future directions","volume":"114","author":"Belgiu","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"mlstabfe7cbib10","doi-asserted-by":"publisher","first-page":"446","DOI":"10.1016\/j.neuroimage.2013.10.027","article-title":"MNE software for processing MEG and EEG data","volume":"86","author":"Gramfort","year":"2014","journal-title":"Neuroimage"},{"key":"mlstabfe7cbib11","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/j.isprsjprs.2010.11.001","article-title":"Support vector machines in remote sensing: a review","volume":"66","author":"Mountrakis","year":"2011","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"mlstabfe7cbib12","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1016\/j.chroma.2007.05.024","article-title":"Supervised pattern recognition in food analysis","volume":"1158","author":"Berrueta","year":"2007","journal-title":"J. Chromatogr. A"},{"key":"mlstabfe7cbib13","doi-asserted-by":"publisher","DOI":"10.1038\/srep42717","article-title":"SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules","volume":"7","author":"Daina","year":"2017","journal-title":"Sci. Rep."},{"key":"mlstabfe7cbib14","doi-asserted-by":"publisher","first-page":"1528","DOI":"10.1016\/j.bpj.2015.08.015","article-title":"MDTraj: a modern open library for the analysis of molecular dynamics trajectories","volume":"109","author":"McGibbon","year":"2015","journal-title":"Biophys. J."},{"key":"mlstabfe7cbib15","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1016\/S0004-3702(97)00063-5","article-title":"Selection of relevant features and examples in machine learning","volume":"97","author":"Blum","year":"1997","journal-title":"Artif. Intell."},{"key":"mlstabfe7cbib16","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1145\/3136625","article-title":"Feature selection: a data perspective","volume":"50","author":"Li","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"mlstabfe7cbib17","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.neucom.2018.02.100","article-title":"Review of classical dimensionality reduction and sample selection methods for large-scale data processing","volume":"328","author":"Xu","year":"2019","journal-title":"Neurocomputing"},{"key":"mlstabfe7cbib18","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1137\/S0036144599352836","article-title":"Centroidal Voronoi tessellations: applications and algorithms","volume":"41","author":"Du","year":"1999","journal-title":"SIAM Rev."},{"key":"mlstabfe7cbib19","doi-asserted-by":"publisher","first-page":"410","DOI":"10.1016\/j.artint.2010.01.001","article-title":"Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts","volume":"174","author":"Garc\u00eda-Osorio","year":"2010","journal-title":"Artif. Intell."},{"key":"mlstabfe7cbib20","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1186\/s12711-015-0116-6","article-title":"Optimization of genomic selection training populations with a genetic algorithm","volume":"47","author":"Akdemir","year":"2015","journal-title":"Genet. Selection Evol."},{"key":"mlstabfe7cbib21","doi-asserted-by":"publisher","first-page":"1491","DOI":"10.1109\/TKDE.2011.67","article-title":"Maximum ambiguity-based sample selection in fuzzy decision tree induction","volume":"24","author":"Wang","year":"2012","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"mlstabfe7cbib22","first-page":"1553\u20131","author":"Widrow","year":"1960"},{"key":"mlstabfe7cbib23","doi-asserted-by":"publisher","first-page":"1358","DOI":"10.1109\/72.963772","article-title":"Sensitivity analysis of multilayer perceptron to input and weight perturbations","volume":"12","author":"Zeng","year":"2001","journal-title":"IEEE Trans. Neural Netw."},{"key":"mlstabfe7cbib24","first-page":"pp 2593","article-title":"Input sample selection for RBF neural network classification problems using sensitivity measure","volume":"vol 3","author":"Ng","year":"2003"},{"key":"mlstabfe7cbib25","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1109\/TIT.1968.1054155","article-title":"The condensed nearest neighbor rule","volume":"14","author":"Hart","year":"1968","journal-title":"IEEE Trans. Inf. Theory"},{"key":"mlstabfe7cbib26","first-page":"pp 455","article-title":"On sensor evolution in robotics","volume":"vol 98","author":"Balakrishnan","year":"1996"},{"key":"mlstabfe7cbib27","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1142\/S0219720005001004","article-title":"Minimum redundancy feature selection from microarray gene expression data","volume":"03","author":"Ding","year":"2005","journal-title":"J. Bioinform. Computat. Biol."},{"key":"mlstabfe7cbib28","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1007\/s10479-008-0506-z","article-title":"Optimizing feature selection to improve medical diagnosis","volume":"174","author":"Fan","year":"2010","journal-title":"Ann. Oper. Res."},{"key":"mlstabfe7cbib29","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/j.compbiolchem.2007.09.005","article-title":"Improved binary PSO for feature selection using gene expression data","volume":"32","author":"Chuang","year":"2008","journal-title":"Computat. Biol. Chem."},{"key":"mlstabfe7cbib30","doi-asserted-by":"publisher","first-page":"pp 155","DOI":"10.1016\/0169-7439(92)80100-I","article-title":"Principal covariates regression: part I. Theory","author":"de Jong","year":"1992"},{"key":"mlstabfe7cbib31","doi-asserted-by":"publisher","first-page":"765","DOI":"10.1021\/acs.jctc.5b01006","article-title":"Ab Initio quality NMR parameters in solid-state materials using a high-dimensional neural-network representation","volume":"12","author":"Cuny","year":"2016","journal-title":"J. Chem. Theory Comput."},{"key":"mlstabfe7cbib32","doi-asserted-by":"publisher","first-page":"4501","DOI":"10.1038\/s41467-018-06972-x","article-title":"Chemical shifts in molecular solids by machine learning","volume":"9","author":"Paruzzo","year":"2018","journal-title":"Nat. Commun."},{"key":"mlstabfe7cbib33","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.98.146401","article-title":"Generalized neural-network representation of high-dimensional potential-energy surfaces","volume":"98","author":"Behler","year":"2007","journal-title":"Phys. Rev. Lett."},{"key":"mlstabfe7cbib34","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.104.136403","article-title":"Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons","volume":"104","author":"Bart\u00f3k","year":"2010","journal-title":"Phys. Rev. Lett."},{"key":"mlstabfe7cbib35","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.108.058301","article-title":"Fast and accurate modeling of molecular atomization energies with machine learning","volume":"108","author":"Rupp","year":"2012","journal-title":"Phys. Rev. Lett."},{"key":"mlstabfe7cbib36","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba9ef","article-title":"Structure-property maps with Kernel principal covariates regression","volume":"1","author":"Helfrecht","year":"2020","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstabfe7cbib37","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1016\/j.chemolab.2013.02.005","article-title":"On the selection of the weighting parameter value in principal covariates regression","volume":"123","author":"Vervloet","year":"2013","journal-title":"Chemometr. Intell. Lab. Syst."},{"key":"mlstabfe7cbib38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v065.i08","article-title":"PCovR: an R Package for principal covariates regression","volume":"65","author":"Vervloet","year":"2015","journal-title":"J. Stat. Software"},{"key":"mlstabfe7cbib39","doi-asserted-by":"publisher","first-page":"1305","DOI":"10.1109\/83.623193","article-title":"The farthest point strategy for progressive image sampling","volume":"6","author":"Eldar","year":"1997","journal-title":"IEEE Trans. Image Process."},{"key":"mlstabfe7cbib40","doi-asserted-by":"publisher","DOI":"10.1063\/1.5024611","article-title":"Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials","volume":"148","author":"Imbalzano","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstabfe7cbib41","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1073\/pnas.0803205106","article-title":"CUR matrix decompositions for improved data analysis","volume":"106","author":"Mahoney","year":"2009","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"mlstabfe7cbib42","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1007\/BF02163027","article-title":"Singular value decomposition and least squares solutions","volume":"14","author":"Golub","year":"1970","journal-title":"Numer. Math."},{"key":"mlstabfe7cbib43","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1109\/TAC.1980.1102314","article-title":"The singular value decomposition: its computation and some applications","volume":"25","author":"Klema","year":"1980","journal-title":"IEEE Trans. Autom. Control"},{"key":"mlstabfe7cbib44","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1007\/BF01396012","article-title":"Rank-one modification of the symmetric eigenproblem","volume":"31","author":"Bunch","year":"1978","journal-title":"Numer. Math."},{"key":"mlstabfe7cbib45","doi-asserted-by":"publisher","first-page":"1266","DOI":"10.1137\/S089547989223924X","article-title":"A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem","volume":"15","author":"Gu","year":"1994","journal-title":"SIAM J. Matrix Anal. Appl."},{"key":"mlstabfe7cbib46","doi-asserted-by":"publisher","first-page":"906","DOI":"10.1021\/acs.jctc.8b00959","article-title":"Fast and accurate uncertainty estimation in chemical machine learning","volume":"15","author":"Musil","year":"2019","journal-title":"J. Chem. Theory Comput."},{"key":"mlstabfe7cbib47","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.87.184115","article-title":"On representing chemical environments","volume":"87","author":"Bart\u00f3k","year":"2013","journal-title":"Phys. Rev. B"},{"key":"mlstabfe7cbib48","doi-asserted-by":"publisher","DOI":"10.1063\/5.0044689","article-title":"Efficient implementation of atom-density representations","volume":"154","author":"Musil","year":"2021","journal-title":"J. Chem. Phys."},{"key":"mlstabfe7cbib49","doi-asserted-by":"publisher","first-page":"23385","DOI":"10.1039\/C9CP04489B","article-title":"A Bayesian approach to NMR crystal structure determination","volume":"21","author":"Engel","year":"2019","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstabfe7cbib50","doi-asserted-by":"publisher","DOI":"10.1063\/5.0016005","article-title":"Sensitivity and dimensionality of atomic environment representations used for machine learning interatomic potentials","volume":"153","author":"Onat","year":"2020","journal-title":"J. Chem. Phys."},{"key":"mlstabfe7cbib51","doi-asserted-by":"publisher","first-page":"9b","DOI":"10.1021\/acs.jpca.9b08723","article-title":"Performance and cost assessment of machine learning interatomic potentials","volume":"124","author":"Zuo","year":"2020","journal-title":"J. Phys. Chem. A"},{"key":"mlstabfe7cbib52","doi-asserted-by":"publisher","DOI":"10.1063\/1.3553717","article-title":"Atom-centered symmetry functions for constructing high-dimensional neural network potentials","volume":"134","author":"Behler","year":"2011a","journal-title":"J. Chem. Phys."},{"key":"mlstabfe7cbib53","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.1701816","article-title":"Machine learning unifies the modeling of materials and molecules","volume":"3","author":"Bart\u00f3k","year":"2017","journal-title":"Sci. Adv."},{"key":"mlstabfe7cbib54","author":"Rasmussen","year":"2005"},{"key":"mlstabfe7cbib55","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1088\/2632-2153\/abdaf7","article-title":"The role of feature space in atomistic learning","volume":"2","author":"Goscinski","year":"2020","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstabfe7cbib56","doi-asserted-by":"publisher","first-page":"300","DOI":"10.2307\/2348005","article-title":"A note on the use of principal components in regression","volume":"31","author":"Jolliffe","year":"1982","journal-title":"J. R. Stat. Soc. Ser. C"},{"key":"mlstabfe7cbib57","doi-asserted-by":"publisher","DOI":"10.1063\/5.0021116","article-title":"Recursive evaluation and iterative contraction of N-body equivariant features","volume":"153","author":"Nigam","year":"2020","journal-title":"J. Chem. Phys."},{"key":"mlstabfe7cbib58","doi-asserted-by":"publisher","first-page":"17930","DOI":"10.1039\/c1cp21668f","article-title":"Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations","volume":"13","author":"Behler","year":"2011b","journal-title":"Phys. Chem. Chem. Phys. PCCP"},{"key":"mlstabfe7cbib59","article-title":"A complete description of thermodynamic stabilities of molecular crystals","author":"Kapil","year":"2021"},{"key":"mlstabfe7cbib60","doi-asserted-by":"publisher","DOI":"10.24435\/materialscloud:vp-jf","article-title":"Semi-local and hybrid functional DFT data for thermalised snapshots of polymorphs of benzene, succinic acid and glycine","volume":"2021.51","author":"Engel","year":"2021","journal-title":"Mater. Cloud Arch."},{"key":"mlstabfe7cbib61","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.81.184107","article-title":"Ab initio quality neural-network potential for sodium","volume":"81","author":"Eshet","year":"2010","journal-title":"Phys. Rev. B"},{"key":"mlstabfe7cbib62","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.81.100103","article-title":"Graphite-diamond phase coexistence study employing a neural-network mapping of the ab initio potential energy surface","volume":"81","author":"Khaliullin","year":"2010","journal-title":"Phys. Rev. B"},{"key":"mlstabfe7cbib63","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1038\/nmat3078","article-title":"Nucleation mechanism for the direct graphite-to-diamond phase transition","volume":"10","author":"Khaliullin","year":"2011","journal-title":"Nat. Mater."},{"key":"mlstabfe7cbib64","doi-asserted-by":"publisher","first-page":"1110","DOI":"10.1073\/pnas.1815117116","article-title":"Ab initio thermodynamics of liquid and solid water","volume":"116","author":"Cheng","year":"2019","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"mlstabfe7cbib65","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.4752370","author":"Cersonsky","year":"2021","journal-title":"scikit-cosmo"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,12]],"date-time":"2021-12-12T17:57:38Z","timestamp":1639331858000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abfe7c"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,14]]},"references-count":65,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,7,14]]},"published-print":{"date-parts":[[2021,9,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/abfe7c","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,14]]},"assertion":[{"value":"Improving sample and feature selection with principal covariates regression","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2021 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2020-12-18","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2021-05-06","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2021-07-14","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}