{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T00:17:32Z","timestamp":1768349852035,"version":"3.49.0"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"crossref","award":["326280"],"award-info":[{"award-number":["326280"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"crossref","award":["326339"],"award-info":[{"award-number":["326339"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Finnish Grid and Cloud Infrastructure"},{"name":"Doctoral Programme in Computer Science at University of Helsinki"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Real-world datasets are often characterised by outliers; data items that do not follow the same structure as the rest of the data. These outliers might negatively influence modelling of the data. In data analysis it is, therefore, important to consider methods that are robust to outliers. In this paper we develop a robust regression method that finds the largest subset of data items that can be approximated using a sparse linear model to a given precision. We show that this can yield the best possible robustness to outliers. However, this problem is NP-hard and to solve it we present an efficient approximation algorithm, termed SLISE. Our method extends existing state-of-the-art robust regression methods, especially in terms of speed on high-dimensional datasets. We demonstrate our method by applying it to both synthetic and real-world regression problems.<\/jats:p>","DOI":"10.1007\/s10618-022-00819-2","type":"journal-article","created":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T17:02:34Z","timestamp":1643302954000},"page":"781-810","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Robust regression via error tolerance"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7749-2918","authenticated-orcid":false,"given":"Anton","family":"Bj\u00f6rklund","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4040-6967","authenticated-orcid":false,"given":"Andreas","family":"Henelius","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9623-6282","authenticated-orcid":false,"given":"Emilia","family":"Oikarinen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9769-7163","authenticated-orcid":false,"given":"Kimmo","family":"Kallonen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1819-1047","authenticated-orcid":false,"given":"Kai","family":"Puolam\u00e4ki","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,27]]},"reference":[{"issue":"1","key":"819_CR1","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1214\/12-AOAS575","volume":"7","author":"A Alfons","year":"2013","unstructured":"Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann Appl Stat 7(1):226\u2013248. https:\/\/doi.org\/10.1214\/12-AOAS575","journal-title":"Ann Appl Stat"},{"issue":"1","key":"819_CR2","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/0304-3975(94)00254-G","volume":"147","author":"E Amaldi","year":"1995","unstructured":"Amaldi E, Kann V (1995) The complexity and approximability of finding maximum feasible subsystems of linear relations. Theor Comput Sci 147(1):181\u2013210. https:\/\/doi.org\/10.1016\/0304-3975(94)00254-G","journal-title":"Theor Comput Sci"},{"key":"819_CR3","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-58412-1","volume-title":"Complexity and approximation: combinatorial optimization problems and their approximability properties","author":"G Ausiello","year":"1999","unstructured":"Ausiello G, Crescenzi P, Gambosi G, Kann V, Marchetti-Spaccamela A, Protasi M (1999) Complexity and approximation: combinatorial optimization problems and their approximability properties, 2nd edn. Springer, Berlin. https:\/\/doi.org\/10.1007\/978-3-642-58412-1","edition":"2"},{"key":"819_CR4","doi-asserted-by":"crossref","unstructured":"Barath D, Matas J (2018) Graph-cut RANSAC. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https:\/\/arxiv.org\/abs\/1706.00984v2","DOI":"10.1109\/CVPR.2018.00704"},{"key":"819_CR5","doi-asserted-by":"crossref","unstructured":"Barath D, Noskova J, Ivashechkin M, Matas J (2020) Magsac++, a fast, reliable and accurate robust estimator. In: Proceedings of the IEEE\/VF conference on computer vision and pattern recognition (CVPR). https:\/\/arxiv.org\/abs\/1912.05909","DOI":"10.1109\/CVPR42600.2020.00138"},{"key":"819_CR6","unstructured":"Bj\u00f6rklund A (2021) SLISE\u2014sparse linear subset explanations (Python version). https:\/\/github.com\/edahelsinki\/pyslise"},{"key":"819_CR7","doi-asserted-by":"publisher","unstructured":"Bj\u00f6rklund A, Henelius A, Oikarinen E, Kallonen K, Puolam\u00e4ki K (2019) Sparse robust regression for explaining classifiers. In: Discovery science. Springer, Berlin, pp 351\u2013366. https:\/\/doi.org\/10.1007\/978-3-030-33778-0_27","DOI":"10.1007\/978-3-030-33778-0_27"},{"key":"819_CR8","unstructured":"Bj\u00f6rklund A, Puolam\u00e4ki K, Henelius A (2021) SLISE\u2014sparse linear subset explanations (R version). https:\/\/github.com\/edahelsinki\/slise"},{"key":"819_CR9","doi-asserted-by":"crossref","unstructured":"Cohen G, Afshar S, Tapson J, van Schaik A (2017) EMNIST: an extension of MNIST to handwritten letters. arXiv:170205373https:\/\/arxiv.org\/abs\/1702.05373","DOI":"10.1109\/IJCNN.2017.7966217"},{"key":"819_CR10","unstructured":"Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008)"},{"issue":"2","key":"819_CR11","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1016\/j.snb.2007.09.060","volume":"129","author":"S De Vito","year":"2008","unstructured":"De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750\u2013757. https:\/\/doi.org\/10.1016\/j.snb.2007.09.060","journal-title":"Sens Actuators B Chem"},{"key":"819_CR12","unstructured":"Donoho DL, Huber PJ (1983) The notion of breakdown point. A festschrift for Erich L Lehmann, pp 157\u2013184"},{"key":"819_CR13","unstructured":"Dua D, Graff C (2019) UCI machine learning repository. http:\/\/archive.ics.uci.edu\/ml"},{"issue":"1","key":"819_CR14","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1080\/07350015.2019.1660177","volume":"39","author":"M Fernandes","year":"2021","unstructured":"Fernandes M, Guerre E, Horta E (2021) Smoothing quantile regressions. J Bus Econ Stat 39(1):338\u2013357. https:\/\/doi.org\/10.1080\/07350015.2019.1660177","journal-title":"J Bus Econ Stat"},{"key":"819_CR15","unstructured":"FGCI (2021) Finnish grid and cloud infrastructure. Urn:nbn:fi:research-infras-2016072533"},{"issue":"6","key":"819_CR16","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1145\/358669.358692","volume":"24","author":"MA Fischler","year":"1981","unstructured":"Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381\u2013395. https:\/\/doi.org\/10.1145\/358669.358692","journal-title":"Commun ACM"},{"issue":"11","key":"819_CR17","doi-asserted-by":"publisher","first-page":"3124","DOI":"10.1016\/j.csda.2005.06.005","volume":"50","author":"A Giloni","year":"2006","unstructured":"Giloni A, Simonoff JS, Sengupta B (2006) Robust weighted lad regression. Comput Stat Data Anal 50(11):3124\u20133140. https:\/\/doi.org\/10.1016\/j.csda.2005.06.005","journal-title":"Comput Stat Data Anal"},{"key":"819_CR18","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1016\/j.commatsci.2018.07.052","volume":"154","author":"K Hamidieh","year":"2018","unstructured":"Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346\u2013354. https:\/\/doi.org\/10.1016\/j.commatsci.2018.07.052","journal-title":"Comput Mater Sci"},{"key":"819_CR19","unstructured":"HIP CMS Experiment (2019) Helsinki OpenData Tuples. https:\/\/hot.hip.fi\/"},{"issue":"1","key":"819_CR20","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1214\/aoms\/1177703732","volume":"35","author":"PJ Huber","year":"1964","unstructured":"Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73\u2013101. https:\/\/doi.org\/10.1214\/aoms\/1177703732","journal-title":"Ann Math Stat"},{"issue":"3","key":"819_CR21","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1002\/wics.34","volume":"1","author":"M Hubert","year":"2009","unstructured":"Hubert M, Debruyne M (2009) Breakdown value. Wiley Interdiscip Rev Comput Stat 1(3):296\u2013302. https:\/\/doi.org\/10.1002\/wics.34","journal-title":"Wiley Interdiscip Rev Comput Stat"},{"issue":"4","key":"819_CR22","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1257\/jep.15.4.143","volume":"15","author":"R Koenker","year":"2001","unstructured":"Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143\u2013156. https:\/\/doi.org\/10.1257\/jep.15.4.143","journal-title":"J Econ Perspect"},{"issue":"2","key":"819_CR23","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1007\/s00180-016-0679-x","volume":"32","author":"M Koller","year":"2017","unstructured":"Koller M, Stahel WA (2017) Nonsingular subsampling for regression s estimators with categorical predictors. Comput Stat 32(2):631\u2013646. https:\/\/doi.org\/10.1007\/s00180-016-0679-x","journal-title":"Comput Stat"},{"key":"819_CR24","unstructured":"Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, pp 142\u2013150. http:\/\/www.aclweb.org\/anthology\/P11-1015"},{"key":"819_CR25","unstructured":"Microsoft and R Core Team (2019) Microsoft R Open. https:\/\/mran.microsoft.com\/"},{"key":"819_CR26","doi-asserted-by":"crossref","unstructured":"Mobahi H, Fisher JW (2015) On the link between gaussian homotopy continuation and convex envelopes. In: Energy minimization methods in computer vision and pattern recognition. Springer, pp 43\u201356","DOI":"10.1007\/978-3-319-14612-6_4"},{"key":"819_CR27","unstructured":"Qin Y, Li S, Li Y, Yu Y (2017) Penalized maximum tangent likelihood estimation and robust variable selection. arXiv:170805439http:\/\/arxiv.org\/abs\/1708.05439"},{"key":"819_CR28","doi-asserted-by":"crossref","unstructured":"Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. In: SIGKDD, pp 1135\u20131144","DOI":"10.1145\/2939672.2939778"},{"issue":"388","key":"819_CR29","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1080\/01621459.1984.10477105","volume":"79","author":"PJ Rousseeuw","year":"1984","unstructured":"Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871\u2013880. https:\/\/doi.org\/10.1080\/01621459.1984.10477105","journal-title":"J Am Stat Assoc"},{"issue":"1","key":"819_CR30","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1002\/widm.2","volume":"1","author":"PJ Rousseeuw","year":"2011","unstructured":"Rousseeuw PJ, Hubert M (2011) Robust statistics for outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):73\u201379. https:\/\/doi.org\/10.1002\/widm.2","journal-title":"Wiley Interdiscip Rev Data Min Knowl Discov"},{"key":"819_CR31","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1007\/978-1-4615-7821-5_15","volume-title":"Robust regression by means of S-estimators","author":"P Rousseeuw","year":"1984","unstructured":"Rousseeuw P, Yohai V (1984) Robust regression by means of S-estimators, vol 26. Springer, New York, pp 256\u2013272. https:\/\/doi.org\/10.1007\/978-1-4615-7821-5_15"},{"key":"819_CR32","doi-asserted-by":"publisher","unstructured":"Rousseeuw PJ, Van\u00a0Driessen K (2000) An algorithm for positive-breakdown regression based on concentration steps. In: Data analysis. Springer, pp 335\u2013346. https:\/\/doi.org\/10.1007\/978-3-642-58250-9_27","DOI":"10.1007\/978-3-642-58250-9_27"},{"issue":"411","key":"819_CR33","doi-asserted-by":"publisher","first-page":"633","DOI":"10.1080\/01621459.1990.10474920","volume":"85","author":"PJ Rousseeuw","year":"1990","unstructured":"Rousseeuw PJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633\u2013639. https:\/\/doi.org\/10.1080\/01621459.1990.10474920","journal-title":"J Am Stat Assoc"},{"key":"819_CR34","unstructured":"Schmidt M, Berg E, Friedlander M, Murphy K (2009) Optimizing costly functions with simple constraints: a limited-memory projected quasi-newton algorithm. In: Artificial intelligence and statistics, vol 5, pp 456\u2013463. http:\/\/proceedings.mlr.press\/v5\/schmidt09a.html"},{"key":"819_CR35","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1016\/j.csda.2017.02.002","volume":"111","author":"E Smucler","year":"2017","unstructured":"Smucler E, Yohai VJ (2017) Robust and sparse estimators for linear regression models. Comput Stat Data Anal 111:116\u2013130. https:\/\/doi.org\/10.1016\/j.csda.2017.02.002","journal-title":"Comput Stat Data Anal"},{"issue":"1","key":"819_CR36","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58(1):267\u2013288. https:\/\/doi.org\/10.1111\/j.2517-6161.1996.tb02080.x","journal-title":"J R Stat Soc Ser B (Methodol)"},{"issue":"3","key":"819_CR37","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1198\/073500106000000251","volume":"25","author":"H Wang","year":"2007","unstructured":"Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25(3):347\u2013355. https:\/\/doi.org\/10.1198\/073500106000000251","journal-title":"J Bus Econ Stat"},{"issue":"2","key":"819_CR38","doi-asserted-by":"publisher","first-page":"642","DOI":"10.1214\/aos\/1176350366","volume":"15","author":"VJ Yohai","year":"1987","unstructured":"Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642\u2013656. https:\/\/doi.org\/10.1214\/aos\/1176350366","journal-title":"Ann Stat"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00819-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-022-00819-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00819-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T10:03:44Z","timestamp":1648548224000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-022-00819-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,27]]},"references-count":38,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["819"],"URL":"https:\/\/doi.org\/10.1007\/s10618-022-00819-2","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,27]]},"assertion":[{"value":"16 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}