{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T17:39:40Z","timestamp":1770226780116,"version":"3.49.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,10,22]],"date-time":"2021-10-22T00:00:00Z","timestamp":1634860800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Hong Kong General Research Fund","award":["17306519"],"award-info":[{"award-number":["17306519"]}]},{"name":"National Key Research and Development Program of China","award":["2018AAA0101900"],"award-info":[{"award-number":["2018AAA0101900"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62006207 and 61625107"],"award-info":[{"award-number":["62006207 and 61625107"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Zhejiang Province Natural Science Foundation","award":["LQ21F020020"],"award-info":[{"award-number":["LQ21F020020"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>In data mining and machine learning, it is commonly assumed that training and test data share the same population distribution. However, this assumption is often violated in practice because of the sample selection bias, which might induce the distribution shift from training data to test data. Such a model-agnostic distribution shift usually leads to prediction instability across unknown test data. This article proposes a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design. It isolates the clear effect of each predictor from the confounding variables. A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift, improving both the accuracy of parameter estimation and the stability of prediction across unknown test data. Numerical experiments on synthetic and real-world datasets demonstrate that our BSSP algorithm can significantly outperform the baseline methods for stable prediction across unknown test data.<\/jats:p>","DOI":"10.1145\/3477052","type":"journal-article","created":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T04:28:40Z","timestamp":1634963320000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Balance-Subsampled Stable Prediction Across Unknown Test Data"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5524-5185","authenticated-orcid":false,"given":"Kun","family":"Kuang","sequence":"first","affiliation":[{"name":"Zhejiang University, Zhejiang, China"}]},{"given":"Hengtao","family":"Zhang","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Hong Kong, China"}]},{"given":"Runze","family":"Wu","sequence":"additional","affiliation":[{"name":"Fuxi AI Lab, NetEase Games, Zhejiang, China"}]},{"given":"Fei","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Zhejiang, China"}]},{"given":"Yueting","family":"Zhuang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Zhejiang, China"}]},{"given":"Aijun","family":"Zhang","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Hong Kong, China"}]}],"member":"320","published-online":{"date-parts":[[2021,10,22]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Martin Arjovsky L\u00e9on Bottou Ishaan Gulrajani and David Lopez-Paz. 2019. Invariant risk minimization. arXiv:1907.02893. Retrieved from https:\/\/arxiv.org\/abs\/1907.02893."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1111\/rssb.12268"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1080\/00273171.2011.568786"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5152-4"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/1577069.1755858"},{"key":"e_1_3_2_7_2","article-title":"Domain generalization by marginal transfer learning","volume":"22","author":"Blanchard Gilles","year":"2021","unstructured":"Gilles Blanchard, Aniket Anand Deshmukh, Urun Dogan, Gyemin Lee, and Clayton Scott. 2021. Domain generalization by marginal transfer learning. Journal of Machine Learning Research 22 (2021), 2\u20131.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/1610075.1610094"},{"key":"e_1_3_2_9_2","volume-title":"Statistics for Experimenters","author":"Box George E. P.","year":"2005","unstructured":"George E. P. Box, J. Stuart Hunter, and William G. Hunter. 2005. Statistics for Experimenters. Wiley Hoboken, NJ."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2382577.2382582"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CEC48606.2020.9185713"},{"key":"e_1_3_2_12_2","volume-title":"Fractional Factorial Plans","author":"Dey Aloke","year":"2009","unstructured":"Aloke Dey and Rahul Mukerjee. 2009. Fractional Factorial Plans. Vol. 496. John Wiley & Sons."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-010-0331-6"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1214\/20-AOS2004"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1214\/17-AOAS1101"},{"issue":"4","key":"e_1_3_2_16_2","first-page":"601","article-title":"Minimum aberration  2^{k-p}  designs","volume":"22","author":"Fries Arthur","year":"1980","unstructured":"Arthur Fries and William G Hunter. 1980. Minimum aberration 2^{k-p} designs. Technometrics 22, 4 (1980), 601\u2013608.","journal-title":"Technometrics"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2019.8790318"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-68759-9_45"},{"issue":"1","key":"e_1_3_2_19_2","first-page":"1","article-title":"R package FrF2 for creating and analyzing fractional factorial 2-level designs","volume":"56","author":"Gr\u00f6nmping Ulrike","year":"2014","unstructured":"Ulrike Gr\u00f6nmping. 2014. R package FrF2 for creating and analyzing fractional factorial 2-level designs. Journal of Statistical Software 56, 1 (2014), 1\u201356.","journal-title":"Journal of Statistical Software"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mpr025"},{"key":"e_1_3_2_21_2","volume-title":"Orthogonal Arrays: Theory and Applications","author":"Hedayat A. Samad","year":"2012","unstructured":"A. Samad Hedayat, Neil James Alexander Sloane, and John Stufken. 2012. Orthogonal Arrays: Theory and Applications. Springer Science & Business Media."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2015.2405277"},{"key":"e_1_3_2_23_2","first-page":"264","volume-title":"Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics","author":"Jiang Jing","year":"2007","unstructured":"Jing Jiang and Chengxiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 264\u2013271."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220082"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3365677"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098032"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.5555\/3298239.3298261"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2019.08.016"},{"key":"e_1_3_2_29_2","first-page":"6804","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Liu Jiashuo","year":"2021","unstructured":"Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Heterogeneous risk minimization. In Proceedings of the 38th International Conference on Machine Learning. 6804\u20136814."},{"key":"e_1_3_2_30_2","first-page":"8662","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Liu Jiashuo","year":"2021","unstructured":"Jiashuo Liu, Zheyan Shen, Peng Cui, Linjun Zhou, Kun Kuang, and Bo Li. 2021. Distributionally robust learning with stable adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence 35, 10 (2021), 8662\u20138670."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.111"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2018.12.038"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2019.12.013"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s001840100112"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2831141"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-018-9549-x"},{"key":"e_1_3_2_37_2","volume-title":"The Theory of Error-Correcting Codes","author":"MacWilliams Florence Jessie","year":"1977","unstructured":"Florence Jessie MacWilliams and Neil James Alexander Sloane. 1977. The Theory of Error-Correcting Codes. Vol. 16. Elsevier."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/3042817.3042820"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1111\/rssb.12167"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2019.12.012"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/3291125.3291161"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/70.1.41"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240577"},{"key":"e_1_3_2_45_2","volume-title":"Sampling (3rd ed.)","author":"Thompson Steven K.","year":"2012","unstructured":"Steven K. Thompson. 2012. Sampling (3rd ed.). Wiley, New York, NY."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467403"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBME.2009.2036000"},{"issue":"132","key":"e_1_3_2_49_2","first-page":"1","article-title":"More efficient estimation for logistic regression with optimal subsamples","volume":"20","author":"Wang HaiYing","year":"2019","unstructured":"HaiYing Wang. 2019. More efficient estimation for logistic regression with optimal subsamples. The Journal of Machine Learning Research 20, 132 (2019), 1\u201359.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2017.1408468"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.5555\/3491440.3491741"},{"key":"e_1_3_2_52_2","volume-title":"Experiments: Planning, Analysis, and Optimization","author":"Wu C. F. Jeff","year":"2011","unstructured":"C. F. Jeff Wu and Michael S. Hamada. 2011. Experiments: Planning, Analysis, and Optimization. Vol. 552. John Wiley & Sons."},{"issue":"4","key":"e_1_3_2_53_2","first-page":"1066","article-title":"Generalized minimum aberration for asymmetrical fractional factorial designs","volume":"29","author":"Xu Hongquan","year":"2001","unstructured":"Hongquan Xu and C. F. Jeff Wu. 2001. Generalized minimum aberration for asymmetrical fractional factorial designs. The Annals of Statistics 29, 4 (2001), 1066\u20131077.","journal-title":"The Annals of Statistics"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1214\/009053605000000679"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-020-09936-8"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K17-1040"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1022"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1591"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2015.1023805"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477052","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477052","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:47Z","timestamp":1750188647000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477052"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,22]]},"references-count":58,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3477052"],"URL":"https:\/\/doi.org\/10.1145\/3477052","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,22]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}