{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,29]],"date-time":"2025-11-29T08:01:37Z","timestamp":1764403297608,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,7]],"date-time":"2024-03-07T00:00:00Z","timestamp":1709769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["IIS-2142675, IIS-2142681, and III-2008557"],"award-info":[{"award-number":["IIS-2142675, IIS-2142681, and III-2008557"]}]},{"name":"Kent State University and iLambda Inc."}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Recomm. Syst."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>\n            Personalized recommender systems play a crucial role in modern society, especially in e-commerce, news, and ads areas. Correctly evaluating and comparing candidate recommendation models is as essential as constructing ones. The common offline evaluation strategy is holding out some user-interacted items from training data and evaluating the performance of recommendation models based on how many items they can retrieve. Specifically, for any hold-out item or so-called target item for a user, the recommendation models try to predict the probability that the user would interact with the item and rank it among overall items, which is called\n            <jats:italic>global evaluation<\/jats:italic>\n            . Intuitively, a good recommendation model would assign high probabilities to such hold-out\/target items. Based on the specific ranks, some metrics like\n            <jats:italic>Recall@K<\/jats:italic>\n            and\n            <jats:italic>NDCG@K<\/jats:italic>\n            can be calculated to further quantify the quality of the recommender model. Instead of ranking the target items among all items, Koren first proposed to rank them among a small\n            <jats:italic>sampled set of items<\/jats:italic>\n            , then quantified the performance of the models, which is called\n            <jats:italic>sampling evaluation<\/jats:italic>\n            . Ever since then, there has been a large amount of work adopting sampling evaluation due to its efficiency and frugality. In recent work, Rendle and Krichene argued that the sampling evaluation is \u201cinconsistent\u201d with respect to a global evaluation in terms of offline top-\n            <jats:italic>K<\/jats:italic>\n            metrics.\n          <\/jats:p>\n          <jats:p>\n            In this work, we first investigate the \u201cinconsistent\u201d phenomenon by taking a glance at the connections between sampling evaluation and global evaluation. We reveal the approximately linear relationship between sampling with respect to its global counterpart in terms of the top-\n            <jats:italic>K<\/jats:italic>\n            Recall metric. Second, we propose a new statistical perspective of the sampling evaluation\u2014to estimate the global rank distribution of the entire population. After the estimated rank distribution is obtained, the approximation of the global metric can be further derived. Third, we extend the work of Krichene and Rendle, directly optimizing the error with ground truth, providing not only a comprehensive empirical study but also a rigorous theoretical understanding of the proposed metric estimators. To address the \u201cblind spot\u201d issue, where accurately estimating metrics for small top-\n            <jats:italic>K<\/jats:italic>\n            values in sampling evaluation is challenging, we propose a novel adaptive sampling method that generalizes the expectation-maximization algorithm to this setting. Last but not least, we also study the user sampling evaluation effect. This series of works outlines a clear roadmap for sampling evaluation and establishes a foundational theoretical framework. Extensive empirical studies validate the reliability of the sampling methods presented.\n          <\/jats:p>","DOI":"10.1145\/3629171","type":"journal-article","created":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T22:05:27Z","timestamp":1698444327000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["On Item-Sampling Evaluation for Recommender System"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2599-6065","authenticated-orcid":false,"given":"Dong","family":"Li","sequence":"first","affiliation":[{"name":"Kent State University, Kent, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1895-4243","authenticated-orcid":false,"given":"Ruoming","family":"Jin","sequence":"additional","affiliation":[{"name":"Kent State University, Kent, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9494-8748","authenticated-orcid":false,"given":"Zhenming","family":"Liu","sequence":"additional","affiliation":[{"name":"College of William &amp; Mary, Williamsburg, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4116-5237","authenticated-orcid":false,"given":"Bin","family":"Ren","sequence":"additional","affiliation":[{"name":"College of William &amp; Mary, Williamsburg, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4502-5650","authenticated-orcid":false,"given":"Jing","family":"Gao","sequence":"additional","affiliation":[{"name":"iLambda Inc., Aurora, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5248-4807","authenticated-orcid":false,"given":"Zhi","family":"Liu","sequence":"additional","affiliation":[{"name":"iLambda Inc., Aurora, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,3,7]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511804441"},{"key":"e_1_3_3_3_2","volume-title":"Statistical Inference","author":"Casella G.","year":"2002","unstructured":"G. Casella and R. L. Berger. 2002. Statistical Inference. Thomson Learning."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.5555\/1146355"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1979742.1979896"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/1864708.1864721"},{"key":"e_1_3_3_7_2","unstructured":"Maurizio Ferrari Dacrema Simone Boglio Paolo Cremonesi and Dietmar Jannach. 2019. A troubling analysis of reproducibility and progress in recommender systems research. arXiv:1911.07698 (2019)."},{"key":"e_1_3_3_8_2","doi-asserted-by":"crossref","DOI":"10.1145\/3460231.3475943","article-title":"A case study on sampling strategies for evaluating neural sequential item recommendation models","author":"Dallmann Alexander","year":"2021","unstructured":"Alexander Dallmann, Daniel Zoller, and Andreas Hotho. 2021. A case study on sampling strategies for evaluating neural sequential item recommendation models. In Proceedings of the 15th ACM Conference on Recommender Systems (RecSys \u201921).","journal-title":"Proceedings of the 15th ACM Conference on Recommender Systems (RecSys \u201921)."},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248548"},{"key":"e_1_3_3_10_2","doi-asserted-by":"crossref","DOI":"10.1145\/963770.963776","article-title":"Item-based top-N recommendation algorithms","author":"Deshpande Mukund","year":"2004","unstructured":"Mukund Deshpande and George Karypis. 2004. Item-based top-N recommendation algorithms. ACM Transactions on Information Systems 22, 1 (2004), 143\u2013177.","journal-title":"ACM Transactions on Information Systems"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3209991"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741667"},{"issue":"21","key":"e_1_3_3_13_2","article-title":"Recommendation systems: Algorithms, challenges, metrics, and business opportunities","volume":"10","author":"Fayyaz Zeshan","year":"2020","unstructured":"Zeshan Fayyaz, Mahsa Ebrahimian, Dina Nawara, Ahmed Ibrahim, and Rasha Kashef. 2020. Recommendation systems: Algorithms, challenges, metrics, and business opportunities. Applied Sciences 10, 21 (2020), 7748.","journal-title":"Applied Sciences"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3291027"},{"issue":"12","key":"e_1_3_3_15_2","article-title":"A survey of accuracy evaluation metrics of recommendation tasks.","volume":"10","author":"Gunawardana Asela","year":"2009","unstructured":"Asela Gunawardana and Guy Shani. 2009. A survey of accuracy evaluation metrics of recommendation tasks. Journal of Machine Learning Research 10, 12 (2009), 2935\u20132962.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_16_2","first-page":"982","volume-title":"Proceedings of the 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA \u201920)","author":"Gupta U.","year":"2020","unstructured":"U. Gupta, S. Hsia, V. Saraph, X. Wang, B. Reagen, G. Wei, H. S. Lee, D. Brooks, and C. Wu. 2020. DeepRecSys: A system for optimizing end-to-end at-scale neural recommendation inference. In Proceedings of the 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA \u201920). 982\u2013995."},{"key":"e_1_3_3_17_2","unstructured":"Xiangnan He Lizi Liao Hanwang Zhang Liqiang Nie Xia Hu and Tat-Seng Chua. 2017. Neural collaborative filtering. InProceedings of the 22nd International Conference on World Wide Web (WWW \u201917)."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219965"},{"key":"e_1_3_3_19_2","volume-title":"Proceedings of the 2008 9th IEEE International Conference on Data Mining (ICDM \u201908)","author":"Hu Y.","year":"2008","unstructured":"Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 9th IEEE International Conference on Data Mining (ICDM \u201908)."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467428"},{"key":"e_1_3_3_21_2","first-page":"4147","article-title":"On estimating recommendation evaluation metrics under sampling","author":"Jin Ruoming","year":"2021","unstructured":"Ruoming Jin, Dong Li, Benjamin Mudrak, Jing Gao, and Zhi Liu. 2021. On estimating recommendation evaluation metrics under sampling. Proceedings of the AAAI Conference on Artificial Intelligence 35, 5 (May2021), 4147\u20134154.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"e_1_3_3_22_2","first-page":"27","article-title":"Multi-objective personalization in multi-stakeholder organizational bulk e-mail: A field experiment","volume":"6","author":"Kong Ruoyan","year":"2022","unstructured":"Ruoyan Kong, Charles Chuankai Zhang, Ruixuan Sun, Vishnu Chhabra, Tanushsrisai Nadimpalli, and Joseph A. Konstan. 2022. Multi-objective personalization in multi-stakeholder organizational bulk e-mail: A field experiment. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (Nov. 2022), Article 528, 27 pages.","journal-title":"Proceedings of the ACM on Human-Computer Interaction"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401944"},{"key":"e_1_3_3_24_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR \u201919)","author":"Krichene Walid","year":"2019","unstructured":"Walid Krichene, N. Mayoraz, S. Rendle, L. Zhang, X. Yi, L. Hong, Ed H. Chi, and J. R. Anderson. 2019. Efficient training on very large corpora via Gramian estimation. In Proceedings of the International Conference on Learning Representations (ICLR \u201919)."},{"key":"e_1_3_3_25_2","volume-title":"Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201920). ACM, New York, NY, 1748\u20131757.","author":"Krichene Walid","year":"2020","unstructured":"Walid Krichene and Steffen Rendle. 2020. On sampled metrics for item recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201920). ACM, New York, NY, 1748\u20131757."},{"key":"e_1_3_3_26_2","volume-title":"Theory of Point Estimation","author":"Lehmann Erich L.","year":"2006","unstructured":"Erich L. Lehmann and George Casella. 2006. Theory of Point Estimation. Springer Science & Business Media."},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403262"},{"key":"e_1_3_3_28_2","volume-title":"Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI \u201923)","author":"Li Dong","year":"2023","unstructured":"Dong Li, Ruoming Jin, Zhenming Liu, Bin Ren, Jing Gao, and Zhi Liu. 2023. Towards reliable item sampling for recommendation evaluation. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI \u201923)."},{"key":"e_1_3_3_29_2","doi-asserted-by":"crossref","unstructured":"Wentian Li Pedro Miramontes and Cocho Germinal. 2010. Fitting ranked linguistic data with two-parameter functions. Entropy 12 7 (2010) 1743\u20131764.","DOI":"10.3390\/e12071743"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186150"},{"key":"e_1_3_3_31_2","first-page":"728","volume-title":"Proceedings of the 2020 IEEE International Conference on Big Data (Big Data \u201920)","author":"Liu Zhiwei","year":"2020","unstructured":"Zhiwei Liu, Xiaohan Li, Ziwei Fan, Stephen Guo, Kannan Achan, and Philip S. Yu. 2020. Basket recommendation with multi-intent translation graph neural network. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data \u201920). 728\u2013737."},{"key":"e_1_3_3_32_2","volume-title":"Signals and Systems","author":"Rao K. Deergha","year":"2018","unstructured":"K. Deergha Rao. 2018. Signals and Systems. Springer International Publishing."},{"key":"e_1_3_3_33_2","article-title":"Evaluation metrics for item recommendation under sampling","author":"Rendle Steffen","year":"2019","unstructured":"Steffen Rendle. 2019. Evaluation metrics for item recommendation under sampling. arXiv preprint arXiv:1912.02263 (2019).","journal-title":"arXiv preprint arXiv:1912.02263"},{"key":"e_1_3_3_34_2","doi-asserted-by":"crossref","DOI":"10.1145\/3308558.3313710","article-title":"Embarrassingly shallow autoencoders for sparse data","author":"Steck Harald","year":"2019","unstructured":"Harald Steck. 2019. Embarrassingly shallow autoencoders for sparse data. In Proceedings of the World Wide Web Conference (WWW \u201919). 3251\u20133257.","journal-title":"Proceedings of the World Wide Web Conference (WWW \u201919)."},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v42i3.18140"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460231.3478848"},{"key":"e_1_3_3_37_2","volume-title":"Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI \u201919)","author":"Wang Xiang","year":"2019","unstructured":"Xiang Wang, D. Wang, C. Xu, X. He, Y. Cao, and T. Chua. 2019. Explainable reasoning over knowledge graphs for recommendation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI \u201919)."},{"key":"e_1_3_3_38_2","volume-title":"Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM \u201918)","author":"Yang Longqi","year":"2018","unstructured":"Longqi Yang, Eugene Bagdasaryan, Joshua Gruenstein, Cheng-Kang Hsieh, and Deborah Estrin. 2018. OpenRec: A modular framework for extensible and adaptable recommendation algorithms. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM \u201918). 664\u2013672."},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240355"},{"key":"e_1_3_3_40_2","volume-title":"Proceedings of the Conference on Information and Knowledge Management (CIKM \u201921)","author":"Zhao Wayne Xin","year":"2021","unstructured":"Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In Proceedings of the Conference on Information and Knowledge Management (CIKM \u201921)."}],"container-title":["ACM Transactions on Recommender Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3629171","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3629171","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:18Z","timestamp":1750178178000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3629171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,7]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3629171"],"URL":"https:\/\/doi.org\/10.1145\/3629171","relation":{},"ISSN":["2770-6699"],"issn-type":[{"type":"electronic","value":"2770-6699"}],"subject":[],"published":{"date-parts":[[2024,3,7]]},"assertion":[{"value":"2022-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-02","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}