{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T03:38:39Z","timestamp":1771299519831,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,6]]},"DOI":"10.1145\/3580305.3599780","type":"proceedings-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T18:13:58Z","timestamp":1691172838000},"page":"5039-5050","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9396-0398","authenticated-orcid":false,"given":"Dong","family":"Wang","sequence":"first","affiliation":[{"name":"STCA, Microsoft Corporation, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5557-9134","authenticated-orcid":false,"given":"Kav\u00e9","family":"Salamatian","sequence":"additional","affiliation":[{"name":"University of Savoie, Annecy, France"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8608-574X","authenticated-orcid":false,"given":"Yunqing","family":"Xia","sequence":"additional","affiliation":[{"name":"STCA, Microsoft Corporation, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4793-9715","authenticated-orcid":false,"given":"Weiwei","family":"Deng","sequence":"additional","affiliation":[{"name":"STCA, Microsoft Corporation, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-7438-7248","authenticated-orcid":false,"given":"Qi","family":"Zhang","sequence":"additional","affiliation":[{"name":"STCA, Microsoft Corporation, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 ( 2014 ). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3326937.3341261"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_3_2_2_4_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330651"},{"key":"e_1_3_2_2_6_1","volume-title":"An introduction to ROC analysis. Pattern recognition letters","author":"Fawcett Tom","year":"2006","unstructured":"Tom Fawcett . 2006. An introduction to ROC analysis. Pattern recognition letters , Vol. 27 , 8 ( 2006 ), 861--874. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters, Vol. 27, 8 (2006), 861--874."},{"key":"e_1_3_2_2_7_1","unstructured":"Huifeng Guo Ruiming Tang Yunming Ye Zhenguo Li and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).  Huifeng Guo Ruiming Tang Yunming Ye Zhenguo Li and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017)."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412699"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_10_1","volume-title":"Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton , Oriol Vinyals , and Jeff Dean . 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 ( 2015 ). Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJEB.2008.018068"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Yunjiang Jiang Yue Shang Ziyang Liu Hongwei Shen Yun Xiao Wei Xiong Sulong Xu etal 2020. BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search. arXiv preprint arXiv:2010.10442 (2020).  Yunjiang Jiang Yue Shang Ziyang Liu Hongwei Shen Yun Xiao Wei Xiong Sulong Xu et al. 2020. BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search. arXiv preprint arXiv:2010.10442 (2020).","DOI":"10.1109\/ICDM50108.2020.00030"},{"key":"e_1_3_2_2_13_1","volume-title":"Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942","author":"Lan Zhenzhong","year":"2019","unstructured":"Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , and Radu Soricut . 2019 . Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019). Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3041021.3054192"},{"key":"e_1_3_2_2_15_1","volume-title":"Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504","author":"Liu Xiaodong","year":"2019","unstructured":"Xiaodong Liu , Pengcheng He , Weizhu Chen , and Jianfeng Gao . 2019a. Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504 ( 2019 ). Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao. 2019a. Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504 (2019)."},{"key":"e_1_3_2_2_16_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019 b. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2004.02.003"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412747"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2488200"},{"key":"e_1_3_2_2_20_1","volume-title":"Information gain. School of Computer Science","author":"Moore Andrew W","year":"2001","unstructured":"Andrew W Moore . 2001. Information gain. School of Computer Science , Carnegie Mellon University , http:\/\/www. cs. cmu. edu\/ awm\/tutorials ( 2001 ). Andrew W Moore. 2001. Information gain. School of Computer Science, Carnegie Mellon University, http:\/\/www. cs. cmu. edu\/ awm\/tutorials (2001)."},{"key":"e_1_3_2_2_21_1","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS). 1--7.","author":"Muhamed Aashiq","year":"2021","unstructured":"Aashiq Muhamed , Iman Keivanloo , Sujan Perera , James Mracek , Yi Xu , Qingjun Cui , Santosh Rajagopalan , Belinda Zeng , and Trishul Chilimbi . 2021 . CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models . In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS). 1--7. Aashiq Muhamed, Iman Keivanloo, Sujan Perera, James Mracek, Yi Xu, Qingjun Cui, Santosh Rajagopalan, Belinda Zeng, and Trishul Chilimbi. 2021. CTR-BERT: Cost-effective knowledge distillation for billion-parameter teacher models. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS). 1--7."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331268"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0151"},{"key":"e_1_3_2_2_24_1","first-page":"1989","article-title":"A survey of recommendation system: Research challenges","volume":"4","author":"Sharma Lalita","year":"2013","unstructured":"Lalita Sharma and Anju Gera . 2013 . A survey of recommendation system: Research challenges . International Journal of Engineering Trends and Technology (IJETT) , Vol. 4 , 5 (2013), 1989 -- 1992 . Lalita Sharma and Anju Gera. 2013. A survey of recommendation system: Research challenges. International Journal of Engineering Trends and Technology (IJETT), Vol. 4, 5 (2013), 1989--1992.","journal-title":"International Journal of Engineering Trends and Technology (IJETT)"},{"key":"e_1_3_2_2_25_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_26_1","volume-title":"Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530","author":"Su Weijie","year":"2019","unstructured":"Weijie Su , Xizhou Zhu , Yue Cao , Bin Li , Lewei Lu , Furu Wei , and Jifeng Dai . 2019 . Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019). Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2019. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00756"},{"key":"e_1_3_2_2_28_1","volume-title":"Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223","author":"Sun Yu","year":"2019","unstructured":"Yu Sun , Shuohuan Wang , Yukun Li , Shikun Feng , Xuyi Chen , Han Zhang , Xin Tian , Danxiang Zhu , Hao Tian , and Hua Wu . 2019 b. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019). Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019b. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)."},{"key":"e_1_3_2_2_29_1","volume-title":"Attention is all you need. Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539064"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2013.05.103"},{"key":"e_1_3_2_2_32_1","volume-title":"Studies","volume":"55","author":"Wang Zhe","year":"2020","unstructured":"Zhe Wang , Rundong Shi , Shijie Li , and Peng Yan . 2020 . GBDT and BERT: a Hybrid Solution for Recognizing Citation Intent . Studies , Vol. 55 (2020), 12c2a39230188. Zhe Wang, Rundong Shi, Shijie Li, and Peng Yan. 2020. GBDT and BERT: a Hybrid Solution for Recognizing Citation Intent. Studies, Vol. 55 (2020), 12c2a39230188."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449830"},{"key":"e_1_3_2_2_34_1","volume-title":"Do language embeddings capture scales? arXiv preprint arXiv:2010.05345","author":"Zhang Xikun","year":"2020","unstructured":"Xikun Zhang , Deepak Ramachandran , Ian Tenney , Yanai Elazar , and Dan Roth . 2020. Do language embeddings capture scales? arXiv preprint arXiv:2010.05345 ( 2020 ). Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, and Dan Roth. 2020. Do language embeddings capture scales? arXiv preprint arXiv:2010.05345 (2020)."}],"event":{"name":"KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Long Beach CA USA","acronym":"KDD '23","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599780","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580305.3599780","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:22Z","timestamp":1750182562000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599780"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":34,"alternative-id":["10.1145\/3580305.3599780","10.1145\/3580305"],"URL":"https:\/\/doi.org\/10.1145\/3580305.3599780","relation":{},"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"2023-08-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}