{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T04:55:48Z","timestamp":1769144148668,"version":"3.49.0"},"reference-count":34,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Intelligent Data Analysis: An International Journal"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:p>Pre-trained language models have become a critical natural language processing component in many E-commerce applications. As businesses continue to evolve, the pre-trained models should be able to adopt new domain knowledge and new tasks. This paper proposes a novel sequential multi-task pre-trained language framework, ICL-BERT (In-loop Continual Learning BERT), which enables evolving the current model with new knowledge and new tasks. The contributions of ICL-BERT are (1) vocabularies and entities are optimized on E-commerce corpus; (2) a new glyph embedding is introduced to learn glyph information for vocabularies and entities; (3) specific and general tasks are designed to encode E-commerce knowledge for pre-training ICL-BERT; and (4) a new task-gating mechanism, called ICL (In-loop continual Learning), is proposed for sequential multi-task learning, which evolves the current model effectively and efficiently. Our evaluation results demonstrate that ICL-BERT outperforms existing models in both CLUE and e-commerce tasks, with an average accuracy improvement of 1.73% and 3.5%, respectively. Furthermore, ICL-BERT serves as a fundamental pre-trained language model that runs online in JingDong\u2019s daily business.<\/jats:p>","DOI":"10.1177\/1088467x251333230","type":"journal-article","created":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T05:07:58Z","timestamp":1745903278000},"page":"235-250","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["ICL: In-loop continual learning framework for language model pre-training for E-commerce"],"prefix":"10.1177","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-0731-4343","authenticated-orcid":false,"given":"Nan","family":"Lu","sequence":"first","affiliation":[{"name":"School of Electronics and Information Engineering, Beijing Jiaotong University, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5545-4845","authenticated-orcid":false,"given":"Chi-Man","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Science, University of Macau, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6564-7678","authenticated-orcid":false,"given":"Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"Risk Management Group, Retail JD.com, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1498-4891","authenticated-orcid":false,"given":"Sanpeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Risk Management Group, Retail JD.com, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0428-9262","authenticated-orcid":false,"given":"Danyang","family":"Zhu","sequence":"additional","affiliation":[{"name":"Risk Management Group, Retail JD.com, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7997-8279","authenticated-orcid":false,"given":"Chi-Man","family":"Vong","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Science, University of Macau, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6781-455X","authenticated-orcid":false,"given":"Rui","family":"Lin","sequence":"additional","affiliation":[{"name":"Risk Management Group, Retail JD.com, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5737-2874","authenticated-orcid":false,"given":"Shaoyi","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Beijing Jiaotong University, China"}]}],"member":"179","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"key":"e_1_3_4_2_2","unstructured":"Devlin J Chang M-W Lee K et\u00a0al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018."},{"key":"e_1_3_4_3_2","unstructured":"Liu Y Ott M Goyal N et\u00a0al. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 2019b."},{"key":"e_1_3_4_4_2","unstructured":"Sun Y Wang S Feng S et\u00a0al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137 2021a."},{"key":"e_1_3_4_5_2","unstructured":"Sun Y Wang S Li Y et\u00a0al. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 2019."},{"key":"e_1_3_4_6_2","doi-asserted-by":"crossref","unstructured":"Sun Y Wang S Li Y et\u00a0al. Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence volume 34 pp.8968\u20138975 2020.","DOI":"10.1609\/aaai.v34i05.6428"},{"key":"e_1_3_4_7_2","unstructured":"Vaswani A Shazeer N Parmar N et\u00a0al. Attention is all you need. In: Advances in Neural Information Processing Systems 2017 pp.5998\u20136008."},{"key":"e_1_3_4_8_2","unstructured":"Mangalgi S Kumar L Tallamraju RB. Deep contextual embeddings for address classification in e-commerce. arXiv preprint arXiv:2007.03020 2020."},{"key":"e_1_3_4_9_2","doi-asserted-by":"crossref","unstructured":"Xu S Li H Yuan P et\u00a0al. K-plug: Knowledge-injected pre-trained language model for natural language understanding and generation in e-commerce. arXiv preprint arXiv:2104.06960 2021.","DOI":"10.18653\/v1\/2021.findings-emnlp.1"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3124365"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00300"},{"key":"e_1_3_4_12_2","doi-asserted-by":"crossref","unstructured":"Sun Z Li X Sun X et\u00a0al. Chinesebert: Chinese pretraining enhanced by glyph and pinyin information. arXiv preprint arXiv:2106.16038 2021b.","DOI":"10.18653\/v1\/2021.acl-long.161"},{"key":"e_1_3_4_13_2","doi-asserted-by":"crossref","unstructured":"Gao T Yao X Chen D. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 2021.","DOI":"10.18653\/v1\/2021.emnlp-main.552"},{"key":"e_1_3_4_14_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel C","year":"2020","unstructured":"Raffel C, Shazeer N, Roberts A, et\u00a0al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020; 21: 1\u201367.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_4_15_2","unstructured":"Microsoft. Turing-nlg: A 17-billion-parameter language model by microsoft. https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\/ 2020."},{"key":"e_1_3_4_16_2","doi-asserted-by":"crossref","unstructured":"Liu X He P Chen W et\u00a0al. Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504 2019a.","DOI":"10.18653\/v1\/P19-1441"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-01581-6"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2019.01.012"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3304567"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.10.021"},{"key":"e_1_3_4_21_2","doi-asserted-by":"crossref","unstructured":"Smith JS Tian J Halbe S et\u00a0al. A closer look at rehearsal-free continual learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 2023 pP.2410\u20132420.","DOI":"10.1109\/CVPRW59228.2023.00239"},{"key":"e_1_3_4_22_2","doi-asserted-by":"crossref","unstructured":"Gopalakrishnan S Singh PR Fayek H et\u00a0al. Knowledge capture and replay for continual learning. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision 2022 pp.10\u201318.","DOI":"10.1109\/WACV51458.2022.00041"},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2023.3246049"},{"key":"e_1_3_4_24_2","doi-asserted-by":"crossref","unstructured":"Merlin G Lomonaco V Cossu A et\u00a0al. Practical recommendations for replay-based continual learning methods. In: International Conference on Image Analysis and Processing Springer 2022 pp.548\u2013559.","DOI":"10.1007\/978-3-031-13324-4_47"},{"key":"e_1_3_4_25_2","doi-asserted-by":"crossref","unstructured":"Douillard A Ram\u00e9 A Couairon G et\u00a0al. Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition pp.9285\u20139295 2022.","DOI":"10.1109\/CVPR52688.2022.00907"},{"key":"e_1_3_4_26_2","doi-asserted-by":"crossref","unstructured":"Xue M Zhang H Song J et\u00a0al. Meta-attention for vit-backed continual learning. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2022 pp.150\u2013159.","DOI":"10.1109\/CVPR52688.2022.00025"},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_4_28_2","doi-asserted-by":"crossref","unstructured":"Pf\u00fclb B Gepperth A Abdullah S et\u00a0al. Catastrophic forgetting: still a problem for dnns. In: Artificial Neural Networks and Machine Learning\u2013ICANN 2018: 27th International Conference on Artificial Neural Networks Rhodes Greece October 4-7 2018 Proceedings Part I 27 Springer 2018 pp.487\u2013497.","DOI":"10.1007\/978-3-030-01418-6_48"},{"key":"e_1_3_4_29_2","unstructured":"Wang T Isola P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning 2020 pp.9929\u20139939. PMLR."},{"key":"e_1_3_4_30_2","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava N","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, et\u00a0al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15: 1929\u20131958.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_4_31_2","doi-asserted-by":"crossref","unstructured":"Lin T-Y Goyal P Girshick R et\u00a0al. Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision 2017 pp.2980\u20132988.","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_4_32_2","doi-asserted-by":"crossref","unstructured":"Rasley J Rajbhandari S Ruwase O et\u00a0al. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 pp.3505\u20133506.","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_4_33_2","doi-asserted-by":"crossref","unstructured":"Niu W Guan J Wang Y et\u00a0al. Dnnfusion: accelerating deep neural networks execution with advanced operator fusion. In: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation 2021 pp.883\u2013898.","DOI":"10.1145\/3453483.3454083"},{"key":"e_1_3_4_34_2","unstructured":"Xu L Hu H Zhang X et\u00a0al. Clue: A chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986 2020."},{"key":"e_1_3_4_35_2","unstructured":"CO L. Iflytek: a multiple categories chinese text classifier. competition official website 2019."}],"container-title":["Intelligent Data Analysis: An International Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1088467X251333230","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1088467X251333230","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1088467X251333230","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T12:36:41Z","timestamp":1769085401000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1088467X251333230"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["10.1177\/1088467X251333230"],"URL":"https:\/\/doi.org\/10.1177\/1088467x251333230","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"value":"1088-467X","type":"print"},{"value":"1571-4128","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,29]]}}}