{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:27:11Z","timestamp":1755926831516,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,3,26]],"date-time":"2024-03-26T00:00:00Z","timestamp":1711411200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Pioneer R&D Program of Zhejiang","award":["2024C01021"],"award-info":[{"award-number":["2024C01021"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.<\/jats:p>","DOI":"10.1145\/3643853","type":"journal-article","created":{"date-parts":[[2024,2,5]],"date-time":"2024-02-05T12:24:13Z","timestamp":1707135853000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["On the Value of Head Labels in Multi-Label Text Classification"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8586-3048","authenticated-orcid":false,"given":"Haobo","family":"Wang","sequence":"first","affiliation":[{"name":"School of Software Technology, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7006-1728","authenticated-orcid":false,"given":"Cheng","family":"Peng","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4885-2762","authenticated-orcid":false,"given":"Hede","family":"Dong","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2839-5799","authenticated-orcid":false,"given":"Lei","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2450-3369","authenticated-orcid":false,"given":"Weiwei","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0744-6454","authenticated-orcid":false,"given":"Tianlei","family":"Hu","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3062-0900","authenticated-orcid":false,"given":"Ke","family":"Chen","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7483-0045","authenticated-orcid":false,"given":"Gang","family":"Chen","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,26]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.93"},{"key":"e_1_3_3_3_2","first-page":"242","volume-title":"Proceedings of the ICML","volume":"97","author":"Allen-Zhu Zeyuan","year":"2019","unstructured":"Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In Proceedings of the ICML. Vol. 97, PMLR, 242\u2013252."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018741"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-019-05791-5"},{"key":"e_1_3_3_6_2","unstructured":"K. Bhatia K. Dahiya H. Jain A. Mittal Y. Prabhu and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. Retrieved from http:\/\/manikvarma.org\/downloads\/XC\/XMLRepository.html. Accessed 1-1-2024."},{"key":"e_1_3_3_7_2","first-page":"730","volume-title":"Proceedings of the NeurIPS","author":"Bhatia Kush","year":"2015","unstructured":"Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse local embeddings for extreme multi-label classification. In Proceedings of the NeurIPS. 730\u2013738."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.607"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403368"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.283"},{"key":"e_1_3_3_11_2","first-page":"1538","volume-title":"Proceedings of the NeurIPS","author":"Chen Yao-Nan","year":"2012","unstructured":"Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification. In Proceedings of the NeurIPS. 1538\u20131546."},{"key":"e_1_3_3_12_2","first-page":"4171","volume-title":"Proceedings of the NAACL-HLT","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT. Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_3_13_2","first-page":"876","volume-title":"Proceedings of the UAI","author":"Izmailov Pavel","year":"2018","unstructured":"Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. 2018. Averaging weights leads to wider optima and better generalization. In Proceedings of the UAI. AUAI Press, 876\u2013885."},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939756"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i9.16974"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2992393"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-020-05888-2"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.287"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080834"},{"key":"e_1_3_3_20_2","series-title":"Proceedings of the ICML","first-page":"4032","volume":"97","author":"Liu Weiwei","year":"2019","unstructured":"Weiwei Liu and Xiaobo Shen. 2019. Sparse extreme multi-label learning with oracle property. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 97, PMLR, 4032\u20134041."},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-short.135"},{"key":"e_1_3_3_22_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arxiv:1907.11692. Retrieved from http:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46379-7_1"},{"key":"e_1_3_3_24_2","article-title":"The benefit of multitask representation learning","author":"Maurer Andreas","year":"2016","unstructured":"Andreas Maurer, Massimiliano Pontil, and Bernardino Romera-Paredes. 2016. The benefit of multitask representation learning. J. Mach. Learn. Res. 17, 1 (2016), 2853\u20132884.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507163"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87481-2_4"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1100"},{"key":"e_1_3_3_28_2","first-page":"5413","volume-title":"Proceedings of the NeurIPS","author":"Nam Jinseok","year":"2017","unstructured":"Jinseok Nam, Eneldo Loza Menc\u00eda, Hyunwoo J. Kim, and Johannes F\u00fcrnkranz. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Proceedings of the NeurIPS. 5413\u20135423."},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185998"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","unstructured":"Jesse Read Bernhard Pfahringer Geoff Holmes and Eibe Frank. 2021. Classifier chains: A review and perspectives. J. Artif. Intell. Res. 70 (2021) 683\u2013718. 10.1613\/jair.1.12376","DOI":"10.1613\/jair.1.12376"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1352"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","unstructured":"Mohammadreza Qaraei Erik Schultheis Priyanshu Gupta and Rohit Babbar. 2021. Convex surrogates for unbiased loss functions in extreme classification with missing labels. WWW\u201921: The Web Conference 2021 Virtual Event\/Ljubljana Slovenia April 19-23 2021 Jure Leskovec Marko Grobelnik Marc Najork Jie Tang and Leila Zia (Eds.). ACM\/IW3C2 3711\u20133720. 10.1145\/3442381.3450139","DOI":"10.1145\/3442381.3450139"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3097987"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1214"},{"issue":"7","key":"e_1_3_3_36_2","first-page":"2315","article-title":"Does tail label help for large-scale multi-label learning?","volume":"31","author":"Wei Tong","year":"2020","unstructured":"Tong Wei and Yu-Feng Li. 2020. Does tail label help for large-scale multi-label learning? IEEE Trans. Neural Networks Learn. Syst. 31, 7 (2020), 2315\u20132324.","journal-title":"IEEE Trans. Neural Networks Learn. Syst."},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467223"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.217"},{"key":"e_1_3_3_39_2","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz and Jamie Brew. 2019. HuggingFace\u2019s transformers: State-of-the-art natural language processing. arxiv:1910.03771. Retrieved from http:\/\/arxiv.org\/abs\/1910.03771"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939798"},{"key":"e_1_3_3_41_2","first-page":"3915","volume-title":"Proceedings of the COLING","author":"Yang Pengcheng","year":"2018","unstructured":"Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence generation model for multi-label classification. In Proceedings of the COLING. Association for Computational Linguistics, 3915\u20133926."},{"key":"e_1_3_3_42_2","volume-title":"Proceedings of the NeurIPS","author":"Yang Yuzhe","year":"2020","unstructured":"Yuzhe Yang and Zhi Xu. 2020. Rethinking the value of labels for improving class-imbalanced learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_3_43_2","first-page":"5754","volume-title":"Proceedings of the NeurIPS","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the NeurIPS. 5754\u20135764."},{"key":"e_1_3_3_44_2","series-title":"Proceedings of the ICML","first-page":"10809","volume":"119","author":"Ye Hui","year":"2020","unstructured":"Hui Ye, Zhiyu Chen, Da-Han Wang, and Brian D. Davison. 2020. Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification. In Proceedings of the ICML. Proceedings of Machine Learning Research, Vol. 119, PMLR, 10809\u201310819."},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098083"},{"key":"e_1_3_3_46_2","first-page":"5812","volume-title":"Proceedings of the NeurIPS","author":"You Ronghui","year":"2019","unstructured":"Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Proceedings of the NeurIPS. 5812\u20135822."},{"key":"e_1_3_3_47_2","unstructured":"Arkaitz Zubiaga. 2012. Enhancing navigation on Wikipedia with social tags. arxiv:1202.5469. Retrieved from http:\/\/arxiv.org\/abs\/1202.5469"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643853","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643853","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:57:34Z","timestamp":1750291054000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,26]]},"references-count":46,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3643853"],"URL":"https:\/\/doi.org\/10.1145\/3643853","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2024,3,26]]},"assertion":[{"value":"2022-05-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-24","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}