{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:19:15Z","timestamp":1759331955022,"version":"3.41.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,5,18]],"date-time":"2020-05-18T00:00:00Z","timestamp":1589760000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,7,31]]},"abstract":"<jats:p>Dependency-based graph convolutional networks (DepGCNs) are proven helpful for text representation to handle many natural language tasks. Almost all previous models are trained with cross-entropy (CE) loss, which maximizes the posterior likelihood directly. However, the contribution of dependency structures is not well considered by CE loss. As a result, the performance improvement gained by using the structure information can be narrow due to the failure in learning to rely on this structure information. To face the challenge, we propose the novel structurally comparative hinge (SCH) loss function for DepGCNs. SCH loss aims at enlarging the margin gained by structural representations over non-structural ones. From the perspective of information theory, this is equivalent to improving the conditional mutual information of model decision and structure information given text. Our experimental results on both English and Chinese datasets show that by substituting SCH loss for CE loss on various tasks, for both induced structures and structures from an external parser, performance is improved without additional learnable parameters. Furthermore, the extent to which certain types of examples rely on the dependency structure can be measured directly by the learned margin, which results in better interpretability. In addition, through detailed analysis, we show that this structure margin has a positive correlation with task performance and structure induction of DepGCNs, and SCH loss can help model focus more on the shortest dependency path between entities. We achieve the new state-of-the-art results on TACRED, IMDB, and Zh. Literature datasets, even compared with ensemble and BERT baselines.<\/jats:p>","DOI":"10.1145\/3387633","type":"journal-article","created":{"date-parts":[[2020,5,22]],"date-time":"2020-05-22T23:47:19Z","timestamp":1590191239000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation"],"prefix":"10.1145","volume":"19","author":[{"given":"Kexin","family":"Wang","sequence":"first","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, CAS and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Zhou","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, CAS, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P. R. China, and Beijing Fanyu Technology Co., Ltd"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiajun","family":"Zhang","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, CAS and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaonan","family":"Wang","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, CAS and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengqing","family":"Zong","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, CAS and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,5,18]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Joost Bastings Wilker Aziz Ivan Titov and Khalil Sima\u2019an. 2019. Modeling latent sentence structure in neural machine translation. arxiv:1901.06436.  Joost Bastings Wilker Aziz Ivan Titov and Khalil Sima\u2019an. 2019. Modeling latent sentence structure in neural machine translation. arxiv:1901.06436."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1209"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-2704"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1072"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2029"},{"volume-title":"Kang Min Yoo, and SangGoo Lee","year":"2018","author":"Choi Jihun","key":"e_1_2_1_6_1"},{"volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","year":"2017","author":"Chung Junyoung","key":"e_1_2_1_7_1"},{"volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.","year":"2018","author":"Devlin Jacob","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623758"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2012"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1254"},{"volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","year":"2018","author":"Gong Yichen","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1205"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1024"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1192"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1162"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1162"},{"key":"e_1_2_1_20_1","unstructured":"S\u00e9bastien Jean and Kyunghyun Cho. 2019. Context-aware learning for neural machine translation. arxiv:1903.04715.  S\u00e9bastien Jean and Kyunghyun Cho. 2019. Context-aware learning for neural machine translation. arxiv:1903.04715."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1062"},{"volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Thomas","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","first-page":"07","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL\u201907)","author":"Koo Terry","year":"2007"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1082"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2023"},{"volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)","author":"Li Yujia","key":"e_1_2_1_26_1"},{"key":"e_1_2_1_27_1","first-page":"18","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics (COLING\u201918)","author":"Liu Xin","year":"2018"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00005"},{"key":"e_1_2_1_29_1","unstructured":"Yang Liu Furu Wei Sujian Li Heng Ji Ming Zhou and Houfeng Wang. 2015. A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL\u201915) Volume 2: Short Papers. 285--290. http:\/\/aclweb.org\/anthology\/P\/P15\/P15-2047.pdf.  Yang Liu Furu Wei Sujian Li Heng Ji Ming Zhou and Houfeng Wang. 2015. A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL\u201915) Volume 2: Short Papers. 285--290. http:\/\/aclweb.org\/anthology\/P\/P15\/P15-2047.pdf."},{"volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","year":"2017","author":"Maddison Chris J.","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-5010"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1159"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1105"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1108"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2520371"},{"volume-title":"Manning","year":"2014","author":"Pennington Jeffrey","key":"e_1_2_1_36_1"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5431"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2209"},{"volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Shen Yikang","key":"e_1_2_1_40_1"},{"key":"e_1_2_1_41_1","unstructured":"Peng Shi and Jimmy Lin. 2019. Simple BERT models for relation extraction and semantic role labeling. arxiv:1904.05255.  Peng Shi and Jimmy Lin. 2019. Simple BERT models for relation extraction and semantic role labeling. arxiv:1904.05255."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1159"},{"key":"e_1_2_1_43_1","first-page":"13","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201913), a Meeting of SIGDAT, a Special Interest Group of the ACL. 1631--1642","author":"Socher Richard","year":"2013"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-6245"},{"first-page":"18","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201918)","author":"Swayamdipta Swabha","key":"e_1_2_1_45_1"},{"volume-title":"Manning","year":"2015","author":"Tai Kai Sheng","key":"e_1_2_1_46_1"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1167"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Ian Tenney Dipanjan Das and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. arxiv:1905.05950.  Ian Tenney Dipanjan Das and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. arxiv:1905.05950.","DOI":"10.18653\/v1\/P19-1452"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Yufei Wang Mark Johnson Stephen Wan Yifang Sun and Wei Wang. 2019. How to best use syntax in semantic role labelling. arxiv:1906.00266.  Yufei Wang Mark Johnson Stephen Wan Yifang Sun and Wei Wang. 2019. How to best use syntax in semantic role labelling. arxiv:1906.00266.","DOI":"10.18653\/v1\/P19-1529"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/579"},{"volume-title":"Proceedings of the 26th International Conference on Computational Linguistics (COLING\u201916)","year":"2016","author":"Wang Zhiguo","key":"e_1_2_1_51_1"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1006\/jmps.1999.1278"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2059"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1007"},{"volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT\u201916)","author":"Yang Zichao","key":"e_1_2_1_55_1"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1118"},{"key":"e_1_2_1_57_1","unstructured":"Xiang Zhang and Yann LeCun. 2017. Which encoding is the best for text classification in Chinese English Japanese and Korean?arxiv:1708.02657.  Xiang Zhang and Yann LeCun. 2017. Which encoding is the best for text classification in Chinese English Japanese and Korean?arxiv:1708.02657."},{"first-page":"18","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201918)","author":"Zhang Yuhao","key":"e_1_2_1_58_1"},{"first-page":"17","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201917)","author":"Zhang Yuhao","key":"e_1_2_1_59_1"},{"volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915)","year":"2015","author":"Zhu Xiao-Dan","key":"e_1_2_1_60_1"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3387633","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3387633","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:36Z","timestamp":1750200096000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3387633"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,18]]},"references-count":60,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,7,31]]}},"alternative-id":["10.1145\/3387633"],"URL":"https:\/\/doi.org\/10.1145\/3387633","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2020,5,18]]},"assertion":[{"value":"2019-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}