{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T04:12:43Z","timestamp":1747714363620,"version":"3.40.5"},"reference-count":49,"publisher":"MIT Press","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2015,3]]},"abstract":"<jats:p>Manually annotated corpora are indispensable resources, yet for many annotation tasks, such as the creation of treebanks, there exist multiple corpora with different and incompatible annotation guidelines. This leads to an inefficient use of human expertise, but it could be remedied by integrating knowledge across corpora with different annotation guidelines. In this article we describe the problem of annotation adaptation and the intrinsic principles of the solutions, and present a series of successively enhanced models that can automatically adapt the divergence between different annotation formats.<\/jats:p><jats:p>We evaluate our algorithms on the tasks of Chinese word segmentation and dependency parsing. For word segmentation, where there are no universal segmentation guidelines because of the lack of morphology in Chinese, we perform annotation adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank. For dependency parsing, we perform annotation adaptation from the Penn Chinese Treebank to a semantics-oriented Dependency Treebank, which is annotated using significantly different annotation guidelines. In both experiments, automatic annotation adaptation brings significant improvement, achieving state-of-the-art performance despite the use of purely local features in training.<\/jats:p>","DOI":"10.1162\/coli_a_00210","type":"journal-article","created":{"date-parts":[[2015,2,23]],"date-time":"2015-02-23T15:11:01Z","timestamp":1424704261000},"page":"119-147","source":"Crossref","is-referenced-by-count":3,"title":["Automatic Adaptation of Annotations"],"prefix":"10.1162","volume":"41","author":[{"given":"Wenbin","family":"Jiang","sequence":"first","affiliation":[{"name":"Chinese Academy of Sciences"}]},{"given":"Yajuan","family":"L\u00fc","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences"}]},{"given":"Liang","family":"Huang","sequence":"additional","affiliation":[{"name":"Queens College and Graduate Center, The City University of New York"}]},{"given":"Qun","family":"Liu","sequence":"additional","affiliation":[{"name":"Dublin City University Chinese Academy of Sciences"}]}],"member":"281","reference":[{"key":"R1","doi-asserted-by":"publisher","DOI":"10.3115\/1610075.1610094"},{"key":"R2","unstructured":"Brill, Eric. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543\u2013565."},{"key":"R3","doi-asserted-by":"crossref","unstructured":"Buchholz, Sabine and Erwin Marsi. 2006. CONLL-X shared task on multilingual dependency parsing. In Proceedings of CoNLL, pages 149\u2013164, New York, NY.","DOI":"10.3115\/1596276.1596305"},{"key":"R4","unstructured":"Cahill, Aoife, Mairead McCarthy, Josef van Genabith, and Andy Way. 2002. Automatic annotation of the Penn treebank with LFG F-structure information. In Proceedings of the LREC Workshop, Las Palmas."},{"key":"R5","unstructured":"Che, Wanxiang, Meishan Zhang, Yanqiu Shao, and Ting Liu. 2012. Semeval-2012 task 5: Chinese semantic dependency parsing. In Proceedings of SemEval, pages 378\u2013384, Montreal."},{"key":"R6","doi-asserted-by":"crossref","unstructured":"Collins, Michael. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP, pages 1\u20138, Philadelphia, PA.","DOI":"10.3115\/1118693.1118694"},{"key":"R7","unstructured":"Das, Dipanjan and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of ACL, pages 600\u2013609, Portland, OR."},{"key":"R8","unstructured":"Daum\u00e9 III, Hal. 2007. Frustratingly easy domain adaptation. In Proceedings of ACL, pages 256\u2013263, Prague."},{"key":"R9","doi-asserted-by":"crossref","unstructured":"Daum\u00e9 III, Hal. 2009. Unsupervised search based structured prediction. In Proceedings of ICML, pages 209\u2013216, Montreal.","DOI":"10.1145\/1553374.1553401"},{"key":"R10","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1872"},{"key":"R11","doi-asserted-by":"crossref","unstructured":"Davis, Jesse and Pedro Domingos. 2009. Deep transfer via second-order Markov logic. In Proceedings of ICML, pages 217\u2013224, Montreal.","DOI":"10.1145\/1553374.1553402"},{"key":"R12","doi-asserted-by":"crossref","unstructured":"Eisner, Jason M. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of COLING, pages 340\u2013345, Copenhagen.","DOI":"10.3115\/992628.992688"},{"key":"R13","doi-asserted-by":"crossref","unstructured":"Ganchev, Kuzman, Jennifer Gillenwater, and Ben Taskar. 2009. Dependency grammar induction via bitext projection constraints. In Proceedings of ACL, pages 369\u2013377, Singapore.","DOI":"10.3115\/1687878.1687931"},{"key":"R14","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219014"},{"key":"R15","unstructured":"Hewlett, Daniel and Paul Cohen. 2011. Fully unsupervised word segmentation with BVE and MDL. In Proceedings of ACL, pages 540\u2013545, Portland, OR."},{"key":"R16","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.3.355"},{"key":"R17","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324905003840"},{"key":"R18","doi-asserted-by":"crossref","unstructured":"Hwa, Rebecca, Philip Resnik, Amy Weinberg, and Okan Kolak. 2002. Evaluating translational correspondence using annotation projection. In Proceedings of ACL, pages 392\u2013399, Philadephia, PA.","DOI":"10.21236\/ADA455137"},{"key":"R19","doi-asserted-by":"crossref","unstructured":"Jiang, Wenbin, Liang Huang, and Qun Liu. 2009. Automatic adaptation of annotation standards: Chinese word segmentation and POS tagging \u2013 A case study. In Proceedings of ACL, pages 522\u2013530, Singapore.","DOI":"10.3115\/1687878.1687952"},{"key":"R20","doi-asserted-by":"crossref","unstructured":"Jiang, Wenbin, Liang Huang, Yajuan L\u00fc, and Qun Liu. 2008. A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of ACL, pages 897\u2013904, Columbus, OH.","DOI":"10.3115\/1599081.1599130"},{"key":"R21","unstructured":"Jiang, Wenbin and Qun Liu. 2010. Dependency parsing and projection based on word-pair classification. In Proceedings of the ACL, pages 12\u201320, Uppsala."},{"key":"R22","unstructured":"Jiang, Wenbin, Fandong Meng, Qun Liu, and Yajuan L\u00fc. 2012. Iterative annotation transformation with predict-self reestimation for Chinese word segmentation. In Proceedings of EMNLP, pages 412\u2013420, Jeju Island."},{"key":"R23","doi-asserted-by":"publisher","DOI":"10.3115\/1620754.1620800"},{"key":"R24","doi-asserted-by":"publisher","DOI":"10.3115\/1687878.1687951"},{"key":"R25","unstructured":"Li, Zhongguo. 2011. Parsing the internal structure of words: A new paradigm for Chinese word segmentation. In Proceedings of ACL, pages 1,405\u20131,414, Portland, OR."},{"key":"R26","unstructured":"Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19(2):313\u2013330."},{"key":"R27","doi-asserted-by":"publisher","DOI":"10.3115\/1613715.1613738"},{"key":"R28","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219852"},{"key":"R29","unstructured":"McDonald, Ryan and Fernando Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL, pages 81\u201388, Trento."},{"key":"R30","unstructured":"Mihalkova, Lilyana, Tuyen Huynh, and Raymond J. Mooney. 2007. Mapping and revising Markov logic networks for transfer learning. In Proceedings of AAAI, volume 7, pages 608\u2013614, Vancouver."},{"key":"R31","unstructured":"Mihalkova, Lilyana and Raymond J. Mooney. 2008. Transfer learning by mapping with minimal target data. In Proceedings of AAAI Workshop Transfer Learning for Complex Tasks, Chicago, IL."},{"key":"R32","doi-asserted-by":"publisher","DOI":"10.3115\/1687878.1687894"},{"key":"R33","doi-asserted-by":"publisher","DOI":"10.3115\/1557769.1557832"},{"key":"R34","unstructured":"Ng, Hwee Tou and Jin Kiat Low. 2004. Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based? In Proceedings of EMNLP, pages 277\u2013284, Barcelona."},{"key":"R35","unstructured":"Nivre, Joakim and Ryan McDonald. 2008. Integrating graph-based and transition-based dependency parsers. In Proceedings of ACL, pages 950\u2013958, Columbus, OH."},{"key":"R36","doi-asserted-by":"crossref","unstructured":"Oepen, Stephan, Kristina Toutanova, Stuart Shieber, Christopher Manning Dan Flickinger, and Thorsten Brants. 2002. The LinGo Redwoods treebank: Motivation and preliminary applications. In Proceedings of COLING, volume 2, pages 1\u20135, Taipei.","DOI":"10.3115\/1071884.1071909"},{"key":"R37","doi-asserted-by":"crossref","unstructured":"Pan, Sinno Jialin and Qiang Yang. 2010. A survey on transfer learning. IEEE TKDE, 22(10):1345\u20131359.","DOI":"10.1109\/TKDE.2009.191"},{"key":"R38","doi-asserted-by":"crossref","unstructured":"Sarkar, Anoop. 2001. Applying co-training methods to statistical parsing. In Proceedings of NAACL, pages 1\u20138, Pittsburgh, PA.","DOI":"10.3115\/1073336.1073359"},{"key":"R39","doi-asserted-by":"publisher","DOI":"10.3115\/1699571.1699620"},{"key":"R40","unstructured":"Sun, Weiwei. 2011. A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of ACL, pages 1,385\u20131,394, Portland, OR."},{"key":"R41","unstructured":"Sun, Weiwei and Xiaojun Wan. 2012. Reducing approximation and estimation errors for Chinese lexical processing with heterogeneous annotations. In Proceedings of ACL, volume 1, pages 232\u2013241, Jeju Island."},{"key":"R43","unstructured":"Wang, Kun, Chengqing Zong, and Keh-Yih Su. 2010. A character-based joint model for Chinese word segmentation. In Proceedings of COLING, pages 1,173\u20131,181, Beijing."},{"key":"R44","doi-asserted-by":"publisher","DOI":"10.3115\/1119250.1119278"},{"key":"R45","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490400364X"},{"key":"R46","unstructured":"Yamada, H. and Y. Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of IWPT, pages 195\u2013206, Nancy."},{"key":"R48","unstructured":"Zhang, Yue and Stephen Clark. 2007. Chinese segmentation with a word-based perceptron algorithm. In Proceedings of ACL, pages 840\u2013847, Prague."},{"key":"R49","unstructured":"Zhang, Yue and Stephen Clark. 2010. A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In Proceedings of EMNLP, pages 843\u2013852, Cambridge, MA."},{"key":"R50","unstructured":"Zhao, Hai and Chunyu Kit. 2008. Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition. In Proceedings of IJCNLP, pages 106\u2013111, Hyderabad."},{"key":"R51","unstructured":"Zhu, Muhua, Jingbo Zhu, and Minghan Hu. 2011. Better automatic treebank conversion using a feature-based approach. In Proceedings of ACL, volume 2, pages 715\u2013719, Portland, OR."}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/COLI_a_00210","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T22:20:24Z","timestamp":1747693224000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/41\/1\/119-147\/1500"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,3]]}},"alternative-id":["10.1162\/COLI_a_00210"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00210","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"type":"print","value":"0891-2017"},{"type":"electronic","value":"1530-9312"}],"subject":[],"published":{"date-parts":[[2015,3]]}}}