{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:40:03Z","timestamp":1759333203612,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2010,6,1]],"date-time":"2010-06-01T00:00:00Z","timestamp":1275350400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2010,6]]},"abstract":"<jats:p>\n            This article investigates a relatively underdeveloped subject in natural language processing---the generation of punctuation marks. From a theoretical perspective, we study 16 Chinese punctuation marks as defined in the Chinese national standard of punctuation usage, and categorize these punctuation marks into three different types according to their syntactic properties. We implement a three-tier maximum entropy model incorporating linguistically-motivated features for generating the commonly used Chinese punctuation marks in unpunctuated sentences output by a surface realizer. Furthermore, we present a method to automatically extract cue words indicating sentence-final punctuation marks as a specialized feature to construct a more precise model. Evaluating on the Penn Chinese Treebank data, the MaxEnt model achieves an\n            <jats:italic>f<\/jats:italic>\n            -score of 79.83% for punctuation insertion and 74.61% for punctuation restoration using gold data input, 79.50% for insertion and 73.32% for restoration using parser-based imperfect input. The experiments show that the MaxEnt model significantly outperforms a baseline 5-gram language model that scores 54.99% for punctuation insertion and 52.01% for restoration. We show that our results are not far from human performance on the same task with human insertion\n            <jats:italic>f<\/jats:italic>\n            -scores in the range of 81-87% and human restoration in the range of 71-82%. Finally, a manual error analysis of the generation output shows that close to 40% of the mismatched punctuation marks do in fact result in acceptable choices, a fact obscured in the automatic string-matching based evaluation scores.\n          <\/jats:p>","DOI":"10.1145\/1781134.1781136","type":"journal-article","created":{"date-parts":[[2010,6,11]],"date-time":"2010-06-11T18:52:51Z","timestamp":1276282371000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["A Linguistically Inspired Statistical Model for Chinese Punctuation Generation"],"prefix":"10.1145","volume":"9","author":[{"given":"Yuqing","family":"Guo","sequence":"first","affiliation":[{"name":"Toshiba (China) Research and Development Center"}]},{"given":"Haifeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Toshiba (China) Research and Development Center"}]},{"given":"Josef","family":"van Genabith","sequence":"additional","affiliation":[{"name":"Dublin City University"}]}],"member":"320","published-online":{"date-parts":[[2010,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2008.05.008"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal (ICASSP\u201998)","author":"Beeferman D.","key":"e_1_2_1_2_1","unstructured":"Beeferman , D. , Berger , A. , and Lafferty , J . 1998. Cyberpunc: A lightweight punctuation annotation system for speech . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal (ICASSP\u201998) . 689--692. Beeferman, D., Berger, A., and Lafferty, J. 1998. Cyberpunc: A lightweight punctuation annotation system for speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal (ICASSP\u201998). 689--692."},{"volume-title":"rep","author":"Briscoe T.","key":"e_1_2_1_4_1","unstructured":"Briscoe , T. 1994. Parsing (with) punctuation etc. Tech. rep ., Rank Xerox Research Centre , Grenoble, France . Briscoe, T. 1994. Parsing (with) punctuation etc. Tech. rep., Rank Xerox Research Centre, Grenoble, France."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220305"},{"volume-title":"Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding (ISCA\u201901)","author":"Christensen H.","key":"e_1_2_1_6_1","unstructured":"Christensen , H. , Gotoh , Y. , and Renals , S . 2001. Punctuation annotation using statistical prosody models . In Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding (ISCA\u201901) . 35--40. Christensen, H., Gotoh, Y., and Renals, S. 2001. Punctuation annotation using statistical prosody models. In Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding (ISCA\u201901). 35--40."},{"key":"e_1_2_1_7_1","first-page":"183","article-title":"Word association norms, mutual information, and lexicography","volume":"16","author":"Church K. W.","year":"1990","unstructured":"Church , K. W. and Hanks , P. 1990 . Word association norms, mutual information, and lexicography . Comput. Linguist. 16 , 1, 183 -- 213 . Church, K. W. and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist. 16, 1, 183--213.","journal-title":"Comput. Linguist."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 5th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG\u201900)","author":"Doran C.","year":"2000","unstructured":"Doran , C. 2000 . Punctuation in a lexicalized grammar . In Proceedings of the 5th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG\u201900) . Doran, C. 2000. Punctuation in a lexicalized grammar. In Proceedings of the 5th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG\u201900)."},{"volume-title":"Transmission of Information: A Statistical Theory of Information","author":"Fano R. M.","key":"e_1_2_1_9_1","unstructured":"Fano , R. M. 1961. Transmission of Information: A Statistical Theory of Information . MIT Press , Cambridge, MA . Fano, R. M. 1961. Transmission of Information: A Statistical Theory of Information. MIT Press, Cambridge, MA."},{"volume-title":"Proceedings of the Conference on Lexical Functional Grammar (LFG\u201907)","author":"Guo Y.","key":"e_1_2_1_10_1","unstructured":"Guo , Y. , van Genabith , J. , and Wang , H . 2007. Treebank-based acquisition of LFG resources for Chinese . In Proceedings of the Conference on Lexical Functional Grammar (LFG\u201907) . 214--232. Guo, Y., van Genabith, J., and Wang, H. 2007. Treebank-based acquisition of LFG resources for Chinese. In Proceedings of the Conference on Lexical Functional Grammar (LFG\u201907). 214--232."},{"volume-title":"Proceedings of the 22nd International Conference on Computational Linguistics (ICCL\u201908)","author":"Guo Y.","key":"e_1_2_1_11_1","unstructured":"Guo , Y. , van Genabith , J. , and Wang , H . 2008. Dependency-based n-gram models for general purpose sentence realization . In Proceedings of the 22nd International Conference on Computational Linguistics (ICCL\u201908) . 297--304. Guo, Y., van Genabith, J., and Wang, H. 2008. Dependency-based n-gram models for general purpose sentence realization. In Proceedings of the 22nd International Conference on Computational Linguistics (ICCL\u201908). 297--304."},{"key":"e_1_2_1_12_1","unstructured":"Kaplan R. M. and Bresnan J. 1982. Lexical functional grammar: A formal system for grammatical representation. In The Mental Representation of Grammatical Relations. MIT Press Cambridge MA 173--282. Kaplan R. M. and Bresnan J. 1982. Lexical functional grammar: A formal system for grammatical representation. In The Mental Representation of Grammatical Relations . MIT Press Cambridge MA 173--282."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 2nd International Conference on Natural Language Generation (INLG\u201902)","author":"Langkilde I.","year":"2002","unstructured":"Langkilde , I. 2002 . An empirical verification of coverage and correctness for a general-purpose sentence generator . In Proceedings of the 2nd International Conference on Natural Language Generation (INLG\u201902) . 17--24. Langkilde, I. 2002. An empirical verification of coverage and correctness for a general-purpose sentence generator. In Proceedings of the 2nd International Conference on Natural Language Generation (INLG\u201902). 17--24."},{"volume-title":"Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJNLP\u201905)","author":"Li X.","key":"e_1_2_1_14_1","unstructured":"Li , X. , Zong , C. , and Hu , R . 2005. A hierarchical parsing approach with punctuation processing for long Chinese sentences . In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJNLP\u201905) . 7--12. Li, X., Zong, C., and Hu, R. 2005. A hierarchical parsing approach with punctuation processing for long Chinese sentences. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJNLP\u201905). 7--12."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 1st Workshop on Computational Terminology (WCT\u201998)","author":"Lin D.","year":"1998","unstructured":"Lin , D. 1998 . Extracting collocations from text corpora . In Proceedings of the 1st Workshop on Computational Terminology (WCT\u201998) . 57. Lin, D. 1998. Extracting collocations from text corpora. In Proceedings of the 1st Workshop on Computational Terminology (WCT\u201998). 57."},{"key":"e_1_2_1_16_1","unstructured":"Manning C. D. and Schutze H. 1999. Foundations of Statistical Natural Language Processing. MIT Press Cambridge MA. Manning C. D. and Schutze H. 1999. Foundations of Statistical Natural Language Processing . MIT Press Cambridge MA."},{"volume-title":"The Linguistics of Punctuation","author":"Nunberg G.","key":"e_1_2_1_17_1","unstructured":"Nunberg , G. 1990. The Linguistics of Punctuation . CSLI Publications , Stanford, CA . Nunberg, G. 1990. The Linguistics of Punctuation. CSLI Publications, Stanford, CA."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201996)","author":"Ratnaparkhi A.","year":"1996","unstructured":"Ratnaparkhi , A. 1996 . A maximum entropy model for part-of-speech tagging . In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201996) . 133--142. Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP\u201996). 133--142."},{"volume-title":"A simple introduction to maximum entropy models for natural language processing. Tech. rep","author":"Ratnaparkhi A.","key":"e_1_2_1_19_1","unstructured":"Ratnaparkhi , A. 1997. A simple introduction to maximum entropy models for natural language processing. Tech. rep ., University of Pennsylvania . Ratnaparkhi, A. 1997. A simple introduction to maximum entropy models for natural language processing. Tech. rep., University of Pennsylvania."},{"key":"e_1_2_1_20_1","unstructured":"Reed C. and Long D. 1997. Generating punctuation in written arguments. Tech. rep. University College London. Reed C. and Long D. 1997. Generating punctuation in written arguments. Tech. rep. University College London."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/974557.974561"},{"volume-title":"Proceedings of the Association for Computational Linguistics Workshop on Punctuation (ACLWP\u201996)","author":"Say B.","key":"e_1_2_1_22_1","unstructured":"Say , B. and Akman , V . 1996. Information-based aspects of punctuation . In Proceedings of the Association for Computational Linguistics Workshop on Punctuation (ACLWP\u201996) . 49--56. Say, B. and Akman, V. 1996. Information-based aspects of punctuation. In Proceedings of the Association for Computational Linguistics Workshop on Punctuation (ACLWP\u201996). 49--56."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.3115\/1117794.1117802"},{"volume-title":"Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (WCLP\u201905)","author":"Tseng H.","key":"e_1_2_1_25_1","unstructured":"Tseng , H. , Chang , P. , Andrew , G. , Jurafsky , D. , and Manning , C . 2005. A conditional random field word segmenter . In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (WCLP\u201905) . 168--171. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., and Manning, C. 2005. A conditional random field word segmenter. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (WCLP\u201905). 168--171."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 5th European Workshop on Natural Language Generation (EWNLG\u201995)","author":"White M.","year":"1995","unstructured":"White , M. 1995 . Presenting punctuation . In Proceedings of the 5th European Workshop on Natural Language Generation (EWNLG\u201995) . 107--125. White, M. 1995. Presenting punctuation. In Proceedings of the 5th European Workshop on Natural Language Generation (EWNLG\u201995). 107--125."},{"volume-title":"Proceedings of the Workshop on Grammar Engineering Across Frameworks (COLING\u201908)","author":"White M.","key":"e_1_2_1_27_1","unstructured":"White , M. and Rajkumar , R . 2008. A more precise analysis of punctuation for broad-coverage surface realization with CCG . In Proceedings of the Workshop on Grammar Engineering Across Frameworks (COLING\u201908) . 17--24. White, M. and Rajkumar, R. 2008. A more precise analysis of punctuation for broad-coverage surface realization with CCG. In Proceedings of the Workshop on Grammar Engineering Across Frameworks (COLING\u201908). 17--24."},{"volume-title":"Proceedings of the 9th International Joint Conferences on Artificial Intelligence (IJCAI\u201905)","author":"Xue N.","key":"e_1_2_1_28_1","unstructured":"Xue , N. and Palmer , M . 2005. Automatic semantic role labeling for Chinese verbs . In Proceedings of the 9th International Joint Conferences on Artificial Intelligence (IJCAI\u201905) . 1160--1165. Xue, N. and Palmer, M. 2005. Automatic semantic role labeling for Chinese verbs. In Proceedings of the 9th International Joint Conferences on Artificial Intelligence (IJCAI\u201905). 1160--1165."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1781134.1781136","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1781134.1781136","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T11:39:48Z","timestamp":1750246788000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1781134.1781136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6]]},"references-count":27,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2010,6]]}},"alternative-id":["10.1145\/1781134.1781136"],"URL":"https:\/\/doi.org\/10.1145\/1781134.1781136","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2010,6]]},"assertion":[{"value":"2009-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}