{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:43:25Z","timestamp":1750308205515,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","license":[{"start":{"date-parts":[[2004,10,28]],"date-time":"2004-10-28T00:00:00Z","timestamp":1098921600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2004,10,28]]},"DOI":"10.1145\/1030397.1030439","type":"proceedings-article","created":{"date-parts":[[2005,1,30]],"date-time":"2005-01-30T17:55:16Z","timestamp":1107107716000},"page":"220-228","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Supervised learning for the legacy document conversion"],"prefix":"10.1145","author":[{"given":"Boris","family":"Chidlovskii","sequence":"first","affiliation":[{"name":"Xerox Research Centre Europe, Meylan, France"}]},{"given":"J\u00e9r\u00f4me","family":"Fuselier","sequence":"additional","affiliation":[{"name":"Xerox Research Centre Europe, Meylan, France"}]}],"member":"320","published-online":{"date-parts":[[2004,10,28]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Compilers: Principles, Techniques, and Tools","author":"Ullman J.","year":"1986","unstructured":"J. Ullman , A. Aho , and R. Seti . Compilers: Principles, Techniques, and Tools . Addison-Wesley , 1986 . J. Ullman, A. Aho, and R. Seti. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986."},{"key":"e_1_3_2_1_2_1","volume-title":"The Theory of Parsing, Translation, and Compiling","author":"Aho A.","year":"1972","unstructured":"A. Aho and J. Ullman . The Theory of Parsing, Translation, and Compiling . Prentice Hall , Englewood Cliffs, NJ , 1972 . A. Aho and J. Ullman. The Theory of Parsing, Translation, and Compiling. Prentice Hall, Englewood Cliffs, NJ, 1972."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/PL00013569"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/271074.271078"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/234285.234289"},{"key":"e_1_3_2_1_6_1","first-page":"467","volume-title":"Computational Linguistics, 18(4)","author":"Brown P.","year":"1992","unstructured":"P. Brown , V. Della Pietra , P. deSouza , J. Lai , and R. Mercer . Class-based n-gram models of natural language . In Computational Linguistics, 18(4) , pages 467 -- 480 , 1992 . P. Brown, V. Della Pietra, P. deSouza, J. Lai, and R. Mercer. Class-based n-gram models of natural language. In Computational Linguistics, 18(4), pages 467--480, 1992."},{"key":"e_1_3_2_1_7_1","volume-title":"18th International Conference on Data Engineering (ICDE'02)","author":"Sundaresan Neel","year":"2002","unstructured":"Neel Sundaresan , Christina Yip Chung , and Michael Gertz . Reverse engineering for web data: From visual to semantic structures . In 18th International Conference on Data Engineering (ICDE'02) , San Jose, California , 2002 . Neel Sundaresan, Christina Yip Chung, and Michael Gertz. Reverse engineering for web data: From visual to semantic structures. In 18th International Conference on Data Engineering (ICDE'02), San Jose, California, 2002."},{"key":"e_1_3_2_1_8_1","series-title":"Lecture Notes in Computer Science","volume-title":"Machine learning for sequential data: A review","author":"Dietterich T. G.","year":"2002","unstructured":"T. G. Dietterich . Machine learning for sequential data: A review . In T. Caelli, editor, Lecture Notes in Computer Science . Springer-Verlag , 2002 . T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Lecture Notes in Computer Science. Springer-Verlag, 2002."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/585058.585078"},{"key":"e_1_3_2_1_10_1","first-page":"517","volume-title":"Proc. AAAI\/IAAI","author":"Freitag D.","year":"1998","unstructured":"D. Freitag . Information extraction from html: Application of a general machine learning approach . In Proc. AAAI\/IAAI , pages 517 -- 523 , 1998 . D. Freitag. Information extraction from html: Application of a general machine learning approach. In Proc. AAAI\/IAAI, pages 517--523, 1998."},{"key":"e_1_3_2_1_11_1","unstructured":"I4I - The WORD is XML. www.i4i.com\/life sciences.htm.  I4I - The WORD is XML. www.i4i.com\/life sciences.htm."},{"key":"e_1_3_2_1_12_1","first-page":"143","volume-title":"Proceedings of the 14th International Conference on Machine Learning ICML97","author":"Joachims T.","year":"1997","unstructured":"T. Joachims . A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization . In Proceedings of the 14th International Conference on Machine Learning ICML97 , pages 143 -- 151 , 1997 . T. Joachims. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In Proceedings of the 14th International Conference on Machine Learning ICML97, pages 143--151, 1997."},{"key":"e_1_3_2_1_13_1","first-page":"591","volume-title":"Proc. 17th International Conf. on Machine Learning","author":"McCallum Andrew","year":"2000","unstructured":"Andrew McCallum , Dayne Freitag , and Fernando Pereira . Maximum entropy Markov models for information extraction and segmentation . In Proc. 17th International Conf. on Machine Learning , pages 591 -- 598 . Morgan Kaufmann, San Francisco, CA , 2000 . Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th International Conf. on Machine Learning, pages 591--598. Morgan Kaufmann, San Francisco, CA, 2000."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335171"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/601858.601869"},{"key":"e_1_3_2_1_16_1","unstructured":"OmniPage Pro 14 Office. http:\/\/www.scansoft.com\/omnipage\/.  OmniPage Pro 14 Office. http:\/\/www.scansoft.com\/omnipage\/."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335173"},{"key":"e_1_3_2_1_18_1","unstructured":"W2X Convertor. www.turnkey.com.au\/site\/xice\/xice\/convert.html.  W2X Convertor. www.turnkey.com.au\/site\/xice\/xice\/convert.html."},{"key":"e_1_3_2_1_19_1","volume-title":"International Workshop on Document Layout Interpretation and Its Applications (DLIAP'99)","author":"Wang Y.","year":"1999","unstructured":"Y. Wang , I. T. Phillips , and R. Haralick . From image to SGML\/XML representation: One method . In International Workshop on Document Layout Interpretation and Its Applications (DLIAP'99) , Bangalore, India , September 1999 . Y. Wang, I. T. Phillips, and R. Haralick. From image to SGML\/XML representation: One method. In International Workshop on Document Layout Interpretation and Its Applications (DLIAP'99), Bangalore, India, September 1999."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0015253"},{"key":"e_1_3_2_1_21_1","unstructured":"Word and YAWC: A Poor Mans' XML Publishing Environment. www.idealliance.org\/papers\/xmle02\/dx_xmle02\/html\/abstract\/02-06-04.html.  Word and YAWC: A Poor Mans' XML Publishing Environment. www.idealliance.org\/papers\/xmle02\/dx_xmle02\/html\/abstract\/02-06-04.html."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3975(97)00014-5"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/347090.347164"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956787"}],"event":{"name":"DocEng04: ACM Symposium on Document Engineering","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","ACM Association for Computing Machinery"],"location":"Milwaukee Wisconsin USA","acronym":"DocEng04"},"container-title":["Proceedings of the 2004 ACM symposium on Document engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1030397.1030439","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1030397.1030439","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:30:55Z","timestamp":1750264255000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1030397.1030439"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,28]]},"references-count":24,"alternative-id":["10.1145\/1030397.1030439","10.1145\/1030397"],"URL":"https:\/\/doi.org\/10.1145\/1030397.1030439","relation":{},"subject":[],"published":{"date-parts":[[2004,10,28]]},"assertion":[{"value":"2004-10-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}