{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T21:48:10Z","timestamp":1773438490318,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2011,9,4]],"date-time":"2011-09-04T00:00:00Z","timestamp":1315094400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2012,3]]},"DOI":"10.1007\/s10579-011-9161-0","type":"journal-article","created":{"date-parts":[[2011,9,6]],"date-time":"2011-09-06T16:04:42Z","timestamp":1315325082000},"page":"53-74","source":"Crossref","is-referenced-by-count":12,"title":["By all these lovely tokens... Merging conflicting tokenizations"],"prefix":"10.1007","volume":"46","author":[{"given":"Christian","family":"Chiarcos","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julia","family":"Ritz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manfred","family":"Stede","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2011,9,4]]},"reference":[{"key":"9161_CR1","unstructured":"Brants, T. (2000). TnT\u2014A statistical part-of-speech tagger. In Proceedings of the sixth applied natural language processing (ANLP-2000), Seattle, WA, pp. 224\u2013231."},{"issue":"4","key":"9161_CR2","first-page":"543","volume":"21","author":"E Brill","year":"1995","unstructured":"Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4), 543\u2013565.","journal-title":"Computational Linguistics"},{"key":"9161_CR3","unstructured":"Burnard, L. (2007). Reference guide for the British national corpus (XML Edition). http:\/\/www.natcorp.ox.ac.uk\/XMLedition\/URG\/bnctags.html (August 6, 2011)."},{"issue":"3","key":"9161_CR4","doi-asserted-by":"crossref","first-page":"353","DOI":"10.3758\/BF03195511","volume":"35","author":"J Carletta","year":"2003","unstructured":"Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., & Voormann, H. (2003). The NITE XML toolkit: Flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers, 35(3), 353\u2013363.","journal-title":"Behavior Research Methods, Instruments, and Computers"},{"key":"9161_CR5","doi-asserted-by":"crossref","unstructured":"Carlson, L., Marcu, D., & Okurowski, M. E. (2003), Building a discourse-tagged corpus in the framework of rhetorical structure theory. In J. van Kuppevelt & R. W. Smith (Eds.), Current and new directions in discourse and dialogue, text, speech, and language technology; 22 (pp. 85\u2013112). Dordrecht: Kluwer.","DOI":"10.1007\/978-94-010-0019-2_5"},{"key":"9161_CR6","unstructured":"Cheng, L., & Demirdache, H. (1990). Superiority violations. In L. Cheng & H. Demirdache (Eds.), Papers on Wh-movement, MIT working papers in linguistics; 13, MITWPL, pp. 27\u201346."},{"issue":"2","key":"9161_CR7","first-page":"217","volume":"49","author":"C Chiarcos","year":"2008","unstructured":"Chiarcos, C., Dipper, S., G\u00f6tze, M., Leser, U., L\u00fcdeling, A., Ritz, J., & Stede, M. (2008). A flexible framework for integrating annotations from different tools and tagsets. TAL (Traitement automatique des langues), 49(2), 217\u2013246.","journal-title":"TAL (Traitement automatique des langues)"},{"key":"9161_CR8","unstructured":"Christ, O. (1994). A modular and flexible architecture for an integrated corpus query system. In Proceedings of the 3rd conference on computational lexicography and text research (COMPLEX 94), Budapest, Hungary, pp. 23\u201332."},{"key":"9161_CR9","unstructured":"Cunningham, H., Maynard, D., Bontcheva, K., & Tablan, V. (2002). GATE: An architecture for development of robust HLT applications. In Proceedings of the 40th anniversary meeting of the association for computational linguistics (ACL-2002), Philadelphia, Pennsylvania, pp. 168\u2013175."},{"key":"9161_CR10","unstructured":"Dipper, S. (2005). XML-based stand-off representation and exploitation of multi-level linguistic annotation. In Proceedings of berliner XML tage 2005 (BXML 2005), Berlin, Germany, pp. 39\u201350."},{"key":"9161_CR11","unstructured":"Dipper, S., & G\u00f6tze, M. (2005) Accessing heterogeneous linguistic data \u2013 Generic XML-based representation and flexible visualization. In Proceedings of the 2nd language and technology conference (L&T\u201905), Poznan, Poland, pp. 23\u201330."},{"issue":"3\/4","key":"9161_CR12","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1017\/S1351324904003523","volume":"10","author":"D Ferrucci","year":"2004","unstructured":"Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3\/4), 327\u2013348.","journal-title":"Natural Language Engineering"},{"issue":"23","key":"9161_CR13","first-page":"569","volume":"4","author":"J Guo","year":"1997","unstructured":"Guo, J. (1997). Critical tokenization and its properties. Computational Linguistics, 4(23), 569\u2013596.","journal-title":"Computational Linguistics"},{"key":"9161_CR14","unstructured":"Henderson, J. C. (2000). A DTD for reference key annotation of EDT entities and RDC relations in the ACE evaluations (v. 5.2.0, 2000\/01\/05). http:\/\/projects.ldc.upenn.edu\/ace\/annotation\/apf.v5.2.0.dtd . Accessed 6 August 2011."},{"issue":"4","key":"9161_CR15","first-page":"547","volume":"26","author":"C Heycock","year":"1995","unstructured":"Heycock, C. (1995). Asymmetries in reconstruction. Linguistic Inquiry, 26(4), 547\u2013570.","journal-title":"Linguistic Inquiry"},{"key":"9161_CR16","doi-asserted-by":"crossref","unstructured":"Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006) OntoNotes: The 90% solution. In Proceedings of the human language technology conference of the NAACL (HLT 2006), New York City, USA, pp. 57\u201360.","DOI":"10.3115\/1614049.1614064"},{"key":"9161_CR17","unstructured":"Ide, N. (2008). The American national corpus: Then, now and tomorrow. Keynote paper presented at the HCSNet workshop on designing the Australian national corpus, 4\u20135 December, UNSW, Sydney, Australia."},{"key":"9161_CR18","doi-asserted-by":"crossref","unstructured":"Ide, N., & Suderman, K. (2007). GrAF: A graph-based format for linguistic annotations. In Proceedings of the linguistic annotation workshop (LAW) 2007, Prague, Czech Republic, pp. 1\u20138.","DOI":"10.3115\/1642059.1642060"},{"key":"9161_CR19","doi-asserted-by":"crossref","unstructured":"Jiampojamarn, S., & Kondrak, G. (2009). Online discriminative training for grapheme-to-phoneme conversion. In Proceedings of the 10th annual conference of the international speech communication association (Interspeech 2009), Brighton, pp. 1303\u20131306.","DOI":"10.21437\/Interspeech.2009-407"},{"key":"9161_CR20","unstructured":"Junghanns, U., & Zybatow, G., (1995). Fokus im Russischen. In Proceedings of the G\u00f6ttingen focus workshop at the 17th annual conference of the German linguistic society (DGfS 1995), G\u00f6ttingen, Germany, pp. 113\u2013136."},{"key":"9161_CR21","unstructured":"Kaplan, R., & Newman, P. (1997). Lexical resource reconciliation in the xerox linguistic environment. In Proceedings of the ACL\u201997 workshop on computational environments for grammar development and linguistic engineering, Madrid, Spain, pp. 54\u201361."},{"key":"9161_CR22","unstructured":"Kingsbury, P., & Palmer, M. (2002). From TreeBank to PropBank. In Proceedings of the third international conference on language resources and evalution (LREC 2002), Las Palmas, Spain, pp. 1989\u20131993."},{"key":"9161_CR23","doi-asserted-by":"crossref","unstructured":"Kohler, K. (1996). Labelled data bank of spoken standard German. The Kiel Corpus of read\/spontaneous speech. In Proceedings of the fourth international conference on spoken language processing (ICSLP\u201996), Philadelphia, pp. 1938\u20131941.","DOI":"10.1109\/ICSLP.1996.608014"},{"key":"9161_CR24","doi-asserted-by":"crossref","unstructured":"K\u00f6nig, E., & Lezius, W. (2000). A description language for syntactically annotated corpora. In Proceedings of the 18th international conference on computational linguistics (COLING 2000), Saarbr\u00fccken, Germany, pp. 1056\u20131060.","DOI":"10.3115\/992730.992804"},{"key":"9161_CR25","unstructured":"Lezius, W. (2002). TIGERSearch. Ein Suchwerkzeug f\u00fcr Baumbanken. In Proceedings of the 6th Konferenz zur Verarbeitung nat\u00fcrlicher Sprache (KONVENS 2002), Saarbr\u00fccken, Germany, pp. 107-114."},{"key":"9161_CR26","volume-title":"Foundations of statistical natural language processing","author":"C Manning","year":"1999","unstructured":"Manning, C., & Sch\u00fctze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press."},{"key":"9161_CR27","first-page":"313","volume":"19","author":"MP Marcus","year":"1993","unstructured":"Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The PennTreeBank. Computational Linguistics, 19, 313\u2013330.","journal-title":"Computational Linguistics"},{"key":"9161_CR28","unstructured":"Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zilinska, V., & Young, B. (2004). The NomBank project: An interim report. In HLT-NAACL workshop on frontiers in corpus Annotation, Boston, Massachusetts, pp. 24\u201331."},{"key":"9161_CR30","first-page":"297","volume":"203","author":"S M\u00fcller","year":"2005","unstructured":"M\u00fcller, S. (2005). Zur Analyse der scheinbar mehrfachen Vorfeldbesetzung. Linguistische Berichte, 203, 297\u2013330.","journal-title":"Linguistische Berichte"},{"key":"9161_CR29","unstructured":"M\u00fcller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods (pp. 197\u2013214). Frankfurt, Germany: Peter Lang."},{"key":"9161_CR31","unstructured":"Poesio, M., & Artstein, R. (2008). Anaphoric annotation in the ARRAU corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, & D. Tapias (Eds.), Proceedings of the sixth international language resources and evaluation (LREC 2008), Marrakech, Morocco."},{"key":"9161_CR32","unstructured":"Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., & Webber, B. (2008). The Penn Discourse TreeBank 2.0. In Proceedings of the sixth international language resources and evaluation (LREC 2008), Marrakech, Morocco."},{"key":"9161_CR33","unstructured":"Pustejovsky, J., Hanks, P., Saur\u00ed, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Beth Sundheim, D. D., Ferro, L., & Lazo, M. (2003). The TIMEBANK corpus. In Corpus linguistics, pp. 647\u2013656."},{"key":"9161_CR34","unstructured":"Rehm, G., Schonefeld, O., Witt, A., Chiarcos, C., & Lehmberg, T. (2008). SPLICR: A sustainability platform for linguistic corpora and resources. In A. Storrer, A. Geyken, A. Siebert, & K. M. W\u00fcrzner (Eds.), Text resources and lexical knowledge (pp. 85\u201396). Berlin, Germany: Mouton de Gruyter."},{"key":"9161_CR35","unstructured":"Sampson, G. R. (1999). CHRISTINE corpus, stage I: Documentation. http:\/\/www.grsampson.net\/ChrisDoc.htm ."},{"key":"9161_CR36","unstructured":"Schmidt, T. (2004). Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the LREC 2004 workshop on XML based richly annotated corpora, Lisboa, Portugal."},{"key":"9161_CR37","unstructured":"Sekerina, I. (1997). The syntax and processing of scrambling constructions in Russian. PhD thesis, The City University of New York."},{"key":"9161_CR38","unstructured":"Stede, M., Bieler, H., Dipper, S., & Suriyawongkul, A. (2006). Summar: Combining linguistics and statistics for text summarization. In Proceedings of the 17th European conference on artificial intelligence (ECAI-06), Riva del Garda, Italy, pp. 827\u2013828."},{"key":"9161_CR39","doi-asserted-by":"crossref","unstructured":"Vilain, M., Burger, J., Aberdeen, J., Connolly, D., & Hirschman, L. (1995) A model-theoretic coreference scoring scheme. In MUC6: Proceedings of the 6th conference on message understanding, Morristown, NJ, USA, pp. 45\u201352.","DOI":"10.3115\/1072399.1072405"},{"issue":"2","key":"9161_CR40","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1162\/0891201054223977","volume":"31","author":"F. Wolf","year":"2005","unstructured":"Wolf, F., & Gibson, E. (2005). Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2), 249\u2013287.","journal-title":"Computational Linguistics"},{"key":"9161_CR41","unstructured":"Wu, D. (1998). A position statement on chinese segmentation. In Proceedings of the Chinese language processing workshop, University of Pennsylvania, Pennsylvania, Philadelphia."},{"key":"9161_CR42","doi-asserted-by":"crossref","unstructured":"Yamamoto, K., Kudo. T., Konagaya, A., & Matsumoto, Y. (2003). Protein name tagging for biomedical annotation in text. In Proceedings of the ACL 2003 workshop on natural language processing in biomedicine, Morristown, NJ, USA, pp. 65\u201372.","DOI":"10.3115\/1118958.1118967"},{"key":"9161_CR43","unstructured":"Zeldes, A., Ritz, J., L\u00fcdeling, A., & Chiarcos, C. (2009). ANNIS: A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics 2009, Liverpool, UK."},{"key":"9161_CR44","unstructured":"Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the 7th international conference on language resources and evaluation (LREC 2010), Valetta, Malta."}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-011-9161-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10579-011-9161-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-011-9161-0","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,3]],"date-time":"2021-12-03T14:37:01Z","timestamp":1638542221000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10579-011-9161-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,9,4]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,3]]}},"alternative-id":["9161"],"URL":"https:\/\/doi.org\/10.1007\/s10579-011-9161-0","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,9,4]]}}}