{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T23:13:25Z","timestamp":1773443605482,"version":"3.50.1"},"reference-count":89,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,2,20]],"date-time":"2022-02-20T00:00:00Z","timestamp":1645315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,2,20]],"date-time":"2022-02-20T00:00:00Z","timestamp":1645315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks\u2014based on available literature\u2014along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.<\/jats:p>","DOI":"10.1007\/s10579-022-09581-9","type":"journal-article","created":{"date-parts":[[2022,2,20]],"date-time":"2022-02-20T05:02:18Z","timestamp":1645333338000},"page":"493-544","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations"],"prefix":"10.1007","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0147-2208","authenticated-orcid":false,"given":"Manuela","family":"Sanguinetti","sequence":"first","affiliation":[]},{"given":"Cristina","family":"Bosco","sequence":"additional","affiliation":[]},{"given":"Lauren","family":"Cassidy","sequence":"additional","affiliation":[]},{"given":"\u00d6zlem","family":"\u00c7etino\u011flu","sequence":"additional","affiliation":[]},{"given":"Alessandra Teresa","family":"Cignarella","sequence":"additional","affiliation":[]},{"given":"Teresa","family":"Lynn","sequence":"additional","affiliation":[]},{"given":"Ines","family":"Rehbein","sequence":"additional","affiliation":[]},{"given":"Josef","family":"Ruppenhofer","sequence":"additional","affiliation":[]},{"given":"Djam\u00e9","family":"Seddah","sequence":"additional","affiliation":[]},{"given":"Amir","family":"Zeldes","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,2,20]]},"reference":[{"key":"9581_CR1","doi-asserted-by":"crossref","unstructured":"Albogamy, F., & Ramsay, A. (2017). Universal dependencies for Arabic Tweets. In International  conference recent advances in natural language processing, (RANLP) (pp. 46\u201351).","DOI":"10.26615\/978-954-452-049-6_007"},{"key":"9581_CR2","doi-asserted-by":"crossref","unstructured":"Aufrant, L., Wisniewski, G., & Yvon, F. (2017). LIMSI@CoNLL\u201917: UD shared task. In Proceedings of the CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies (pp. 163\u2013173).","DOI":"10.18653\/v1\/K17-3017"},{"key":"9581_CR3","unstructured":"Azzi, A. A., Bouamor, H., & Ferradans, S. (2019). The FinSBD-2019 shared task: Sentence boundary detection in PDF noisy text in the financial domain. In Proceedings of the first workshop on financial technology and natural language processing (pp. 74\u201380), Macao, China. https:\/\/www.aclweb.org\/anthology\/W19-5512"},{"key":"9581_CR4","unstructured":"Balahur, A. (2013). Sentiment analysis in social media texts. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 120\u2013128), Atlanta, GA."},{"key":"9581_CR5","unstructured":"Behzad, S., & Zeldes, A. (2020). A cross-genre ensemble approach to robust reddit part of speech tagging. In Proceedings of the 12th web as corpus workshop (WAC-XII) (pp. 50\u201356), Marseille, France."},{"key":"9581_CR6","doi-asserted-by":"crossref","unstructured":"Bhat, I., Bhat, R.\u00a0A., Shrivastava, M., & Sharma, D. (2018). Universal dependency parsing for Hindi\u2013English code-switching. In Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies, Vol. 1 (Long Papers) (pp. 987\u2013998).","DOI":"10.18653\/v1\/N18-1090"},{"key":"9581_CR7","doi-asserted-by":"crossref","unstructured":"Bird, S., & Loper, E. (2004). NLTK: The natural language toolkit. In Proceedings of the ACL interactive poster and demonstration sessions (pp. 214\u2013217), Barcelona, Spain. Association for Computational Linguistics.","DOI":"10.3115\/1219044.1219075"},{"key":"9581_CR8","doi-asserted-by":"publisher","unstructured":"Bj\u00f6rkelund, A., Falenska, A., Yu, X., & Kuhn, J. (2017). IMS at the CoNLL 2017 UD shared task: CRFs and perceptrons meet neural networks. In Proceedings of the CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies (pp. 40\u201351), Vancouver, Canada. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/K17-3004","DOI":"10.18653\/v1\/K17-3004"},{"key":"9581_CR9","volume-title":"Le fran\u00e7ais parl\u00e9","author":"Claire Blanche-Benveniste","year":"1990","unstructured":"Blanche-Benveniste, C., Bilger, M., Rouget, C., Van Den Eynde, K., Mertens, P., & Willems, D. (1990). Le fran\u00e7ais parl\u00e9. \u00c9tudes grammaticales. CNRS Editions."},{"key":"9581_CR10","doi-asserted-by":"crossref","unstructured":"Blodgett, S.\u00a0L., Wei, J. T. Z., & O\u2019Connor, B. (2018). Twitter universal dependency parsing for African-American and mainstream American English. In Proceedings of the ACL 2018\u201456th annual Meeting of the Association for Computational Linguistics (Long Papers) (Vol. 1, pp. 1415\u20131425). ACL.","DOI":"10.18653\/v1\/P18-1131"},{"key":"9581_CR11","doi-asserted-by":"crossref","unstructured":"Bosco, C., Tamburini, F., Bolioli, A., & Mazzei, A. (2016). Overview of the EVALITA 2016 part of speech tagging on TWitter for ITAlian Task. In Proceedings of the fifth evaluation campaign of natural language processing and speech tools for Italian. Final Workshop (EVALITA 2016). CEUR.","DOI":"10.4000\/books.aaccademia.1956"},{"key":"9581_CR12","unstructured":"Candito, M., Guillaume, B., Perrier, G., & Seddah, D. (2017). Enhanced UD dependencies with neutralized diathesis alternation. In Proceedings of the fourth international conference on dependency linguistics (Depling 2017) (pp. 42\u201353), Pisa, Italy. Link\u00f6ping University Electronic Press. https:\/\/www.aclweb.org\/anthology\/W17-6507"},{"key":"9581_CR13","doi-asserted-by":"crossref","unstructured":"Caron, B., Courtin, M., Gerdes, K., & Kahane, S.. (2019). A surface-syntactic UD treebank for Naija. In Proceedings of the 18th international workshop on treebanks and linguistic theories (TLT'19) (pp. 13\u201324). ACL.","DOI":"10.18653\/v1\/W19-7803"},{"key":"9581_CR15","unstructured":"\u00c7etino\u011flu, \u00d6. (2016). A Turkish-German Code-Switching Corpus. In Proceedings of the tenth international conference on Language Resources and Evaluation (LREC\u201916) (pp. 4215\u20134220). ELRA."},{"key":"9581_CR14","doi-asserted-by":"crossref","unstructured":"\u00c7etino\u011flu, \u00d6., & \u00c7\u00f6ltekin, \u00c7. (2016). Part of speech annotation of a Turkish-German code-switching corpus. In Proceedings of the tenth Linguistic Annotation Workshop (LAW-X) (pp. 120\u2013130). ACL.","DOI":"10.18653\/v1\/W16-1714"},{"key":"9581_CR16","doi-asserted-by":"crossref","unstructured":"Cignarella, A.\u00a0T., Bosco, C., & Rosso, P. (2019). Presenting TWITTIRO-UD: An Italian Twitter Treebank in universal dependencies. In Proceedings of the fifth international conference on dependency linguistics (Depling, SyntaxFest 2019) (pp. 190\u2013197).","DOI":"10.18653\/v1\/W19-7723"},{"key":"9581_CR17","unstructured":"Croft, W., Nordquist, D., Looney, K., & Regan, M. (2017). Linguistic typology meets universal dependencies. In Proceedings of the 15th international workshop on Treebanks and Linguistic Theories (TLT) (pp. 63\u201375)."},{"key":"9581_CR18","unstructured":"Daiber, J., & Van Der Goot, R. (2016). The denoised Web Treebank: Evaluating dependency parsing under noisy input conditions. In Proceedings of the 10th international conference on Language Resources and Evaluation (LREC 2016) (pp. 649\u2013653)."},{"key":"9581_CR20","doi-asserted-by":"crossref","unstructured":"Dobrovoljc, K., Erjavec, T., & Krek, S. (2017). The Universal Dependencies treebank for Slovenian. In Proceedings of the 6th workshop on Balto-Slavic natural language processing. Association for Computational Linguistics.","DOI":"10.18653\/v1\/W17-1406"},{"key":"9581_CR19","unstructured":"Dobrovoljc, K., & Nivre, J. (2016). The universal dependencies treebank of spoken Slovenian. In Proceedings of the tenth international conference on Language Resources and Evaluation (LREC 2016) (pp. 1566\u20131573). ELRA."},{"key":"9581_CR21","doi-asserted-by":"crossref","unstructured":"Droganova, Kira, & Zeman, Daniel. (2019). Towards deep universal dependencies. In Proceedings of the fifth international conference on Dependency Linguistics (Depling, SyntaxFest 2019) (pp. 144\u2013152), Paris, France. ACL.","DOI":"10.18653\/v1\/W19-7717"},{"key":"9581_CR22","unstructured":"Eisenstein, J. (2013). What to do about bad language on the Internet. In Proceedings of the 2013 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 359\u2013369). ACL."},{"key":"9581_CR23","doi-asserted-by":"crossref","unstructured":"Fischer, K. (2006). Frames, constructions, and morphemic meanings: The functional polysemy of discourse particles. In K. Fischer (Ed.), Approaches to discourse particles (pp. 427\u2013447). Elsevier.","DOI":"10.1163\/9780080461588_023"},{"key":"9581_CR24","unstructured":"Foster, J. (2010). \u201ccba to check the spelling\u201d: Investigating parser performance on discussion forum posts. In Human language technologies: The 2010 annual conference of the North American Chapter of the Association for Computational Linguistics (pp. 381\u2013384). ACL."},{"key":"9581_CR25","unstructured":"Foster, J., \u00c7etino\u011flu, \u00d6., Wagner, J., Le Roux, J., Nivre, J., Hogan, D., & van Genabith, J. (2011). From News to comment: Resources and benchmarks for parsing the Language of Web 2.0. In Proceedings of 5th international joint conference on Natural Language Processing (pp. 893\u2013901)."},{"key":"9581_CR26","doi-asserted-by":"crossref","unstructured":"Gerdes, K., Guillaume, B., Kahane, S., & Perrier, G.. (2018). SUD or surface-syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD. In Proceedings of the second workshop on Universal Dependencies (UDW 2018) (pp. 66\u201374), Brussels, Belgium, November. ACL.","DOI":"10.18653\/v1\/W18-6008"},{"key":"9581_CR27","doi-asserted-by":"crossref","unstructured":"Gimpel, K., Schneider, N., O\u2019Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N.\u00a0A. (2011). Part-of-speech tagging for Twitter: Annotation, features, and experiments. In Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 42\u201347). ACL.","DOI":"10.21236\/ADA547371"},{"key":"9581_CR28","doi-asserted-by":"crossref","unstructured":"Kaljahi, R., Foster, J., Roturier, J., Ribeyre, C., Lynn, T., & Le Roux, J. (2015). Foreebank: Syntactic analysis of customer support forums. In Conference proceedings\u2014EMNLP 2015: Conference on empirical methods in natural language processing (pp. 1341\u20131347).","DOI":"10.18653\/v1\/D15-1157"},{"key":"9581_CR30","doi-asserted-by":"crossref","unstructured":"Kirov, C., Cotterell, R., Sylak-Glassman, J., Walther, G., Vylomova, E., Xia, P., Faruqui, M., Mielke, S.\u00a0J., McCarthy, A., K\u00fcbler, S., Yarowsky, D., Eisner, J., & Hulden, M. (2018). UniMorph 2.0: Universal morphology. In Proceedings of the eleventh international conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA). https:\/\/www.aclweb.org\/anthology\/L18-1293","DOI":"10.18653\/v1\/K18-3001"},{"key":"9581_CR29","unstructured":"Kirov, C., Sylak-Glassman, J., Que, R., & Yarowsky, D. (2016). Very-large scale parsing and normalization of Wiktionary morphological paradigms. In Proceedings of the tenth international conference on Language Resources and Evaluation (LREC). ELRA."},{"key":"9581_CR31","doi-asserted-by":"crossref","unstructured":"Kong, L., Schneider, N., Swayamdipta, S., Bhatia, A., Dyer, C., & Smith, N.\u00a0A. (2014). A dependency parser for Tweets. In The conference on Empirical Methods in Natural Language Processing (EMNLP\u201914) (pp. 1001\u20131012).","DOI":"10.3115\/v1\/D14-1108"},{"key":"9581_CR32","unstructured":"Lacheret, A., Kahane, S., Beliao, J., Dister, A., Gerdes, K., Goldman, J.-P., Obin, N., Pietrandrea, P., & Tchobanov, A. (2014). Rhapsodie: a prosodic-syntactic treebank for spoken French. In Proceedings of the ninth international conference on Language Resources and Evaluation (LREC\u201914) (pp. 295\u2013301), Reykjavik, Iceland. European Language Resources Association (ELRA). http:\/\/www.lrec-conf.org\/proceedings\/lrec2014\/pdf\/381_Paper.pdf"},{"key":"9581_CR34","unstructured":"Leung, H., Poiret, R., Wong, T., Chen, X., Gerdes, K., & Lee, J. (2016). Developing universal dependencies for Mandarin Chinese. In Proceedings of the 12th workshop on Asian Language Resources (ALR12) (pp. 20\u201329)."},{"key":"9581_CR35","doi-asserted-by":"crossref","unstructured":"Liu, Y., Zhu, Y., Che, W., Qin, B., Schneider, N., & Smith, N.\u00a0A. (2018). Parsing Tweets into universal dependencies. In Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers) (pp. 965\u2013975). ACL.","DOI":"10.18653\/v1\/N18-1088"},{"key":"9581_CR36","unstructured":"Luotolahti, J., Kanerva, J., Laippala, V., Pyysalo, S., & Ginter, F. (2015). Towards universal web parsebanks. In Proceedings of the third international conference on Dependency Linguistics (Depling 2015) (pp. 211\u2013220). Uppsala University."},{"key":"9581_CR37","unstructured":"Lynn, T., & Scannell, K. (2019). Code-Sswitching in Irish Tweets: A preliminary analysis. In Proceedings of the Celtic Language Technology workshop (pp. 32\u201340). European Association for Machine Translation."},{"key":"9581_CR38","doi-asserted-by":"crossref","unstructured":"Lynn, T., Scannell, K., & Maguire, E.. (2015). Minority language Twitter: Part-of-speech tagging and analysis of Irish Tweets. In Proceedings of the workshop on noisy user-generated text (pp. 1\u20138). ACL.","DOI":"10.18653\/v1\/W15-4301"},{"key":"9581_CR39","doi-asserted-by":"crossref","unstructured":"Manning, C.\u00a0D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.\u00a0J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of ACL 2014: System demonstrations (pp. 55\u201360), Baltimore, MD.","DOI":"10.3115\/v1\/P14-5010"},{"issue":"2","key":"9581_CR40","first-page":"313","volume":"19","author":"Mitchell Marcus","year":"1993","unstructured":"Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313\u2013330.","journal-title":"Computational Linguistics"},{"key":"9581_CR41","unstructured":"Mart\u00ednez Alonso, H., Seddah, D., & Sagot, B. (2016). From noisy questions to minecraft texts: Annotation challenges in extreme syntax scenarios. In Proceedings of the 2nd workshop on noisy user-generated text (pp. 127\u2013137)."},{"key":"9581_CR42","doi-asserted-by":"crossref","unstructured":"Mataoui, M., Hacine, T. E. B., Tellache, I., Bakhtouchi, A., & Zelmat, O. (2018). A new syntax-based aspect detection approach for sentiment analysis in Arabic reviews. In Proceedings of ICNLSP 2018 (pp. 1\u20136), Algiers.","DOI":"10.1109\/ICNLSP.2018.8374373"},{"key":"9581_CR44","unstructured":"McCarthy, A.\u00a0D., Kirov, C., Grella, M., Nidhi, A., Xia, P., Gorman, K., Vylomova, E., Mielke, S.\u00a0J., Nicolai, G., Silfverberg, M., Arkhangelskiy, T., NatalyKrizhanovsky, A. K., Klyachko, E., Sorokin, A., Mansfield, J., Ern\u0161treits, V., Pinter, Y., Jacobs, C.\u00a0L., Cotterell, R., Hulden, M., & Yarowsky, D. (2020). UniMorph 3.0: Universal morphology. In Proceedings of The 12th language resources and evaluation conference (pp. 3922\u20133931), Marseille, France. European Language Resources Association. ISBN 979-10-95546-34-4. https:\/\/www.aclweb.org\/anthology\/2020.lrec-1.483"},{"key":"9581_CR43","doi-asserted-by":"publisher","unstructured":"McCarthy, A.\u00a0D., Silfverberg, M., Cotterell, R., Hulden, M., & Yarowsky, D. (2018). Marrying universal dependencies and universal morphology. In Proceedings of the second workshop on Universal Dependencies (UDW 2018) (pp. 91\u2013101), Brussels, Belgium. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W18-6011. https:\/\/www.aclweb.org\/anthology\/W18-6011","DOI":"10.18653\/v1\/W18-6011"},{"key":"9581_CR45","doi-asserted-by":"crossref","unstructured":"Nivre, J., de\u00a0Marneffe, M.-C., Ginter, F., Hajic, J., Manning, C.\u00a0D., Pyysalo, S., Schuster, S., Tyers, F.\u00a0M., & Zeman, D. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. CoRR. arXiv:2004.10643","DOI":"10.1162\/coli_a_00402"},{"key":"9581_CR90","unstructured":"\u00d8vrelid, L., & Hohle, P. (2016). Universal dependencies for Norwegian. In Proceedings of the tenth international conference on Language Resources and Evaluation (LREC 2016) (pp. 1579\u20131585). ELRA."},{"key":"9581_CR46","unstructured":"Owoputi, O., O\u2019Connor, B., Dyer, C., Gimpel, K., Schneider, N., & Smith, N. A. (2013). Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of NAACL, 2013 (pp. 380\u2013390)."},{"key":"9581_CR47","doi-asserted-by":"crossref","unstructured":"Pamay, T., Sulubacak, U., Toruno\u011flu-Selamet, D., & Eryi\u011fit, G.. (2015). The annotation process of the ITU Web Treebank. In Proceedings of the 9th linguistic annotation workshop (pp. 95\u2013101).","DOI":"10.3115\/v1\/W15-1610"},{"key":"9581_CR48","unstructured":"Peng, S., & Zeldes, A. (2018). All roads lead to UD: Converting Stanford and Penn parses to English Universal Dependencies with multilayer annotations. In Proceedings of the joint workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018) (pp. 167\u2013177), Santa Fe, NM."},{"key":"9581_CR49","unstructured":"Petrov, S., & McDonald, R. (2012). Overview of the 2012 shared task on parsing the web. In Notes of the first workshop on syntactic analysis of non-canonical language (SANCL)."},{"key":"9581_CR50","doi-asserted-by":"crossref","unstructured":"Pietrandrea, P., Kahane, S., Lacheret-Dujour, A., & Sabio, F.. (2014). The notion of sentence and other discourse units in corpus annotation. In T. Raso & H. Mello (Eds.), Spoken corpora and linguistic studies (pp. 331\u2013364). John Benjamins.","DOI":"10.1075\/scl.61.12pie"},{"key":"9581_CR51","unstructured":"Popel, M., Zabokrtsk\u00fd, Z., & Vojtek, M. (2017). Udapi: Universal API for universal dependencies. In Universal dependencies workshop at NoDaLiDa, 2017 (pp. 96\u2013101)."},{"key":"9581_CR52","unstructured":"Proisl, T. (2018). Someweta: A part-of-speech tagger for German Social Media and web texts. In Proceedings of the 11th international conference on language resources and evaluation (LREC 2018) (pp. 665\u2013670). ELRA."},{"key":"9581_CR53","unstructured":"Read, J., Dridan, R., Oepen, S., & Solberg, L.\u00a0J.. (2012a). Sentence boundary detection: A long solved problem? In Proceedings of COLING 2012: Posters (pp. 985\u2013994), Mumbai, India. The COLING 2012 Organizing Committee. https:\/\/www.aclweb.org\/anthology\/C12-2096"},{"key":"9581_CR54","unstructured":"Read, J., Flickinger, D., Dridan, R., Oepen, S., & \u00d8vrelid, L. (2012b). The wesearch corpus, treebank, and treecache. a comprehensive sample of user-generated content. In Proceedings of the 8th international conference on language resources and evaluation."},{"key":"9581_CR55","doi-asserted-by":"crossref","unstructured":"Rehbein, I. (2015). Filled pauses in user-generated content are words with extra-propositional meaning. In Proceedings of the second workshop on extra-propositional aspects of meaning in computational semantics (ExProM 2015) (pp. 12\u201321). ACL.","DOI":"10.3115\/v1\/W15-1302"},{"key":"9581_CR57","doi-asserted-by":"crossref","unstructured":"Rehbein, I., Ruppenhofer, J., & Do, B.-N. (2019). tweeDe\u2014A universal dependencies Treebank for German tweets. In Proceedings of the 17th workshop on Treebanks and Linguistic Theories (TLT 2019).","DOI":"10.18653\/v1\/W19-7811"},{"key":"9581_CR56","unstructured":"Rehbein, I., Ruppenhofer, J., & Zimmermann, V. (2018). A harmonised testsuite for pos tagging of german social media data. In Proceedings of the 27th international conference on computational linguistics, KONVENS 2018 (pp. 18\u201328), Wien, \u00d6sterreich."},{"key":"9581_CR58","doi-asserted-by":"crossref","unstructured":"Reznicek, M., L\u00fcdeling, A., & Hirschmann, H. (2013). Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture. In A. D\u00edaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 101\u2013124). John Benjamins.","DOI":"10.1075\/scl.59.07rez"},{"key":"9581_CR59","doi-asserted-by":"publisher","unstructured":"Sanchez, G. (2019). Sentence boundary detection in legal text. In Proceedings of the natural legal language processing workshop 2019 (pp. 31\u201338), Minneapolis, Minnesota. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W19-2204. https:\/\/www.aclweb.org\/anthology\/W19-2204.","DOI":"10.18653\/v1\/W19-2204"},{"key":"9581_CR61","unstructured":"Sanguinetti, M., Bosco, C., Cassidy, L., \u00c7etino\u011flu, \u00d6., Cignarella, A. T., Lynn, T., Rehbein, I., Ruppenhofer, J., Seddah, D., & Zeldes, A. (2020). Treebanking user-generated content: A proposal for a unified representation in Universal Dependencies. In Proceedings of the 12th language resources and evaluation conference (pp. 5240\u20135250), Marseille, France. European Language Resources Association."},{"key":"9581_CR60","unstructured":"Sanguinetti, M., Bosco, C., Lavelli, A., Mazzei, A., Antonelli, O., & Tamburini, F. (2018). PoSTWITA-UD: An Italian Twitter Treebank in universal dependencies. In LREC 2018\u201411th international conference on language resources and evaluation (pp. 1768\u20131775)."},{"key":"9581_CR64","unstructured":"Schuster, S., Lamm, M., & Manning, C. D. (2017). Gapping constructions in universal dependencies v2. In Proceedings of the NoDaLiDa 2017 workshop on Universal Dependencies (UDW 2017) (pp. 123\u2013132). ACL."},{"key":"9581_CR62","unstructured":"Schuster, S., & Manning, C.\u00a0D. (2016a). Enhanced English Universal Dependencies: An improved representation for natural language understanding tasks. In Proceedings of the 10th language resource and evalutation conference (LREC 2016) (pp. 2371\u20132378). ELRA."},{"key":"9581_CR63","unstructured":"Schuster, S., & Manning, C.\u00a0D. (2016b). Enhanced English universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the tenth international conference on language resources and evaluation (LREC\u201916) (pp. 2371\u20132378), Portoro\u017e, Slovenia. European Language Resources Association (ELRA). https:\/\/www.aclweb.org\/anthology\/L16-1376"},{"key":"9581_CR66","doi-asserted-by":"publisher","unstructured":"Seddah, D., Essaidi, F., Fethi, A., Futeral, M., Muller, B., Su\u00e1rez, P.\u00a0J. O., Sagot, B., & Srivastava, A. (2020). Building a user-generated content North-African Arabizi treebank: Tackling hell. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 1139\u20131150). Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.107. https:\/\/www.aclweb.org\/anthology\/2020.acl-main.107","DOI":"10.18653\/v1\/2020.acl-main.107"},{"key":"9581_CR65","unstructured":"Seddah, D., Sagot, B., Candito, M., Mouilleron, V., & Combet, V. (2012). The French Social Media Bank: A treebank of noisy user generated content. In 24th International conference on computational linguistics\u2014Proceedings of COLING 2012: Technical papers (pp. 2441\u20132458). ACL."},{"issue":"1","key":"9581_CR67","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1016\/j.ipm.2015.03.002","volume":"52","author":"Aliaksei Severyn","year":"2016","unstructured":"Severyn, A., Moschitti, A., Uryupina, O., Plank, B., & Filippova, K. (2016). Multi-lingual opinion mining on YouTube. Information Processing & Management, 52(1), 46\u201360.","journal-title":"Information Processing & Management"},{"key":"9581_CR68","unstructured":"Silveira, N. Dozat, T., De Marneffe, M.\u00a0C., Bowman, S.\u00a0R., Connor, M., Bauer, J., & Manning, C.\u00a0D. (2014). A gold standard dependency corpus for English. In Proceedings of the 9th international conference on language resources and evaluation, LREC 2014 (pp. 2897\u20132904). ELRA."},{"key":"9581_CR70","doi-asserted-by":"crossref","unstructured":"Solorio, T., Blair, E., Maharjan, S., Bethard, S., Diab, M., Ghoneim, M., Hawwari, A., Alghamdi, F., Hirschberg, J., Chang, A., & Fung, P. (2014). Overview for the first shared task on language identification in code-switched data. In Proceedings of the CodeSwitch workshop.","DOI":"10.3115\/v1\/W14-3907"},{"key":"9581_CR69","doi-asserted-by":"crossref","unstructured":"Solorio, T., & Liu, Y. (2008). Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the conference on empirical methods in natural language processing (EMNLP \u201908) (pp. 1051\u20131060). ACL.","DOI":"10.3115\/1613715.1613852"},{"key":"9581_CR71","doi-asserted-by":"publisher","unstructured":"Stevenson, M., & Gaizauskas, R. (2000). Experiments on sentence boundary detection. In Proceedings of the sixth conference on applied natural language processing, ANLC \u201900 (pp. 84\u201389), USA. Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/974147.974159","DOI":"10.3115\/974147.974159"},{"key":"9581_CR72","doi-asserted-by":"crossref","unstructured":"Taul\u00e9, M., Mart\u00ed, M.\u00a0A., Bies, A., Nofre, M., Gar\u00ed, A., Song, Z., Strassel, S., & Ellis, J. (2015). Spanish treebank annotation of informal non-standard web text. In F. Daniel & O. Diaz (Eds.), Current trends in web engineering (pp. 15\u201327). Springer.","DOI":"10.1007\/978-3-319-24800-4_2"},{"key":"9581_CR73","unstructured":"Tyers, F.\u00a0M., & Mishchenkova, K. (2020). Dependency annotation of noun incorporation in polysynthetic languages. In Proceedings of the fourth workshop on Universal Dependencies (UDW 2020) (pp. 195\u2013204)."},{"key":"9581_CR74","unstructured":"universaldependencies.org. (2019a). Typos and other errors in underlying text: Wrongly split word. https:\/\/universaldependencies.org\/u\/overview\/typos.html#wrongly-split-word. Accessed: 2020-07-06."},{"key":"9581_CR75","unstructured":"universaldependencies.org. (2019b). Enhanced dependencies. Retrieved August 3, 2020, from https:\/\/universaldependencies.org\/u\/overview\/enhanced-syntax.html"},{"key":"9581_CR76","unstructured":"universaldependencies.org. (2019c). Annotation of foreign strings in the Universal Dependencies guidelines. Retrieved November 28, 2019, from https:\/\/universaldependencies.org\/cs\/dep\/flat-foreign.html"},{"key":"9581_CR77","unstructured":"universaldependencies.org. (2019d). Pos-tagging of foreign tokens in the Universal Dependencies guidelines. Retrieved November 28, 2019, from  https:\/\/universaldependencies.org\/u\/pos\/X.html"},{"key":"9581_CR78","unstructured":"universaldependencies.org. (2019e). Morphology: General principles. Retrieved July 15, 2019, from  https:\/\/universaldependencies.org\/u\/overview\/morphology.html"},{"key":"9581_CR79","unstructured":"universaldependencies.org. (2019f). Annotation of speech repair in the Universal Dependencies guidelines. Retrieved November 28, 2019, from  https:\/\/universaldependencies.org\/u\/dep\/reparandum.html"},{"key":"9581_CR80","unstructured":"universaldependencies.org. (2019g). Tokenization and Word Segmentation guidelines. Retrieved December 2, 2019, from  https:\/\/universaldependencies.org\/u\/overview\/tokenization.html"},{"key":"9581_CR81","unstructured":"universaldependencies.org. (2021). Annotation of style or sublanguage to which a word form belongs. Retrieved December 15, 2021, from  https:\/\/universaldependencies.org\/u\/feat\/Style.html"},{"key":"9581_CR82","doi-asserted-by":"crossref","unstructured":"Van Der\u00a0Goot, R., & van Noord, G. (2018). Modeling input uncertainty in neural network dependency parsing. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 4984\u20134991).","DOI":"10.18653\/v1\/D18-1542"},{"key":"9581_CR83","doi-asserted-by":"publisher","unstructured":"Verdonik, D., Kosem, I., Vitez, A. Z., Krek, S., & Stabej, M. (2013). Compilation, transcription and usage of a reference speech corpus: The case of the slovene corpus gos. Language Resources and Evaluation, 47, 12. https:\/\/doi.org\/10.1007\/s10579-013-9216-5","DOI":"10.1007\/s10579-013-9216-5"},{"key":"9581_CR84","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1016\/j.knosys.2016.11.014","volume":"118","author":"David Vilares","year":"2017","unstructured":"Vilares, D., G\u00f3mez-Rodr\u00edguez, C., & Alonso, M. A. (2017). Universal, unsupervised (rule-based), uncovered sentiment analysis. Knowledge-Based Systems, 118, 45\u201355.","journal-title":"Knowledge-Based Systems"},{"key":"9581_CR85","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhang, Y., Chan, G. Y. L., Yang, J., & Chieu, H.\u00a0L. (2017). Universal dependencies parsing for colloquial Singaporean English. In ACL 2017\u201455th annual meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1, pp. 1732\u20131744).","DOI":"10.18653\/v1\/P17-1159"},{"key":"9581_CR86","doi-asserted-by":"crossref","unstructured":"Wang, W.\u00a0Y., Kong, L., Mazaitis, K., & Cohen, W.\u00a0W. (2014). Dependency parsing for Weibo: An efficient probabilistic logic programming approach. In EMNLP 2014\u20142014 conference on empirical methods in Natural Language Processing, Proceedings of the Conference (pp. 1152\u20131158).","DOI":"10.3115\/v1\/D14-1122"},{"key":"9581_CR87","unstructured":"Westpfahl, S., & Gorisch, J. (2018). A syntax-based scheme for the annotation and segmentation of German spoken language interactions. In Proceedings of the joint workshop on linguistic anotation, multiword expressions and constructions (LAW-MWE-CxG-2018) (pp. 109\u2013120), Santa Fe, New Mexico, USA. Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/W18-4913"},{"key":"9581_CR88","unstructured":"Wong, T., Gerdes, K., Leung, H., & Lee, J.. (2017). Quantitative comparative syntax on the Cantonese\u2013Mandarin parallel dependency treebank. In Proceedings of the fourth international conference on Dependency Linguistics (Depling 2017) (pp. 266\u2013275), Pisa,Italy. Link\u00f6ping University Electronic Press. https:\/\/www.aclweb.org\/anthology\/W17-6530"},{"issue":"3","key":"9581_CR89","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1007\/s10579-016-9343-x","volume":"51","author":"Amir Zeldes","year":"2017","unstructured":"Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581\u2013612.","journal-title":"Language Resources and Evaluation"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-022-09581-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-022-09581-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-022-09581-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,20]],"date-time":"2023-05-20T15:06:34Z","timestamp":1684595194000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-022-09581-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,20]]},"references-count":89,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["9581"],"URL":"https:\/\/doi.org\/10.1007\/s10579-022-09581-9","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,20]]},"assertion":[{"value":"13 January 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 February 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}