{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T16:17:45Z","timestamp":1771258665695,"version":"3.50.1"},"reference-count":54,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2022,1,20]],"date-time":"2022-01-20T00:00:00Z","timestamp":1642636800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Learning idiomatic expressions is seen as one of the most challenging stages in second-language learning because of their unpredictable meaning. A similar situation holds for their identification within natural language processing applications such as machine translation and parsing. The lack of high-quality usage samples exacerbates this challenge not only for humans but also for artificial intelligence systems. This article introduces a gamified crowdsourcing approach for collecting language learning materials for idiomatic expressions; a messaging bot is designed as an asynchronous multiplayer game for native speakers who compete with each other while providing idiomatic and nonidiomatic usage examples and rating other players\u2019 entries. As opposed to classical crowd-processing annotation efforts in the field, for the first time in the literature, a crowd-creating &amp; crowd-rating approach is implemented and tested for idiom corpora construction. The approach is language-independent and evaluated on two languages in comparison to traditional data preparation techniques in the field. The reaction of the crowd is monitored under different motivational means (namely, gamification affordances and monetary rewards). The results reveal that the proposed approach is powerful in collecting the targeted materials, and although being an explicit crowdsourcing approach, it is found entertaining and useful by the crowd. The approach has been shown to have the potential to speed up the construction of idiom corpora for different natural languages to be used as second-language learning material, training data for supervised idiom identification systems, or samples for lexicographic studies.<\/jats:p>","DOI":"10.1017\/s1351324921000401","type":"journal-article","created":{"date-parts":[[2022,1,20]],"date-time":"2022-01-20T12:14:16Z","timestamp":1642680856000},"page":"909-941","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":11,"title":["Gamified crowdsourcing for idiom corpora construction"],"prefix":"10.1017","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4607-7305","authenticated-orcid":false,"given":"G\u00fcl\u015een","family":"Eryi\u011fit","sequence":"first","affiliation":[]},{"given":"Ali","family":"\u015eenta\u015f","sequence":"additional","affiliation":[]},{"given":"Johanna","family":"Monti","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,1,20]]},"reference":[{"key":"S1351324921000401_ref3","unstructured":"Berk, G. , Erden, B. and G\u00fcng\u00f6r, T. (2018). Deep-BGT at PARSEME shared task 2018: Bidirectional LSTM-CRF model for verbal multiword expression identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 248\u2013253."},{"key":"S1351324921000401_ref25","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijhcs.2020.102495"},{"key":"S1351324921000401_ref26","doi-asserted-by":"publisher","DOI":"10.1016\/j.cogpsych.2008.05.002"},{"key":"S1351324921000401_ref11","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00302"},{"key":"S1351324921000401_ref35","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2017.10.015"},{"key":"S1351324921000401_ref42","unstructured":"Rumshisky, A. , Botchan, N. , Kushkuley, S. and Pustejovsky, J. (2012). Word sense inventories by non-experts. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC\u201912), Istanbul, Turkey. European Language Resources Association (ELRA), pp. 4055\u20134059."},{"key":"S1351324921000401_ref16","unstructured":"Fort, K. , Guillaume, B. , Constant, M. , Lef\u00e8bvre, N. and Pilatte, Y.-A. (2018). \u201cfingers in the nose\u201d: Evaluating speakers\u2019 identification of multi-word expressions using a slightly gamified crowdsourcing platform. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 207\u2013213."},{"key":"S1351324921000401_ref1","unstructured":"Akkaya, C. , Conrad, A. , Wiebe, J. and Mihalcea, R. (2010). Amazon Mechanical Turk for subjectivity word sense disambiguation. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk, Los Angeles. Association for Computational Linguistics, pp. 195\u2013203."},{"key":"S1351324921000401_ref22","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog0000_44"},{"key":"S1351324921000401_ref34","doi-asserted-by":"crossref","unstructured":"Morschheuser, B. , Hamari, J. and Maedche, A. (2019). Cooperation or competition \u2013 when do people contribute more? a field experiment on gamification of crowdsourcing. International Journal of Human-Computer Studies 127, 7\u201324. Strengthening gamification studies: critical challenges and new opportunities.","DOI":"10.1016\/j.ijhcs.2018.10.001"},{"key":"S1351324921000401_ref36","doi-asserted-by":"publisher","DOI":"10.1016\/j.tsc.2020.100645"},{"key":"S1351324921000401_ref41","unstructured":"Ramisch, C. , Cordeiro, S.R. , Savary, A. , Vincze, V. , Barbu Mititelu, V. , Bhatia, A. , Buljan, M. , Candito, M. , Gantar, P. , Giouli, V. , G\u00fcng\u00f6r, T. , Hawwari, A. , I\u00f1urrieta, U. , Kovalevskait\u0116, J. , Krek, S. , Lichte, T. , Liebeskind, C. , Monti, J. , Parra Escartn, C. , QasemiZadeh, B. , Ramisch, R. , Schneider, N. , Stoyanova, I. , Vaidya, A. and Walsh, A. (2018). Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 222\u2013240."},{"key":"S1351324921000401_ref9","doi-asserted-by":"publisher","DOI":"10.3115\/1626481.1626511"},{"key":"S1351324921000401_ref43","author":"Savary","year":"2018"},{"key":"S1351324921000401_ref10","doi-asserted-by":"publisher","DOI":"10.1145\/1088622.1088644"},{"key":"S1351324921000401_ref20","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9104-1"},{"key":"S1351324921000401_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2014.05.007"},{"key":"S1351324921000401_ref21","first-page":"1","article-title":"The rise of crowdsourcing","volume":"14","author":"Howe","year":"2006","journal-title":"Wired Magazine"},{"key":"S1351324921000401_ref6","author":"Bontcheva","year":"2017"},{"key":"S1351324921000401_ref44","author":"Saxena","year":"2020"},{"key":"S1351324921000401_ref47","doi-asserted-by":"crossref","unstructured":"Snow, R. , O\u2019Connor, B. , Jurafsky, D. and Ng, A. (2008). Cheap and fast \u2013 but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii. Association for Computational Linguistics, pp. 254\u2013263.","DOI":"10.3115\/1613715.1613751"},{"key":"S1351324921000401_ref31","doi-asserted-by":"publisher","DOI":"10.1093\/ijl\/ecv023"},{"key":"S1351324921000401_ref27","unstructured":"Lawson, N. , Eustice, K. , Perkowitz, M. and Yetisgen-Yildiz, M. (2010). Annotating large email datasets for named entity recognition with cechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk, Los Angeles. Association for Computational Linguistics, pp. 71\u201379."},{"key":"S1351324921000401_ref19","doi-asserted-by":"crossref","unstructured":"Grace Araneta, M. , Eryigit, G. , K\u00f6nig, A. , Lee, J.-U. , Luis, A.R. , Lyding, V. , Nicolas, L. , Rodosthenous, C. and Sangati, F. (2020). Substituto - A synchronous educational language game for simultaneous teaching and crowdsourcing. In 9th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2020), Gothenburg, Sweden, pp. 1\u20139.","DOI":"10.3384\/ecp201759"},{"key":"S1351324921000401_ref7","unstructured":"Boros, T. and Burtica, R. (2018). GBD-NER at PARSEME shared task 2018: Multi-word expression detection using bidirectional long-short-term memory networks and graph-based decoding. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 254\u2013260."},{"key":"S1351324921000401_ref29","unstructured":"Losnegaard, G.S. , Sangati, F. , Escartn, C.P. , Savary, A. , Bargmann, S. and Monti, J. (2016). PARSEME survey on MWE resources. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916), Portoro\u017d, Slovenia. European Language Resources Association (ELRA), pp. 2299\u20132306."},{"key":"S1351324921000401_ref28","doi-asserted-by":"publisher","DOI":"10.3115\/1118108.1118117"},{"key":"S1351324921000401_ref39","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-demos.14"},{"key":"S1351324921000401_ref30","first-page":"37","article-title":"Crowdsourcing and its application","volume":"14","author":"Mitrovi\u0106","year":"2013","journal-title":"INFOtheca"},{"key":"S1351324921000401_ref48","doi-asserted-by":"publisher","DOI":"10.1016\/j.jml.2005.11.001"},{"key":"S1351324921000401_ref49","first-page":"1","article-title":"Teaching and learning idioms in l2: From theory to practice","volume":"39","author":"Vasiljevic","year":"2015","journal-title":"Mextesol Journal"},{"key":"S1351324921000401_ref50","unstructured":"Vincze, V. , Nagy, T. I. and Berend, G. (2011). Multiword expressions and named entities in the wiki50 corpus. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, Hissar, Bulgaria. Association for Computational Linguistics, pp. 289\u2013295."},{"key":"S1351324921000401_ref53","doi-asserted-by":"publisher","DOI":"10.1145\/1124772.1124784"},{"key":"S1351324921000401_ref8","unstructured":"Caruso, V. , Barbara, B. , Monti, J. and Roberta, P. (2019). How can app design improve lexicographic outcomes? examples from an italian idiom dictionary. In ELEX 2019: SMART LEXICOGRAPHY. Lexical Computing CZ SRO, pp. 374\u2013396."},{"key":"S1351324921000401_ref14","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.figlang-1.29"},{"key":"S1351324921000401_ref54","unstructured":"Yirmibe\u015eo\u011elu, Z. and G\u00fcng\u00f6r, T. (2020). ERMI at PARSEME shared task 2020: Embedding-rich multiword expression identification. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, online. Association for Computational Linguistics, pp. 130\u2013135."},{"key":"S1351324921000401_ref46","doi-asserted-by":"publisher","DOI":"10.1177\/1362168817706842"},{"key":"S1351324921000401_ref52","doi-asserted-by":"publisher","DOI":"10.1145\/985692.985733"},{"key":"S1351324921000401_ref13","unstructured":"Dumitrache, A. , Aroyo, L. , Welty, C. , Sips, R.-J. and Levas, A. (2013). \u201cdr. detective\u201d: Combining gamication techniques and crowdsourcing to create a gold standard in medical text. In Proceedings of the 1st International Conference on Crowdsourcing the Semantic Web - Volume 1030, CrowdSem\u201913, Aachen, DEU. CEUR-WS.org, pp. 16\u201331."},{"key":"S1351324921000401_ref5","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"S1351324921000401_ref15","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00057"},{"key":"S1351324921000401_ref45","unstructured":"Schneider, N. , Onuffer, S. , Kazour, N. , Danchik, E. , Mordowanec, M.T. , Conrad, H. and Smith, N.A. (2014). Comprehensive annotation of multiword expressions in a social web corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914), Reykjavik, Iceland. European Language Resources Association (ELRA), pp. 455\u2013461."},{"key":"S1351324921000401_ref32","doi-asserted-by":"publisher","DOI":"10.1177\/1056492618790921"},{"key":"S1351324921000401_ref37","unstructured":"Palmero Aprosio, A. and Moretti, G. (2016). Italy goes to Stanford: a collection of CoreNLP modules for Italian. arXiv e-prints, arXiv:1609.06204."},{"key":"S1351324921000401_ref12","unstructured":"Cook, P. , Fazly, A. and Stevenson, S. (2008). The vnc-tokens dataset. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pp. 19\u201322."},{"key":"S1351324921000401_ref51","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2006.196"},{"key":"S1351324921000401_ref40","author":"Quoc Viet Hung","year":"2013"},{"key":"S1351324921000401_ref38","doi-asserted-by":"publisher","DOI":"10.1016\/j.bushor.2014.09.005"},{"key":"S1351324921000401_ref17","unstructured":"Fort, K. , Guillaume, B. , Pilatte, Y.-A. , Constant, M. and Lef\u00e8bvre, N. (2020). Rigor mortis: Annotating MWEs with a gamified platform. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France. European Language Resources Association, pp. 4395\u20134401."},{"key":"S1351324921000401_ref33","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijhcs.2017.04.005"},{"key":"S1351324921000401_ref23","unstructured":"Kato, A. , Shindo, H. and Matsumoto, Y. (2018). Construction of large-scale English verbal multiword expression annotated corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA)."},{"key":"S1351324921000401_ref4","unstructured":"Birke, J. and Sarkar, A. (2006). A clustering approach for nearly unsupervised recognition of nonliteral language. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. Association for Computational Linguistics, pp. 329\u2013336."},{"key":"S1351324921000401_ref2","doi-asserted-by":"publisher","DOI":"10.1109\/IV.2009.100"},{"key":"S1351324921000401_ref24","doi-asserted-by":"publisher","DOI":"10.1007\/s10676-016-9401-5"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324921000401","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T08:59:27Z","timestamp":1689757167000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324921000401\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,20]]},"references-count":54,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["S1351324921000401"],"URL":"https:\/\/doi.org\/10.1017\/s1351324921000401","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,20]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}