{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:07:11Z","timestamp":1765544831525,"version":"3.37.3"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,5]],"date-time":"2022-01-05T00:00:00Z","timestamp":1641340800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,5]],"date-time":"2022-01-05T00:00:00Z","timestamp":1641340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011033","name":"agencia estatal de investigaci\u00f3n","doi-asserted-by":"publisher","award":["PID2019-107652RB-I00\/AEI\/10.13039\/501100011033"],"award-info":[{"award-number":["PID2019-107652RB-I00\/AEI\/10.13039\/501100011033"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100008049","name":"fundaci\u00f3n banco santander","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008049","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Satirical content on social media is hard to distinguish from real news, misinformation, hoaxes or propaganda when there are no clues as to which medium these news were originally written in. It is important, therefore, to provide Information Retrieval systems with mechanisms to identify which results are legitimate and which ones are misleading. Our contribution for satire identification is twofold. On the one hand, we release the Spanish SatiCorpus 2021, a balanced dataset that contains satirical and non-satirical documents. On the other hand, we conduct an extensive evaluation of this dataset with linguistic features and embedding-based features. All feature sets are evaluated separately and combined using different strategies. Our best result is achieved with a combination of the linguistic features and BERT with an accuracy of 97.405%. Besides, we compare our proposal with existing datasets in Spanish regarding satire and irony.<\/jats:p>","DOI":"10.1007\/s40747-021-00625-1","type":"journal-article","created":{"date-parts":[[2022,1,5]],"date-time":"2022-01-05T09:02:58Z","timestamp":1641373378000},"page":"1723-1736","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Compilation and evaluation of the Spanish SatiCorpus 2021 for satire identification using linguistic features and transformers"],"prefix":"10.1007","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3651-2660","authenticated-orcid":false,"given":"Jos\u00e9 Antonio","family":"Garc\u00eda-D\u00edaz","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2457-1791","authenticated-orcid":false,"given":"Rafael","family":"Valencia-Garc\u00eda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,5]]},"reference":[{"key":"625_CR1","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1007\/978-3-030-66046-8_38","volume-title":"International conference on computational data and social networks","author":"A Al Imran","year":"2020","unstructured":"Al Imran A, Wahid Z, Ahmed T (2020) Bnnet: a deep neural network for the identification of satire and fake Bangla news. International conference on computational data and social networks. Springer, Berlin, pp 464\u2013475"},{"issue":"11","key":"625_CR2","doi-asserted-by":"publisher","first-page":"2075","DOI":"10.3390\/math8112075","volume":"8","author":"\u00d3 Apolinario-Arzube","year":"2020","unstructured":"Apolinario-Arzube \u00d3, Garc\u00eda-D\u00edaz JA, Medina-Moreira J, Luna-Aveiga H, Valencia-Garc\u00eda R (2020) Comparing deep-learning architectures and traditional machine-learning approaches for satire identification in Spanish tweets. Mathematics 8(11):2075","journal-title":"Mathematics"},{"key":"625_CR3","first-page":"135","volume":"55","author":"F Barbieri","year":"2015","unstructured":"Barbieri F, Ronzano F, Saggion H (2015) Is this tweet satirical? A computational approach for satire detection in Spanish. Procesam del Leng Nat 55:135\u2013142","journal-title":"Procesam del Leng Nat"},{"key":"625_CR4","unstructured":"Ca\u00f1ete J, Chaperon G, Fuentes R, Ho JH, Kang H, P\u00e9rez J (2020) Spanish pre-trained bert model and evaluation data. In: PML4DC at ICLR 2020"},{"key":"625_CR5","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4171\u20134186"},{"key":"625_CR6","first-page":"139","volume":"65","author":"JA Garc\u00eda-D\u00edaz","year":"2020","unstructured":"Garc\u00eda-D\u00edaz JA, Almela \u00c1, Alcaraz-M\u00e1rmol G, Valencia-Garc\u00eda R (2020) Umucorpusclassifier: compilation and evaluation of linguistic corpus for natural language processing tasks. Procesa del Leng Nat 65:139\u2013142","journal-title":"Procesa del Leng Nat"},{"key":"625_CR7","doi-asserted-by":"publisher","first-page":"506","DOI":"10.1016\/j.future.2020.08.032","volume":"114","author":"JA Garc\u00eda-D\u00edaz","year":"2021","unstructured":"Garc\u00eda-D\u00edaz JA, C\u00e1novas-Garc\u00eda M, Palacios RC, Valencia-Garc\u00eda R (2021) Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings. Future Gener Comput Syst 114:506\u2013518. https:\/\/doi.org\/10.1016\/j.future.2020.08.032","journal-title":"Future Gener Comput Syst"},{"key":"625_CR8","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1016\/j.future.2020.06.019","volume":"112","author":"JA Garc\u00eda-D\u00edaz","year":"2020","unstructured":"Garc\u00eda-D\u00edaz JA, C\u00e1novas-Garc\u00eda M, Valencia-Garc\u00eda R (2020) Ontology-driven aspect-based sentiment analysis classification: an infodemiological case study regarding infectious diseases in Latin America. Future Gener Comput Syst 112:641\u2013657. https:\/\/doi.org\/10.1016\/j.future.2020.06.019","journal-title":"Future Gener Comput Syst"},{"key":"625_CR9","first-page":"141","volume":"12036","author":"B Ghanem","year":"2020","unstructured":"Ghanem B, Karoui J, Benamara F, Rosso P, Moriceau V (2020) Irony detection in a multilingual context. Adv Inf Retr 12036:141","journal-title":"Adv Inf Retr"},{"key":"625_CR10","unstructured":"Golazizian P, Sabeti B, Asli SAA, Majdabadi Z, Momenzadeh O, Fahmi R (2020) Irony detection in persian language: a transfer learning approach using emoji prediction. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp 2839\u20132845"},{"key":"625_CR11","doi-asserted-by":"crossref","unstructured":"Krasnowska-Kiera\u015b K, Wr\u00f3blewska A (2019) Empirical linguistic study of sentence embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 5729\u20135739","DOI":"10.18653\/v1\/P19-1573"},{"key":"625_CR12","unstructured":"Li L, Levi O, Hosseini P, Broniatowski DA (2020) A multi-modal method for satire detection using textual and visual cues. arXiv preprint arXiv:2010.06671"},{"key":"625_CR13","unstructured":"Libovick\u1ef3 J, Rosa R, Fraser A (2019) How language-neutral is multilingual bert? arXiv preprint arXiv:1911.03310"},{"key":"625_CR14","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781"},{"key":"625_CR15","unstructured":"Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2018) Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)"},{"issue":"2","key":"625_CR16","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.3906\/elk-1907-11","volume":"28","author":"A Onan","year":"2020","unstructured":"Onan A, To\u00e7o\u011flu MA (2020) Satire identification in Turkish news articles based on ensemble of classifiers. Turk J Electr Eng Comput Sci 28(2):1086\u20131106","journal-title":"Turk J Electr Eng Comput Sci"},{"key":"625_CR17","doi-asserted-by":"publisher","first-page":"7701","DOI":"10.1109\/ACCESS.2021.3049734","volume":"9","author":"A Onan","year":"2021","unstructured":"Onan A, To\u00e7o\u011flu MA (2021) A term weighted neural language model and stacked bidirectional lstm based framework for sarcasm identification. IEEE Access 9:7701\u20137722","journal-title":"IEEE Access"},{"key":"625_CR18","first-page":"229","volume":"2421","author":"R Ortega-Bueno","year":"2019","unstructured":"Ortega-Bueno R, Rangel F, Hern\u00e1ndez Far\u0131as D, Rosso P, Montes-y G\u00f3mez M, Medina Pagola JE (2019) Overview of the task on irony detection in Spanish variants. Proc Iber Lang Eval Forum 2421:229\u2013256","journal-title":"Proc Iber Lang Eval Forum"},{"key":"625_CR19","doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532\u20131543","DOI":"10.3115\/v1\/D14-1162"},{"issue":"6","key":"625_CR20","doi-asserted-by":"publisher","first-page":"2105","DOI":"10.1007\/s10115-019-01425-3","volume":"62","author":"M del Pilar Salas-Z\u00e1rate","year":"2020","unstructured":"del Pilar Salas-Z\u00e1rate M, Alor-Hern\u00e1ndez G, S\u00e1nchez-Cervantes JL, Paredes-Valverde MA, Garc\u00eda-Alcaraz JL, Valencia-Garc\u00eda R (2020) Review of English literature on figurative language applied to social networks. Knowl Inf Syst 62(6):2105\u20132137. https:\/\/doi.org\/10.1007\/s10115-019-01425-3","journal-title":"Knowl Inf Syst"},{"key":"625_CR21","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1016\/j.knosys.2017.04.009","volume":"128","author":"M del Pilar Salas-Z\u00e1rate","year":"2017","unstructured":"del Pilar Salas-Z\u00e1rate M, Paredes-Valverde MA, Rodr\u00edguez-Garc\u00eda M\u00c1, Valencia-Garc\u00eda R, Alor-Hern\u00e1ndez G (2017) Automatic detection of satire in twitter: a psycholinguistic-based approach. Knowl Based Syst 128:20\u201333. https:\/\/doi.org\/10.1016\/j.knosys.2017.04.009","journal-title":"Knowl Based Syst"},{"key":"625_CR22","doi-asserted-by":"crossref","unstructured":"Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD (2020) Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations, pp 101\u2013108","DOI":"10.18653\/v1\/2020.acl-demos.14"},{"key":"625_CR23","doi-asserted-by":"crossref","unstructured":"Reimers N, Gurevych I, Reimers N, Gurevych I, Thakur N, Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-Networks. In: Proceedings of the 2019 Conference on empirical methods in natural language processing. Association for computational linguistics","DOI":"10.18653\/v1\/D19-1410"},{"key":"625_CR24","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv arXiv:abs\/1910.01108"},{"issue":"1","key":"625_CR25","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1177\/0261927X09351676","volume":"29","author":"YR Tausczik","year":"2010","unstructured":"Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: Liwc and computerized text analysis methods. J Lang Soc Psychol 29(1):24\u201354","journal-title":"J Lang Soc Psychol"},{"key":"625_CR26","unstructured":"Virtanen A, Kanerva J, Ilo R, Luoma J, Luotolahti J, Salakoski T, Ginter F, Pyysalo S (2019) Multilingual is not enough: Bert for finnish. arXiv preprint arXiv:1912.07076"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00625-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00625-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00625-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T17:16:29Z","timestamp":1651252589000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00625-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,5]]},"references-count":26,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["625"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00625-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2022,1,5]]},"assertion":[{"value":"23 July 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}