{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T10:11:47Z","timestamp":1781086307206,"version":"3.54.1"},"publisher-location":"Cham","reference-count":30,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031657931","type":"print"},{"value":"9783031657948","type":"electronic"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T00:00:00Z","timestamp":1723680000000},"content-version":"vor","delay-in-days":227,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Detecting cite-worthiness in text is seen as the problem of flagging a missing reference to a scientific result (an article or a dataset) that should come to support a claim formulated in the text. Previous work has taken interest in this problem in the context of scientific literature, motivated by the need to allow for reference recommendation for researchers and flag missing citations in scientific work. In this preliminary study, we extend this idea towards the context of social media. As scientific claims are often made to support various arguments in societal debates on the Web, it is crucial to flag non-referenced or unsupported claims that relate to science, as this promises to contribute to improving the quality of the debates online. We experiment with baseline models, initially tested on scientific literature, by applying them on the SciTweets dataset which gathers science-related claims from X. We show that models trained on scientific papers struggle to detect cite-worthy text from X, we discuss implications of such results and argue for the necessity to train models on social media corpora for satisfactory flagging of missing references on social media. We make our data publicly available to encourage further research on cite-worthiness detection on social media.<\/jats:p>","DOI":"10.1007\/978-3-031-65794-8_2","type":"book-chapter","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:02:44Z","timestamp":1723615364000},"page":"19-30","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Cite-worthiness Detection on\u00a0Social Media: A Preliminary Study"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1775-8542","authenticated-orcid":false,"given":"Salim","family":"Hafid","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wassim","family":"Ammar","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2830-3666","authenticated-orcid":false,"given":"Sandra","family":"Bringay","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9116-6692","authenticated-orcid":false,"given":"Konstantin","family":"Todorov","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,8,15]]},"reference":[{"key":"2_CR1","doi-asserted-by":"crossref","unstructured":"Wright, D., Augenstein, I.: CiteWorth: cite-worthiness detection for improved scientific document understanding. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1796\u20131807 (2021)","DOI":"10.18653\/v1\/2021.findings-acl.157"},{"key":"2_CR2","doi-asserted-by":"crossref","unstructured":"Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615\u20133620 (2019)","DOI":"10.18653\/v1\/D19-1371"},{"key":"2_CR3","doi-asserted-by":"crossref","unstructured":"Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S.: SciTweets-a dataset and annotation framework for detecting scientific online discourse. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 3988\u20133992 (2022)","DOI":"10.1145\/3511808.3557693"},{"key":"2_CR4","unstructured":"Beltagy, I., Peters, M.E., Cohan, A.: LongFormer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)"},{"key":"2_CR5","doi-asserted-by":"crossref","unstructured":"Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.: S2ORC: the semantic scholar open research corpus. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4969\u20134983 (2020)","DOI":"10.18653\/v1\/2020.acl-main.447"},{"key":"2_CR6","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"598","DOI":"10.1007\/978-3-319-76941-7_50","volume-title":"Advances in Information Retrieval","author":"M F\u00e4rber","year":"2018","unstructured":"F\u00e4rber, M., Thiemann, A., Jatowt, A.: To cite, or not to cite? Detecting citation contexts in text. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 598\u2013603. Springer, Cham (2018). https:\/\/doi.org\/10.1007\/978-3-319-76941-7_50"},{"issue":"1","key":"2_CR7","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1177\/0963662520957252","volume":"30","author":"M Della Giusta","year":"2021","unstructured":"Della Giusta, M., Jaworska, S., Vukadinovi\u0107 Greetham, D.: Expert communication on Twitter: comparing economists\u2019 and scientists\u2019 social networks, topics and communicative styles. Public Underst. Sci. 30(1), 75\u201390 (2021)","journal-title":"Public Underst. Sci."},{"key":"2_CR8","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1007\/s10619-010-7077-0","volume":"29","author":"ST Moturu","year":"2011","unstructured":"Moturu, S.T., Liu, H.: Quantifying the trustworthiness of social media content. Distrib. Parallel Databases 29, 239\u2013260 (2011)","journal-title":"Distrib. Parallel Databases"},{"key":"2_CR9","unstructured":"Sundriyal, M., Akhtar, M.S., Chakraborty, T.: Leveraging social discourse to measure check-worthiness of claims for fact-checking. arXiv preprint arXiv:2309.09274 (2023)"},{"key":"2_CR10","unstructured":"Satapara, S., Mehta, P., Ganguly, D., Modha, S.: Fighting fire with fire: adversarial prompting to generate a misinformation detection dataset. arXiv preprint arXiv:2401.04481 (2024)"},{"issue":"4","key":"2_CR11","doi-asserted-by":"publisher","first-page":"e2012","DOI":"10.2196\/jmir.2012","volume":"13","author":"G Eysenbach","year":"2011","unstructured":"Eysenbach, G.: Can tweets predict citations? Metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact. J. Med. Internet Res. 13(4), e2012 (2011)","journal-title":"J. Med. Internet Res."},{"key":"2_CR12","doi-asserted-by":"crossref","unstructured":"Jain, N., Singh, M.: TweetPap: a dataset to study the social media discourse of scientific papers. In: 2021 ACM\/IEEE Joint Conference on Digital Libraries (JCDL), pp. 328\u2013329. IEEE (2021)","DOI":"10.1109\/JCDL52503.2021.00055"},{"key":"2_CR13","doi-asserted-by":"crossref","unstructured":"August, T., Card, D., Hsieh, G., Smith, N.A., Reinecke, K.: Explain like I am a scientist: the linguistic barriers of entry to r\/science. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1\u201312 (2020)","DOI":"10.1145\/3313831.3376524"},{"key":"2_CR14","doi-asserted-by":"crossref","unstructured":"Chandrasekharan, E., et al.: The Internet\u2019s hidden rules: an empirical study of Reddit norm violations at micro, meso, and macro scales. In: Proceedings of the ACM on Human-Computer Interaction, vol. 2, no. CSCW, pp. 1\u201325 (2018)","DOI":"10.1145\/3274301"},{"issue":"2","key":"2_CR15","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1257\/jep.31.2.211","volume":"31","author":"H Allcott","year":"2017","unstructured":"Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211\u2013236 (2017)","journal-title":"J. Econ. Perspect."},{"issue":"6380","key":"2_CR16","doi-asserted-by":"publisher","first-page":"1146","DOI":"10.1126\/science.aap9559","volume":"359","author":"S Vosoughi","year":"2018","unstructured":"Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146\u20131151 (2018)","journal-title":"Science"},{"issue":"1","key":"2_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3140565","volume":"1","author":"K Garimella","year":"2018","unstructured":"Garimella, K., Morales, G.D.F., Gionis, A., Mathioudakis, M.: Quantifying controversy on social media. ACM Trans. Soc. Comput. 1(1), 1\u201327 (2018)","journal-title":"ACM Trans. Soc. Comput."},{"issue":"2","key":"2_CR18","first-page":"125","volume":"3","author":"V De Semir","year":"2000","unstructured":"De Semir, V.: Scientific journalism: problems and perspectives. Int. Microbiol. 3(2), 125\u2013128 (2000)","journal-title":"Int. Microbiol."},{"key":"2_CR19","doi-asserted-by":"crossref","unstructured":"Dunwoody, S.: Science journalism: prospects in the digital age. In: Routledge Handbook of Public Communication of Science and Technology, pp. 14\u201332. Routledge (2021)","DOI":"10.4324\/9781003039242-2-2"},{"key":"2_CR20","unstructured":"Arnold, P.: The challenges of online fact checking. Technical report, Full Fact (2020)"},{"key":"2_CR21","doi-asserted-by":"publisher","first-page":"960","DOI":"10.1016\/j.joi.2018.08.002","volume":"12","author":"F Didegah","year":"2018","unstructured":"Didegah, F., Mejlgaard, N., S\u00f8rensen, M.P.: Investigating the quality of interactions and public engagement around scientific papers on Twitter. J. Informet. 12, 960\u2013971 (2018)","journal-title":"J. Informet."},{"key":"2_CR22","doi-asserted-by":"crossref","unstructured":"Liu, Y., Whitfield, C., Zhang, T., Hauser, A., Reynolds, T., Anwar, M.: Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning. Health Inf. Sci. Syst. 9 (2021)","DOI":"10.1007\/s13755-021-00158-4"},{"key":"2_CR23","doi-asserted-by":"crossref","unstructured":"Raza, H., Faizan, M., Hamza, A., Mushtaq, A., Akhtar, N.: Scientific text sentiment analysis using machine learning techniques. Int. J. Adv. Comput. Sci. Appl. (2019)","DOI":"10.14569\/IJACSA.2019.0101222"},{"key":"2_CR24","doi-asserted-by":"crossref","unstructured":"Sugiyama, K., Kumar, T., Kan, M., Tripathi, R.C.: Identifying citing sentences in research papers using supervised learning. In: 2010 International Conference on Information Retrieval & Knowledge Management (CAMP), pp. 67\u201372 (2010)","DOI":"10.1109\/INFRKM.2010.5466945"},{"key":"2_CR25","unstructured":"Bird, S., et al.: The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics. In: International Conference on Language Resources and Evaluation (2008)"},{"key":"2_CR26","unstructured":"F\u00e4rber, M., Thiemann, A., Jatowt, A.: A high-quality gold standard for citation-based tasks. In: International Conference on Language Resources and Evaluation (2018)"},{"key":"2_CR27","doi-asserted-by":"crossref","unstructured":"Alperin, J.P., Fleerackers, A., Riedlinger, M., Haustein, S.: Second-order citations in altmetrics: a case study analyzing the audiences of COVID-19 research in the news and on social media. Quant. Sci. Stud. 1\u201328 (2024)","DOI":"10.1101\/2023.04.05.535734"},{"key":"2_CR28","unstructured":"Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! Lab task 1 on identifying relevant claims in tweets. In: 2022 Conference and Labs of the Evaluation Forum, CLEF 2022, pp. 368\u2013392. CEUR Workshop Proceedings. CEUR-WS.org (2022)"},{"key":"2_CR29","unstructured":"Alam, F., et al.: Overview of the CLEF-2023 CheckThat! Lab task 1 on check-worthiness in multimodal and multigenre content. In: Working Notes of CLEF (2023)"},{"issue":"3","key":"2_CR30","doi-asserted-by":"publisher","first-page":"2519","DOI":"10.1007\/s11192-020-03564-9","volume":"124","author":"Z Fang","year":"2020","unstructured":"Fang, Z., Costas, R., Tian, W., Wang, X., Wouters, P.: An extensive analysis of the presence of altmetric data for Web of science publications across subject fields and research topics. Scientometrics 124(3), 2519\u20132549 (2020)","journal-title":"Scientometrics"}],"container-title":["Lecture Notes in Computer Science","Natural Scientific Language Processing and Research Knowledge Graphs"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-65794-8_2","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:03:10Z","timestamp":1723615390000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-65794-8_2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031657931","9783031657948"],"references-count":30,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-65794-8_2","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"15 August 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"NSLP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Hersonissos, Crete","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Greece","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 May 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"nslp2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/nfdi4ds.github.io\/nslp2024\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}