{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T11:45:59Z","timestamp":1769168759987,"version":"3.49.0"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,8,18]],"date-time":"2020-08-18T00:00:00Z","timestamp":1597708800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"QMUL Research-IT"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2020,11,30]]},"abstract":"<jats:p>The unmoderated nature of social media enables the diffusion of hoaxes, which in turn jeopardises the credibility of information gathered from social media platforms. Existing research on automated detection of hoaxes has the limitation of using relatively small datasets, owing to the difficulty of getting labelled data. This, in turn, has limited research exploring early detection of hoaxes as well as exploring other factors such as the effect of the size of the training data or the use of sliding windows. To mitigate this problem, we introduce a semi-automated method that leverages the Wikidata knowledge base to build large-scale datasets for veracity classification, focusing on celebrity death reports. This enables us to create a dataset with 4,007 reports including over 13M tweets, 15% of which are fake. Experiments using class-specific representations of word embeddings show that we can achieve F1 scores nearing 72% within 10 minutes of the first tweet being posted when we expand the size of the training data following our semi-automated means. Our dataset represents a realistic scenario with a real distribution of true, commemorative, and false stories, which we release for further use as a benchmark in future research.<\/jats:p>","DOI":"10.1145\/3407194","type":"journal-article","created":{"date-parts":[[2020,8,18]],"date-time":"2020-08-18T16:07:10Z","timestamp":1597766830000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Early Detection of Social Media Hoaxes at Scale"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4583-3623","authenticated-orcid":false,"given":"Arkaitz","family":"Zubiaga","sequence":"first","affiliation":[{"name":"Queen Mary University of London, London, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9269-733X","authenticated-orcid":false,"given":"Aiqi","family":"Jiang","sequence":"additional","affiliation":[{"name":"Queen Mary University of London, London, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2020,8,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2019.02.016"},{"key":"e_1_2_1_3_1","first-page":"1137","article-title":"A neural probabilistic language model","author":"Bengio Yoshua","year":"2003","journal-title":"J. Mach. Learn. Res. 3"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201915)","author":"Bowman Samuel R."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/176313.176316"},{"key":"e_1_2_1_6_1","first-page":"15","article-title":"Blogs, Twitter, and breaking news: The produsage of citizen journalism","volume":"80","author":"Bruns Axel","year":"2012","journal-title":"Produs. Theor. Dig. World: Intersect. Aud. Prod. Contemp. Theor."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963500"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1177\/001316446002000104"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1070"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539701398363"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2006"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208409"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics. 3360--3370","author":"Dungs Sebastian","year":"2018"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","volume-title":"Tweets and the Streets: Social Media and Contemporary Activism","author":"Gerbaudo Paolo","DOI":"10.2307\/j.ctt183pdzs"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3358528.3358567"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3201064.3201100"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S19-2147"},{"key":"e_1_2_1_18_1","volume-title":"News Use across Social Media Platforms","author":"Gottfried Jeffrey","year":"2016"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-13734-6_16"},{"key":"e_1_2_1_20_1","first-page":"5","article-title":"Tweets and truth: Journalism as a discipline of collaborative verification","volume":"6","author":"Hermida Alfred","year":"2012","journal-title":"J. Pract."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 2972--2978","author":"Jin Zhiwei","year":"2016"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics. 595--603","author":"Koo Terry","year":"2008"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883085"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0168344"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806416.2806651"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence.","author":"Liu Yang","year":"2018"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence. 3818--3824","author":"Ma Jing","year":"2016"},{"key":"e_1_2_1_30_1","unstructured":"Curtis Daniel MacDougall. 1958. Hoaxes. Vol. 465. Dover Pubns.  Curtis Daniel MacDougall. 1958. Hoaxes. Vol. 465. Dover Pubns."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872518.2890092"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems. 3111--3119","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 9th International AAAI Conference on Web and Social Media.","author":"Mitra Tanushree","year":"2015"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323503.3361698"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145204.2145274"},{"key":"e_1_2_1_36_1","volume-title":"r\/Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. Arxiv Preprint Arxiv:1911.03854","author":"Nakamura Kai","year":"2019"},{"key":"e_1_2_1_37_1","unstructured":"Robert Nares. 1822. A Glossary: Or Collection of Words Phrases Names and Allusions to Customs Proverbs 8c. which Have Been Thought to Require Illustration in the Works of English Authors Particularly Shakespeare and His Contemporaries...R. Triphook London.  Robert Nares. 1822. A Glossary: Or Collection of Words Phrases Names and Allusions to Customs Proverbs 8c. which Have Been Thought to Require Illustration in the Works of English Authors Particularly Shakespeare and His Contemporaries...R. Triphook London."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"13","author":"N\u00f8rregaard Jeppe","year":"2019"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1589--1599","author":"Qazvinian Vahed","year":"2011"},{"key":"e_1_2_1_41_1","series-title":"Reports Series (1997)","volume-title":"Tech","author":"Ratnaparkhi Adwait"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2019.2899143"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"13","author":"Abu Salem Fatima K.","year":"2019"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983697"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1653771.1653781"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-06930-7"},{"key":"e_1_2_1_47_1","volume-title":"FakeNewsNet: A data repository with news content, social context, and spatialtemporal information for studying fake news on social media. Arxiv Preprint Arxiv:1809.01286","author":"Shu Kai","year":"2018"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3137597.3137600"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 2nd Workshop on Data Science for Social Good.","author":"Tacchini Eugenio","year":"2017"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCIS-ISIS.2012.6505254"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1074"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3184558.3188722"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics. 384--394","author":"Turian Joseph","year":"2010"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-30760-8_25"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629489"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2067"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219903"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2350190.2350203"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13278-019-0616-4"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2019.03.004"},{"key":"e_1_2_1_61_1","volume-title":"Fake news early detection: A theory-driven model. Arxiv Preprint Arxiv:1904.11679","author":"Zhou Xinyi","year":"2019"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3291382"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.osnem.2019.100049"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161603"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13278-014-0163-y"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2449396.2449424"},{"key":"e_1_2_1_67_1","volume-title":"Proceedings of the International Conference on Computational Linguistics (COLING\u201916)","author":"Zubiaga Arkaitz","year":"2016"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.11.009"},{"key":"e_1_2_1_69_1","volume-title":"Geraldine Wong Sak Hoi, and Peter Tolmie","author":"Zubiaga Arkaitz","year":"2016"}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3407194","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3407194","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:23Z","timestamp":1750200083000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3407194"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,18]]},"references-count":67,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,11,30]]}},"alternative-id":["10.1145\/3407194"],"URL":"https:\/\/doi.org\/10.1145\/3407194","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"value":"1559-1131","type":"print"},{"value":"1559-114X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,18]]},"assertion":[{"value":"2019-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}