{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T04:59:25Z","timestamp":1773723565048,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,12,14]],"date-time":"2021-12-14T00:00:00Z","timestamp":1639440000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"DFG Grant Managed Forgetting","award":["NI-1760\/1-1"],"award-info":[{"award-number":["NI-1760\/1-1"]}]},{"name":"European Union?s Horizon 2020 research and innovation program - ROXANNE","award":["833635"],"award-info":[{"award-number":["833635"]}]},{"name":"European Union?s Horizon 2020 research and innovation program - MIRROR","award":["832921"],"award-info":[{"award-number":["832921"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,14]]},"DOI":"10.1145\/3486622.3493960","type":"proceedings-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T01:18:53Z","timestamp":1649899133000},"page":"210-217","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["On the Impact of Dataset Size:A Twitter Classification Case Study"],"prefix":"10.1145","author":[{"given":"Thi Huyen","family":"Nguyen","sequence":"first","affiliation":[{"name":"L3S Research Center, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hoang H.","family":"Nguyen","sequence":"additional","affiliation":[{"name":"L3S Research Center, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zahra","family":"Ahmadi","sequence":"additional","affiliation":[{"name":"L3S Research Center, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tuan-Anh","family":"Hoang","sequence":"additional","affiliation":[{"name":"VNU University of Science, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thanh-Nam","family":"Doan","sequence":"additional","affiliation":[{"name":"University of Tennessee at Chattanooga, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,4,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.26615\/issn.2603-2821.2019_002"},{"key":"e_1_3_2_1_2_1","unstructured":"Ahmed Sulaiman\u00a0M Alharbi and Elise de Doncker. 2019. Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information. Cognitive Systems Research(2019)."},{"key":"e_1_3_2_1_3_1","volume-title":"Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Applied Sciences","author":"Althnian Alhanoof","year":"2021","unstructured":"Alhanoof Althnian, Duaa AlSaeed, Heyam Al-Baity, Amani Samha, Alanoud\u00a0Bin Dris, Najla Alzakari, Afnan\u00a0Abou Elwafa, and Heba Kurdi. 2021. Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Applied Sciences (2021)."},{"key":"e_1_3_2_1_4_1","unstructured":"Alessio Benavoli Giorgio Corani and Francesca Mangili. 2016. Should we really use post-hoc tests based on mean-ranks?Machine Learning Research(2016)."},{"key":"e_1_3_2_1_5_1","unstructured":"Cody Buntain Jennifer Golbeck Brooke Liu and Gary LaFree. 2016. Evaluating Public Response to the Boston Marathon Bombing and Other Acts of Terrorism through Twitter.. In ICWSM."},{"key":"e_1_3_2_1_6_1","volume-title":"Multi-class machine classification of suicide-related communication on Twitter. Online social networks and media 2","author":"Burnap Pete","year":"2017","unstructured":"Pete Burnap, Gualtiero Colombo, Rosie Amery, Andrei Hodorog, and Jonathan Scourfield. 2017. Multi-class machine classification of suicide-related communication on Twitter. Online social networks and media 2 (2017), 32\u201344."},{"key":"e_1_3_2_1_7_1","unstructured":"Junghwan Cho Kyewook Lee Ellie Shin Garry Choy and Synho Do. 2015. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?. In arXiv preprint arXiv:1511.06348."},{"key":"e_1_3_2_1_8_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.","author":"Delvin Jacob","year":"2019","unstructured":"Jacob Delvin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Nicholas\u00a0A Diakopoulos and David\u00a0A Shamma. 2010. Characterizing debate performance via aggregated twitter sentiment. In SIGCHI.","DOI":"10.1145\/1753326.1753504"},{"key":"e_1_3_2_1_10_1","unstructured":"Tobias Domhan Jost\u00a0Tobias Springenberg and Frank Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In IJCAI."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Rosa\u00a0L Figueroa Qing Zeng-Treitler Sasikiran Kandula and Long\u00a0H Ngo. 2012. Predicting sample size required for classification performance. BMC medical informatics and decision making(2012).","DOI":"10.1186\/1472-6947-12-8"},{"key":"e_1_3_2_1_12_1","volume-title":"Seventh International Workshop on Artificial Intelligence and Statistics. PMLR.","author":"Frey J","year":"1999","unstructured":"Lewis\u00a0J Frey and Douglas\u00a0H Fisher. 1999. Modeling decision tree performance with the power law. In Seventh International Workshop on Artificial Intelligence and Statistics. PMLR."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Milton Friedman. 1940. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics(1940).","DOI":"10.1214\/aoms\/1177731944"},{"key":"e_1_3_2_1_14_1","unstructured":"Alec Go Richa Bhayani and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N project report Stanford(2009)."},{"key":"e_1_3_2_1_15_1","volume-title":"LSTM: A search space odyssey","author":"Greff Klaus","year":"2016","unstructured":"Klaus Greff, Rupesh\u00a0K Srivastava, Jan Koutn\u00edk, Bas\u00a0R Steunebrink, and J\u00fcrgen Schmidhuber. 2016. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems (2016)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-47714-4_29"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"M.A. Hearst S.T. Dumais E. Osuna J. Platt and B. Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and their applications (1998).","DOI":"10.1109\/5254.708428"},{"key":"e_1_3_2_1_18_1","unstructured":"Joel Hestness Sharan Narang Newsha Ardalani Gregory Diamos Heewoo Jun Hassan Kianinejad Md Patwary Mostofa Ali Yang Yang and Yanqi Zhou. 2017. Deep learning scaling is predictable empirically. In arXiv preprint arXiv:1712.00409."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Tuan-Anh Hoang Thi\u00a0Huyen Nguyen and Wolfgang Nejdl. 2019. Efficient Tracking of Breaking News in Twitter. In WebSci.","DOI":"10.1145\/3292522.3326058"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Mark Johnson Peter Anderson Mark Dras and Mark Steedman. 2018. Predicting accuracy on large datasets from smaller pilot data. In ACL.","DOI":"10.18653\/v1\/P18-2072"},{"key":"e_1_3_2_1_21_1","volume-title":"Adam: A method for stochastic optimization. In ICLR.","author":"Kingma Diederik","year":"2015","unstructured":"Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR."},{"key":"e_1_3_2_1_22_1","unstructured":"Prasanth Kolachina Nicola Cancedda Marc Dymetman and Sriram Venkatapathy. 2012. Prediction of learning curves in machine translation. In ACL."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"David\u00a0D. Lewis. 1998. The independence assumption in information retrieval. In ECML.","DOI":"10.1007\/BFb0026666"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Trond Linjordet and Krisztian Balog. 2019. Impact of Training Dataset Size on Neural Answer Selection Models. In ECIR.","DOI":"10.1007\/978-3-030-15712-8_59"},{"key":"e_1_3_2_1_25_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg\u00a0S. Corrado and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NeurIPS."},{"key":"e_1_3_2_1_26_1","unstructured":"Sheeba Naz Aditi Sharan and Nidhi Malik. 2018. Sentiment classification on twitter data using support vector machine. In WI."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Dat\u00a0Quoc Nguyen Thanh Vu and Anh\u00a0Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. In EMNLP: System Demonstrations.","DOI":"10.18653\/v1\/2020.emnlp-demos.2"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.wnut-1.41"},{"key":"e_1_3_2_1_29_1","volume-title":"Shafiq Joty, Hassan Sajjad, Muhammad Imran, and Prasenjit Mitra.","author":"Nguyen Dat\u00a0Tien","year":"2017","unstructured":"Dat\u00a0Tien Nguyen, Kamela Ali\u00a0Al Mannai, Shafiq Joty, Hassan Sajjad, Muhammad Imran, and Prasenjit Mitra. 2017. Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks. In ICWSM."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Thi\u00a0Huyen Nguyen Tuan-Anh Hoang and Wolfgang Nejdl. 2019. Efficient Summarizing of Evolving Events from Twitter Streams. In SDM.","DOI":"10.1137\/1.9781611975673.26"},{"key":"e_1_3_2_1_31_1","volume-title":"Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In ICWSM.","author":"Olteanu Alexandra","year":"2014","unstructured":"Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2014. Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In ICWSM."},{"key":"e_1_3_2_1_32_1","volume-title":"The Effect of Dataset Size on Training Tweet Sentiment Classifiers. ICMLA","author":"Prusa Joseph","year":"2015","unstructured":"Joseph Prusa, Taghi\u00a0M. Khoshgoftaar, and Naeem Seliya. 2015. The Effect of Dataset Size on Training Tweet Sentiment Classifiers. ICMLA (2015)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijdrr.2018.03.002"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Takeshi Sakaki Makoto Okazaki and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. In TheWebConf.","DOI":"10.1145\/1772690.1772777"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"John Shawe-Taylor Martin Anthony and N.L.Biggs. 1993. Bounding sample size with the Vapnik-Chervonenkis dimension. Discrete Applied Mathematics(1993).","DOI":"10.1016\/0166-218X(93)90179-R"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Bing Xiang and Liang Zhou. 2014. Improving twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In ACL.","DOI":"10.3115\/v1\/P14-2071"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3185045"}],"event":{"name":"WI-IAT '21: IEEE\/WIC\/ACM International Conference on Web Intelligence","location":"Melbourne VIC Australia","acronym":"WI-IAT '21","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence"]},"container-title":["IEEE\/WIC\/ACM International Conference on Web Intelligence and Intelligent Agent Technology"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3486622.3493960","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3486622.3493960","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:05Z","timestamp":1750191125000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3486622.3493960"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,14]]},"references-count":37,"alternative-id":["10.1145\/3486622.3493960","10.1145\/3486622"],"URL":"https:\/\/doi.org\/10.1145\/3486622.3493960","relation":{},"subject":[],"published":{"date-parts":[[2021,12,14]]},"assertion":[{"value":"2022-04-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}