{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T05:18:50Z","timestamp":1776316730197,"version":"3.50.1"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T00:00:00Z","timestamp":1723593600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T00:00:00Z","timestamp":1723593600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","award":["883121"],"award-info":[{"award-number":["883121"]}],"id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","award":["883121"],"award-info":[{"award-number":["883121"]}],"id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","award":["883121"],"award-info":[{"award-number":["883121"]}],"id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006447","name":"University of Zurich","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006447","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Soc. Netw. Anal. Min."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to create a random sample of Twitter users in the US: <jats:italic>1% Stream<\/jats:italic>, <jats:italic>Bounding Box<\/jats:italic>, <jats:italic>Location Query<\/jats:italic>, and <jats:italic>Language Query<\/jats:italic>. Then, we compare these methods according to their tweet- and user-level metrics as well as their accuracy in estimating the US population. Our results show that users collected by the <jats:italic>1% Stream<\/jats:italic> method tend to have more tweets, tweets per day, followers, and friends, a fewer number of likes, are younger accounts, and include more male users compared to the other three methods. Moreover, it achieves the minimum error in estimating the US population. However, the <jats:italic>1% Stream<\/jats:italic> method is time-consuming, cannot be used for the past time frames, and is not suitable when user engagement is part of the study. In situation where these three drawbacks are important, our results support the <jats:italic>Bounding Box<\/jats:italic> method as the second-best method.<\/jats:p>","DOI":"10.1007\/s13278-024-01327-5","type":"journal-article","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T09:02:16Z","timestamp":1723626136000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Comparing methods for creating a national random sample of twitter users"],"prefix":"10.1007","volume":"14","author":[{"given":"Meysam","family":"Alizadeh","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Darya","family":"Zare","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zeynab","family":"Samei","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammadamin","family":"Alizadeh","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mael","family":"Kubli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammadhadi","family":"Aliahmadi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sarvenaz","family":"Ebrahimi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabrizio","family":"Gilardi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,8,14]]},"reference":[{"key":"1327_CR1","doi-asserted-by":"crossref","unstructured":"Alizadeh M, Cioffi-Revilla C (2014). Distributions of opinion and extremist radicalization: insights from agent-based modeling. In: Social Informatics: 6th international conference, SocInfo 2014, Barcelona, Spain, November 11\u201313, 2014. proceedings 6. Springer, pp 348\u2013358","DOI":"10.1007\/978-3-319-13734-6_26"},{"issue":"1","key":"1327_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1140\/epjds\/s13688-019-0193-9","volume":"8","author":"M Alizadeh","year":"2019","unstructured":"Alizadeh M, Weber I, Cioffi-Revilla C, Fortunato S, Macy M (2019) Psychology and morality of political extremists: evidence from Twitter language analysis of alt-right and Antifa. EPJ Data Sci 8(1):1\u201335","journal-title":"EPJ Data Sci"},{"key":"1327_CR3","doi-asserted-by":"crossref","unstructured":"Alizadeh M, Lewis M, Zarandi MHF, Jolai F (2011) Determining significant parameters in the design of ANFIS. In: 2011 Annual meeting of the North American fuzzy information processing society. IEEE, pp 1\u20136","DOI":"10.1109\/NAFIPS.2011.5751958"},{"issue":"30","key":"1327_CR4","doi-asserted-by":"publisher","first-page":"eabb5824","DOI":"10.1126\/sciadv.abb5824","volume":"6","author":"M Alizadeh","year":"2020","unstructured":"Alizadeh M, Shapiro JN, Buntain C, Tucker JA (2020) Content-based features predict social media influence operations. Sci Adv 6(30):eabb5824","journal-title":"Sci Adv"},{"key":"1327_CR5","unstructured":"Alizadeh M, Kubli M, Samei Z, Dehghani S, Bermeo J.\u00a0D, Korobeynikova M, Gilardi F (2023) Open-source large language models outperform crowd workers and approach ChatGPT in text-annotation tasks. arXiv:2307.02179"},{"issue":"4","key":"1327_CR6","doi-asserted-by":"publisher","first-page":"883","DOI":"10.1017\/S0003055419000352","volume":"113","author":"P Barber\u00e1","year":"2019","unstructured":"Barber\u00e1 P, Casas A, Nagler J, Egan PJ, Bonneau R, Jost JT, Tucker JA (2019) Who leads? Who follows? Measuring issue attention and agenda setting by legislators and the mass public using social media data. Am Polit Sci Rev 113(4):883\u2013901","journal-title":"Am Polit Sci Rev"},{"key":"1327_CR7","first-page":"1","volume":"1","author":"C Barrie","year":"2021","unstructured":"Barrie C, Siegel AA (2021) Kingdom of trolls? Influence operations in the Saudi Twittersphere. J Quantitat Descr 1:1\u201341","journal-title":"J Quantitat Descr"},{"issue":"4","key":"1327_CR8","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1007\/s41060-021-00298-6","volume":"13","author":"V Batzdorfer","year":"2022","unstructured":"Batzdorfer V, Steinmetz H, Biella M, Alizadeh M (2022) Conspiracy theories on Twitter: emerging motifs and temporal dynamics during the COVID-19 pandemic. Int J Data Sci Anal 13(4):315\u2013333","journal-title":"Int J Data Sci Anal"},{"issue":"2","key":"1327_CR9","doi-asserted-by":"publisher","first-page":"388","DOI":"10.5117\/CCR2022.2.002.BOES","volume":"4","author":"L Boeschoten","year":"2022","unstructured":"Boeschoten L, Ausloos J, M\u00f6ller JE, Araujo T, Oberski DL (2022) A framework for privacy preserving digital trace data collection through data donation. Comput Commun Res 4(2):388\u2013423","journal-title":"Comput Commun Res"},{"key":"1327_CR10","unstructured":"Cerina R, Duch R (2023) Artificially intelligent opinion polling. arXiv:2309.06029"},{"key":"1327_CR11","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1609\/icwsm.v4i1.14024","volume":"4","author":"M De Choudhury","year":"2010","unstructured":"De Choudhury M, Lin Y-R, Sundaram H, Candan KS, Xie L, Kelliher A (2010) How does the data sampling strategy impact the discovery of information diffusion in social media? Proc Int AAAI Conf Web Soc Media 4:34\u201341","journal-title":"Proc Int AAAI Conf Web Soc Media"},{"issue":"6","key":"1327_CR12","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1177\/0894439313493979","volume":"31","author":"D Gayo-Avello","year":"2013","unstructured":"Gayo-Avello D (2013) A meta-analysis of state-of-the-art electoral prediction from Twitter data. Soc Sci Comput Rev 31(6):649\u2013679","journal-title":"Soc Sci Comput Rev"},{"key":"1327_CR13","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.socnet.2014.01.004","volume":"38","author":"S Gonz\u00e1lez-Bail\u00f3n","year":"2014","unstructured":"Gonz\u00e1lez-Bail\u00f3n S, Wang N, Rivero A, Borge-Holthoefer J, Moreno Y (2014) Assessing the bias in samples of large online networks. Soc Netw 38:16\u201327","journal-title":"Soc Netw"},{"key":"1327_CR14","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1016\/j.ijinfomgt.2019.01.019","volume":"48","author":"A Hino","year":"2019","unstructured":"Hino A, Fahey RA (2019) Representing the Twittersphere: archiving a representative sample of Twitter data under resource constraints. Int J Inf Manag 48:175\u2013184","journal-title":"Int J Inf Manag"},{"key":"1327_CR15","doi-asserted-by":"crossref","unstructured":"Joseph K, Landwehr PM, Carley, KM (2014) Two 1% s don\u2019t make a whole: Comparing simultaneous samples from Twitter\u2019s streaming API. In: International conference on social computing, behavioral-cultural modeling, and prediction. Springer, pp 75\u201383","DOI":"10.1007\/978-3-319-05579-4_10"},{"key":"1327_CR16","doi-asserted-by":"crossref","unstructured":"Jungherr A, J\u00fcrgens P, Schoen H (2012) Why the pirate party won the german election of 2009 or the trouble with predictions: a response to tumasjan, a., sprenger, to, sander, pg, & welpe, im \u201cpredicting elections with twitter: What 140 characters reveal about political sentiment. Soc Sci Comput Rev 30(2):229\u2013234","DOI":"10.1177\/0894439311404119"},{"issue":"2","key":"1327_CR17","doi-asserted-by":"publisher","first-page":"205630511877283","DOI":"10.1177\/2056305118772836","volume":"4","author":"H Kim","year":"2018","unstructured":"Kim H, Jang SM, Kim S-H, Wan A (2018) Evaluating sampling methods for content analysis of Twitter data. Soc Media Soc 4(2):2056305118772836","journal-title":"Soc Media Soc"},{"issue":"4","key":"1327_CR18","doi-asserted-by":"publisher","first-page":"971","DOI":"10.1111\/ajps.12291","volume":"61","author":"G King","year":"2017","unstructured":"King G, Lam P, Roberts ME (2017) Computer-assisted keyword and document set discovery from unstructured text. Am J Polit Sci 61(4):971\u2013988","journal-title":"Am J Polit Sci"},{"key":"1327_CR19","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1609\/icwsm.v7i1.14401","volume":"7","author":"F Morstatter","year":"2013","unstructured":"Morstatter F, Pfeffer J, Liu H, Carley K (2013) Is the sample good enough? comparing data from twitter\u2019s streaming api with twitter\u2019s firehose. Proc Int AAAI Conf Web Soc Media 7:400\u2013408","journal-title":"Proc Int AAAI Conf Web Soc Media"},{"key":"1327_CR20","doi-asserted-by":"crossref","unstructured":"Mosleh M, Rand DG (2024) Who is on Twitter (\u201cX\u201d)? Identifying demographic of Twitter users","DOI":"10.31235\/osf.io\/wxfcz"},{"issue":"1","key":"1327_CR21","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1017\/S0007123420000198","volume":"52","author":"K Munger","year":"2022","unstructured":"Munger K, Egan PJ, Nagler J, Ronen J, Tucker J (2022) Political knowledge and misinformation in the era of social media: evidence from the 2015 UK election. Br J Polit Sci 52(1):107\u2013127","journal-title":"Br J Polit Sci"},{"issue":"1","key":"1327_CR22","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1140\/epjds\/s13688-018-0178-0","volume":"7","author":"J Pfeffer","year":"2018","unstructured":"Pfeffer J, Mayer K, Morstatter F (2018) Tampering with Twitter\u2019s sample API. EPJ Data Sci 7(1):50","journal-title":"EPJ Data Sci"},{"key":"1327_CR23","unstructured":"Pointer D (2023) System design interview: scalable unique ID generator (twitter snowflake or a similar service). Accessed: 2023-02-08"},{"issue":"4","key":"1327_CR24","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196087","volume":"13","author":"C Shao","year":"2018","unstructured":"Shao C, Hui P-M, Wang L, Jiang X, Flammini A, Menczer F, Ciampaglia GL (2018) Anatomy of an online misinformation network. PLoS ONE 13(4):e0196087","journal-title":"PLoS ONE"},{"issue":"1","key":"1327_CR25","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1140\/epjds\/s13688-024-00450-9","volume":"13","author":"BT Truong","year":"2024","unstructured":"Truong BT, Allen OM, Menczer F (2024) Account credibility inference based on news-sharing networks. EPJ Data Sci 13(1):10","journal-title":"EPJ Data Sci"},{"key":"1327_CR26","doi-asserted-by":"crossref","unstructured":"Tufekci Z (2014) Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: Eighth international AAAI conference on weblogs and social media","DOI":"10.1609\/icwsm.v8i1.14517"},{"issue":"3","key":"1327_CR27","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1016\/j.ijforecast.2014.06.001","volume":"31","author":"W Wang","year":"2015","unstructured":"Wang W, Rothschild D, Goel S, Gelman A (2015) Forecasting elections with non-representative polls. Int J Forecast 31(3):980\u2013991","journal-title":"Int J Forecast"},{"issue":"3","key":"1327_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2746366","volume":"9","author":"Y Wang","year":"2015","unstructured":"Wang Y, Callan J, Zheng B (2015) Should we use the sample? Analyzing datasets sampled from Twitter\u2019s stream API. ACM Trans Web (TWEB) 9(3):1\u201323","journal-title":"ACM Trans Web (TWEB)"},{"key":"1327_CR29","doi-asserted-by":"crossref","unstructured":"Wang Z, Hale S, Adelani DI, Grabowicz P, Hartman T, Fl\u00f6ck F, Jurgens D (2019) Demographic inference and representative population estimates from multilingual social media data. In: The world wide web conference, pp 2056\u20132067","DOI":"10.1145\/3308558.3313684"},{"key":"1327_CR30","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1609\/icwsm.v14i1.7337","volume":"14","author":"S Wu","year":"2020","unstructured":"Wu S, Rizoiu M-A, Xie L (2020) Variation across scales: measurement fidelity under twitter data sampling. Proc Int AAAI Conf Web Soc Media 14:715\u2013725","journal-title":"Proc Int AAAI Conf Web Soc Media"},{"key":"1327_CR31","doi-asserted-by":"crossref","unstructured":"Yang K-C, Ferrara E, Menczer F (2022) Botometer 101: social bot practicum for computational social scientists. J Computat Soc Sci 1\u201318","DOI":"10.1007\/s42001-022-00177-5"},{"key":"1327_CR32","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.1025","volume":"8","author":"K-C Yang","year":"2022","unstructured":"Yang K-C, Hui P-M, Menczer F (2022) How Twitter data sampling biases US voter behavior characterizations. PeerJ Comput Sci 8:e1025","journal-title":"PeerJ Comput Sci"},{"issue":"3","key":"1327_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2743023","volume":"9","author":"MB Zafar","year":"2015","unstructured":"Zafar MB, Bhattacharya P, Ganguly N, Gummadi KP, Ghosh S (2015) Sampling content from online social networks: comparing random vs expert sampling of the twitter stream. ACM Trans Web (TWEB) 9(3):1\u201333","journal-title":"ACM Trans Web (TWEB)"}],"container-title":["Social Network Analysis and Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13278-024-01327-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13278-024-01327-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13278-024-01327-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,25]],"date-time":"2025-02-25T14:33:05Z","timestamp":1740493985000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13278-024-01327-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,14]]},"references-count":33,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1327"],"URL":"https:\/\/doi.org\/10.1007\/s13278-024-01327-5","relation":{},"ISSN":["1869-5469"],"issn-type":[{"value":"1869-5469","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,14]]},"assertion":[{"value":"17 March 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 July 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 August 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}],"article-number":"160"}}