{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T13:51:03Z","timestamp":1758981063164,"version":"3.37.3"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,2,19]],"date-time":"2022-02-19T00:00:00Z","timestamp":1645228800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,2,19]],"date-time":"2022-02-19T00:00:00Z","timestamp":1645228800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EPJ Data Sci."],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>As survey costs continue to rise and response rates decline, researchers are seeking more cost-effective ways to collect, analyze and process social and public opinion data. These issues have created an opportunity and interest in expanding the fit-for-purpose paradigm to include alternate sources such as passively collected sensor data and social media data. However, methods for accessing, sourcing and sampling social media data are just now being developed. In fact, there has been a small but growing body of literature focusing on comparing different Twitter data access methods through either the elaborate firehose or the free Twitter search or streaming APIs. Missing from the literature is a good understanding of how to randomly sample Tweets to produce datasets that are representative of the daily discourse, especially within geographical regions of interest, without requiring a census of all Tweets. This understanding is necessary for producing quality estimates of public opinion from social media sources such as Twitter. To address this gap, we propose and test the Velocity-Based Estimation for Sampling Tweets (VBEST) algorithm for selecting a probability based sample of tweets. We compare the performance of VBEST sample estimates to other methods of accessing Twitter through the Search API on the distribution of total Tweets as well as COVID-19 keyword incidence and frequency and find that the VBEST samples produce consistent and relatively low levels of overall bias compared to common methods of access through the Search API across many experimental conditions.<\/jats:p>","DOI":"10.1140\/epjds\/s13688-022-00321-1","type":"journal-article","created":{"date-parts":[[2022,2,19]],"date-time":"2022-02-19T14:04:27Z","timestamp":1645279467000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter"],"prefix":"10.1140","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4798-9154","authenticated-orcid":false,"given":"Trent D.","family":"Buskirk","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brian P.","family":"Blakely","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adam","family":"Eck","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard","family":"McGrath","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ravinder","family":"Singh","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Youzhi","family":"Yu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,2,19]]},"reference":[{"issue":"2","key":"321_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.29115\/SP-2018-0033","volume":"11","author":"ME Berzofsky","year":"2018","unstructured":"Berzofsky ME, McKay T, Hsieh YP, Smith A (2018) Probability-based samples on Twitter: methodology and application. Surv Pract 11(2):1\u201312","journal-title":"Surv Pract"},{"key":"321_CR2","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1002\/9781118976357.ch2","volume-title":"Big data meets survey science: a collection of innovative methods","author":"A Burke-Garcia","year":"2020","unstructured":"Burke-Garcia A, Edwards B, Yan T (2020) The future is now: how surveys can harness social media to address twenty-first century challenges. In: Big data meets survey science: a collection of innovative methods, pp\u00a063\u201397"},{"key":"321_CR3","volume-title":"Statistical models in S","author":"WS Cleveland","year":"1991","unstructured":"Cleveland WS (1991) Local regression models. In: Statistical models in S"},{"key":"321_CR4","doi-asserted-by":"publisher","first-page":"596","DOI":"10.1080\/01621459.1988.10478639","volume":"83","author":"WS Cleveland","year":"1988","unstructured":"Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 83:596\u2013610","journal-title":"J Am Stat Assoc"},{"issue":"4","key":"321_CR5","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1177\/0894439319875692","volume":"39","author":"FG Conrad","year":"2021","unstructured":"Conrad FG, Gagnon-Bartsch JA, Ferg RA, Schober MF, Pasek J, Hou E (2021) Social media as an alternative to surveys of opinions about the economy. Soc Sci Comput Rev 39(4):489\u2013508","journal-title":"Soc Sci Comput Rev"},{"key":"321_CR6","doi-asserted-by":"publisher","first-page":"1325","DOI":"10.1145\/2020408.2020606","volume-title":"Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining","author":"N Dalvi","year":"2011","unstructured":"Dalvi N, Kumar R, Machanavajjhala A, Rastogi V (2011) Sampling hidden objects using nearest-neighbor oracles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp\u00a01325\u20131333"},{"issue":"S1","key":"321_CR7","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1093\/poq\/nfw061","volume":"81","author":"D Dutwin","year":"2017","unstructured":"Dutwin D, Buskirk TD (2017) Apples to oranges or gala versus golden delicious? Comparing data quality of nonprobability Internet samples to low response rate probability samples. Public Opin Q 81(S1):213\u2013239","journal-title":"Public Opin Q"},{"key":"321_CR8","doi-asserted-by":"publisher","unstructured":"Gerlitz C, Rieder B (2013) Mining one percent of Twitter: collections, baselines, sampling. M\/C J 16(2). https:\/\/doi.org\/10.5204\/mcj.620. Accessed 25 May 2021","DOI":"10.5204\/mcj.620"},{"key":"321_CR9","unstructured":"Goepp V, Bouaziz O, Nuel G (2018) Spline regression with automatic knot selection. arXiv preprint. arXiv:1808.01770"},{"issue":"6051","key":"321_CR10","doi-asserted-by":"publisher","first-page":"1878","DOI":"10.1126\/science.1202775","volume":"333","author":"SA Golder","year":"2011","unstructured":"Golder SA, Macy MW (2011) Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333(6051):1878\u20131881","journal-title":"Science"},{"key":"321_CR11","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1016\/j.ijinfomgt.2019.01.019","volume":"48","author":"A Hino","year":"2019","unstructured":"Hino A, Fahey RA (2019) Representing the Twittersphere: archiving a representative sample of Twitter data under resource constraints. Int J Inf Manag 48:175\u2013184","journal-title":"Int J Inf Manag"},{"key":"321_CR12","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1002\/9781119041702.ch2","volume-title":"Total survey error in practice: improving quality in the era of big data","author":"YP Hsieh","year":"2017","unstructured":"Hsieh YP, Murphy J (2017) Total Twitter error: decomposing public opinion measurement on Twitter from a total survey error perspective. In: Biemer PP, de Leeuw E, Eckman S, Edwards B, Kreuter F, Lyberg LE, Tucker NC, West BT (eds) Total survey error in practice: improving quality in the era of big data. Wiley, Hoboken, pp\u00a023\u201346"},{"issue":"2","key":"321_CR13","volume":"4","author":"H Kim","year":"2018","unstructured":"Kim H, Jang SM, Kim SH, Wan A (2018) Evaluating sampling methods for content analysis of Twitter data. Soc Media Society 4(2):2056305118772836","journal-title":"Soc Media Society"},{"issue":"3","key":"321_CR14","doi-asserted-by":"publisher","DOI":"10.3390\/ijerph17030864","volume":"17","author":"Y Kim","year":"2020","unstructured":"Kim Y, Nordgren R, Emery S (2020) The story of Goldilocks and three Twitter\u2019s APIs: a pilot study on Twitter data sources and disclosure. Int J Environ Res Public Health 17(3):864","journal-title":"Int J Environ Res Public Health"},{"key":"321_CR15","doi-asserted-by":"publisher","DOI":"10.1201\/9780429296284","volume-title":"Sampling: design and analysis","author":"SL Lohr","year":"2019","unstructured":"Lohr SL (2019) Sampling: design and analysis. Chapman & Hall\/CRC, Boca Raton"},{"key":"321_CR16","unstructured":"Mislove A, Lehmann S, Ahn Y, Onnela J, Rosenquist JN (2010). http:\/\/www.ccs.neu.edu\/home\/amislove\/twittermood\/. Accessed 15 May 2021"},{"issue":"4","key":"321_CR17","doi-asserted-by":"publisher","first-page":"1519","DOI":"10.1007\/s13202-017-0427-y","volume":"8","author":"SR Moosavi","year":"2018","unstructured":"Moosavi SR, Qajar J, Riazi M (2018) A comparison of methods for denoising of well test pressure data. J Pet Explor Prod Technol 8(4):1519\u20131534","journal-title":"J Pet Explor Prod Technol"},{"key":"321_CR18","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1145\/2567948.2576952","volume-title":"Proceedings of the 23rd international conference on world wide web","author":"F Morstatter","year":"2014","unstructured":"Morstatter F, Pfeffer J, Liu H (2014) When is it biased? Assessing the representativeness of Twitter\u2019s streaming API. In: Proceedings of the 23rd international conference on world wide web, pp\u00a0555\u2013556"},{"issue":"1","key":"321_CR19","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-018-0178-0","volume":"7","author":"J Pfeffer","year":"2018","unstructured":"Pfeffer J, Mayer K, Morstatter F (2018) Tampering with Twitter\u2019s sample API. EPJ Data Sci 7(1):50","journal-title":"EPJ Data Sci"},{"issue":"1","key":"321_CR20","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1177\/0049124119882477","volume":"51","author":"D Schneider","year":"2022","unstructured":"Schneider D, Harknett K (2022) What\u2019s to like? Facebook as a tool for survey data collection. Sociol Methods Res 51(1):108\u2013140","journal-title":"Sociol Methods Res"},{"issue":"1","key":"321_CR21","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1093\/poq\/nfv048","volume":"80","author":"MF Schober","year":"2016","unstructured":"Schober MF, Pasek J, Guggenheim L, Lampe C, Conrad FG (2016) Social media analyses for social measurement. Public Opin Q 80(1):180\u2013211","journal-title":"Public Opin Q"},{"key":"321_CR22","doi-asserted-by":"publisher","first-page":"3510","DOI":"10.1109\/HICSS.2012.493","volume-title":"2012 45th Hawaii international conference on system sciences","author":"C Sibona","year":"2012","unstructured":"Sibona C, Walczak S (2012) Purposive sampling on Twitter: a case study. In: 2012 45th Hawaii international conference on system sciences. IEEE, pp\u00a03510\u20133519"},{"key":"321_CR23","unstructured":"Suzer-Gurtekin ZT, Fu Y, Li C, Lepkowski J, Curtin R (2021) Explaining consumer expectations using big data. Paper presented at the 76th annual American Association of Public Opinion Research conference, May 11\u201314, 2021"},{"issue":"3","key":"321_CR24","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/s10109-005-0007-4","volume":"7","author":"NJ Tate","year":"2005","unstructured":"Tate NJ, Brunsdon C, Charlton M, Fotheringham AS, Jarvis CH (2005) Smoothing\/filtering LiDAR digital surface models. Experiments with loess regression and discrete wavelets. J Geogr Syst 7(3):273\u2013290","journal-title":"J Geogr Syst"},{"key":"321_CR25","doi-asserted-by":"publisher","first-page":"1519","DOI":"10.1145\/2588555.2610517","volume-title":"Proceedings of the 2014 ACM SIGMOD international conference on management of data","author":"S Thirumuruganathan","year":"2014","unstructured":"Thirumuruganathan S, Zhang N, Hristidis V, Das G (2014) Aggregate estimation over a microblog platform. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp\u00a01519\u20131530"},{"key":"321_CR26","doi-asserted-by":"crossref","unstructured":"Tromble R, Storz A, Stockmann D (2017) We don\u2019t know what we don\u2019t know: when and how the use of Twitter\u2019s public APIs biases scientific inference. Available at SSRN 3079927","DOI":"10.2139\/ssrn.3079927"},{"issue":"3","key":"321_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2746366","volume":"9","author":"Y Wang","year":"2015","unstructured":"Wang Y, Callan J, Zheng B (2015) Should we use the sample? Analyzing datasets sampled from Twitter\u2019s stream API. ACM Trans Web 9(3):1\u201323","journal-title":"ACM Trans Web"},{"key":"321_CR28","doi-asserted-by":"publisher","first-page":"2056","DOI":"10.1145\/3308558.3313684","volume-title":"The world wide web conference","author":"Z Wang","year":"2019","unstructured":"Wang Z, Hale S, Adelani DI, Grabowicz P, Hartman T, Fl\u00f6ck F, Jurgens D (2019) Demographic inference and representative population estimates from multilingual social media data. In: The world wide web conference, pp\u00a02056\u20132067"},{"issue":"4","key":"321_CR29","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1093\/poq\/nfr020","volume":"75","author":"DS Yeager","year":"2011","unstructured":"Yeager DS, Krosnick JA, Chang L, Javitz HS, Levendusky MS, Simpser A, Wang R (2011) Comparing the accuracy of RDD telephone surveys and Internet surveys conducted with probability and non-probability samples. Public Opin Q 75(4):709\u2013747","journal-title":"Public Opin Q"},{"issue":"3","key":"321_CR30","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1177\/0894439310382512","volume":"29","author":"JJ Zhu","year":"2011","unstructured":"Zhu JJ, Mo Q, Wang F, Lu H (2011) A random digit search (RDS) method for sampling of blogs and other user-generated content. Soc Sci Comput Rev 29(3):327\u2013339","journal-title":"Soc Sci Comput Rev"}],"container-title":["EPJ Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-022-00321-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1140\/epjds\/s13688-022-00321-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-022-00321-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,19]],"date-time":"2022-02-19T14:04:34Z","timestamp":1645279474000},"score":1,"resource":{"primary":{"URL":"https:\/\/epjdatascience.springeropen.com\/articles\/10.1140\/epjds\/s13688-022-00321-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,19]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["321"],"URL":"https:\/\/doi.org\/10.1140\/epjds\/s13688-022-00321-1","relation":{},"ISSN":["2193-1127"],"issn-type":[{"type":"electronic","value":"2193-1127"}],"subject":[],"published":{"date-parts":[[2022,2,19]]},"assertion":[{"value":"6 June 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 February 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"9"}}