{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T02:56:34Z","timestamp":1767840994386,"version":"3.49.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,7,13]],"date-time":"2024-07-13T00:00:00Z","timestamp":1720828800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["#1934925 and #1934494"],"award-info":[{"award-number":["#1934925 and #1934494"]}]},{"DOI":"10.13039\/100020175","name":"National Collaborative on Gun Violence Research","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100020175","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Massive Data Institute (MDI) at Georgetown University"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Spatial Algorithms Syst."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>\n            Accurate estimates of user location are important for many online services, including event detection, disaster management, and determining public opinion. Neural network-based techniques have proven to be highly effective in predicting user location. However, these models typically require a large amount of labeled training data, which can be difficult to obtain in real-world scenarios. In this article, we present two approaches to tackle the issue of limited training data when predicting city level location. First, we consider a self-supervised approach that trains a state-level model without labeled data and then integrate this knowledge into the training dataset used for city-level predictions. Second, we explore the option of increasing the number of training examples by utilizing external resources to generate\n            <jats:italic>synthetic users<\/jats:italic>\n            . Finally, we combine these two strategies, exploiting the benefits of both. We empirically evaluate our proposed techniques on multiple Twitter\/X datasets and show that our models perform significantly better than the state-of-the-art with improvements of up to 6% for Acc@161 and 8% for F1 score.\n          <\/jats:p>","DOI":"10.1145\/3673899","type":"journal-article","created":{"date-parts":[[2024,6,19]],"date-time":"2024-06-19T10:52:00Z","timestamp":1718794320000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Utilizing External Knowledge to Enhance Location Prediction for Twitter\/X Users in Low Resource Settings"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1926-444X","authenticated-orcid":false,"given":"Yaguang","family":"Liu","sequence":"first","affiliation":[{"name":"Georgetown University, Washington, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8300-2970","authenticated-orcid":false,"given":"Lisa","family":"Singh","sequence":"additional","affiliation":[{"name":"Georgetown University, Washington, United States"}]}],"member":"320","published-online":{"date-parts":[[2024,7,13]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-76941-7_11"},{"key":"e_1_3_2_3_2","volume-title":"ICWSM","author":"Zamal F. Al","year":"2012","unstructured":"F. Al Zamal, W. Liu, and D. Ruths. 2012. Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In ICWSM."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009040"},{"key":"e_1_3_2_5_2","article-title":"Inferring the location of authors from words in their texts","author":"Berggren M.","year":"2016","unstructured":"M. Berggren, J. Karlgren, R. \u00d6stling, and M. Parkvall. 2016. Inferring the location of authors from words in their texts. arXiv preprint arXiv:1612.06671 (2016).","journal-title":"arXiv preprint arXiv:1612.06671"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.3115\/1119394.1119403"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"L. Bode P. Davis-Kean L. Singh T. Berger-Wolf C. Budak G. Chi A. Guess J. Hill A. Hughes J. Jensen et\u00a0al. 2020. Study designs for quantitative social science research using social media. PsyArXiv (2020).","DOI":"10.31234\/osf.io\/zp8q2"},{"key":"e_1_3_2_8_2","article-title":"Enriching word vectors with subword information","author":"Bojanowski P.","year":"2017","unstructured":"P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Computat. Ling. 5 (2017), 135\u2013146.","journal-title":"Trans. Assoc. Computat. Ling."},{"key":"e_1_3_2_9_2","unstructured":"O. Buyukokkten J. Cho H. Garcia-Molina L. Gravano and N. Shivakumar. 1999. Exploiting geographical location information of web pages. In WebDB (1999)."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ssci.2016.04.002"},{"key":"e_1_3_2_11_2","volume-title":"ICML","author":"Chen T.","year":"2020","unstructured":"T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML."},{"key":"e_1_3_2_12_2","volume-title":"ICWSM","author":"Chen X.","year":"2015","unstructured":"X. Chen, Y. Wang, E. Agichtein, and F. Wang. 2015. A comparative study of demographic attribute inference in Twitter. In ICWSM."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871535"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2414425.2414427"},{"key":"e_1_3_2_15_2","article-title":"On the properties of neural machine translation: Encoder-decoder approaches","author":"Cho K.","year":"2014","unstructured":"K. Cho, B. Van Merri\u00ebnboer, D. Bahdanau, and Y. Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).","journal-title":"arXiv preprint arXiv:1409.1259"},{"key":"e_1_3_2_16_2","article-title":"Learning phrase representations using RNN encoder-decoder for statistical machine translation","author":"Cho K.","year":"2014","unstructured":"K. Cho, B. Van Merri\u00ebnboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).","journal-title":"arXiv preprint arXiv:1406.1078"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v29i1.9204"},{"key":"e_1_3_2_18_2","volume-title":"CoNLL","author":"Ebrahimi M.","year":"2018","unstructured":"M. Ebrahimi, E. ShafieiBavani, R. Wong, and F. Chen. 2018. A unified neural network model for geolocating Twitter users. In CoNLL."},{"key":"e_1_3_2_19_2","volume-title":"EMNLP","author":"Eisenstein J.","year":"2010","unstructured":"J. Eisenstein, B. O\u2019Connor, N. Smith, and E. Xing. 2010. A latent variable model for geographic lexical variation. In EMNLP."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3481949"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.3390\/e19070330"},{"key":"e_1_3_2_22_2","volume-title":"ICWSM","author":"Fink C.","year":"2012","unstructured":"C. Fink, J. Kopecky, and M. Morawski. 2012. Inferring gender from the content of tweets: A region specific example. In ICWSM."},{"key":"e_1_3_2_23_2","article-title":"Unsupervised representation learning by predicting image rotations","author":"Gidaris Spyros","year":"2018","unstructured":"Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018).","journal-title":"arXiv preprint arXiv:1803.07728"},{"key":"e_1_3_2_24_2","volume-title":"COLING","author":"Han B.","year":"2012","unstructured":"B. Han, P. Cook, and T. Baldwin. 2012. Geolocation prediction in social media data by finding location indicative words. In COLING."},{"key":"e_1_3_2_25_2","volume-title":"ACL: System Demonstrations","author":"Han B.","year":"2013","unstructured":"B. Han, P. Cook, and T. Baldwin. 2013. A stacking-based approach to Twitter user geolocation prediction. In ACL: System Demonstrations."},{"key":"e_1_3_2_26_2","volume-title":"WNUT","author":"Han B.","year":"2016","unstructured":"B. Han, A. Rahimi, L. Derczynski, and T. Baldwin. 2016. Twitter geolocation prediction shared task of the 2016 workshop on noisy user-generated text. In WNUT."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2018.8639619"},{"key":"e_1_3_2_28_2","volume-title":"SIGCHI","author":"Hecht B.","year":"2011","unstructured":"B. Hecht, L. Hong, B. Suh, and E. Chi. 2011. Tweets from Justin Bieber\u2019s heart: The dynamics of the location field in user profiles. In SIGCHI."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0207112"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-60240-0_34"},{"key":"e_1_3_2_31_2","article-title":"A hierarchical location prediction neural network for Twitter user geolocation","author":"Huang B.","year":"2019","unstructured":"B. Huang and K. Carley. 2019. A hierarchical location prediction neural network for Twitter user geolocation. arXiv preprint arXiv:1910.12941 (2019).","journal-title":"arXiv preprint arXiv:1910.12941"},{"key":"e_1_3_2_32_2","volume-title":"ICWSM","author":"Jurgens D.","year":"2013","unstructured":"D. Jurgens. 2013. That\u2019s what friends are for: Inferring location in online social media platforms based on social relationships. In ICWSM."},{"key":"e_1_3_2_33_2","article-title":"AEDA: An easier data augmentation technique for text classification","author":"Karimi A.","year":"2021","unstructured":"A. Karimi, L. Rossi, and A. Prati. 2021. AEDA: An easier data augmentation technique for text classification. arXiv preprint arXiv:2108.13230 (2021).","journal-title":"arXiv preprint arXiv:2108.13230"},{"key":"e_1_3_2_34_2","article-title":"Adam: A method for stochastic optimization","author":"Kingma D.","year":"2014","unstructured":"D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18818-8_12"},{"key":"e_1_3_2_36_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Y.","year":"2019","unstructured":"Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482055"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539597.3570462"},{"key":"e_1_3_2_39_2","volume-title":"DeLTA","author":"Liu Y.","year":"2021","unstructured":"Y. Liu, L. Singh, and Z. Mneimneh. 2021. A comparative analysis of classic and deep learning models for inferring gender and age of Twitter users. In DeLTA."},{"key":"e_1_3_2_40_2","volume-title":"ICWSM","author":"Mahmud J.","year":"2012","unstructured":"J. Mahmud, J. Nichols, and C. Drews. 2012. Where is this tweet from? Inferring home locations of Twitter users. In ICWSM."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/IIPHDW.2018.8388338"},{"key":"e_1_3_2_42_2","volume-title":"WNUT","author":"Miura Y.","year":"2016","unstructured":"Y. Miura, M. Taniguchi, T. Taniguchi, and T. Ohkuma. 2016. A simple scalable neural networks based model for geolocation prediction in Twitter. In WNUT."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1116"},{"key":"e_1_3_2_44_2","article-title":"How transferable are neural networks in NLP applications?","author":"Mou L.","year":"2016","unstructured":"L. Mou, Z. Meng, R. Yan, G. Li, Y. Xu, L. Zhang, and Z. Jin. 2016. How transferable are neural networks in NLP applications? arXiv preprint arXiv:1603.06111 (2016).","journal-title":"arXiv preprint arXiv:1603.06111"},{"key":"e_1_3_2_45_2","volume-title":"ICWSM","author":"Nguyen D.","year":"2013","unstructured":"D. Nguyen, R. Gravel, D. Trieschnigg, and T. Meder. 2013. \u201cHow old do you think I am?\u201d A study of language and age in Twitter. In ICWSM."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_47_2","article-title":"The effectiveness of data augmentation in image classification using deep learning","author":"Perez L.","year":"2017","unstructured":"L. Perez and J. Wang. 2017. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621 (2017).","journal-title":"arXiv preprint arXiv:1712.04621"},{"key":"e_1_3_2_48_2","unstructured":"A. Radford J. Kim C. Hallacy A. Ramesh G. Goh S. Agarwal G. Sastry A. Askell P. Mishkin J. Clark G. Krueger and I. Sutskever. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)."},{"key":"e_1_3_2_49_2","article-title":"Semi-supervised user geolocation via graph convolutional networks","author":"Rahimi A.","year":"2018","unstructured":"A. Rahimi, T. Cohn, and T. Baldwin. 2018. Semi-supervised user geolocation via graph convolutional networks. arXiv preprint arXiv:1804.08049 (2018).","journal-title":"arXiv preprint arXiv:1804.08049"},{"key":"e_1_3_2_50_2","volume-title":"NIPS MLSN Workshop","year":"2010","unstructured":"D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. 2010. Classifying latent user attributes in Twitter. In NIPS MLSN Workshop."},{"key":"e_1_3_2_51_2","volume-title":"EMNLP","author":"Roller S.","year":"2012","unstructured":"S. Roller, M. Speriosu, S. Rallapalli, B. Wing, and J. Baldridge. 2012. Supervised text-based geolocation using language models on an adaptive grid. In EMNLP."},{"key":"e_1_3_2_52_2","volume-title":"WWW","author":"Ryoo K.","year":"2014","unstructured":"K. Ryoo and S. Moon. 2014. Inferring Twitter user locations with 10 km accuracy. In WWW."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-5408"},{"key":"e_1_3_2_54_2","volume-title":"WWW","author":"Sakaki T.","year":"2010","unstructured":"T. Sakaki, M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In WWW."},{"key":"e_1_3_2_55_2","unstructured":"L. Singh M. Traugott L. Bode C. Budak P. Davis-Kean R. Guha J. Ladd Z. Mneimneh Q. Nguyen J. Pasek T. Raghunathan R. Ryan S. Soroka and L. Wahedi. 2020. Data blending: Haven\u2019t we been doing this for years? Georgetown Massive Data Institute Report (2020). https:\/\/live-guwordpressmccourt.pantheonsite.io\/wpcontent\/uploads\/2020\/05\/MDI-Data-Blending-White-Paper-April2020.pdf"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0115545"},{"key":"e_1_3_2_57_2","volume-title":"Twitter Decahose","year":"2023","unstructured":"Twitter. 2023. Twitter Decahose. Retrieved from https:\/\/developer.twitter.com\/en\/docs\/twitter-api\/enterprise\/decahose-api\/overview\/decahose"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2076"},{"key":"e_1_3_2_59_2","article-title":"EDA: Easy data augmentation techniques for boosting performance on text classification tasks","author":"Wei J.","year":"2019","unstructured":"J. Wei and K. Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019).","journal-title":"arXiv preprint arXiv:1901.11196"},{"key":"e_1_3_2_60_2","unstructured":"Wikipedia. 2023. Houston Texas. Retrieved from DOI:https:\/\/en.wikipedia.org\/wiki\/Houston"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1039"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-1114"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1174"},{"key":"e_1_3_2_64_2","article-title":"A survey of location prediction on Twitter","author":"Zheng X.","year":"2018","unstructured":"X. Zheng, J. Han, and A. Sun. 2018. A survey of location prediction on Twitter. IEEE Trans. Knowl. Data Eng. 30, 9 (2018), 1652\u20131671.","journal-title":"IEEE Trans. Knowl. Data Eng."}],"container-title":["ACM Transactions on Spatial Algorithms and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3673899","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3673899","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:23Z","timestamp":1750294703000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3673899"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,13]]},"references-count":63,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3673899"],"URL":"https:\/\/doi.org\/10.1145\/3673899","relation":{},"ISSN":["2374-0353","2374-0361"],"issn-type":[{"value":"2374-0353","type":"print"},{"value":"2374-0361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,13]]},"assertion":[{"value":"2023-05-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-31","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}