{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:30:29Z","timestamp":1753893029658,"version":"3.41.2"},"reference-count":22,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T00:00:00Z","timestamp":1670544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Point-of-Interests (POIs) represent geographic location by different categories (e.g., touristic places, amenities, or shops) and play a prominent role in several location-based applications. However, the majority of POIs category labels are crowd-sourced by the community, thus often of low quality. In this paper, we introduce the first annotated dataset for the POIs categorical classification task in Vietnamese. A total of 750,000 POIs are collected from WeMap, a Vietnamese digital map. Large-scale hand-labeling is inherently time-consuming and labor-intensive, thus we have proposed a new approach using weak labeling. As a result, our dataset covers 15 categories with 275,000 weak-labeled POIs for training, and 30,000 gold-standard POIs for testing, making it the largest compared to the existing Vietnamese POIs dataset. We empirically conduct POI categorical classification experiments using a strong baseline (BERT-based fine-tuning) on our dataset and find that our approach shows high efficiency and is applicable on a large scale. The proposed baseline gives an F1 score of 90% on the test dataset, and significantly improves the accuracy of WeMap POI data by a margin of 37% (from 56 to 93%).<\/jats:p>","DOI":"10.3389\/frai.2022.1020532","type":"journal-article","created":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T07:55:47Z","timestamp":1670572547000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Large-scale Vietnamese point-of-interest classification using weak labeling"],"prefix":"10.3389","volume":"5","author":[{"given":"Van Trung","family":"Tran","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Quang Dao","family":"Le","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bao Son","family":"Pham","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Viet Hung","family":"Luu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Quang Hung","family":"Bui","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2022,12,9]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1145\/3299869.3314036","article-title":"\u201cSnorkel drybell: a case study in deploying weak supervision at industrial scale,\u201d","volume-title":"Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19","author":"Bach","year":"2019"},{"key":"B2","doi-asserted-by":"publisher","first-page":"1588","DOI":"10.1080\/13658816.2019.1593422","article-title":"Crowdsourced geospatial data quality: challenges and future directions","volume":"33","author":"Basiri","year":"2019","journal-title":"Int. J. Geograph. Inf. Sci"},{"key":"B3","first-page":"4573","article-title":"\u201cCreating a dataset for named entity recognition in the archaeology domain,\u201d","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Brandsen","year":"2020"},{"key":"B4","first-page":"13","article-title":"\u201cImproving sequence tagging for Vietnamese text using transformer-based neural models,\u201d","volume-title":"Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation","author":"Bui","year":"2020"},{"key":"B5","first-page":"38","article-title":"\u201cA poi categorization by composition of onomastic and contextual information,\u201d","volume-title":"2014 IEEE\/WIC\/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Vol. 2","author":"Choi","year":"2014"},{"key":"B6","doi-asserted-by":"publisher","first-page":"265","DOI":"10.5555\/944790.944813","article-title":"On the algorithmic implementation of multiclass kernel-based vector machines","volume":"2","author":"Crammer","year":"2002","journal-title":"J. Mach. Learn. Res"},{"key":"B7","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1145\/3340964.3340993","article-title":"\u201cLGM-pc: A tool for poi classification on QGIS,\u201d","volume-title":"Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD '19","author":"Eftaxias","year":"2019"},{"key":"B8","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1145\/3356994.3365504","article-title":"\u201cClassifying points of interest with minimum metadata,\u201d","volume-title":"Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Recommendations, Geosocial Networks and Geoadvertising, LocalRec '19","author":"Giannopoulos","year":"2019"},{"key":"B9","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1007\/978-3-319-34129-3_33","article-title":"\u201cLearning to classify spatiotextual entities in maps,\u201d","volume-title":"The Semantic Web. Latest Advances and New Domains","author":"Giannopoulos","year":"2016"},{"key":"B10","article-title":"\u201cNeighbourhood components analysis,\u201d","volume-title":"Advances in Neural Information Processing Systems, Vol. 17","author":"Goldberger","year":"2004"},{"key":"B11","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s10708-007-9111-y","article-title":"Citizens as sensors: the world of volunteered geography","volume":"69","author":"Goodchild","year":"2007","journal-title":"GeoJournal"},{"key":"B12","first-page":"3111","article-title":"\u201cDistributed representations of words and phrases and their compositionality,\u201d","volume-title":"Proceedings of the 26th International Conference on Neural Information Processing Systems, Vol. 2, NIPS'13","author":"Mikolov","year":"2013"},{"key":"B13","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1007\/978-3-642-23196-4_19","article-title":"\u201cAnalyzing the spatial-semantic interaction of points of interest in volunteered geographic information,\u201d","volume-title":"Spatial Information Theory","author":"M\u00fclligann","year":"2011"},{"key":"B14","doi-asserted-by":"crossref","first-page":"1037","DOI":"10.18653\/v1\/2020.findings-emnlp.92","article-title":"\u201cPhoBERT: pre-trained language models for Vietnamese,\u201d","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Nguyen","year":"2020"},{"key":"B15","doi-asserted-by":"publisher","first-page":"269","DOI":"10.14778\/3157794.3157797","article-title":"Snorkel: rapid training data creation with weak supervision","volume":"11","author":"Ratner","year":"2017","journal-title":"Proc. VLDB Endow"},{"key":"B16","doi-asserted-by":"publisher","first-page":"234","DOI":"10.2307\/143141","article-title":"A computer movie simulating urban growth in the detroit region","volume":"46","author":"Tobler","year":"1970","journal-title":"Econ. Geography"},{"key":"B17","doi-asserted-by":"publisher","DOI":"10.3390\/ijgi6030080","article-title":"Assessing crowdsourced poi quality: combining methods based on reference data, history, and spatial relations","volume":"6","author":"Touya","year":"2017","journal-title":"ISPRS Int. J. Geoinf"},{"key":"B18","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-14280-7_4","volume-title":"Improving Volunteered Geographic Information Quality Using a Tag Recommender System: The Case of OpenStreetMap, Chapter 3","author":"Vandecasteele","year":"2015"},{"key":"B19","doi-asserted-by":"publisher","first-page":"223","DOI":"10.14778\/3291264.3291268","article-title":"Snuba: automating weak supervision to label training data","volume":"12","author":"Varma","year":"2018","journal-title":"Proc. VLDB Endow"},{"key":"B20","first-page":"56","article-title":"\u201cVnCoreNLP: a Vietnamese natural language processing toolkit,\u201d","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations","author":"Vu","year":"2018"},{"key":"B21","doi-asserted-by":"crossref","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"\u201cTransformers: State-of-the-art natural language processing,\u201d","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"B22","doi-asserted-by":"publisher","first-page":"944","DOI":"10.20965\/jaciii.2020.p0944","article-title":"Poi classification method based on feature extension and deep learning","volume":"24","author":"Zhou","year":"2020","journal-title":"J. Adv. Comput. Intell. Intell. Inform"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.1020532\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T07:55:54Z","timestamp":1670572554000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.1020532\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,9]]},"references-count":22,"alternative-id":["10.3389\/frai.2022.1020532"],"URL":"https:\/\/doi.org\/10.3389\/frai.2022.1020532","relation":{},"ISSN":["2624-8212"],"issn-type":[{"type":"electronic","value":"2624-8212"}],"subject":[],"published":{"date-parts":[[2022,12,9]]},"article-number":"1020532"}}