{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T07:14:56Z","timestamp":1761808496484,"version":"build-2065373602"},"reference-count":64,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T00:00:00Z","timestamp":1643155200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Regional Government of Aragon (Spain)","award":["T59_20R"],"award-info":[{"award-number":["T59_20R"]}]},{"DOI":"10.13039\/501100004837","name":"Spanish Ministry of Science and Innovation","doi-asserted-by":"publisher","award":["PID2020-113353RB-I00"],"award-info":[{"award-number":["PID2020-113353RB-I00"]}],"id":[{"id":"10.13039\/501100004837","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>The discrete representation of resources in geospatial catalogues affects their information retrieval performance. The performance could be improved by using automatically generated clusters of related resources, which we name quasi-spatial dataset series. This work evaluates whether a clustering process can create quasi-spatial dataset series using only textual information from metadata elements. We assess the combination of different kinds of text cleaning approaches, word and sentence-embeddings representations (Word2Vec, GloVe, FastText, ELMo, Sentence BERT, and Universal Sentence Encoder), and clustering techniques (K-Means, DBSCAN, OPTICS, and agglomerative clustering) for the task. The results demonstrate that combining word-embeddings representations with an agglomerative-based clustering creates better quasi-spatial dataset series than the other approaches. In addition, we have found that the ELMo representation with agglomerative clustering produces good results without any preprocessing step for text cleaning.<\/jats:p>","DOI":"10.3390\/ijgi11020087","type":"journal-article","created":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T11:02:53Z","timestamp":1643194973000},"page":"87","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Approaches for the Clustering of Geographic Metadata and the Automatic Detection of Quasi-Spatial Dataset Series"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3071-5819","authenticated-orcid":false,"given":"Javier","family":"Lacasta","sequence":"first","affiliation":[{"name":"Arag\u00f3n Institute of Engineering Research (I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6491-7430","authenticated-orcid":false,"given":"Francisco Javier","family":"Lopez-Pellicer","sequence":"additional","affiliation":[{"name":"Arag\u00f3n Institute of Engineering Research (I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6557-2494","authenticated-orcid":false,"given":"Javier","family":"Zarazaga-Soria","sequence":"additional","affiliation":[{"name":"Arag\u00f3n Institute of Engineering Research (I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7866-3793","authenticated-orcid":false,"given":"Rub\u00e9n","family":"B\u00e9jar","sequence":"additional","affiliation":[{"name":"Arag\u00f3n Institute of Engineering Research (I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1279-0367","authenticated-orcid":false,"given":"Javier","family":"Nogueras-Iso","sequence":"additional","affiliation":[{"name":"Arag\u00f3n Institute of Engineering Research (I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,26]]},"reference":[{"key":"ref_1","unstructured":"Nebert, D. (2021, November 26). Developing Spatial Data Infrastructures: The SDI Cookbook; Global Spatial Data Infrastructure (GSDI). Available online: http:\/\/gsdiassociation.org\/images\/publications\/cookbooks\/SDI_Cookbook_GSDI_2004_ver2.pdf."},{"key":"ref_2","unstructured":"(2021, November 26). ISO 19115-1:2014-Geographic Information\u2014Metadata\u2014Part 1: Fundamentals. Available online: https:\/\/iso.statuspage.io\/#!iso:std:53798:en."},{"key":"ref_3","unstructured":"Da Silva Santos, L.B., Wilkinson, M.D., Kuzniar, A., Kaliyaperumal, R., Thompson, M., Dumontier, M., and Burger, K. (2016). FAIR data points supporting big data interoperability. Enterprise Interoperability in the Digitized and Networked Factory of the Future, ISTE."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1109\/MIS.2004.15","article-title":"Ontology-based search for interactive digital maps","volume":"19","author":"Hubner","year":"2004","journal-title":"IEEE Intell. Syst."},{"key":"ref_5","unstructured":"Larson, J., Olmos, M.A., and Pereira, M. (2006, January 20\u201322). Are geospatial catalogues reaching their goals?. Proceedings of the 9th AGILE Conference on Geographic Information Science: Shaping the Future of Geographic Information Science in Europe, Visegr\u00e1d, Hungary."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Fugazza, C., Tagliolato, P., Frigerio, L., and Carrara, P. (2017). Web-scale normalization of geospatial metadata based on semantics-aware data sources. ISPRS Int. J. Geo-Inf., 6.","DOI":"10.3390\/ijgi6110354"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1080\/14498596.2017.1397559","article-title":"A recommender geoportal for geospatial resource discovery and recommendation","volume":"64","author":"Dareshiri","year":"2019","journal-title":"J. Spat. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"33","DOI":"10.5194\/isprs-archives-XLII-4-W20-33-2019","article-title":"Fair and standard access to spatial data as the means for achieving sustainable development goals","volume":"42","author":"Ivanova","year":"2019","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.-ISPRS Arch."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Giuliani, G., Cazeaux, H., Burgi, P.Y., Poussin, C., Richard, J.P., and Chatenoux, B. (2021). SwissEnvEO: A FAIR National Environmental Data Repository for Earth Observation Open Science. Data Sci. J., 20.","DOI":"10.5334\/dsj-2021-022"},{"key":"ref_10","unstructured":"ISO 19131:2007 (2021, November 26). Geographic Information\u2014Data Product Specifications. International Organization for Standardization (ISO). Available online: https:\/\/iso.statuspage.io\/#iso:std:iso:19131:ed-1:en."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Larson, R., and Frontiera, P. (2004, January 29). Ranking and representation for geographic information retrieval. Proceedings of the Extended Abstract in SIGIR 2004 Workshop on Geographic Information Retrieval, Sheffield, UK.","DOI":"10.1145\/1008992.1009143"},{"key":"ref_12","first-page":"141","article-title":"Ontology-based semantic description model for discovery and retrieval of geospatial information","volume":"32","author":"Zhan","year":"2008","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chiang, Y.Y., Szekely, P., and Knoblock, C.A. (2013, January 4\u20135). A semantic approach to retrieving, linking, and integrating heterogeneous geospatial data. Proceedings of the Workshop on AI Problems and Approaches for Intelligent Environments and Workshop on Semantic Cities, Beijing, China.","DOI":"10.1145\/2516911.2516914"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1007\/s10707-014-0202-x","article-title":"Improving geographic information retrieval in spatial data infrastructures","volume":"18","author":"Davis","year":"2014","journal-title":"GeoInformatica"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1080\/17538947.2012.674561","article-title":"Towards geospatial semantic search: Exploiting latent semantic relations in geospatial data","volume":"7","author":"Li","year":"2014","journal-title":"Int. J. Digit. Earth"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Fugazza, C., Pepe, M., Oggioni, A., Tagliolato, P., and Carrara, P. (2018). Raising semantics-awareness in geospatial metadata management. ISPRS Int. J. Geo-Inf., 7.","DOI":"10.3390\/ijgi7090370"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1579","DOI":"10.1007\/s12145-020-00559-1","article-title":"Decentralized geospatial metadata management","volume":"14","author":"Fugazza","year":"2021","journal-title":"Earth Sci. Inform."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1515\/geo-2020-0232","article-title":"An OGC web service geospatial data semantic similarity model for improving geospatial service discovery","volume":"13","author":"Miao","year":"2021","journal-title":"Open Geosci."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"104520","DOI":"10.1016\/j.cageo.2020.104520","article-title":"Improving search ranking of geospatial data based on deep learning using user behavior data","volume":"142","author":"Li","year":"2020","journal-title":"Comput. Geosci."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Aggarwal, C.C., and Zhai, C. (2012). A Survey of Text Clustering Algorithms. Mining Text Data, Springer. Chapter A: Survey of Text Clustering Algorithms.","DOI":"10.1007\/978-1-4614-3223-4"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ma, L., and Zhang, Y. (November, January 29). Using Word2Vec to process big text data. Proceedings of the 2015 IEEE International Conference on Big Data, Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7364114"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, C., Lu, Y., Wu, J., Zhang, Y., Xia, Z., Wang, T., Yu, D., Chen, X., Liu, P., and Guo, J. (2018, January 23\u201327). LDA Meets Word2Vec: A Novel Model for Academic Abstract Clustering. Proceedings of the Companion Proceedings of the Web Conference 2018, Lyon, France.","DOI":"10.1145\/3184558.3191629"},{"key":"ref_23","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume":"2","author":"Mikolov","year":"2013","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. Available online: https:\/\/aclanthology.org\/D14-1162\/.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_26","first-page":"2227","article-title":"Deep contextualized word representations","volume":"Volume 1","author":"Peters","year":"2018","journal-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"ref_27","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1007\/s11023-020-09548-1","article-title":"GPT-3: Its nature, scope, limits, and consequences","volume":"30","author":"Floridi","year":"2020","journal-title":"Minds Mach."},{"key":"ref_29","unstructured":"Arora, S., Liang, Y., and Ma, T. A Simple But Tough-to-Beat Baseline for Sentence Embeddings. Proceedings of the International Conference on Learning Representations, Available online: https:\/\/openreview.net\/pdf?id=SyK00v5xx."},{"key":"ref_30","unstructured":"Le, Q., and Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Riemers, N., and Gurevych, I. (2019). Sentence Embeddings using Siamese BERT-Networks. arXiv.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Conneau, A., Kiela, D., Schwenk, H., Barrault, L., and Bordes, A. (2017). Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. arXiv.","DOI":"10.18653\/v1\/D17-1070"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R., Constant, N., Guajardo-Cespedes, M., Yuan, S., and Tar, C. Universal Sentence Encoder for English. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Available online: https:\/\/aclanthology.org\/D18-2029\/.","DOI":"10.18653\/v1\/D18-2029"},{"key":"ref_34","unstructured":"Kusner, M., Sun, Y., Kolkin, N., and Weinberger, K. (2015, January 6\u201311). From word embeddings to document distances. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhang, C., Tao, F., Chen, X., Shen, J., Jiang, M., Sadler, B., and Han, J. (2018). Taxogen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering. arXiv.","DOI":"10.1145\/3219819.3220064"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1016\/j.ipm.2019.02.014","article-title":"Understanding the topic evolution of scientific literatures like an evolving city: Using Google Word2Vec model and spatial autocorrelation analysis","volume":"56","author":"Hu","year":"2019","journal-title":"Inf. Process. Manag."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"102219","DOI":"10.1016\/j.ipm.2020.102219","article-title":"An integrated model for textual social media data with spatio-temporal dimensions","volume":"57","author":"Diaz","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Li, Y., Cai, J., and Wang, J. (2020, January 12\u201314). A Text Document Clustering Method Based on Weighted BERT Model. Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China.","DOI":"10.1109\/ITNEC48623.2020.9085059"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"102645","DOI":"10.1016\/j.ipm.2021.102645","article-title":"Convolutional neural encoding of online reviews for the identification of travel group type topics on TripAdvisor","volume":"58","author":"Toral","year":"2021","journal-title":"Inf. Process. Manag."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"102312","DOI":"10.1016\/j.ipm.2020.102312","article-title":"A Google Trends spatial clustering approach for a worldwide Twitter user geolocation","volume":"57","author":"Zola","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Newman, D., Hagedorn, K., Chemudugunta, C., and Smyth, P. (2007, January 18\u201323). Subject metadata enrichment using statistical topic models. Proceedings of the 7th ACM\/IEEE-CS Joint Conference on Digital Libraries, Vancouver, BC, Canada.","DOI":"10.1145\/1255175.1255248"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lacasta, J., Nogueras-Iso, J., Muro-Medrano, P.R., and Zarazaga-Soria, F.J. (2007). Thematic clustering of geographic resource metadata collections. International Symposium on Web and Wireless Geographical Information Systems, Springer.","DOI":"10.1007\/978-3-540-76925-5_3"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Thomas, R.E., and Khan, S.S. (2016, January 21\u201322). Improved clustering technique using metadata for text mining. Proceedings of the 2016 International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India.","DOI":"10.1109\/CESYS.2016.7889835"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1007\/978-981-10-8848-3_61","article-title":"Clustering the Patent Data Using K-Means Approach","volume":"Volume 731","author":"Hoda","year":"2019","journal-title":"Software Engineering. Advances in Intelligent Systems and Computing"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Rakib, M.R.H., Zeh, N., Jankowska, M., and Milios, E. (2020). Enhancement of short text clustering by iterative classification. International Conference on Applications of Natural Language to Information Systems, Springer.","DOI":"10.1007\/978-3-030-51310-8_10"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"23346","DOI":"10.1109\/ACCESS.2020.2969440","article-title":"Adaptive density-based spatial clustering for massive data analysis","volume":"8","author":"Cai","year":"2020","journal-title":"IEEE Access"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"102686","DOI":"10.1016\/j.ipm.2021.102686","article-title":"A temporally dynamic examination of research method usage in the Chinese library and information science community","volume":"58","author":"Lou","year":"2021","journal-title":"Inf. Process. Manag."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"102519","DOI":"10.1016\/j.ipm.2021.102519","article-title":"Bias-Aware Hierarchical Clustering for detecting the discriminated groups of users in recommendation systems","volume":"58","author":"Indurkhya","year":"2021","journal-title":"Inf. Process. Manag."},{"key":"ref_49","unstructured":"Ahmad, M., and Ali, A. (2021, November 26). Mapping National Spatial Data Infrastructure Initiatives. Available online: https:\/\/www.google.com\/maps\/d\/viewer?mid=1596RIb8g_n0LPyi55-N1E2PuDw4&ll=24.147211357953225%2C-86.74911452879445&z=2."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Kalantari, M., Syahrudin, S., Rajabifard, A., Subagyo, H., and Hubbard, H. (2020). Spatial Metadata Usability Evaluation. ISPRS Int. J. Geo-Inf., 9.","DOI":"10.3390\/ijgi9070463"},{"key":"ref_51","first-page":"30","article-title":"User-centric SDI: Addressing users requirements in third-generation SDI. The Example of Nature-SDIplus","volume":"10","author":"Hennig","year":"2011","journal-title":"Geoforum Perspekt."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1080\/13658816.2011.620570","article-title":"Tuning the second-generation SDI: Theoretical aspects and real use cases","volume":"26","author":"Pons","year":"2012","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1583","DOI":"10.1080\/13658816.2017.1319949","article-title":"Aggregation-based information retrieval system for geospatial data catalogs","volume":"31","author":"Lacasta","year":"2017","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Latre, M.A., Lacasta, J., Mojica-Abrego, E., Nogueras-Iso, J., and Zarazaga-Soria, F.J. (2009). An Approach to Facilitate the Integration of Hydrological Data by means of Ontologies and Multilingual Thesauri. Advances in GIScience. Lecture Notes in Geoinformation and Cartography (LNG&C), Springer.","DOI":"10.1007\/978-3-642-00318-9_8"},{"key":"ref_55","unstructured":"Ingersoll, G.S., Morton, T.S., and Farris, A.L. (2012). Taming Text: How to Find, Organize, and Manipulate It, Manning."},{"key":"ref_56","unstructured":"Porter, M.F. (2021, November 26). Snowball: A Language for Stemming Algorithms. Available online: http:\/\/snowball.tartarus.org\/texts\/introduction.html."},{"key":"ref_57","unstructured":"Cardellino, C. (2021, November 26). Spanish Billion Words Corpus and Embeddings. Available online: https:\/\/crscardellino.ar\/SBWCE\/."},{"key":"ref_58","unstructured":"Che, W., Liu, Y., Wang, Y., Zheng, B., and Liu, T. (2018). Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation. arXiv."},{"key":"ref_59","unstructured":"Hartigan, J.A. (1975). Clustering Algorithms, John Willey & Sons."},{"key":"ref_60","unstructured":"Simoudis, E., Han, J., and Fayyad, U. (1996, January 2\u20134). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA."},{"key":"ref_61","first-page":"1379","article-title":"A comparative study of various clustering algorithms in data mining","volume":"2","author":"Verma","year":"2012","journal-title":"Int. J. Eng. Res. Appl."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/0306-4573(86)90097-X","article-title":"Implementing agglomerative hierarchic clustering algorithms for use in document retrieval","volume":"22","author":"Voorhees","year":"1986","journal-title":"Inf. Process. Manag."},{"key":"ref_63","unstructured":"Eisner, J. (2007). V-measure: A conditional entropy-based external cluster evaluation measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics. Available online: https:\/\/aclanthology.org\/D07-1043\/."},{"key":"ref_64","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance","volume":"11","author":"Vinh","year":"2010","journal-title":"J. Mach. Learn. Res."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/11\/2\/87\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:08:29Z","timestamp":1760134109000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/11\/2\/87"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,26]]},"references-count":64,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["ijgi11020087"],"URL":"https:\/\/doi.org\/10.3390\/ijgi11020087","relation":{},"ISSN":["2220-9964"],"issn-type":[{"type":"electronic","value":"2220-9964"}],"subject":[],"published":{"date-parts":[[2022,1,26]]}}}