{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T13:14:31Z","timestamp":1774358071795,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,5,6]],"date-time":"2020-05-06T00:00:00Z","timestamp":1588723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>One of the enduring issues of spatial origin-destination (OD) flow data analysis is the computational inefficiency or even the impossibility to handle large datasets. Despite the recent advancements in high performance computing (HPC) and the ready availability of powerful computing infrastructure, we argue that the best solutions are based on a thorough understanding of the fundamental properties of the data. This paper focuses on overcoming the computational challenge through data reduction that intelligently takes advantage of the heavy-tailed distributional property of most flow datasets. We specifically propose the classification technique of head\/tail breaks to this end. We test this approach with representative algorithms from three common method families, namely flowAMOEBA from flow clustering, Louvain from network community detection, and PageRank from network centrality algorithms. A variety of flow datasets are adopted for the experiments, including inter-city travel flows, cellphone call flows, and synthetic flows. We propose a standard evaluation framework to evaluate the applicability of not only the selected three algorithms, but any given method in a systematic way. The results prove that head\/tail breaks can significantly improve the computational capability and efficiency of flow data analyses while preserving result quality, on condition that the analysis emphasizes the \u201chead\u201d part of the dataset or the flows with high absolute values. We recommend considering this easy-to-implement data reduction technique before analyzing a large flow dataset.<\/jats:p>","DOI":"10.3390\/ijgi9050299","type":"journal-article","created":{"date-parts":[[2020,5,7]],"date-time":"2020-05-07T03:10:38Z","timestamp":1588821038000},"page":"299","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Boosting Computational Effectiveness in Big Spatial Flow Data Analysis with Intelligent Data Reduction"],"prefix":"10.3390","volume":"9","author":[{"given":"Ran","family":"Tao","sequence":"first","affiliation":[{"name":"School of Geosciences, University of South Florida, Tampa, FL 33620, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2166-3466","authenticated-orcid":false,"given":"Zhaoya","family":"Gong","sequence":"additional","affiliation":[{"name":"School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston B15 2TT, UK"}]},{"given":"Qiwei","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Architecture, Tsinghua University, Beijing 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6651-8123","authenticated-orcid":false,"given":"Jean-Claude","family":"Thill","sequence":"additional","affiliation":[{"name":"Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA"}]}],"member":"1968","published-online":{"date-parts":[[2020,5,6]]},"reference":[{"key":"ref_1","unstructured":"Farmer, C., and Oshan, T. (2017). Spatial interaction. The Geographic Information Science & Technology Body of Knowledge, Association of American Geographers. [4th Quarter 2017 ed.]."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tao, R., Depken, C., Thill, J.C., and Kashiha, M. (2017). flowHDBSCAN: A hierarchical and density-based spatial flow clustering method. Proceedings of the 3rd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics, ACM.","DOI":"10.1145\/3152178.3152189"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Batty, M. (2013). The new Science of Cities, MIT Press.","DOI":"10.7551\/mitpress\/9399.001.0001"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1111\/tgis.12042","article-title":"Discovering Spatial Interaction Communities from Mobile Phone Data","volume":"17","author":"Gao","year":"2013","journal-title":"Trans. GIS"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J. Stat. Mech. Theory Exp."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/13658810802022822","article-title":"Ranking spaces for predicting human movement in an urban environment","volume":"23","author":"Jiang","year":"2009","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Chin, W.C.B., and Wen, T.H. (2015). Geographically modified PageRank algorithms: Identifying the spatial concentration of human movement in a geospatial network. PLoS ONE, 10.","DOI":"10.1371\/journal.pone.0139509"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Xing, W., and Ghorbani, A. (2004, January 21\u201321). Weighted pagerank algorithm. Proceedings of the Second Annual Conference on Communication Networks and Services Research, Fredericton, NB, Canada.","DOI":"10.1109\/DNSR.2004.1344743"},{"key":"ref_9","first-page":"1","article-title":"Scale-free networks are rare","volume":"10","author":"Broido","year":"2018","journal-title":"Nat. Commun."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/gean.12161","article-title":"flowAMOEBA: Identifying Regions of Anomalous Spatial Interactions","volume":"51","author":"Tao","year":"2019","journal-title":"Geogr. Anal."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1137\/070710111","article-title":"Power-law distributions in empirical data","volume":"51","author":"Clauset","year":"2009","journal-title":"SIAM Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1007\/s10110-003-0189-4","article-title":"Spatial interaction modelling","volume":"83","author":"Roy","year":"2003","journal-title":"Pap. Reg. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"482","DOI":"10.1080\/00330124.2012.700499","article-title":"Head\/Tail Breaks: A New Classification Scheme for Data with a Heavy-Tailed Distribution","volume":"65","author":"Jiang","year":"2013","journal-title":"Prof. Geogr."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.cities.2014.11.013","article-title":"Head\/tail breaks for visualization of city structure and dynamics","volume":"43","author":"Jiang","year":"2015","journal-title":"Cities"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Jiang, B. (2019). A recursive definition of goodness of space for bridging the concepts of space and place for sustainability. Sustain. Switz., 11.","DOI":"10.3390\/su11154091"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.physa.2015.02.029","article-title":"Defining least community as a homogeneous group in complex networks","volume":"428","author":"Jiang","year":"2015","journal-title":"Phys. Stat. Mech. Its Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.landurbplan.2017.05.008","article-title":"Understanding uneven urban expansion with natural cities using open data","volume":"177","author":"Long","year":"2018","journal-title":"Landsc. Urban Plan."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gong, Z., Ma, Q., Kan, C., and Qi, Q. (2019). Classifying Street Spaces with Street View Images for a Spatial Indicator of Urban Functions. Sustainability, 11.","DOI":"10.3390\/su11226424"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1559\/152304087783875273","article-title":"Experiments in migration mapping by computer","volume":"14","author":"Tobler","year":"1987","journal-title":"Am. Cartogr."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1007\/s41019-016-0022-0","article-title":"Big Data Reduction Methods: A Survey","volume":"1","author":"Liew","year":"2016","journal-title":"Data Sci. Eng."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1007\/s10115-017-1059-8","article-title":"Recent advances in feature selection and its applications","volume":"53","author":"Li","year":"2017","journal-title":"Knowl. Inf. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.knosys.2016.05.056","article-title":"Instance selection of linear complexity for big data","volume":"107","year":"2016","journal-title":"Knowl.-Based Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1401","DOI":"10.3233\/JIFS-169137","article-title":"Learning from examples with data reduction and stacked generalization","volume":"32","author":"Czarnowski","year":"2017","journal-title":"J. Intell. Fuzzy Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1007\/s10462-010-9165-y","article-title":"A review of instance selection methods","volume":"34","author":"Kittler","year":"2010","journal-title":"Artif. Intell. Rev."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1111\/j.1538-4632.1992.tb00261.x","article-title":"The Analysis of Spatial Association by Use of Distance Statistics","volume":"24","author":"Getis","year":"1992","journal-title":"Geogr. Anal."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1111\/j.1538-4632.1995.tb00912.x","article-title":"Local Spatial Autocorrelation Statistics: Distributional Issues and an Application","volume":"27","author":"Ord","year":"1995","journal-title":"Geogr. Anal."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1707","DOI":"10.1080\/13658816.2011.645477","article-title":"Developing a parallel computational implementation of AMOEBA","volume":"26","author":"Widener","year":"2012","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1111\/j.1538-4632.2006.00689.x","article-title":"Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters","volume":"38","author":"Aldstadt","year":"2006","journal-title":"Geogr. Anal."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Amdahl, G.M. (1967, January 18\u201320). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the AFIPS Spring Joint Computer Conference, Atlantic City, NJ, USA.","DOI":"10.1145\/1465482.1465560"},{"key":"ref_30","first-page":"107","article-title":"The anatomy of a large-scale hypertextual Web search engine","volume":"30","author":"Page","year":"1998","journal-title":"Comput. Netw."},{"key":"ref_31","unstructured":"Zipf, G.K. (1932). Selected Studies of the Principle of Relative Frequency in Language, Harvard Univ. Press."}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/9\/5\/299\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:26:00Z","timestamp":1760174760000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/9\/5\/299"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,6]]},"references-count":31,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["ijgi9050299"],"URL":"https:\/\/doi.org\/10.3390\/ijgi9050299","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,6]]}}}