{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T17:14:28Z","timestamp":1777050868146,"version":"3.51.4"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,8,12]],"date-time":"2020-08-12T00:00:00Z","timestamp":1597190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1838222,CNS-1924694"],"award-info":[{"award-number":["IIS-1838222,CNS-1924694"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100005825","name":"National Institute of Food and Agriculture","doi-asserted-by":"publisher","award":["A1521"],"award-info":[{"award-number":["A1521"]}],"id":[{"id":"10.13039\/100005825","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Spatial Algorithms Syst."],"published-print":{"date-parts":[[2021,3,31]]},"abstract":"<jats:p>This article explores the use of deep learning to choose an appropriate spatial partitioning technique for big data. The exponential increase in the volumes of spatial datasets resulted in the development of big spatial data frameworks. These systems need to partition the data across machines to be able to scale out the computation. Unfortunately, there is no current method to automatically choose an appropriate partitioning technique based on the input data distribution.<\/jats:p>\n          <jats:p>This article addresses this problem by using deep learning to train a model that captures the relationship between the data distribution and the quality of the partitioning techniques. We propose a solution that runs in two phases, training and application. The offline training phase generates synthetic data based on diverse distributions, partitions them using six different partitioning techniques, and measures their quality using four quality metrics. At the same time, it summarizes the datasets using a histogram and well-designed skewness measures. The data summaries and the quality metrics are then use to train a deep learning model. The second phase uses this model to predict the best partitioning technique given a new dataset that needs to be partitioned. We run an extensive experimental evaluation on big spatial data, and we experimentally show the applicability of the proposed technique. We show that the proposed model outperforms the baseline method in terms of accuracy for choosing the best partitioning technique by only analyzing the summary of the datasets.<\/jats:p>","DOI":"10.1145\/3402126","type":"journal-article","created":{"date-parts":[[2020,8,12]],"date-time":"2020-08-12T16:29:28Z","timestamp":1597249768000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Using Deep Learning for Big Spatial Data Partitioning"],"prefix":"10.1145","volume":"7","author":[{"given":"Tin","family":"Vu","sequence":"first","affiliation":[{"name":"University of California, Riverside"}]},{"given":"Alberto","family":"Belussi","sequence":"additional","affiliation":[{"name":"University of Verona, Verona VR, Italy"}]},{"given":"Sara","family":"Migliorini","sequence":"additional","affiliation":[{"name":"University of Verona, Verona VR, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6584-1455","authenticated-orcid":false,"given":"Ahmed","family":"Eldway","sequence":"additional","affiliation":[{"name":"University of California, Riverside"}]}],"member":"320","published-online":{"date-parts":[[2020,8,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2000.839399"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831361"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/93597.98741"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559929"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/279339.279342"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274895.3274923"},{"key":"e_1_2_1_7_1","volume-title":"Cost estimation of spatial join in SpatialHadoop. GeoInformatica","author":"Belussi Alberto","year":"2020"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.3390\/ijgi9040201"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3080546.3080553"},{"key":"e_1_2_1_10_1","unstructured":"Graham Cormode. 2011. Sketch techniques for approximate query processing. In Foundations and Trends in Databases. NOW 7.  Graham Cormode. 2011. Sketch techniques for approximate query processing. In Foundations and Trends in Databases. NOW 7."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824057"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2525314.2525349"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 31st IEEE International Conference on Data Engineering \u201915)","author":"Eldawy Ahmed"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000054"},{"key":"e_1_2_1_15_1","volume-title":"Spatial Join with Hadoop. Springer International Publishing","author":"Eldawy A."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498274"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733085.2733096"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377000.3377005"},{"key":"e_1_2_1_19_1","volume-title":"Deep Learning with Keras","author":"\u00a0al Antonio Gulli"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/253262.253274"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI\u201919)","author":"Hu Kevin Zeng"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733085.2733087"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336674"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/37.1-2.17"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 23rd International Conference on Very Large Data Bases. Morgan Kaufmann","author":"Poosala Viswanath"},{"key":"e_1_2_1_27_1","unstructured":"Sebastian Raschka. 2018. Model Evaluation Model Selection and Algorithm\u00a0Selection in Machine Learning. arXiv:cs.LG\/1811.12808  Sebastian Raschka. 2018. Model Evaluation Model Selection and Algorithm\u00a0Selection in Machine Learning. arXiv:cs.LG\/1811.12808"},{"key":"e_1_2_1_28_1","volume-title":"GeoFlink: A framework for the real-time processing of spatial streams. arXiv preprint arXiv:2004.03352 1, 1","author":"Shaikh Salman Ahmed","year":"2020"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9006498"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3275464"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342635"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"M. Tang Y. Yu W. G. Aref A. R. Mahmood Q. M. Malluhi and M. Ouzzani. 2019. LocationSpark: In-memory distributed spatial query processing and optimization. CoRR.  M. Tang Y. Yu W. G. Aref A. R. Mahmood Q. M. Malluhi and M. Ouzzani. 2019. LocationSpark: In-memory distributed spatial query processing and optimization. CoRR.","DOI":"10.3389\/fdata.2020.00030"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2666310.2666365"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300104"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274895.3274984"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 1st ACM SIGSPATIAL International Workshop on Spatial Gems (SpatialGems","author":"Vu Tin","year":"2019"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2487976"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915237"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2820783.2820860"}],"container-title":["ACM Transactions on Spatial Algorithms and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3402126","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3402126","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3402126","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:03:13Z","timestamp":1750197793000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3402126"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,12]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,3,31]]}},"alternative-id":["10.1145\/3402126"],"URL":"https:\/\/doi.org\/10.1145\/3402126","relation":{},"ISSN":["2374-0353","2374-0361"],"issn-type":[{"value":"2374-0353","type":"print"},{"value":"2374-0361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,12]]},"assertion":[{"value":"2019-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}