{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T17:10:58Z","timestamp":1773162658706,"version":"3.50.1"},"reference-count":27,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:00:00Z","timestamp":1700438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:p>With the popularization of big data technology, agricultural data processing systems have become more intelligent. In this study, a data processing method for farmland environmental monitoring based on improved Spark components is designed. It introduces the FAST-Join (Join critical filtering sampling partition optimization) algorithm in the Spark component for equivalence association query optimization to improve the operating efficiency of the Spark component and cluster. The experimental results show that the amount of data written and read in Shuffle by Spark optimized by the FAST-join algorithm only accounts for 0.958 and 1.384% of the original data volume on average, and the calculation speed is 202.11% faster than the original. The average data processing time and occupied memory size of the Spark cluster are reduced by 128.22 and 76.75% compared with the originals. It also compared the cluster performance of the FAST-join and Equi-join algorithms. The Spark cluster optimized by the FAST-join algorithm reduced the processing time and occupied memory size by an average of 68.74 and 37.80% compared with the Equi-join algorithm, which shows that the FAST-join algorithm can effectively improve the efficiency of inter-data table querying and cluster computing.<\/jats:p>","DOI":"10.3389\/fdata.2023.1282352","type":"journal-article","created":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T10:36:21Z","timestamp":1700822181000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Design of a data processing method for the farmland environmental monitoring based on improved Spark components"],"prefix":"10.3389","volume":"6","author":[{"given":"Ruipeng","family":"Tang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Narendra Kumar","family":"Aridas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohamad Sofian Abu","family":"Talip","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,11,20]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4842-6888-9","volume-title":"Monitoring Cloud-Native Applications: Lead Agile Operations Confidently Using Open Source Software","author":"Chakraborty","year":"2021"},{"key":"B2","first-page":"274","article-title":"\u201cResearch on the application of agricultural big data processing with hadoop and spark,\u201d","volume-title":"2019 IEEE International Conference on Artificial Intelligence and Computer Applications (I.C.A.I.C.A.), Dalian, China","author":"Cheng","year":"2019"},{"key":"B3","author":"Chunhui","year":"2015","journal-title":"Application research of big data in agricultural internet of things"},{"key":"B4","first-page":"137","article-title":"\u201cMapReduce: simplified data processing on large clusters,\u201d","author":"Dean","year":"2004","journal-title":"6th Symposium on Operating System Design and Implementation, San Francisco, U.S.A"},{"key":"B5","first-page":"429","article-title":"Big data environment for agricultural soil analysis from C.T. digital images","volume":"12","author":"Gabriel","year":"2016","journal-title":"Int. Conf. Semant. Comput."},{"key":"B6","article-title":"\u201cEquijoin optimization on Spark,\u201d","author":"Haoqiong","year":"2014","journal-title":"Journal of East China Normal University (Natural Science Edition)"},{"key":"B7","doi-asserted-by":"publisher","first-page":"2036","DOI":"10.1016\/j.matpr.2020.03.634","article-title":"Performance analysis of NoSQL and relational databases with MongoDB and MySQL","volume":"24","author":"Jose","year":"2020","journal-title":"Mater. Today"},{"key":"B8","first-page":"214","article-title":"Spark-based big data hybrid computing model","volume":"4","author":"Jun","year":"2015","journal-title":"Comput. Syst. Applic."},{"key":"B9","article-title":"I.O.T. data access system based on netty and kafka","author":"Kaicheng","year":"2020","journal-title":"Comput. Eng. Applic."},{"key":"B10","doi-asserted-by":"publisher","first-page":"6037","DOI":"10.32604\/cmc.2022.031626","article-title":"The impact of check bits on the performance of bloom filter","volume":"73","author":"Khan","year":"2022","journal-title":"Comput. Mater. Continua"},{"key":"B11","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/EITech.2016.7519585","article-title":"\u201cA profile-based Big data architecture for agricultural context,\u201d","volume-title":"2016 International Conference on Electrical and Information Technologies (ICEIT)","author":"Lamrhari","year":"2016"},{"key":"B12","author":"Leifeng","year":"2016","journal-title":"Research on key technologies of big data for agriculture"},{"key":"B13","doi-asserted-by":"publisher","first-page":"3175","DOI":"10.3390\/electronics10243175","article-title":"Smart manufacturing and tactile internet based on 5G in industry 4.0: Challenges, applications and new trends","volume":"10","author":"Mourtzis","year":"2021","journal-title":"Electronics"},{"key":"B14","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/FPL.2019.00025","article-title":"\u201cAccelerating the merge phase of sort-merge join,\u201d","volume-title":"2019 29th International Conference on Field Programmable Logic and Applications (FPL)","author":"Papaphilippou","year":"2019"},{"key":"B15","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/s11227-020-03253-7","article-title":"A Spark-based Apriori algorithm with reduced shuffle overhead","volume":"77","author":"Raj","year":"2021","journal-title":"J. Supercomput."},{"key":"B16","first-page":"17555","article-title":"Hash layers for large sparse models","volume":"34","author":"Roller","year":"2021","journal-title":"Adv. Neur. Inf. Proc. Syst."},{"key":"B17","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1007\/978-3-030-37051-0_85","article-title":"\u201cRDD-Eclat: approaches to parallelize Eclat algorithm on spark RDD framework,\u201d","volume-title":"Second International Conference on Computer Networks and Communication Technologies: ICCNCT 2019","author":"Singh","year":"2020"},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3193546","article-title":"\u201cGeoFlux: hands-off data integration leveraging join key knowledge,\u201d","author":"Song","year":"2018","journal-title":"Proceedings of the 2018 International Conference on Management of Data"},{"key":"B19","first-page":"13","article-title":"Research on 5G - oriented big data analysis method system of I.O.T","volume":"5","author":"Tao","year":"2020","journal-title":"Design Technol. Posts Telecommun."},{"key":"B20","article-title":"Research and application of hierarchical clustering algorithm based on spark","author":"Weihua","year":"2020","journal-title":"Comput. Sci. Appl."},{"key":"B21","first-page":"1","article-title":"Agricultural big data and its application prospects","volume":"43","author":"Wensheng","year":"2015","journal-title":"Jiangsu Agric. Sci."},{"key":"B22","first-page":"173","article-title":"Architecture and platform construction of agricultural big data application","volume":"41","author":"Xiangbao","year":"2014","journal-title":"Guangdong Agric. Sci"},{"key":"B23","article-title":"\u201cGlobal transaction log of multi-master cloud database,\u201d","author":"Xiaoxian","year":"2020","journal-title":"Journal of East China Normal University (Natural Science Edition)"},{"key":"B24","first-page":"10","article-title":"The technological innovation of agricultural information service in the era of big data","volume":"16","author":"Xiufeng","year":"2014","journal-title":"China Agric. Sci. Technol. Guide J."},{"key":"B25","doi-asserted-by":"publisher","first-page":"242","DOI":"10.3969\/j.issn.1000-3428.2011.12.082","article-title":"Massive agricultural data resource management platform based on Hadoop","volume":"37","author":"Yang","year":"2011","journal-title":"Comput. Eng."},{"key":"B26","article-title":"Design of intelligent grading algorithm for jelly orange based on decision tree","author":"Zhichao","year":"2022","journal-title":"Chinese Sci. Technol. Periodic. Database Indust. A 7"},{"key":"B27","article-title":"Optimization of the equivalent join process of two tables based on spark","author":"Zidong","year":"2019","journal-title":"Appl. Res. Comput"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2023.1282352\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T10:36:47Z","timestamp":1700822207000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2023.1282352\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,20]]},"references-count":27,"alternative-id":["10.3389\/fdata.2023.1282352"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2023.1282352","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,20]]},"article-number":"1282352"}}