{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T13:43:49Z","timestamp":1775569429231,"version":"3.50.1"},"reference-count":59,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2019,10,22]],"date-time":"2019-10-22T00:00:00Z","timestamp":1571702400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Autonomous harvesting shows a promising prospect in the future development of the agriculture industry, while the vision system is one of the most challenging components in the autonomous harvesting technologies. This work proposes a multi-function network to perform the real-time detection and semantic segmentation of apples and branches in orchard environments by using the visual sensor. The developed detection and segmentation network utilises the atrous spatial pyramid pooling and the gate feature pyramid network to enhance feature extraction ability of the network. To improve the real-time computation performance of the network model, a lightweight backbone network based on the residual network architecture is developed. From the experimental results, the detection and segmentation network with ResNet-101 backbone outperformed on the detection and segmentation tasks, achieving an     F 1     score of 0.832 on the detection of apples and 87.6% and 77.2% on the semantic segmentation of apples and branches, respectively. The network model with lightweight backbone showed the best computation efficiency in the results. It achieved an     F 1     score of 0.827 on the detection of apples and 86.5% and 75.7% on the segmentation of apples and branches, respectively. The weights size and computation time of the network model with lightweight backbone were 12.8 M and 32 ms, respectively. The experimental results show that the detection and segmentation network can effectively perform the real-time detection and segmentation of apples and branches in orchards.<\/jats:p>","DOI":"10.3390\/s19204599","type":"journal-article","created":{"date-parts":[[2019,10,23]],"date-time":"2019-10-23T11:46:59Z","timestamp":1571831219000},"page":"4599","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":118,"title":["Fruit Detection and Segmentation for Apple Harvesting Using Visual Sensor in Orchards"],"prefix":"10.3390","volume":"19","author":[{"given":"Hanwen","family":"Kang","sequence":"first","affiliation":[{"name":"Laboratory of Motion Generation and Analysis, Faculty of Engineering, Monash University, Clayton, VIC 3800, Australia"}]},{"given":"Chao","family":"Chen","sequence":"additional","affiliation":[{"name":"Laboratory of Motion Generation and Analysis, Faculty of Engineering, Monash University, Clayton, VIC 3800, Australia"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,22]]},"reference":[{"key":"ref_1","unstructured":"ABARES (2018). Australian Vegetable Growing Farms: An Economic Survey, 2016\u201317 and 2017\u201318."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.compag.2018.02.016","article-title":"Deep learning in agriculture: A survey","volume":"147","author":"Kamilaris","year":"2018","journal-title":"Comput. Electron. Agric."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1002\/rob.21709","article-title":"Performance evaluation of a harvesting robot for sweet pepper","volume":"34","author":"Bac","year":"2017","journal-title":"J. Field Robot."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1016\/j.compag.2016.07.024","article-title":"Characterizing apple picking patterns for robotic harvesting","volume":"127","author":"Li","year":"2016","journal-title":"Comput. Electron. Agric."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lin, G., Tang, Y., Zou, X., Xiong, J., and Li, J. (2019). Guava detection and pose estimation using a low-cost RGB-D sensor in the field. Sensors, 19.","DOI":"10.3390\/s19020428"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Vit, A., and Shani, G. (2018). Comparing RGB-D Sensors for Close Range Outdoor Agricultural Phenotyping. Sensors, 18.","DOI":"10.20944\/preprints201810.0664.v1"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/j.compag.2016.06.022","article-title":"A review of key techniques of vision-based control for harvesting robot","volume":"127","author":"Zhao","year":"2016","journal-title":"Comput. Electron. Agric."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Khalid, S., Khalil, T., and Nasreen, S. (2014, January 27\u201329). A survey of feature selection and feature extraction techniques in machine learning. Proceedings of the 2014 Science and Information Conference, London, UK.","DOI":"10.1109\/SAI.2014.6918213"},{"key":"ref_9","first-page":"506","article-title":"PCA-SIFT: A more distinctive representation for local image descriptors","volume":"4","author":"Ke","year":"2004","journal-title":"CVPR (2)"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Pass, G., Zabih, R., and Miller, J. (1996, January 18\u201322). Comparing Images Using Color Coherence Vectors. Proceedings of the Fourth ACM International Conference on Multimedia, Boston, MA, USA.","DOI":"10.1145\/244130.244148"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2037","DOI":"10.1109\/TPAMI.2006.244","article-title":"Face description with local binary patterns: Application to face recognition","volume":"28","author":"Ahonen","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1007\/s11119-012-9269-2","article-title":"Using colour features of cv.\u2018Gala\u2019apple fruits in an orchard in image processing to predict yield","volume":"13","author":"Zhou","year":"2012","journal-title":"Precis. Agric."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.biosystemseng.2013.12.008","article-title":"Automatic fruit recognition and counting from multiple images","volume":"118","author":"Song","year":"2014","journal-title":"Biosyst. Eng."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Luo, L., Tang, Y., Zou, X., Wang, C., Zhang, P., and Feng, W. (2016). Robust grape cluster detection in a vineyard by combining the AdaBoost framework and multiple color components. Sensors, 16.","DOI":"10.3390\/s16122098"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1062","DOI":"10.1007\/s11119-018-9574-5","article-title":"Detection and counting of immature green citrus fruit based on the local binary patterns (lbp) feature using illumination-normalized images","volume":"19","author":"Wang","year":"2018","journal-title":"Precis. Agric."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.neucom.2016.12.038","article-title":"A survey of deep neural network architectures and their applications","volume":"234","author":"Liu","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/s13735-017-0141-z","article-title":"A review of semantic segmentation using deep neural networks","volume":"7","author":"Guo","year":"2018","journal-title":"Int. J. Multimed. Inf. Retr."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1016\/j.inffus.2017.10.006","article-title":"A survey on deep learning for big data","volume":"42","author":"Zhang","year":"2018","journal-title":"Inf. Fusion"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_22","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., and McCool, C. (2016). Deepfruits: A fruit detection system using deep neural networks. Sensors, 16.","DOI":"10.3390\/s16081222"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Bargoti, S., and Underwood, J. (June, January 29). Deep fruit detection in orchards. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989417"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"104846","DOI":"10.1016\/j.compag.2019.06.001","article-title":"Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN","volume":"163","author":"Yu","year":"2019","journal-title":"Comput. Electron. Agric."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). Ssd: Single shot multibox detector. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_28","unstructured":"Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). Dssd: Deconvolutional single shot detector. arXiv."},{"key":"ref_29","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_31","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.compag.2019.01.012","article-title":"Apple detection during different growth stages in orchards using the improved YOLO-V3 model","volume":"157","author":"Tian","year":"2019","journal-title":"Comput. Electron. Agric."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1007\/s11119-019-09642-0","article-title":"Deep learning for real-time fruit detection and orchard fruit load estimation: Benchmarking of \u2018MangoYOLO\u2019","volume":"20","author":"Koirala","year":"2019","journal-title":"Precis. Agric."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Milletari, F., Navab, N., and Ahmadi, S.A. (2016, January 25\u201328). V-net: Fully convolutional neural networks for volumetric medical image segmentation. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.79"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_39","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"McCool, C., Sa, I., Dayoub, F., Lehnert, C., Perez, T., and Upcroft, B. (2016, January 16\u201321). Visual detection of occluded crop: For automated harvesting. Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden.","DOI":"10.1109\/ICRA.2016.7487405"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1002\/rob.21699","article-title":"Image segmentation for fruit detection and yield estimation in apple orchards","volume":"34","author":"Bargoti","year":"2017","journal-title":"J. Field Robot."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"053028","DOI":"10.1117\/1.JEI.26.5.053028","article-title":"DeepCotton: In-field cotton segmentation using deep fully convolutional network","volume":"26","author":"Li","year":"2017","journal-title":"J. Electron. Imaging"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014, January 6\u201312). Visualizing and understanding convolutional networks. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_46","unstructured":"Yao, J., Yu, Z., Yu, J., and Tao, D. (2019). Single Pixel Reconstruction for One-stage Instance Segmentation. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Cho, K., Van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_50","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Wu, Y., and He, K. (2018, January 8\u201314). Group normalization. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_1"},{"key":"ref_52","unstructured":"Silberman, N., and Guadarrama, S. (2019, May 21). TensorFlow-Slim Image Classification Model Library. Available online: https:\/\/github.com\/tensorflow\/models\/tree\/master\/research\/slim."},{"key":"ref_53","unstructured":"(2019, January 17). Tensorflow-yolo-v3. Available online: https:\/\/github.com\/mystic123\/tensorflow-yolo-v3."},{"key":"ref_54","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2018, December 29). Py-Faster-Rcnn. Available online: https:\/\/github.com\/rbgirshick\/py-faster-rcnn."},{"key":"ref_55","unstructured":"(2018, March 14). TF Image Segmentation: Image Segmentation Framework. Available online: https:\/\/github.com\/warmspringwinds\/tf-image-segmentation."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017). A review on deep learning techniques applied to semantic segmentation. arXiv.","DOI":"10.1016\/j.asoc.2018.05.018"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_58","first-page":"37","article-title":"Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation","volume":"2","author":"Powers","year":"2011","journal-title":"J. Mach. Learn. Technol."},{"key":"ref_59","unstructured":"Wang, Q., and Zhang, Q. (2013, January 21\u201324). Three-dimensional reconstruction of a dormant tree using rgb-d cameras. Proceedings of the 2013 Kansas City, Kansas City, MI, USA."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/20\/4599\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:28:28Z","timestamp":1760189308000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/20\/4599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,22]]},"references-count":59,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2019,10]]}},"alternative-id":["s19204599"],"URL":"https:\/\/doi.org\/10.3390\/s19204599","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,22]]}}}