{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T16:02:33Z","timestamp":1780761753912,"version":"3.54.1"},"reference-count":42,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T00:00:00Z","timestamp":1652140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons and the unloading of cartons in the containers. However, there is no public large-scale carton dataset for the research community to train and evaluate the carton detection models up to now, which hinders the development of carton detection. In this article, we present a large-scale carton dataset named Stacked Carton Dataset (SCD) with the goal of advancing the state-of-the-art in carton detection. Images were collected from the Internet and several warehouses, and objects were labeled for precise localization using instance mask annotation. There were a total of 250,000 instance masks from 16,136 images. Naturally, a suite of benchmarks was established with several popular detectors and instance segmentation models. In addition, we designed a carton detector based on RetinaNet by embedding our proposed Offset Prediction between the Classification and Localization module (OPCL) and the Boundary Guided Supervision module (BGS). OPCL alleviates the imbalance problem between classification and localization quality, which boosts AP by 3.1\u223c4.7% on SCD at the model level, while BGS guides the detector to pay more attention to the boundary information of cartons and decouple repeated carton textures at the task level. To demonstrate the generalization of OPCL for other datasets, we conducted extensive experiments on MS COCO and PASCAL VOC. The improvements in AP on MS COCO and PASCAL VOC were 1.8\u223c2.2% and 3.4\u223c4.3%, respectively.<\/jats:p>","DOI":"10.3390\/s22103617","type":"journal-article","created":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T21:52:11Z","timestamp":1652219531000},"page":"3617","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["SCD: A Stacked Carton Dataset for Detection and Segmentation"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0248-8041","authenticated-orcid":false,"given":"Jinrong","family":"Yang","sequence":"first","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shengkai","family":"Wu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lijun","family":"Gou","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1729-1695","authenticated-orcid":false,"given":"Hangcheng","family":"Yu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chenxi","family":"Lin","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiazhuo","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pan","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Minxuan","family":"Li","sequence":"additional","affiliation":[{"name":"Faculty of Arts and Science, Queen\u2019s University, Kingston, ON K7L 3N6, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9213-0416","authenticated-orcid":false,"given":"Xiaoping","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1016\/j.knosys.2006.08.004","article-title":"A dynamic logistics process knowledge-based system\u2014An RFID multi-agent approach","volume":"20","author":"Chow","year":"2007","journal-title":"Knowl.-Based Syst."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Scholz-Reiter, B., Echelmeyer, W., and Wellbrock, E. (2008, January 1\u20133). Development of a robot-based system for automated unloading of variable packages out of transport units and containers. Proceedings of the 2008 IEEE International Conference on Automation and Logistics, Qingdao, China.","DOI":"10.1109\/ICAL.2008.4636644"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Chiaravalli, D., Palli, G., Monica, R., Aleotti, J., and Rizzini, D.L. (2020, January 8\u201311). Integration of a Multi-Camera Vision System and Admittance Control for Robotic Industrial Depalletizing. Proceedings of the 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vienna, Austria.","DOI":"10.1109\/ETFA46521.2020.9212020"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/j.neucom.2020.01.085","article-title":"Recent advances in deep learning for object detection","volume":"396","author":"Wu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/j.neucom.2020.10.081","article-title":"Deep face recognition: A survey","volume":"429","author":"Wang","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"107272","DOI":"10.1016\/j.knosys.2021.107272","article-title":"CAM-guided Multi-Path Decoding U-Net with Triplet Feature Regularization for defect detection and segmentation","volume":"228","author":"Lin","year":"2021","journal-title":"Knowl.-Based Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"106445","DOI":"10.1016\/j.knosys.2020.106445","article-title":"A descriptive framework for the field of deep learning applications in medical images","volume":"210","author":"Tian","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft coco: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1956","DOI":"10.1007\/s11263-020-01316-z","article-title":"The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale","volume":"128","author":"Kuznetsova","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yang, J., Shi, R., and Ni, B. (2021, January 13\u201316). Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. Proceedings of the 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, France.","DOI":"10.1109\/ISBI48211.2021.9434062"},{"key":"ref_13","unstructured":"Angelova, A., Abu-Mostafam, Y., and Perona, P. (2005, January 20\u201326). Pruning training sets for learning of object categories. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_14","unstructured":"Abdelfattah, R., Wang, X., and Wang, S. (December, January 30). TTPLA: An Aerial-Image Dataset for Detection and Segmentation of Transmission Towers and Power Lines. Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","first-page":"2999","article-title":"Focal Loss for Dense Object Detection","volume":"PP","author":"Lin","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019, January 27\u201328). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-end object detection with transformers. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_20","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."},{"key":"ref_21","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., and Yan, Y. (2020, January 13\u201319). Blendmask: Top-down meets bottom-up for instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00860"},{"key":"ref_23","unstructured":"Wang, X., Zhang, R., Kong, T., Li, L., and Shen, C. (2020). SOLOv2: Dynamic, Faster and Stronger. arXiv."},{"key":"ref_24","first-page":"17721","article-title":"Solov2: Dynamic and fast instance segmentation","volume":"33","author":"Wang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., and Ouyang, W. (2019, January 15\u201320). Hybrid task cascade for instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00511"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"103911","DOI":"10.1016\/j.imavis.2020.103911","article-title":"IoU-aware single-stage object detector for accurate localization","volume":"97","author":"Wu","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Jiang, B., Luo, R., Mao, J., Xiao, T., and Jiang, Y. (2018, January 8\u201314). Acquisition of localization confidence for accurate object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_48"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wang, Y., Dayoub, F., and S\u00fcnderhauf, N. (2020). Varifocalnet: An iou-aware dense object detector. arXiv.","DOI":"10.1109\/CVPR46437.2021.00841"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., and Savarese, S. (2020, January 23\u201328). Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Glasgow, UK.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., and Berg, A.C. (2016, January 11\u201314). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Bodla, N., Singh, B., Chellappa, R., and Davis, L.S. (2017, January 22\u201329). Soft-NMS\u2013improving object detection with one line of code. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.593"},{"key":"ref_32","unstructured":"Wu, S., Yang, J., Wang, X., and Li, X. (2019). Iou-balanced loss functions for single-stage object detection. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cao, Y., Chen, K., Loy, C.C., and Lin, D. (2020, January 13\u201319). Prime sample attention in object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01160"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (2019, January 27\u201328). Yolact: Real-time instance segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00925"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., and Chen, H. (2020). Conditional convolutions for instance segmentation. arXiv.","DOI":"10.1007\/978-3-030-58452-8_17"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Xie, E., Sun, P., Song, X., Wang, W., Liu, X., Liang, D., Shen, C., and Luo, P. (2020, January 13\u201319). Polarmask: Single shot instance segmentation with polar representation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01221"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., and Zhou, X. (2020, January 13\u201319). Deep snake for real-time instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00856"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_39","unstructured":"Wada, K. (2018, April 08). labelme: Image Polygonal Annotation with Python. Available online: https:\/\/github.com\/wkentaro\/labelme."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Milletari, F., Navab, N., and Ahmadi, S.A. (2016, January 25\u201328). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.79"},{"key":"ref_41","unstructured":"Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., and Xu, J. (2019). MMDetection: Open mmlab detection toolbox and benchmark. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201323). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/10\/3617\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:08:40Z","timestamp":1760137720000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/10\/3617"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,10]]},"references-count":42,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["s22103617"],"URL":"https:\/\/doi.org\/10.3390\/s22103617","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202203.0172.v1","asserted-by":"object"}]},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,10]]}}}