{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T06:49:12Z","timestamp":1747637352080,"version":"3.37.3"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T00:00:00Z","timestamp":1683763200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T00:00:00Z","timestamp":1683763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100007691","name":"Universidade da Beira Interior","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100007691","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Pattern Anal Applic"],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Object pose estimation has multiple important applications, such as robotic grasping and augmented reality. We present a new method to estimate the 6D pose of objects that improves upon the accuracy of current proposals and can still be used in real-time. Our method uses RGB-D data as input to segment objects and estimate their pose. It uses a neural network with multiple heads to identify the objects in the scene, generate the appropriate masks and estimate the values of the translation vectors and the quaternion that represents the objects\u2019 rotation. These heads leverage a pyramid architecture used during feature extraction and feature fusion. We conduct an empirical evaluation using the two most common datasets in the area, and compare against state-of-the-art approaches, illustrating the capabilities of MPF6D. Our method can be used in real-time with its low inference time and high accuracy.<\/jats:p>","DOI":"10.1007\/s10044-023-01165-9","type":"journal-article","created":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T19:02:30Z","timestamp":1683831750000},"page":"1363-1373","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["MPF6D: masked pyramid fusion 6D pose estimation"],"prefix":"10.1007","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7177-751X","authenticated-orcid":false,"given":"Nuno","family":"Pereira","sequence":"first","affiliation":[]},{"given":"Lu\u00eds A.","family":"Alexandre","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,11]]},"reference":[{"key":"1165_CR1","doi-asserted-by":"publisher","unstructured":"Pereira N, Alexandre LA (2020) MaskedFusion: mask-based 6D object pose estimation. In: 19th IEEE international conference on machine learning and applications (ICMLA), pp 71\u201378. https:\/\/doi.org\/10.1109\/ICMLA51294.2020.00021","DOI":"10.1109\/ICMLA51294.2020.00021"},{"key":"1165_CR2","doi-asserted-by":"crossref","unstructured":"Wang C, Xu D, Zhu Y, Mart\u00edn-Mart\u00edn R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3343\u20133352","DOI":"10.1109\/CVPR.2019.00346"},{"key":"1165_CR3","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234\u2013241. Springer","DOI":"10.1007\/978-3-319-24574-4_28"},{"issue":"12","key":"1165_CR4","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481\u20132495","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1165_CR5","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097\u20131105"},{"key":"1165_CR6","unstructured":"Wu H, Zhang J, Huang K, Liang K, Yu Y (2019) FastFCN: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816"},{"key":"1165_CR7","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881\u20132890","DOI":"10.1109\/CVPR.2017.660"},{"key":"1165_CR8","doi-asserted-by":"crossref","unstructured":"Takikawa T, Acuna D, Jampani V, Fidler S (2019) Gated-SCNN: Gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 5229\u20135238","DOI":"10.1109\/ICCV.2019.00533"},{"key":"1165_CR9","unstructured":"Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587"},{"key":"1165_CR10","doi-asserted-by":"crossref","unstructured":"Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3D object coordinates. In: European conference on computer vision, Springer, pp 536\u2013551","DOI":"10.1007\/978-3-319-10605-2_35"},{"key":"1165_CR11","doi-asserted-by":"crossref","unstructured":"Pavlakos G, Zhou X, Chan A, Derpanis KG, Daniilidis K (2017) 6-DoF object pose from semantic keypoints. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE, pp 2011\u20132018","DOI":"10.1109\/ICRA.2017.7989233"},{"key":"1165_CR12","doi-asserted-by":"crossref","unstructured":"Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 292\u2013301","DOI":"10.1109\/CVPR.2018.00038"},{"key":"1165_CR13","unstructured":"Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. In: Conference on robot learning, pp 306\u2013316"},{"key":"1165_CR14","doi-asserted-by":"crossref","unstructured":"Peng S, Liu Y, Huang Q, Zhou X, Bao H (2019) PVNet: pixel-wise voting network for 6dof pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4561\u20134570","DOI":"10.1109\/CVPR.2019.00469"},{"issue":"6","key":"1165_CR15","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1145\/358669.358692","volume":"24","author":"MA Fischler","year":"1981","unstructured":"Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381\u2013395","journal-title":"Commun ACM"},{"key":"1165_CR16","doi-asserted-by":"crossref","unstructured":"Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: European conference on computer vision, Springer, pp 205\u2013220","DOI":"10.1007\/978-3-319-46487-9_13"},{"key":"1165_CR17","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1016\/j.patrec.2019.08.016","volume":"128","author":"D Li","year":"2019","unstructured":"Li D, Liu N, Guo Y, Wang X, Xu J (2019) 3D object recognition and pose estimation for random bin-picking using partition viewpoint feature histograms. Pattern Recogn Lett 128:148\u2013154","journal-title":"Pattern Recogn Lett"},{"key":"1165_CR18","doi-asserted-by":"crossref","unstructured":"Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 918\u2013927","DOI":"10.1109\/CVPR.2018.00102"},{"key":"1165_CR19","unstructured":"Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652\u2013660"},{"key":"1165_CR20","doi-asserted-by":"crossref","unstructured":"Rusu RB, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: Proceedings of the 23rd IEEE\/RSJ international conference on intelligent robots and systems (IROS), Taipei, Taiwan","DOI":"10.1109\/IROS.2010.5651280"},{"key":"1165_CR21","doi-asserted-by":"crossref","unstructured":"Zhou Y, Tuzel O (2018) VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4490\u20134499","DOI":"10.1109\/CVPR.2018.00472"},{"key":"1165_CR22","doi-asserted-by":"crossref","unstructured":"Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD-6D: Making RGB-Based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE international conference on computer vision, pp 1521\u20131529","DOI":"10.1109\/ICCV.2017.169"},{"key":"1165_CR23","doi-asserted-by":"crossref","unstructured":"Li C, Bai J, Hager GD (2018) A unified framework for multi-view multi-class object pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 254\u2013269","DOI":"10.1007\/978-3-030-01270-0_16"},{"key":"1165_CR24","doi-asserted-by":"crossref","unstructured":"Xiang Y, Schmidt T, Narayanan V, Fox D (2017) PoseCNN: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199","DOI":"10.15607\/RSS.2018.XIV.019"},{"key":"1165_CR25","doi-asserted-by":"crossref","unstructured":"Xu D, Anguelov D, Jain A (2018) Pointfusion: deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 244\u2013253","DOI":"10.1109\/CVPR.2018.00033"},{"key":"1165_CR26","doi-asserted-by":"crossref","unstructured":"He Y, Sun W, Huang H, Liu J, Fan H, Sun J (2020) PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR42600.2020.01165"},{"issue":"1","key":"1165_CR27","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1109\/TPAMI.2017.2665623","volume":"40","author":"A Tejani","year":"2018","unstructured":"Tejani A, Kouskouridas R, Doumanoglou A, Tang D, Kim T (2018) Latent-class Hough forests for 6 DoF object pose estimation. IEEE Trans Pattern Anal Mach Intell 40(1):119\u2013132. https:\/\/doi.org\/10.1109\/TPAMI.2017.2665623","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1165_CR28","doi-asserted-by":"crossref","unstructured":"Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 International conference on computer vision, IEEE, pp 858\u2013865","DOI":"10.1109\/ICCV.2011.6126326"},{"key":"1165_CR29","doi-asserted-by":"crossref","unstructured":"Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The YCB object and model set: Towards common benchmarks for manipulation research. In: 2015 International conference on advanced robotics (ICAR), IEEE, pp 510\u2013517","DOI":"10.1109\/ICAR.2015.7251504"}],"container-title":["Pattern Analysis and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-023-01165-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10044-023-01165-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-023-01165-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,22]],"date-time":"2023-07-22T14:07:51Z","timestamp":1690034871000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10044-023-01165-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,11]]},"references-count":29,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["1165"],"URL":"https:\/\/doi.org\/10.1007\/s10044-023-01165-9","relation":{},"ISSN":["1433-7541","1433-755X"],"issn-type":[{"type":"print","value":"1433-7541"},{"type":"electronic","value":"1433-755X"}],"subject":[],"published":{"date-parts":[[2023,5,11]]},"assertion":[{"value":"10 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}