{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T18:34:35Z","timestamp":1772303675465,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2021,10,14]],"date-time":"2021-10-14T00:00:00Z","timestamp":1634169600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,14]],"date-time":"2021-10-14T00:00:00Z","timestamp":1634169600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100007219","name":"Natural Science Foundation of Shanghai","doi-asserted-by":"publisher","award":["19ZR1435900"],"award-info":[{"award-number":["19ZR1435900"]}],"id":[{"id":"10.13039\/100007219","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The existing view-based 3D object classification and recognition methods ignore the inherent hierarchical correlation and distinguishability of views, making it difficult to further improve the classification accuracy. In order to solve this problem, this paper proposes an end-to-end multi-view dual attention network framework for high-precision recognition of 3D objects. On one hand, we obtain three feature layers of query, key, and value through the convolution layer. The spatial attention matrix is generated by the key-value pairs of query and key, and each feature in the value of the original feature space branch is assigned different importance, which clearly captures the prominent detail features in the view, generates the view space shape descriptor, and focuses on the detail part of the view with the feature of category discrimination. On the other hand, a channel attention vector is obtained by compressing the channel information in different views, and the attention weight of each view feature is scaled to find the correlation between the target views and focus on the view with important features in all views. Integrating the two feature descriptors together to generate global shape descriptors of the 3D model, which has a stronger response to the distinguishing features of the object model and can be used for high-precision 3D object recognition. The proposed method achieves an overall accuracy of 96.6% and an average accuracy of 95.5% on the open-source ModelNet40 dataset, compiled by Princeton University when using Resnet50 as the basic CNN model. Compared with the existing deep learning methods, the experimental results demonstrate that the proposed method achieves state-of-the-art performance in the 3D object classification accuracy.<\/jats:p>","DOI":"10.1007\/s00521-021-06588-1","type":"journal-article","created":{"date-parts":[[2021,10,14]],"date-time":"2021-10-14T14:23:55Z","timestamp":1634221435000},"page":"3201-3212","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":30,"title":["Multi-view dual attention network for 3D object recognition"],"prefix":"10.1007","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8549-4710","authenticated-orcid":false,"given":"Wenju","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8563-5640","authenticated-orcid":false,"given":"Yu","family":"Cai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6716-9009","authenticated-orcid":false,"given":"Tao","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,10,14]]},"reference":[{"key":"6588_CR1","doi-asserted-by":"crossref","unstructured":"Grenzd\u00f6rffer T, G\u00fcnther M, Hertzberg J (2020) YCB-M: a multi-camera RGB-D dataset for object recognition and 6doF pose estimation. In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 3650\u20133656","DOI":"10.1109\/ICRA40945.2020.9197426"},{"key":"6588_CR2","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1016\/j.isprsjprs.2020.11.011","volume":"171","author":"D Yu","year":"2021","unstructured":"Yu D, Ji S, Liu J et al (2021) Automatic 3D building reconstruction from multi-view aerial images with deep learning. ISPRS J Photogramm Remote Sens 171:155\u2013170","journal-title":"ISPRS J Photogramm Remote Sens"},{"key":"6588_CR3","doi-asserted-by":"crossref","unstructured":"K\u00e4stner L, Frasineanu V C, Lambrecht J (2020) A 3d-deep-learning-based augmented reality calibration method for robotic environments using depth sensor data. In: IEEE international conference on robotics and automation (ICRA). IEEE, pp 1135\u20131141","DOI":"10.1109\/ICRA40945.2020.9197155"},{"key":"6588_CR4","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/1314598","author":"SB Adikari","year":"2020","unstructured":"Adikari SB, Ganegoda NC, Meegama RGN et al (2020) Applicability of a single depth sensor in real-time 3D clothes simulation: augmented reality virtual dressing room using kinect sensor. Adv Hum Comput Interact. https:\/\/doi.org\/10.1155\/2020\/1314598","journal-title":"Adv Hum Comput Interact"},{"key":"6588_CR5","doi-asserted-by":"publisher","first-page":"107220","DOI":"10.1016\/j.measurement.2019.107220","volume":"151","author":"Y Yang","year":"2020","unstructured":"Yang Y, Fang H, Fang Y et al (2020) Three-dimensional point cloud data subtle feature extraction algorithm for laser scanning measurement of large-scale irregular surface in reverse engineering. Measurement 151:107220","journal-title":"Measurement"},{"issue":"5","key":"6588_CR6","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.3390\/s20051291","volume":"20","author":"CH Liu","year":"2020","unstructured":"Liu CH, Lee P, Chen YL et al (2020) Study of postural stability features by using kinect depth sensors to assess body joint coordination patterns. Sensors 20(5):1291","journal-title":"Sensors"},{"key":"6588_CR7","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint. https:\/\/arxiv.org\/abs\/1409.1556"},{"key":"6588_CR8","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1\u20139","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"6588_CR9","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"6588_CR10","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700\u20134708","DOI":"10.1109\/CVPR.2017.243"},{"key":"6588_CR11","doi-asserted-by":"crossref","unstructured":"Su H, Maji S, Kalogerakis E et al (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945\u2013953","DOI":"10.1109\/ICCV.2015.114"},{"issue":"6","key":"6588_CR12","doi-asserted-by":"publisher","first-page":"549","DOI":"10.18280\/ts.360610","volume":"36","author":"E \u00d6zbay","year":"2019","unstructured":"\u00d6zbay E, \u00c7inar A (2019) A comparative study of object classification methods using 3D zernike moment on 3D point clouds. Traitement du Signal 36(6):549\u2013555","journal-title":"Traitement du Signal"},{"issue":"12","key":"6588_CR13","doi-asserted-by":"publisher","first-page":"4206","DOI":"10.3390\/s18124206","volume":"18","author":"Q Li","year":"2018","unstructured":"Li Q, Cheng X (2018) Comparison of different feature sets for Tls point cloud classification. Sensors 18(12):4206","journal-title":"Sensors"},{"issue":"5","key":"6588_CR14","doi-asserted-by":"publisher","first-page":"737","DOI":"10.1007\/s12541-019-00102-3","volume":"20","author":"C Chen","year":"2019","unstructured":"Chen C, Li X, Belkacem AN et al (2019) The mixed kernel function SVM-based point cloud classification. Int J Precis Eng Manuf 20(5):737\u2013747","journal-title":"Int J Precis Eng Manuf"},{"key":"6588_CR15","doi-asserted-by":"crossref","unstructured":"Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 922\u2013928","DOI":"10.1109\/IROS.2015.7353481"},{"key":"6588_CR16","unstructured":"Wu Z, Song S, Khosla A et al (2015) 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912\u20131920"},{"key":"6588_CR17","doi-asserted-by":"crossref","unstructured":"Riegler G, Osman Ulusoy A, Geiger A (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3577\u20133586","DOI":"10.1109\/CVPR.2017.701"},{"key":"6588_CR18","doi-asserted-by":"crossref","unstructured":"Le T, Duan Y (2018) Pointgrid: a deep network for 3d shape understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9204\u20139214","DOI":"10.1109\/CVPR.2018.00959"},{"key":"6588_CR19","unstructured":"Qi C R, Su H, Mo K et al (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In: 30th IEEE conference on computer vision and pattern recognition, CVPR, pp 77\u201385"},{"key":"6588_CR20","unstructured":"Qi C R, Yi L, Su H et al (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: 31st Annual conference on neural information processing systems, NIPS 2017, pp 5100\u20135109"},{"key":"6588_CR21","unstructured":"Achlioptas P, Diamanti O, Mitliagkas I et al (2018) Learning representations and generative models for 3d point clouds. In: International conference on machine learning, pp 40\u201349"},{"key":"6588_CR22","doi-asserted-by":"crossref","unstructured":"Joseph-Rivlin M, Zvirin A, Kimmel R (2019) Momen(e)t: flavor the moments in learning to classify shapes. In: Proceedings of the IEEE\/CVF international conference on computer vision workshops","DOI":"10.1109\/ICCVW.2019.00503"},{"key":"6588_CR23","doi-asserted-by":"crossref","unstructured":"Zhao H, Jiang L, Fu CW et al (2019) Pointweb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5565\u20135573","DOI":"10.1109\/CVPR.2019.00571"},{"key":"6588_CR24","doi-asserted-by":"crossref","unstructured":"Lin H, Xiao Z, Tan Y et al (2019) Justlookup: one millisecond deep feature extraction for point clouds by lookup tables. In: 2019 IEEE international conference on multimedia and expo (ICME). IEEE, pp 326\u2013331","DOI":"10.1109\/ICME.2019.00064"},{"key":"6588_CR25","doi-asserted-by":"crossref","unstructured":"Yu T, Meng J, Yuan J (2018) Multi-view harmonized bilinear network for 3d object recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 186\u2013194","DOI":"10.1109\/CVPR.2018.00027"},{"key":"6588_CR26","unstructured":"Wang C, Pelillo M, Siddiqi K (2019) Dominant set clustering and pooling for multi-view 3d object recognition. arXiv preprint. https:\/\/arxiv.org\/abs\/1906.01592"},{"key":"6588_CR27","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1016\/j.neunet.2020.08.021","volume":"132","author":"Y Xie","year":"2020","unstructured":"Xie Y, Zhang Y, Gong M et al (2020) MGAT: multi-view graph attention networks. Neural Netw 132:180\u2013189","journal-title":"Neural Netw"},{"key":"6588_CR28","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 248\u2013255","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"6588_CR29","doi-asserted-by":"crossref","unstructured":"Jerhotova E et al (2011) Biomedical image volumes denoising via the wavelet transform. In: Applied biomedical engineering, pp 435\u2013458","DOI":"10.5772\/20256"},{"key":"6588_CR30","unstructured":"Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017\u20132025"},{"key":"6588_CR31","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"6588_CR32","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"issue":"6","key":"6588_CR33","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1145\/360825.360839","volume":"18","author":"BT Phong","year":"1975","unstructured":"Phong BT (1975) Illumination for computer generated pictures. Commun ACM 18(6):311\u2013317","journal-title":"Commun ACM"},{"key":"6588_CR34","doi-asserted-by":"crossref","unstructured":"Feng Y, Zhang Z, Zhao X et al (2018) GVCNN: group-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 264\u2013272","DOI":"10.1109\/CVPR.2018.00035"},{"key":"6588_CR35","doi-asserted-by":"crossref","unstructured":"Xie S, Girshick R, Doll\u00e1r P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492\u20131500","DOI":"10.1109\/CVPR.2017.634"},{"key":"6588_CR36","unstructured":"The Princeton ModelNet. http:\/\/modelnet.cs.princeton.edu\/. Accessed 18 Jan 2021"},{"key":"6588_CR37","unstructured":"Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, pp 1\u201313"},{"key":"6588_CR38","doi-asserted-by":"crossref","unstructured":"Liu JJ, Hou Q, Cheng MM et al (2020) Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10096\u201310105","DOI":"10.1109\/CVPR42600.2020.01011"},{"key":"6588_CR39","doi-asserted-by":"crossref","unstructured":"Yang Z, Wang L (2019) Learning relationships for multi-view 3D object recognition. In: Proceedings of the IEEE international conference on computer vision, pp 7505\u20137514","DOI":"10.1109\/ICCV.2019.00760"},{"key":"6588_CR40","doi-asserted-by":"crossref","unstructured":"Lin T Y, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449\u20131457","DOI":"10.1109\/ICCV.2015.170"},{"key":"6588_CR41","doi-asserted-by":"crossref","unstructured":"Lin T Y, Maji S (2017) Improved bilinear pooling with CNNs. arXiv preprint. https:\/\/arxiv.org\/abs\/1707.06772","DOI":"10.5244\/C.31.117"},{"key":"6588_CR42","doi-asserted-by":"crossref","unstructured":"Ionescu C, Vantzos O, Sminchisescu C (2015) Matrix backpropagation for deep networks with structured layers. In: Proceedings of the IEEE international conference on computer vision, pp 2965\u20132973","DOI":"10.1109\/ICCV.2015.339"},{"key":"6588_CR43","first-page":"156","volume":"6","author":"M Kazhdan","year":"2003","unstructured":"Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3 d shape descriptors. Symposium on geometry processing 6:156\u2013164","journal-title":"Symposium on geometry processing"},{"key":"6588_CR44","doi-asserted-by":"crossref","unstructured":"Chen DY, Tian XP, Shen YT et al (2003) On visual similarity based 3D model retrieval. In: Computer graphics forum, vol 22, no 3. Blackwell Publishing, Inc, Oxford, pp 223\u2013232","DOI":"10.1111\/1467-8659.00669"},{"key":"6588_CR45","doi-asserted-by":"crossref","unstructured":"Cheraghian A, Petersson L (2019) 3dcapsule: extending the capsule architecture to classify 3d point clouds. In: IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1194\u20131202","DOI":"10.1109\/WACV.2019.00132"},{"key":"6588_CR46","doi-asserted-by":"crossref","unstructured":"Qi CR, Su H, Nie\u00dfner M et al (2016) Volumetric and multi-view CNNs for object classification on 3d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648\u20135656","DOI":"10.1109\/CVPR.2016.609"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-021-06588-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-021-06588-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-021-06588-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,7]],"date-time":"2022-02-07T09:20:28Z","timestamp":1644225628000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-021-06588-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,14]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["6588"],"URL":"https:\/\/doi.org\/10.1007\/s00521-021-06588-1","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,14]]},"assertion":[{"value":"6 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}