{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:00:16Z","timestamp":1760148016139,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,3,21]],"date-time":"2023-03-21T00:00:00Z","timestamp":1679356800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62071260","62006131","LZ22F020001"],"award-info":[{"award-number":["62071260","62006131","LZ22F020001"]}]},{"name":"National Natural Science Foundation of Zhejiang Province","award":["62071260","62006131","LZ22F020001"],"award-info":[{"award-number":["62071260","62006131","LZ22F020001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>3D mesh as a complex data structure can provide effective shape representation for 3D objects, but due to the irregularity and disorder of the mesh data, it is difficult for convolutional neural networks to be directly applied to 3D mesh data processing. At the same time, the extensive use of convolutional kernels and pooling layers focusing on local features can cause the loss of spatial information and dependencies of low-level features. In this paper, we propose a self-attentive convolutional network MixFormer applied to 3D mesh models. By defining 3D convolutional kernels and vector self-attention mechanisms applicable to 3D mesh models, our neural network is able to learn 3D mesh model features. Combining the features of convolutional networks and transformer networks, the network can focus on both local detail features and long-range dependencies between features, thus achieving good learning results without stacking multiple layers and saving arithmetic overhead compared to pure transformer architectures. We conduct classification and semantic segmentation experiments on SHREC15, SCAPE, FAUST, MIT, and Adobe Fuse datasets. Experimental results show that the network can achieve 96.7% classification and better segmentation results by using fewer parameters and network layers.<\/jats:p>","DOI":"10.3390\/a16030171","type":"journal-article","created":{"date-parts":[[2023,3,22]],"date-time":"2023-03-22T06:00:01Z","timestamp":1679464801000},"page":"171","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MixFormer: A Self-Attentive Convolutional Network for 3D Mesh Object Recognition"],"prefix":"10.3390","volume":"16","author":[{"given":"Lingfeng","family":"Huang","sequence":"first","affiliation":[{"name":"Mobile Network Application Technology Laboratory, School of Information Science and Engineering, Ningbo University, 818 Fenghua Road, Ningbo 315211, China"}]},{"given":"Jieyu","family":"Zhao","sequence":"additional","affiliation":[{"name":"Mobile Network Application Technology Laboratory, School of Information Science and Engineering, Ningbo University, 818 Fenghua Road, Ningbo 315211, China"}]},{"given":"Yu","family":"Chen","sequence":"additional","affiliation":[{"name":"Mobile Network Application Technology Laboratory, School of Information Science and Engineering, Ningbo University, 818 Fenghua Road, Ningbo 315211, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hazard, C., Bhagat, A., Buddharaju, B.R., Liu, Z., Shao, Y., Lu, L., Omari, S., and Cui, H. (2022, January 19\u201320). Importance Is in Your Attention: Agent Importance Prediction for Autonomous Driving. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00284"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Klingner, M., Muller, K., Mirzaie, M., Breitenstein, J., Termohlen, J.-A., and Fingscheidt, T. (2022, January 19\u201320). On the Choice of Data for Efficient Training and Validation of End-to-End Driving Models. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00527"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wang, J., Li, X., Sullivan, A., Abbott, L., and Chen, S. (2022, January 19\u201320). PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds Sequences. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00488"},{"key":"ref_4","unstructured":"Grishchenko, I., Ablavatski, A., Kartynnik, Y., Raveendran, K., and Grundmann, M. (2020). Attention Mesh: High-Fidelity Face Mesh Prediction in Real-Time. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Cohn, B.A., Maselli, A., Ofek, E., and Gonzalez-Franco, M. (2020, January 14\u201318). SnapMove: Movement Projection Mapping in Virtual Reality. Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), Utrecht, The Netherlands.","DOI":"10.1109\/AIVR50618.2020.00024"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.cag.2021.07.003","article-title":"A Comprehensive Survey of LIDAR-Based 3D Object Detection Methods with Deep Learning for Autonomous Driving","volume":"99","author":"Zamanakos","year":"2021","journal-title":"Comput. Graph."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1578","DOI":"10.1109\/TPAMI.2019.2954885","article-title":"Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era","volume":"43","author":"Han","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1109\/TPAMI.2020.3005434","article-title":"Deep Learning for 3D Point Clouds: A Survey","volume":"43","author":"Guo","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated Residual Transformations for Deep Neural Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_11","unstructured":"Tan, M., and Le, Q.V. (2020). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.patcog.2016.02.005","article-title":"Adaptive Noise Dictionary Construction via IRRPCA for Face Recognition","volume":"59","author":"Chen","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1553","DOI":"10.1109\/TCYB.2020.2991219","article-title":"Joint Optimal Transport with Convex Regularization for Robust Image Classification","volume":"52","author":"Qian","year":"2022","journal-title":"IEEE Trans. Cybern."},{"key":"ref_15","unstructured":"Milano, F., Loquercio, A., Rosinol, A., Scaramuzza, D., and Carlone, L. (2020, January 6\u201312). Primal-Dual Mesh Convolutional Neural Networks. Proceedings of the Advances in Neural Information Processing Systems, Red Hook, NY, USA."},{"key":"ref_16","first-page":"8279","article-title":"MeshNet: Mesh Neural Network for 3D Shape Representation","volume":"33","author":"Feng","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_17","first-page":"1","article-title":"MeshCNN: A Network with an Edge","volume":"38","author":"Hanocka","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xu, H., Dong, M., and Zhong, Z. (2017, January 22\u201329). Directionally Convolutional Networks for 3D Shape Segmentation. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.294"},{"key":"ref_19","first-page":"3859","article-title":"Dynamic Routing between Capsules","volume":"30","author":"Sabour","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","article-title":"Pct: Point Cloud Transformer","volume":"7","author":"Guo","year":"2021","journal-title":"Comput. Vis. Media"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., Jia, J., Torr, P.H., and Koltun, V. (2021, January 11\u201317). Point Transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lin, K., Wang, L., and Liu, Z. (2021, January 20\u201325). End-to-End Human Pose and Mesh Reconstruction with Transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"ref_23","unstructured":"Hua, W., Dai, Z., Liu, H., and Le, Q. (2022, January 17\u201323). Transformer Quality in Linear Time. Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, Y., Mao, H., Girshick, R., and He, K. (2022). Exploring Plain Vision Transformer Backbones for Object Detection. arXiv.","DOI":"10.1007\/978-3-031-20077-9_17"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (2014). Computer Vision\u2014ECCV 2014, Springer International Publishing. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-319-10602-1"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1476","DOI":"10.1109\/TPAMI.2016.2601099","article-title":"Object Detection Networks on Convolutional Feature Maps","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Dai, J., He, K., and Sun, J. (2016, January 27\u201330). Instance-Aware Semantic Segmentation via Multi-Task Network Cascades. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.343"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1007\/978-3-319-10584-0_20","article-title":"Simultaneous Detection and Segmentation","volume":"Volume 8695","author":"Fleet","year":"2014","journal-title":"Computer Vision\u2014ECCV 2014"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, H., Gu, J., Zhang, Z., Dai, J., and Wei, Y. (2018, January 18\u201323). Relation Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00378"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., and Manmatha, R. (2022, January 19\u201320). ResNeSt: Split-Attention Networks. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00309"},{"key":"ref_35","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_36","first-page":"1877","article-title":"Language Models Are Few-Shot Learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_37","first-page":"9","article-title":"Language Models Are Unsupervised Multitask Learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_38","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A Robustly Optimized Bert Pretraining Approach. arXiv."},{"key":"ref_39","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Proceedings of the NAACL-HLT, Association for Computational Linguistics."},{"key":"ref_40","first-page":"1","article-title":"Stand-Alone Self-Attention in Vision Models","volume":"32","author":"Ramachandran","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","unstructured":"Bello, I., Zoph, B., Vaswani, A., Shlens, J., and Le, Q.V. (November, January 27). Attention Augmented Convolutional Networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jia, J., and Koltun, V. (2020, January 13\u201319). Exploring Self-Attention for Image Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01009"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., and Chen, L.-C. (2020, January 23\u201328). Axial-Deeplab: Stand-Alone Axial-Attention for Panoptic Segmentation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58548-8_7"},{"key":"ref_44","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1145\/3506694","article-title":"Subdivision-Based Mesh Convolution Networks","volume":"41","author":"Hu","year":"2022","journal-title":"ACM Trans. Graph."},{"key":"ref_47","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1145\/3414685.3417806","article-title":"MeshWalker: Deep Mesh Understanding by Random Walks","volume":"39","author":"Lahav","year":"2020","journal-title":"ACM Trans. Graph."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1145\/3072959.3073616","article-title":"Convolutional Neural Networks on Surfaces via Seamless Toric Covers","volume":"36","author":"Maron","year":"2017","journal-title":"ACM Trans. Graph."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/3\/171\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:59:50Z","timestamp":1760122790000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/3\/171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,21]]},"references-count":49,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["a16030171"],"URL":"https:\/\/doi.org\/10.3390\/a16030171","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2023,3,21]]}}}