{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T00:27:20Z","timestamp":1767918440089,"version":"3.49.0"},"reference-count":37,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2020,7,29]],"date-time":"2020-07-29T00:00:00Z","timestamp":1595980800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51475251"],"award-info":[{"award-number":["51475251"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51705273"],"award-info":[{"award-number":["51705273"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key Research &amp; Development Programs of Shandong Province","award":["2017GGX203003"],"award-info":[{"award-number":["2017GGX203003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Monitoring the assembly process is a challenge in the manual assembly of mass customization production, in which the operator needs to change the assembly process according to different products. If an assembly error is not immediately detected during the assembly process of a product, it may lead to errors and loss of time and money in the subsequent assembly process, and will affect product quality. To monitor assembly process, this paper explored two methods: recognizing assembly action and recognizing parts from complicated assembled products. In assembly action recognition, an improved three-dimensional convolutional neural network (3D CNN) model with batch normalization is proposed to detect a missing assembly action. In parts recognition, a fully convolutional network (FCN) is employed to segment, recognize different parts from complicated assembled products to check the assembly sequence for missing or misaligned parts. An assembly actions data set and an assembly segmentation data set are created. The experimental results of assembly action recognition show that the 3D CNN model with batch normalization reduces computational complexity, improves training speed and speeds up the convergence of the model, while maintaining accuracy. Experimental results of FCN show that FCN-2S provides a higher pixel recognition accuracy than other FCNs.<\/jats:p>","DOI":"10.3390\/s20154208","type":"journal-article","created":{"date-parts":[[2020,7,29]],"date-time":"2020-07-29T07:31:45Z","timestamp":1596007905000},"page":"4208","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["Monitoring of Assembly Process Using Deep Learning Technology"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3185-1062","authenticated-orcid":false,"given":"Chengjun","family":"Chen","sequence":"first","affiliation":[{"name":"School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China"},{"name":"Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China"}]},{"given":"Chunlin","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China"},{"name":"Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China"}]},{"given":"Tiannuo","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China"},{"name":"Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China"}]},{"given":"Dongnian","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China"},{"name":"Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China"}]},{"given":"Yang","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China"},{"name":"Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China"}]},{"given":"Zhengxu","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China"},{"name":"Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China"}]},{"given":"Jun","family":"Hong","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Xi\u2019an Jiaotong University, Xi\u2019an 711049, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,7,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bobick, A., and Davis, J. (1996, January 25\u201329). An appearance-based representation of action. Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria.","DOI":"10.1109\/ICPR.1996.546039"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.cviu.2006.07.013","article-title":"Free viewpoint action recognition using motion history volumes","volume":"104","author":"Weinland","year":"2006","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_3","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Chaudhry, R., Ravichandran, A., Hager, G., and Vidal, R. (2009, January 20\u201325). Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206821"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Schuldt, C., Laptev, I., and Caputo, B. (2004, January 26). Recognizing human actions: A local SVM approach. Proceedings of the 17th International Conference on Pattern Recognitio, Cambridge, UK.","DOI":"10.1109\/ICPR.2004.1334462"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s11263-012-0594-8","article-title":"Dense trajectories and motion boundary descriptors for action recognition","volume":"103","author":"Wang","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.jmsy.2020.04.018","article-title":"Repetitive assembly action recognition based on object detection and pose estimation","volume":"55","author":"Chen","year":"2020","journal-title":"J. Manuf. Syst."},{"key":"ref_8","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Wei, S.E., Ramakrishna, V., Kanade, T., and Sheikh, Y. (2016, January 27\u201330). Convolutional pose machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.511"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.cad.2014.09.001","article-title":"A vision-based system for monitoring block assembly in shipbuilding","volume":"59","author":"Kim","year":"2015","journal-title":"Comput. Aided Des."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"\u017didek, K., Hosovsky, A., Pite\u013e, J., and Bedn\u00e1r, S. (2019). Recognition of Assembly Parts by Convolutional Neural Networks. Advances in Manufacturing Engineering and Materials, Springer. Lecture Notes in Mechanical Engineering.","DOI":"10.1007\/978-3-319-99353-9_30"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Zisserman, A. (2016, January 27\u201330). Convolutional two-stream network fusion for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.213"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016, January 11\u201314). Temporal segment networks: Towards good practices for deep action recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D convolutional neural networks for human action recognition","volume":"35","author":"Ji","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Du, W., Wang, Y., and Qiao, Y. (2017, January 22\u201329). RPAN: An end-to-end recurrent pose-attention network for action recognition in videos. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.402"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Darrell, T., and Saenko, K. (2015, January 7\u201312). Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xu, H., Das, A., and Saenko, K. (2017, January 22\u201329). R-C3D: Region Convolutional 3D Network for Temporal Activity Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.617"},{"key":"ref_19","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv."},{"key":"ref_20","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 11). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011, January 20\u201325). Real-time human pose recognition in parts from single depth images. Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"ref_22","unstructured":"Joo, S.I., Weon, S.H., Hong, J.M., and Choi, H.I. (2013, January 22\u201325). Hand detection in depth images using features of depth difference. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (World Comp), Las Vegas, NV, USA."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully convolutional networks for semantic segmentation","volume":"39","author":"Long","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Peng, C., Zhang, X., Yu, G., Luo, G., and Sun, J. (2017, January 21\u201326). Large kernel matters\u2014Improve semantic segmentation by global convolutional network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.189"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., and Liu, H. (2020, January 16\u201318). Spatial pyramid based graph reasoning for semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00897"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Lin, Z.Q., Bidart, R., Hu, X., Daya, I.B., Li, Z., and Wong, A. (2020, January 16\u201318). Squeeze-and-attention networks for semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01308"},{"key":"ref_30","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (November, January 27). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_31","unstructured":"Fu, J., Liu, J., Wang, Y., Zhou, J., Wang, C., and Lu, H. Stacked deconvolutional network for semantic segmentation. IEEE Trans. Image Process., 2019."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Artacho, B., and Savakis, A. (2019). Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors, 19.","DOI":"10.3390\/s19245361"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sharma, S., Ball, J.E., Tang, B., Carruth, D.W., Doude, M., and Islam, M.A. (2019). Semantic segmentation with transfer learning for off-road autonomous driving. Sensors, 19.","DOI":"10.3390\/s19112577"},{"key":"ref_34","unstructured":"Glorot, X., Bordes, A., and Bengio, Y. (2019, January 16\u201318). Deep sparse rectifier neural networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Naha, Okinawa, Japan."},{"key":"ref_35","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_36","unstructured":"Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014, January 8\u201313). How transferable are features in deep neural networks?. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_37","unstructured":"Kingma, D., and Ba, J. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations, ICLR, San Diego, CA, USA."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/15\/4208\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:52:32Z","timestamp":1760176352000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/15\/4208"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,29]]},"references-count":37,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2020,8]]}},"alternative-id":["s20154208"],"URL":"https:\/\/doi.org\/10.3390\/s20154208","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,29]]}}}