{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T09:49:33Z","timestamp":1756460973723,"version":"3.41.0"},"reference-count":89,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2022,1,25]],"date-time":"2022-01-25T00:00:00Z","timestamp":1643068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology (MOST) of Taiwan","doi-asserted-by":"crossref","award":["MOST-109-2223-E-009-002-MY3, MOST-110-2634-F-007-015, MOST-109-2218-E-002-015, MOST-109-2221-E-009-114-MY3, MOST-110-2218-E-A49-018, MOST-109-2327-B-010-005, MOST-109-2221-E-009-097 and MOST-109-2221-E-001-015"],"award-info":[{"award-number":["MOST-109-2223-E-009-002-MY3, MOST-110-2634-F-007-015, MOST-109-2218-E-002-015, MOST-109-2221-E-009-114-MY3, MOST-110-2218-E-A49-018, MOST-109-2327-B-010-005, MOST-109-2221-E-009-097 and MOST-109-2221-E-001-015"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Higher Education Sprout Project of the National Yang Ming Chiao Tung University and Ministry of Education (MOE), Taiwan"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,2,28]]},"abstract":"<jats:p>\n            In the absence of vaccines or medicines to stop COVID-19, one of the effective methods to slow the spread of the coronavirus and reduce the overloading of healthcare is to wear a face mask. Nevertheless, to mandate the use of face masks or coverings in public areas, additional human resources are required, which is tedious and attention-intensive. To automate the monitoring process, one of the promising solutions is to leverage existing object detection models to detect the faces with or without masks. As such, security officers do not have to stare at the monitoring devices or crowds, and only have to deal with the alerts triggered by the detection of faces without masks. Existing object detection models usually focus on designing the CNN-based network architectures for extracting discriminative features. However, the size of training datasets of face mask detection is small, while the difference between faces with and without masks is subtle. Therefore, in this article, we propose a face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network by adapting their attention maps\u2019 feature refinement. Moreover, we further propose an anchor-free detector with Triplet-Consistency Representation Learning by integrating the consistency loss and the triplet loss to deal with the small-scale training data and the similarity between masks and occlusions. Extensive experimental results show that our method outperforms the other state-of-the-art methods. The source code is released as a public download to improve public health at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/wei-1006\/MaskFaceDetection\">https:\/\/github.com\/wei-1006\/MaskFaceDetection<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3472623","type":"journal-article","created":{"date-parts":[[2022,1,25]],"date-time":"2022-01-25T15:06:00Z","timestamp":1643123160000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency Representation Learning"],"prefix":"10.1145","volume":"18","author":[{"given":"Chun-Wei","family":"Yang","sequence":"first","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thanh Hai","family":"Phung","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hong-Han","family":"Shuai","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wen-Huang","family":"Cheng","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University and National Chung Hsing University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,1,25]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Daniell Chiang. 2021. AIZOOTech. AIZOOTech\/FaceMaskDetection. https:\/\/github.com\/AIZOOTech\/FaceMaskDetection."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.993558"},{"key":"e_1_3_2_4_2","article-title":"Yolov4: Optimal speed and accuracy of object detection","author":"Bochkovskiy Alexey","year":"2020","unstructured":"Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).","journal-title":"arXiv preprint arXiv:2004.10934"},{"key":"e_1_3_2_5_2","article-title":"YOLOv4: Optimal speed and accuracy of object detection","author":"Bochkovskiy Alexey","year":"2020","unstructured":"Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal speed and accuracy of object detection. arXiv (2020).","journal-title":"arXiv"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCMC.2017.8282685"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00644"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01150"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_3_2_10_2","article-title":"You only look one-level feature","author":"Chen Qiang","year":"2021","unstructured":"Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, and Jian Sun. 2021. You only look one-level feature. arXiv preprint arXiv:2103.09460 (2021).","journal-title":"arXiv preprint arXiv:2103.09460"},{"key":"e_1_3_2_11_2","volume-title":"Neural Information Processing Systems (NeurIPS\u201920)","author":"Chen Yihong","year":"2020","unstructured":"Yihong Chen, Zheng Zhang, Yue Cao, Liwei Wang, Stephen Lin, and Han Hu. 2020. RepPoints V2: Verification meets regression for object detection. In Neural Information Processing Systems (NeurIPS\u201920)."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-66665-1_6"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01132"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00667"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454351"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587597"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2009.167"},{"key":"e_1_3_2_19_2","article-title":"RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free","author":"Fu Cheng-Yang","year":"2019","unstructured":"Cheng-Yang Fu, Mykhailo Shvets, and Alexander C. Berg. 2019. RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free. arXiv preprint arXiv:1901.03353 (2019).","journal-title":"arXiv preprint arXiv:1901.03353"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIIS51140.2020.9342737"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.53"},{"key":"e_1_3_2_22_2","volume-title":"The International Conference on Learning Representations (ICLR\u201918)","author":"Gidaris Spyros","year":"2018","unstructured":"Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. In The International Conference on Learning Representations (ICLR\u201918)."},{"key":"e_1_3_2_23_2","article-title":"Unsupervised representation learning by predicting image rotations","author":"Gidaris Spyros","year":"2018","unstructured":"Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018).","journal-title":"arXiv preprint arXiv:1803.07728"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.5555\/2986459.2986509"},{"key":"e_1_3_2_27_2","unstructured":"Ross B. Girshick Pedro F. Felzenszwalb and David McAllester. 2012. Discriminatively trained deformable part models release 5. (2012)."},{"key":"e_1_3_2_28_2","volume-title":"Advances in Neural Information Processing Systems (NeurIPS\u201920)","author":"al. J. B. Grill et","year":"2020","unstructured":"J. B. Grill et al.2020. Bootstrap your own latent: A new approach to self-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS\u201920). IEEE."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_31_2","article-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).","journal-title":"arXiv preprint arXiv:1704.04861"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455252"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCSE.2010.5593578"},{"key":"e_1_3_2_35_2","unstructured":"Mingjie Jiang Xinqi Fan and Hong Yan. 2020. RetinaMask: A Face Mask Detector. (2020). arxiv:cs.CV\/2005.03950"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CICN49253.2020.9242625"},{"key":"e_1_3_2_37_2","article-title":"Adam: A method for stochastic optimization","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3002345"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-018-9650-2"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01392"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-020-0843-2"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.08.056"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2954747"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_2_48_2","article-title":"Object-centric learning with slot attention","volume":"33","author":"Locatello Francesco","year":"2020","unstructured":"Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. 2020. Object-centric learning with slot attention. Advances in Neural Information Processing Systems 33 (2020), 11525\u201311538.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/850924.851523"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_2_51_2","first-page":"1","article-title":"A novel technique for automated concealed face detection in surveillance videos","author":"Mahmoud Hanan A. Hosni","year":"2020","unstructured":"Hanan A. Hosni Mahmoud and Hanan Abdullah Mengash. 2020. A novel technique for automated concealed face detection in surveillance videos. Personal and Ubiquitous Computing 25 (2020), 1\u201312.","journal-title":"Personal and Ubiquitous Computing"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2977457"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00674"},{"key":"e_1_3_2_54_2","volume-title":"Proceedings of IEEE International Conference on Pattern Recognition (ICPR\u201906)","author":"Neubeck Alexander","year":"2006","unstructured":"Alexander Neubeck and Luc Van Gool. 2006. Deep residual learning for image recognition. In Proceedings of IEEE International Conference on Pattern Recognition (ICPR\u201906)."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.395"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.690"},{"key":"e_1_3_2_61_2","article-title":"Yolov3: An incremental improvement","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).","journal-title":"arXiv preprint arXiv:1804.02767"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969250"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00584"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.284"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_33"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298907"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.220"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00972"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126456"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2001.990517"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000013087.49260.fb"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.683"},{"key":"e_1_3_2_77_2","article-title":"Deep high-resolution representation learning for visual recognition","author":"Wang Jingdong","year":"2019","unstructured":"Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. 2019. Deep high-resolution representation learning for visual recognition. TPAMI 43, 10 (2019), 3349\u20133364.","journal-title":"TPAMI"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00417"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58545-7_34"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.596"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00975"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00978"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2876865"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00094"},{"key":"e_1_3_2_88_2","article-title":"Recover canonical-view faces in the wild with deep neural networks","author":"Zhu Zhenyao","year":"2014","unstructured":"Zhenyao Zhu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2014. Recover canonical-view faces in the wild with deep neural networks. arXiv preprint arXiv:1404.3543 (2014).","journal-title":"arXiv preprint arXiv:1404.3543"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_26"},{"key":"e_1_3_2_90_2","article-title":"Object detection in 20 years: A survey","author":"Zou Zhengxia","year":"2019","unstructured":"Zhengxia Zou, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).","journal-title":"arXiv preprint arXiv:1905.05055"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472623","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472623","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:24Z","timestamp":1750191444000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472623"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,25]]},"references-count":89,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2022,2,28]]}},"alternative-id":["10.1145\/3472623"],"URL":"https:\/\/doi.org\/10.1145\/3472623","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2022,1,25]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}