{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T06:16:15Z","timestamp":1762409775215,"version":"build-2065373602"},"reference-count":28,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2020,8,2]],"date-time":"2020-08-02T00:00:00Z","timestamp":1596326400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2017YFF0205004","2018YFC0114902-1,"],"award-info":[{"award-number":["2017YFF0205004","2018YFC0114902-1,"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11474259"],"award-info":[{"award-number":["11474259"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Zhejiang Provincial Education Department Research Grant Program","award":["Y201942513"],"award-info":[{"award-number":["Y201942513"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Most sound imaging instruments are currently used as measurement tools which can provide quantitative data, however, a uniform method to directly and comprehensively evaluate the results of combining acoustic and optical images is not available. Therefore, in this study, we define a localization error index for sound imaging instruments, and propose an acoustic phase cloud map evaluation method based on an improved YOLOv4 algorithm to directly and objectively evaluate the sound source localization results of a sound imaging instrument. The evaluation method begins with the image augmentation of acoustic phase cloud maps obtained from the different tests of a sound imaging instrument to produce the dataset required for training the convolutional network. Subsequently, we combine DenseNet with existing clustering algorithms to improve the YOLOv4 algorithm to train the neural network for easier feature extraction. The trained neural network is then used to localize the target sound source and its pseudo-color map in the acoustic phase cloud map to obtain a pixel-level localization error. Finally, a standard chessboard grid is used to obtain the proportional relationship between the size of the acoustic phase cloud map and the actual physical space distance; then, the true lateral and longitudinal positioning error of sound imaging instrument can be obtained. Experimental results show that the mean average precision of the improved YOLOv4 algorithm in acoustic phase cloud map detection is 96.3%, the F1-score is 95.2%, and detection speed is up to 34.6 fps. The improved algorithm can rapidly and accurately determine the positioning error of sound imaging instrument, which can be used to analyze and evaluate the positioning performance of sound imaging instrument.<\/jats:p>","DOI":"10.3390\/s20154314","type":"journal-article","created":{"date-parts":[[2020,8,3]],"date-time":"2020-08-03T06:16:47Z","timestamp":1596435407000},"page":"4314","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":30,"title":["Study on the Evaluation Method of Sound Phase Cloud Maps Based on an Improved YOLOv4 Algorithm"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8753-1637","authenticated-orcid":false,"given":"Qinfeng","family":"Zhu","sequence":"first","affiliation":[{"name":"Key Laboratory of Acoustics Research, China Jiliang University, Hangzhou 310018, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huifeng","family":"Zheng","sequence":"additional","affiliation":[{"name":"Key Laboratory of Acoustics Research, China Jiliang University, Hangzhou 310018, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuebing","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Acoustics Research, China Jiliang University, Hangzhou 310018, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yonggang","family":"Cao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Acoustics Research, China Jiliang University, Hangzhou 310018, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shixu","family":"Guo","sequence":"additional","affiliation":[{"name":"Key Laboratory of Acoustics Research, China Jiliang University, Hangzhou 310018, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,8,2]]},"reference":[{"key":"ref_1","unstructured":"Brandstein, M., and Silverman, H. (1997, January 21\u201324). A robust method for speech signal time-delay estimation in reverberant rooms. Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Washington, DC, USA."},{"key":"ref_2","first-page":"2286","article-title":"Sound Source Localization and Separation in Near Field","volume":"83","author":"Asono","year":"2000","journal-title":"Ieice Trans. Fundam. Electron. Commun. Comput. Sci."},{"key":"ref_3","first-page":"219","article-title":"An Acoustic Field Visualization System Based on Virtual Instruments","volume":"42","author":"Li","year":"2006","journal-title":"Comput. Eng. Appl."},{"key":"ref_4","unstructured":"Johnson, D.H., and Dudgeon, D.E. (1993). Array Signal Processing: Concepts and Techniques, Prentice Hall."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Brandstein, M.S., and Ward, D.B. (2001). Microphone Arrays: Signal Processing Techniques and Applications, Springer.","DOI":"10.1007\/978-3-662-04619-7"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3271","DOI":"10.1121\/1.410949","article-title":"Free-field calibration and characterization of microphone systems","volume":"96","author":"Nedzelnitsky","year":"1994","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1088\/0026-1394\/45\/2\/006","article-title":"The influence of positional uncertainty in free-field microphone calibration","volume":"45","author":"Molares","year":"2008","journal-title":"Metrologia"},{"key":"ref_8","unstructured":"Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., M\u00fcller, U., and Zhang, J. (2016). End to End Learning for Self-Driving Cars. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Machine Intell."},{"key":"ref_10","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_11","unstructured":"Zhang, Z. (1999, January 20\u201327). Flexible camera calibration by viewing a plane from unknown orientations. Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece."},{"key":"ref_12","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_15","unstructured":"David, A., and Vassilvitskii, S. (2007, January 7\u20139). K-Means++: The Advantages of Careful Seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, LA, USA."},{"key":"ref_16","first-page":"4700","article-title":"Densely Connected Convolutional Networks","volume":"2","author":"Huang","year":"2017","journal-title":"Proc. IEEE Conf. Comput. Vis. Pattern Recognit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","unstructured":"Vinod, N., and Hinton, G.E. (2010, January 21\u201324). Rectified linear units improve restricted boltzmann machines. Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., and Yeh, I.-H. (2020, January 14\u201319). CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPR 2020), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2017); Institute of Electrical and Electronics Engineers (IEEE), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_23","unstructured":"Misra, D. (2019). Mish: A Self Regularized Non-Monotonic Neural Activation Function. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, Y., Wang, Z., Cao, Z., and Huang, T. (2016, January 15\u201319). UnitBox: An Advanced Object Detection Network. Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands.","DOI":"10.1145\/2964284.2967274"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (July, January 26). Rethinking the Inception Architecture for Computer Vision. Proceedings of the 2016 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/15\/4314\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:53:47Z","timestamp":1760176427000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/15\/4314"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,2]]},"references-count":28,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2020,8]]}},"alternative-id":["s20154314"],"URL":"https:\/\/doi.org\/10.3390\/s20154314","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2020,8,2]]}}}