{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,19]],"date-time":"2025-04-19T04:06:08Z","timestamp":1745035568033,"version":"3.40.4"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,4,1]],"date-time":"2025-04-01T00:00:00Z","timestamp":1743465600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T00:00:00Z","timestamp":1743552000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100012165","name":"Key Technologies Research and Development Program","doi-asserted-by":"publisher","award":["2023YFB3001801"],"award-info":[{"award-number":["2023YFB3001801"]}],"id":[{"id":"10.13039\/501100012165","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62322201","62072018","U23B2020","U22A2028"],"award-info":[{"award-number":["62322201","62072018","U23B2020","U22A2028"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["CCF Trans. HPC"],"published-print":{"date-parts":[[2025,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Synchronization performance issues related to lock such as too large critical section and improper lock usage, are inevitable in scientific computing. Even skilled programmers suffer from complicated reports of existing lock behavior profilers, not to mention scientists who are most of the scientific computing programmers. Besides, ARM-based supercomputers emerge on the top 500 list while ARM-supported lock behavior profiling tools haven\u2019t got enough attention as they deserve. Based on an <jats:bold>\u201cone step for all\u201d<\/jats:bold> workflow including problem identification, problem analysis and solution generation, this paper presents an end-to-end and fine-grained lock behavior profiling tool, supporting both ARM and <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>86 architecture. Specially, this paper introduces a priority function to quantify the priority of distinct solutions and users can adjust different weights of metrics. Compared to existing work using library interception and replacement or <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>86-based analysis framework, fined-grained analysis, highly usable report, high portability and strong compatibility make it an efficient tool for scientific computing programmers to find and optimize lock related performance bugs.<\/jats:p>","DOI":"10.1007\/s42514-024-00210-1","type":"journal-article","created":{"date-parts":[[2025,4,4]],"date-time":"2025-04-04T11:07:21Z","timestamp":1743764841000},"page":"100-113","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SyncNOVA: an end-to-end fine-grained profiling tool oN lOck behaVior detection and critical section diAgnosis"],"prefix":"10.1007","volume":"7","author":[{"given":"Wentao","family":"Feng","sequence":"first","affiliation":[]},{"given":"Shizhe","family":"Shang","sequence":"additional","affiliation":[]},{"given":"Pengfei","family":"Li","sequence":"additional","affiliation":[]},{"given":"Hailong","family":"Yang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7186-0556","authenticated-orcid":false,"given":"Zhongzhi","family":"Luan","sequence":"additional","affiliation":[]},{"given":"Depei","family":"Qian","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,2]]},"reference":[{"issue":"6","key":"210_CR1","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1002\/cpe.1553","volume":"22","author":"L Adhianto","year":"2010","unstructured":"Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685\u2013701 (2010)","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"210_CR2","doi-asserted-by":"crossref","unstructured":"Ahmed, A., Liscano, R., Azim, A., Chan, Y.-K., Sundaresan, V.: Identification of Java lock contention anti-patterns based on run-time performance data. In: Proceedings of the 5th ACM\/IEEE International Conference on Automation of Software Test (AST 2024), pp. 209\u2013213 (2024)","DOI":"10.1145\/3644032.3644466"},{"key":"210_CR3","doi-asserted-by":"crossref","unstructured":"Alam, M.M.U., Liu, T., Zeng, G., Muzahid, A.: Syncperf: categorizing, detecting, and diagnosing synchronization performance bugs. In: Proceedings of the Twelfth European Conference on Computer Systems, pp. 298\u2013313 (2017)","DOI":"10.1145\/3064176.3064186"},{"key":"210_CR4","doi-asserted-by":"crossref","unstructured":"Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multicore. In: Programming Multi-Core and Many-Core Computing Systems, pp. 261\u2013280 (2017)","DOI":"10.1002\/9781119332015.ch13"},{"issue":"3","key":"210_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3275443","volume":"5","author":"A Amer","year":"2019","unstructured":"Amer, A., Lu, H., Balaji, P., Chabbi, M., Wei, Y., Hammond, J., Matsuoka, S.: Lock contention management in multithreaded mpi. ACM Trans. Parallel Comput. 5(3), 1\u201321 (2019)","journal-title":"ACM Trans. Parallel Comput."},{"key":"210_CR6","doi-asserted-by":"crossref","unstructured":"Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72\u201381 (2008)","DOI":"10.1145\/1454115.1454128"},{"key":"210_CR7","unstructured":"Breshears, C.P.: Using Intel Thread Profiler for Win32 Threads: Philosophy and Theory (2007)"},{"key":"210_CR8","unstructured":"Bruening, D., Amarasinghe, S.: Efficient, transparent, and comprehensive runtime code manipulation (2004)"},{"key":"210_CR9","unstructured":"Bryant, R., Hawkes, J.: Lockmeter: highly-informative instrumentation for spin locks in the linux\u00ae kernel. In: Proceedings of the 4th Annual Linux Showcase & Conference, vol. 4, pp. 17\u201317 (2000)"},{"key":"210_CR10","doi-asserted-by":"crossref","unstructured":"Chen, G., Stenstrom, P.: Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications. In: SC\u201912: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1\u201311. IEEE (2012)","DOI":"10.1109\/SC.2012.40"},{"key":"210_CR11","doi-asserted-by":"crossref","unstructured":"Curtsinger, C., Berger, E.D.: Coz: Finding code that counts with causal profiling. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 184\u2013197 (2015)","DOI":"10.1145\/2815400.2815409"},{"issue":"1","key":"210_CR12","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1109\/99.660313","volume":"5","author":"L Dagum","year":"1998","unstructured":"Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46\u201355 (1998)","journal-title":"IEEE Comput. Sci. Eng."},{"key":"210_CR13","unstructured":"David, F.: Continuous and Efficient Lock Profiling for Java on Multicore Architectures. PhD thesis, Universit\u00e9 Pierre et Marie Curie-Paris VI (2015)"},{"key":"210_CR14","doi-asserted-by":"crossref","unstructured":"Du\u00a0Bois, K., Eyerman, S., Sartor, J.B., Eeckhout, L.: Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, pp. 511\u2013522 (2013)","DOI":"10.1145\/2485922.2485966"},{"key":"210_CR15","unstructured":"GmbH: Jprofiler: The Award-Winning All-in-One Java Profiler. https:\/\/www.ej-technologies.com\/products\/jprofiler\/overview.html"},{"key":"210_CR16","doi-asserted-by":"crossref","unstructured":"Huang, Y., Cui, Z., Chen, L., Zhang, W., Bao, Y., Chen, M.: Halock: hardware-assisted lock contention detection in multithreaded applications. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 253\u2013262 (2012)","DOI":"10.1145\/2370816.2370854"},{"key":"210_CR17","unstructured":"Hwang, P.Z.J.: IBM Thread and Monitor Dump Analyze for Java. (2014). https:\/\/www.ibm.com\/support\/pages\/node\/1108077?mhsrc=ibmsearch_a&mhq=IBM"},{"issue":"6","key":"210_CR18","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1145\/2345156.2254075","volume":"47","author":"G Jin","year":"2012","unstructured":"Jin, G., Song, L., Shi, X., Scherpelz, J., Lu, S.: Understanding and detecting real-world performance bugs. ACM SIGPLAN Not. 47(6), 77\u201388 (2012)","journal-title":"ACM SIGPLAN Not."},{"key":"210_CR19","doi-asserted-by":"crossref","unstructured":"Li, G., Chen, D., Lu, S., Musuvathi, M., Nath, S.: Sherlock: unsupervised synchronization-operation inference. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 314\u2013328 (2021)","DOI":"10.1145\/3445814.3446754"},{"issue":"2","key":"210_CR20","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1007\/s42514-022-00095-y","volume":"4","author":"K Lu","year":"2022","unstructured":"Lu, K., Wang, Y., Guo, Y., Huang, C., Liu, S., Wang, R., Fang, J., Tang, T., Chen, Z., Liu, B.: MT-3000: a heterogeneous multi-zone processor for HPC. CCF Trans. High Perfor. Comput. 4(2), 150\u2013164 (2022)","journal-title":"CCF Trans. High Perfor. Comput."},{"key":"210_CR21","unstructured":"Luk, C.-K., Cohn, R.S., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.M.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM-SIGPLAN Symposium on Programming Language Design and Implementation (2005). https:\/\/api.semanticscholar.org\/CorpusID:6719639"},{"key":"210_CR22","doi-asserted-by":"crossref","unstructured":"Nair, R., Field, T.: Gapp: a fast profiler for detecting serialization bottlenecks in parallel Linux applications. In: Proceedings of the ACM\/SPEC International Conference on Performance Engineering, pp. 257\u2013264 (2020)","DOI":"10.1145\/3358960.3379136"},{"key":"210_CR23","unstructured":"Patel C.: NVIDIA Expands Support for Arm with HPC, AI, Visualization Containers on NGC. (2019). https:\/\/blogs.nvidia.com\/blog\/ngc-containers-arm\/"},{"key":"210_CR24","unstructured":"Oracle: Solaris Studio 12.2: Performance Analyzer. (2011). https:\/\/docs.oracle.com\/cd\/E18659_01\/pdf\/821-1379.pdf"},{"key":"210_CR25","unstructured":"Oracle, H.: A Heap\/CPU Profiling Tool (2017). https:\/\/docs.oracle.com\/javase\/8\/docs\/technotes\/samples\/hprof.html"},{"issue":"4","key":"210_CR26","first-page":"298","volume":"23","author":"C Pheatt","year":"2008","unstructured":"Pheatt, C.: Intel\u00ae threading building blocks. J. Comput. Sci. Coll. 23(4), 298\u2013298 (2008)","journal-title":"J. Comput. Sci. Coll."},{"key":"210_CR27","unstructured":"Xinhua.: Prototype of China\u2019s Tianhe-3 Complete. (2018). http:\/\/english.www.gov.cn\/news\/photos\/2018\/07\/27\/content_281476238364702.htm"},{"key":"210_CR28","unstructured":"Rapp, J.: \u201cDiagnosing lock contention with the concurrency visualizer\u201d. (2010). https:\/\/learn.microsoft.com\/en-us\/archive\/blogs\/visualizeparallel\/diagnosing-lock-contention-with-the-concurrency-visualizer"},{"key":"210_CR29","doi-asserted-by":"crossref","unstructured":"Rezazadeh, M., Ezzati-Jivan, N., Galea, E., Dagenais, M.R.: Multi-level execution trace based lock contention analysis. In: 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 177\u2013182. IEEE (2020)","DOI":"10.1109\/ISSREW51248.2020.00068"},{"key":"210_CR30","doi-asserted-by":"crossref","unstructured":"Roy, P., Liu, X.: StructSlim: a lightweight profiler to guide structure splitting. In: Proceedings of the 2016 International Symposium on Code Generation and Optimization, pp. 36\u201346 (2016)","DOI":"10.1145\/2854038.2854053"},{"key":"210_CR31","doi-asserted-by":"crossref","unstructured":"Tallent, N.R., Mellor-Crummey, J.M., Porterfield, A.: Analyzing lock contention in multithreaded applications. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 269\u2013280 (2010)","DOI":"10.1145\/1693453.1693489"},{"key":"210_CR32","unstructured":"Strohmaier E., Dongarra J., Simon HD., Meuer M.: Top 500 list. (2022). https:\/\/www.top500.org"},{"key":"210_CR33","doi-asserted-by":"crossref","unstructured":"Wen, S., Liu, X., Byrne, J., Chabbi, M.: Watching for software inefficiencies with witch. In: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 332\u2013347 (2018)","DOI":"10.1145\/3173162.3177159"},{"key":"210_CR34","doi-asserted-by":"crossref","unstructured":"Weng, L., Hu, Y., Huang, P., Nieh, J., Yang, J.: Effective performance issue diagnosis with value-assisted cost profiling. In: Proceedings of the Eighteenth European Conference on Computer Systems, pp. 1\u201317 (2023)","DOI":"10.1145\/3552326.3587444"},{"key":"210_CR35","doi-asserted-by":"crossref","unstructured":"You, X., Yang, H., Luan, Z., Qian, D., Liu, X.: ZeroSpy: exploring software inefficiency with redundant zeros. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1\u201314. IEEE (2020)","DOI":"10.1109\/SC41405.2020.00033"},{"key":"210_CR36","doi-asserted-by":"crossref","unstructured":"You, X., Yang, H., Lei, K., Luan, Z., Qian, D.: Vclinic: A portable and efficient framework for fine-grained value profilers. In: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, pp. 892\u2013904 (2023)","DOI":"10.1145\/3575693.3576934"},{"key":"210_CR37","doi-asserted-by":"crossref","unstructured":"Yu, T., Pradel, M.: Syncprof: detecting, localizing, and optimizing synchronization bottlenecks. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 389\u2013400 (2016)","DOI":"10.1145\/2931037.2931070"},{"issue":"11","key":"210_CR38","doi-asserted-by":"publisher","first-page":"2489","DOI":"10.1109\/TPDS.2018.2840992","volume":"29","author":"C Yu","year":"2018","unstructured":"Yu, C., Roy, P., Bai, Y., Yang, H., Liu, X.: LWPTool: a lightweight profiler to guide data layout optimization. IEEE Trans. Parallel Distrib. Syst. 29(11), 2489\u20132502 (2018)","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"210_CR39","doi-asserted-by":"crossref","unstructured":"Zhao, Q., Liu, X., Chabbi, M.: DrCCTProf: a fine-grained call path profiler for arm-based clusters. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1\u201316. IEEE (2020)","DOI":"10.1109\/SC41405.2020.00034"},{"key":"210_CR40","doi-asserted-by":"crossref","unstructured":"Zheng, L., Liao, X., He, B., Wu, S., Jin, H.: On performance debugging of unnecessary lock contentions on multicore processors: a replay-based approach. In: 2015 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO), pp. 56\u201367 (2015). IEEE","DOI":"10.1109\/CGO.2015.7054187"},{"key":"210_CR41","unstructured":"Zhou, F., Gan, Y., Ma, S., Wang, Y.: {wPerf}: Generic {Off-CPU} analysis to identify bottleneck waiting events. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 527\u2013543 (2018)"}],"container-title":["CCF Transactions on High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42514-024-00210-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42514-024-00210-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42514-024-00210-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T09:54:57Z","timestamp":1744970097000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42514-024-00210-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,4]]}},"alternative-id":["210"],"URL":"https:\/\/doi.org\/10.1007\/s42514-024-00210-1","relation":{},"ISSN":["2524-4922","2524-4930"],"issn-type":[{"type":"print","value":"2524-4922"},{"type":"electronic","value":"2524-4930"}],"subject":[],"published":{"date-parts":[[2025,4]]},"assertion":[{"value":"8 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 November 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no Conflict of interest\/Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}