{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:20:35Z","timestamp":1750306835581,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,12,1]],"date-time":"2013-12-01T00:00:00Z","timestamp":1385856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS-0834664"],"award-info":[{"award-number":["CNS-0834664"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p>Due to the importance of reliability and security, prior studies have proposed inlining metafunctions into applications for detecting bugs and security vulnerabilities. However, because these software techniques add frequent, fine-grained instrumentation to programs, they often incur large runtime overheads. In this work, we consider an automatic thread extraction technique for removing these fine-grained checks from a main application and scheduling them on helper threads. In this way, we can leverage the resources available on a CMP to reduce the latency and overhead of fine-grained checking codes.<\/jats:p>\n          <jats:p>\n            Our parallelization strategy extracts metafunctions from a single threaded application and executes them in\n            <jats:italic>customized<\/jats:italic>\n            helper threads\u2014threads constructed to mirror relevant fragments of the main program\u2019s behavior in order to keep communication and overhead low. To get good performance, we consider optimizations that reduce communication and balance work among many threads.\n          <\/jats:p>\n          <jats:p>We evaluate our parallelization strategy on Mudflap, a pointer-use checking tool in GCC. To show the benefits of our technique, we compare it to a manually parallelized version of Mudflap. We run our experiments on an architectural simulator with support for fast queueing operations. On a subset of SPECint 2000, our automatically parallelized code using static load balance is only 19% slower, on average, than the manually parallelized version on a simulated eight-core system. In addition, our automatically parallelized code using dynamic load balance is competitive, on average, to the manually parallelized version on a simulated eight-core system. Furthermore, all the applications except parser achieve better speedups with our automatic algorithms than with the manual approach. Also, our approach introduces very little overhead in the main program\u2014it is kept under 100%, which is more than a 5.3\u00d7 reduction compared to serial Mudflap.<\/jats:p>","DOI":"10.1145\/2541228.2541237","type":"journal-article","created":{"date-parts":[[2014,1,2]],"date-time":"2014-01-02T13:09:43Z","timestamp":1388668183000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Automatic parallelization of fine-grained metafunctions on a chip multiprocessor"],"prefix":"10.1145","volume":"10","author":[{"given":"Sanghoon","family":"Lee","sequence":"first","affiliation":[{"name":"North Carolina State University Department of Electrical and Computer Engineering, Raleigh, NC"}]},{"given":"James","family":"Tuck","sequence":"additional","affiliation":[{"name":"North Carolina State University Department of Electrical and Computer Engineering, Raleigh, NC"}]}],"member":"320","published-online":{"date-parts":[[2013,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/178243.178446"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/512529.512560"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.18"},{"volume-title":"Proceedings of the 7th USENIX Security Symposium. 63--78","author":"Cowan C.","key":"e_1_2_1_4_1","unstructured":"Cowan , C. , Pu , C. , Maier , D. , Walpole , J. , Bakke , P. , Beattie , S. , Grier , A. , Wagle , P. , Zhang , Q. , and Hinton , H . 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks . In Proceedings of the 7th USENIX Security Symposium. 63--78 . Cowan, C., Pu, C., Maier, D., Walpole, J., Bakke, P., Beattie, S., Grier, A., Wagle, P., Zhang, Q., and Hinton, H. 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks. In Proceedings of the 7th USENIX Security Symposium. 63--78."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the GCC Developers Summit.","author":"Eigler F.","year":"2003","unstructured":"Eigler , F. 2003 . Mudflap: Pointer use checking for C\/C&plus;&plus; . In Proceedings of the GCC Developers Summit. Eigler, F. 2003. Mudflap: Pointer use checking for C\/C&plus;&plus;. In Proceedings of the GCC Developers Summit."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945468"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/24039.24041"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/512529.512539"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/581339.581377"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259034"},{"key":"e_1_2_1_11_1","unstructured":"Jablin T. Zhang Y. Jablin J. Huang J. Kim H. and August D. 2010. Liberty queues for epic architectures. In EPIC-8.  Jablin T. Zhang Y. Jablin J. Huang J. Kim H. and August D. 2010. Liberty queues for epic architectures. In EPIC-8."},{"key":"e_1_2_1_12_1","unstructured":"KAI-Intel Corporation. 2004. Intel Thread Checker. http:\/\/developer.intel.com\/software\/products\/threading\/tcwin.  KAI-Intel Corporation. 2004. Intel Thread Checker. http:\/\/developer.intel.com\/software\/products\/threading\/tcwin."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2009.18"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168884"},{"volume-title":"Proceedings of the 17th International Symposium on High-Performance Computer Architecture.","author":"Lee S.","key":"e_1_2_1_15_1","unstructured":"Lee , S. , Tiwari , D. , Solihin , Y. , and Tuck , J . 2011. HAQu: Hardware Accelerated Queueing for fine-grained threading on a chip multiprocessor . In Proceedings of the 17th International Symposium on High-Performance Computer Architecture. Lee, S., Tiwari, D., Solihin, Y., and Tuck, J. 2011. HAQu: Hardware Accelerated Queueing for fine-grained threading on a chip multiprocessor. In Proceedings of the 17th International Symposium on High-Performance Computer Architecture."},{"key":"e_1_2_1_16_1","volume-title":"Lecture Notes in Computer Science","volume":"2029","author":"Loginov A.","unstructured":"Loginov , A. , Yong , S. H. , Horwitz , S. , and Reps , T . 2001. Debugging via run-time type checking . Lecture Notes in Computer Science , vol. 2029 . Loginov, A., Yong, S. H., Horwitz, S., and Reps, T. 2001. Debugging via run-time type checking. Lecture Notes in Computer Science, vol. 2029."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379250"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/1060289.1060297"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542476.1542504"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837855.1806657"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/503272.503286"},{"volume-title":"Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS).","author":"Newsome J.","key":"e_1_2_1_22_1","unstructured":"Newsome , J. and Song , D . 2005. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software . In Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS). Newsome, J. and Song, D. 2005. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1353536.1346321"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605417"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/250900.250910"},{"volume-title":"Proceedings of the 2nd International Workshop on Automated and Algorithmic Debugging.","author":"Patil H.","key":"e_1_2_1_26_1","unstructured":"Patil , H. and Fischer , C. N . 1995. Efficient run-time monitoring using shadow processing . In Proceedings of the 2nd International Workshop on Automated and Algorithmic Debugging. Patil, H. and Fischer, C. N. 1995. Efficient run-time monitoring using shadow processing. In Proceedings of the 2nd International Workshop on Automated and Algorithmic Debugging."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/362422.362468"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.29"},{"key":"e_1_2_1_29_1","unstructured":"Rational Software. 2006. Purify. http:\/\/www.rational.com\/products\/purify_unix\/index.jsp.  Rational Software. 2006. Purify. http:\/\/www.rational.com\/products\/purify_unix\/index.jsp."},{"key":"e_1_2_1_30_1","volume-title":"et al","author":"Renau J.","year":"2004","unstructured":"Renau , J. et al . 2004 . SESC. http:\/\/sesc.sourceforge.net. Renau, J. et al. 2004. SESC. http:\/\/sesc.sourceforge.net."},{"volume-title":"Proceedings of the 7th International Symposium on High Performance Computer Architecture.","author":"Roth A.","key":"e_1_2_1_31_1","unstructured":"Roth , A. and Sohi , G . 2001. Speculative data-driven multithreading . In Proceedings of the 7th International Symposium on High Performance Computer Architecture. Roth, A. and Sohi, G. 2001. Speculative data-driven multithreading. In Proceedings of the 7th International Symposium on High Performance Computer Architecture."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/265924.265927"},{"key":"e_1_2_1_33_1","unstructured":"Seward J. 2004. Valgrind an open-source memory debugger for x86-GNU\/Linux. http:\/\/valgrind.kde.org.  Seward J. 2004. Valgrind an open-source memory debugger for x86-GNU\/Linux. http:\/\/valgrind.kde.org."},{"volume-title":"Proceedings of the Annual Conference on USENIX Annual Technical Conference. USENIX Association","author":"Seward J.","key":"e_1_2_1_34_1","unstructured":"Seward , J. and Nethercote , N . 2005. Using Valgrind to detect undefined value errors with bit-precision . In Proceedings of the Annual Conference on USENIX Annual Technical Conference. USENIX Association , Berkeley, CA, 2--2. Seward, J. and Nethercote, N. 2005. Using Valgrind to detect undefined value errors with bit-precision. In Proceedings of the Annual Conference on USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 2--2."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.502.0261"},{"volume-title":"Proceedings of IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2).","author":"Shetty R.","key":"e_1_2_1_36_1","unstructured":"Shetty , R. , Kharbutli , M. , Solihin , Y. , and Prvulovic , M . 2004. HeapMon: A low overhead, automatic, and programmable memory bug detector . In Proceedings of IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2). Shetty, R., Kharbutli, M., Solihin, Y., and Prvulovic, M. 2004. HeapMon: A low overhead, automatic, and programmable memory bug detector. In Proceedings of IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2)."},{"volume-title":"Proceedings of the Conference on Correct Hardware Design and Verification Methods.","author":"Stern U.","key":"e_1_2_1_37_1","unstructured":"Stern , U. and Dill , D. L . 1995. Automatic verification of the SCI cache coherence protocol . In Proceedings of the Conference on Correct Hardware Design and Verification Methods. Stern, U. and Dill, D. L. 1995. Automatic verification of the SCI cache coherence protocol. In Proceedings of the Conference on Correct Hardware Design and Verification Methods."},{"key":"e_1_2_1_38_1","unstructured":"Team T. G. 2008. GNU Compiler Collection. http:\/\/gcc.gnu.org.  Team T. G. 2008. GNU Compiler Collection. http:\/\/gcc.gnu.org."},{"volume-title":"Proceedings of the 24th International Parallel and Distributed Processing Symposium.","author":"Tiwari D.","key":"e_1_2_1_39_1","unstructured":"Tiwari , D. , Lee , S. , Tuck , J. , and Solihin , Y . 2010. Mmt: Exploiting fine-grained parallelism in dynamic memory management . In Proceedings of the 24th International Parallel and Distributed Processing Symposium. Tiwari, D., Lee, S., Tuck, J., and Solihin, Y. 2010. Mmt: Exploiting fine-grained parallelism in dynamic memory management. In Proceedings of the 24th International Parallel and Distributed Processing Symposium."},{"key":"e_1_2_1_40_1","unstructured":"Valgrind Developers. 2005. The Valgrind Quick Start Guide. http:\/\/valgrind.org\/docs\/manual\/quick-start.html.  Valgrind Developers. 2005. The Valgrind Quick Start Guide. http:\/\/valgrind.org\/docs\/manual\/quick-start.html."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2004.75"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542333"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.3"},{"volume-title":"Proceedings of the 31st International Symposium on Computer Architecture.","author":"Zhou P.","key":"e_1_2_1_44_1","unstructured":"Zhou , P. , Qin , F. , Liu , W. , Zhou , Y. , and Torellas , J . 2004b. iWatcher: Efficient architectural support for software debugging . In Proceedings of the 31st International Symposium on Computer Architecture. Zhou, P., Qin, F., Liu, W., Zhou, Y., and Torellas, J. 2004b. iWatcher: Efficient architectural support for software debugging. In Proceedings of the 31st International Symposium on Computer Architecture."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2541237","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2541228.2541237","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:09:55Z","timestamp":1750234195000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2541237"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1145\/2541228.2541237"],"URL":"https:\/\/doi.org\/10.1145\/2541228.2541237","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"2012-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}