{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,29]],"date-time":"2025-11-29T08:01:53Z","timestamp":1764403313541,"version":"3.41.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T00:00:00Z","timestamp":1725408000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). Therefore, it is crucial to develop an efficient instruction scheduling algorithm to explore more ILP. While heuristic algorithms are widely used in modern compilers due to simple implementation and low computational cost, they have limitations in providing accurate solutions and are prone to local optima. On the other hand, exact algorithms can usually find the optimal solution, but their high time overhead makes them less suitable for large-scale problems. This article proposes a two-dimensional constrained dynamic programming (TDCDP) approach and a quantitative model for instruction scheduling. The TDCDP approach achieves near-optimal solutions within an acceptable time overhead. Furthermore, we integrate our TDCDP approach into mainstream compiler architecture, encompassing Pre- and Post-RA (register allocation) scheduling. We conduct a quantitative evaluation of TDCDP compared with four heuristic algorithms on a typical VLIW processor. Our approach achieves an efficiency improvement of up to 58.34% in final solutions compared with the heuristic algorithms. Additionally, the Post-RA Scheduling enhances programs with an average speedup of 14.04% than solely applying the Pre-RA Scheduling.<\/jats:p>","DOI":"10.1145\/3643135","type":"journal-article","created":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T12:38:57Z","timestamp":1706186337000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming"],"prefix":"10.1145","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5632-3509","authenticated-orcid":false,"given":"Can","family":"Deng","sequence":"first","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha, China"},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1552-8396","authenticated-orcid":false,"given":"Zhaoyun","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha, China"},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5786-3171","authenticated-orcid":false,"given":"Yang","family":"Shi","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha, China"},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4026-2513","authenticated-orcid":false,"given":"Yimin","family":"Ma","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha, China"},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5875-3297","authenticated-orcid":false,"given":"Mei","family":"Wen","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha, China"},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9329-1411","authenticated-orcid":false,"given":"Lei","family":"Luo","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha, China"},{"name":"Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China"}]}],"member":"320","published-online":{"date-parts":[[2024,9,4]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3349567.3357376"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","unstructured":"Sanjoy Chakraborty Apu Kumar Saha Sushmita Sharma Seyedali Mirjalili and Ratul Chakraborty. 2021. A novel enhanced whale optimization algorithm for global optimization. Comput. Ind. Eng. 153 (2021) 107086. DOI:10.1016\/J.CIE.2020.107086","DOI":"10.1016\/J.CIE.2020.107086"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2013.129"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","unstructured":"Can Deng Zhaoyun Chen Yang Shi Xichang Kong and Mei Wen. 2022. Exploring ILP for VLIW architecture by quantified modeling and dynamic programming-based instruction scheduling. In 27th Asia and South Pacific Design Automation Conference (ASP-DAC\u201922). IEEE 256\u2013261. DOI:10.1109\/ASP-DAC52403.2022.9712500","DOI":"10.1109\/ASP-DAC52403.2022.9712500"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586283"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3548681"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339682"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","unstructured":"E. Figielska. 2014. Algorithms using list scheduling and greedy strategies for scheduling in the flowshop with resource constraints. Zeszyty Naukowe Warszawskiej Wy\u017cszej Szko\u0142y Informatyki nr 11 (2014) 29\u201339. DOI:10.26348\/znwwsi.11.29","DOI":"10.26348\/znwwsi.11.29"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-019-01493-2"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","unstructured":"Florian Giesemann Guillermo Pay\u00e1 Vay\u00e1 Lukas Gerlach Holger Blume Fabian Pflug and Gabriele von Voigt. 2017. Using a genetic algorithm approach to reduce register file pressure during instruction scheduling. In 2017 International Conference on Embedded Computer Systems: Architectures Modeling and Simulation (SAMOS\u201917). Yale N. Patt and S. K. Nandy (Eds.). IEEE 179\u2013187. DOI:10.1109\/SAMOS.2017.8344626","DOI":"10.1109\/SAMOS.2017.8344626"},{"key":"e_1_3_2_12_2","volume-title":"Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.)","author":"Hennessy John L.","year":"2011","unstructured":"John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","unstructured":"Wen-mei W. Hwu Scott A. Mahlke William Y. Chen Pohua P. Chang Nancy J. Warter Roger A. Bringmann Roland G. Ouellette Richard E. Hank Tokuzo Kiyohara Grant E. Haab John G. Holm and Daniel M. Lavery. 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7 1-2 (1993) 229\u2013248. DOI:10.1007\/BF01205185","DOI":"10.1007\/BF01205185"},{"key":"e_1_3_2_14_2","first-page":"5","article-title":"Characteristics of the longest job for new retired workers: Findings from the New Beneficiary Survey","volume":"48","author":"Iams Howard","year":"1985","unstructured":"Howard Iams. 1985. Characteristics of the longest job for new retired workers: Findings from the New Beneficiary Survey. Social Security Bulletin 48 (041985), 5\u201321.","journal-title":"Social Security Bulletin"},{"key":"e_1_3_2_15_2","unstructured":"IBM. 2021. IBM SPSS Statistics 22. Retrieved from https:\/\/www.ibm.com\/spss"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1997.599856"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-91734-4_27"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/384197.384219"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","unstructured":"Christoph W. Ke\u00dfler and Andrzej Bednarski. 2006. Optimal integrated code generation for VLIW architectures. Concurr. Comput. Pract. Exp. 18 11 (2006) 1353\u20131390. DOI:10.1002\/CPE.1012","DOI":"10.1002\/CPE.1012"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDC51059.2022.9992661"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.220.4598.671"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/SPDP.1995.530662"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","unstructured":"Chingren Lee Jenq Kuen Lee TingTing Hwang and Shi-Chun Tsai. 2003. Compiler optimization on VLIW instruction scheduling for low power. ACM Trans. Design Autom. Electr. Syst. 8 2 (2003) 252\u2013268. DOI:10.1145\/762488.762494","DOI":"10.1145\/762488.762494"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","unstructured":"Roberto Casta\u00f1eda Lozano and Christian Schulte. 2019. Survey on combinatorial register allocation and instruction scheduling. ACM Comput. Surv. 52 3 (2019) 62:1\u201362:50. DOI:10.1145\/3200920","DOI":"10.1145\/3200920"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2892208.2892237"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3200920"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.5555\/1345485.1345486"},{"key":"e_1_3_2_28_2","unstructured":"GUROBI OPTIMIZATION. 2023. GUROBI Solver. Retrieved from http:\/\/www.gurobi.com\/"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","unstructured":"Prashant Singh Rawat Fabrice Rastello Aravind Sukumaran-Rajam Louis-No\u00ebl Pouchet Atanas Rountev and P. Sadayappan. 2018. Register optimizations for stencils on GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201918). Andreas Krall and Thomas R. Gross (Eds.). ACM 168\u2013182. DOI:10.1145\/3178487.3178500","DOI":"10.1145\/3178487.3178500"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/1124713.1124724"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","unstructured":"Ghassan Shobaki Vahl Scott Gordon Paul McHugh Theodore Dubois and Austin Kerbow. 2022. Register-pressure-aware instruction scheduling using ant colony optimization. ACM Trans. Archit. Code Optim. 19 2 (2022) 23:1\u201323:23. DOI:10.1145\/3505558","DOI":"10.1145\/3505558"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","unstructured":"Ghassan Shobaki Austin Kerbow and Stanislav Mekhanoshin. 2020. Optimizing occupancy and ILP on the GPU using a combinatorial approach. In 18th ACM\/IEEE International Symposium on Code Generation and Optimization (CGO\u201920). ACM 133\u2013144. DOI:10.1145\/3368826.3377918","DOI":"10.1145\/3368826.3377918"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2512432"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.993206"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","unstructured":"Huijun Wang and Oliver Sinnen. 2018. List-scheduling versus cluster-scheduling. IEEE Trans. Parallel Distributed Syst. 29 8 (2018) 1736\u20131749. DOI:10.1109\/TPDS.2018.2808959","DOI":"10.1109\/TPDS.2018.2808959"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42514-020-00057-2"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/358438.349318"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSPA.2005.1580258"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2038698.2038707"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00089"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2022.3197685"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643135","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643135","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:31:21Z","timestamp":1750264281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643135"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,4]]},"references-count":40,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3643135"],"URL":"https:\/\/doi.org\/10.1145\/3643135","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2024,9,4]]},"assertion":[{"value":"2023-11-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}