{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:20:38Z","timestamp":1768029638338,"version":"3.49.0"},"reference-count":42,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2015,6,2]],"date-time":"2015-06-02T00:00:00Z","timestamp":1433203200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2016,5]]},"abstract":"<jats:p> The graphic processing unit (GPU) is becoming increasingly popular as a performance accelerator in various applications requiring high-performance parallel computing capability. In a central processing unit (CPU) or GPU hybrid system, software pipelining is a major task in order to deliver accelerated performance, where hiding CPU\u2013GPU communication overheads by splitting a large task into small units is the key challenge. In this paper, we carry out a systematic investigation into task partitioning in order to achieve maximum performance gain. We first validate the advantage of even partition strategy, and then propose the optimal scheduling, with detailed study into how to achieve optimal unit size (data granularity) in an analytical framework. Experiments on AMD and NVIDIA GPU platforms demonstrate that our approaches achieve around 31 \u2013 59% performance improvement using software pipelining. <\/jats:p>","DOI":"10.1177\/1094342015585845","type":"journal-article","created":{"date-parts":[[2015,6,4]],"date-time":"2015-06-04T00:14:41Z","timestamp":1433376881000},"page":"169-185","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":9,"title":["Software pipelining for graphic processing unit acceleration: Partition, scheduling and granularity"],"prefix":"10.1177","volume":"30","author":[{"given":"Bozhong","family":"Liu","sequence":"first","affiliation":[{"name":"School of Information Security Engineering, Shanghai Jiao Tong University, China"}]},{"given":"Weidong","family":"Qiu","sequence":"additional","affiliation":[{"name":"School of Information Security Engineering, Shanghai Jiao Tong University, China"}]},{"given":"Lin","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Information Security Engineering, Shanghai Jiao Tong University, China"}]},{"given":"Zheng","family":"Gong","sequence":"additional","affiliation":[{"name":"School of Computer Science, South China Normal University, China"}]}],"member":"179","published-online":{"date-parts":[[2015,6,2]]},"reference":[{"key":"bibr1-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/71.476167"},{"key":"bibr2-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27257-8_16"},{"key":"bibr3-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.236"},{"key":"bibr4-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/1015706.1015800"},{"key":"bibr5-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2012.03.004"},{"key":"bibr6-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23397-5_40"},{"key":"bibr7-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-55895-0_462"},{"key":"bibr8-1094342015585845","first-page":"47","volume-title":"Proceedings of the 2004 ACM\/IEEE conference on supercomputing","author":"Fan Z","year":"2004"},{"key":"bibr9-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-009-9266-8"},{"key":"bibr10-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/75362.75411"},{"key":"bibr11-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2011.07.011"},{"key":"bibr12-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/71.544355"},{"key":"bibr13-1094342015585845","first-page":"69","author":"Guevara M","year":"2009","journal-title":"Workshop on programming models and emerging architectures"},{"key":"bibr14-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-012-1205-4"},{"key":"bibr15-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555775"},{"key":"bibr16-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145818"},{"key":"bibr17-1094342015585845","doi-asserted-by":"publisher","DOI":"10.15803\/ijnc.2.1_131"},{"key":"bibr18-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1147\/rd.515.0503"},{"key":"bibr19-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2011.13"},{"key":"bibr20-1094342015585845","first-page":"401","volume-title":"2012 USENIX annual technical conference","author":"Kato S","year":"2012"},{"key":"bibr21-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/567532.567555"},{"key":"bibr22-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/53990.54022"},{"key":"bibr23-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/SPDP.1992.242742"},{"key":"bibr24-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2012.05.004"},{"key":"bibr25-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669121"},{"key":"bibr26-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/ICSPC.2007.4728256"},{"key":"bibr27-1094342015585845","first-page":"1","volume-title":"2010 IEEE international symposium on parallel distributed processing, workshops and PhD forum (IPDPSW)","author":"Mei C","year":"2010"},{"key":"bibr28-1094342015585845","author":"Munshi A","year":"2008","journal-title":"SIGGRAPH, Tutorial"},{"key":"bibr29-1094342015585845","unstructured":"NVIDIA Corporation (2007) CUDA Programming Guide. June"},{"key":"bibr30-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13858-4_5"},{"key":"bibr31-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.22"},{"key":"bibr32-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1145\/1014192.802449"},{"key":"bibr33-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmgm.2010.06.010"},{"key":"bibr34-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_2"},{"key":"bibr35-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2009.20"},{"key":"bibr36-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid.2014.16"},{"key":"bibr37-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2008.4630035"},{"key":"bibr38-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-011-0184-1"},{"key":"bibr39-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2010.5456975"},{"key":"bibr40-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2012.41"},{"key":"bibr41-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_82"},{"key":"bibr42-1094342015585845","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-76900-2_15"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015585845","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342015585845","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015585845","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T23:19:53Z","timestamp":1741043993000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342015585845"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,6,2]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,5]]}},"alternative-id":["10.1177\/1094342015585845"],"URL":"https:\/\/doi.org\/10.1177\/1094342015585845","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,6,2]]}}}