{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T04:18:35Z","timestamp":1773116315501,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,16]],"date-time":"2023-02-16T00:00:00Z","timestamp":1676505600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,16]],"date-time":"2023-02-16T00:00:00Z","timestamp":1676505600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cloud Comp"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Recently, deep neural networks (DNNs) have shown great promise in many fields while their parameter sizes are rapidly expanding. To break through the computation and memory limitation of a single machine, pipeline model parallelism is proposed for large-scale DNN training by fully utilizing the computation and storage power of the distributed cluster. Cloud data centers can also provide sufficient computing, storage and bandwidth resources. However, most existing approaches apply layer-wise partitioning, which is difficult to obtain an even model partition result because of the large computational overhead discrepancy between DNN layers, resulting in degraded efficiency. To tackle this issue, we propose \u201cBi-Partition\u201d, a novel partitioning method based on bidirectional partitioning for forward propagation (FP) and backward propagation (BP), which improves the efficiency of the pipeline model parallelism system. By deliberated designing distinct cut positions for FP and BP of DNN training, workers in the pipeline get nearly equal computational loads, and the balanced pipeline fully utilizes the computing resources. Experiments on various DNN models and datasets validate the efficiency of our mechanism, e.g., the training efficiency achieving up to 1.9\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mo>\u00d7<\/mml:mo>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    faster than the state-of-the-art method PipeDream.\n                  <\/jats:p>","DOI":"10.1186\/s13677-022-00382-7","type":"journal-article","created":{"date-parts":[[2023,2,16]],"date-time":"2023-02-16T03:02:52Z","timestamp":1676516572000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["A bidirectional DNN partition mechanism for efficient pipeline parallel training in cloud"],"prefix":"10.1186","volume":"12","author":[{"given":"Lingyun","family":"Cui","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhihao","family":"Qu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guomin","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Tang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Baoliu","family":"Ye","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,2,16]]},"reference":[{"key":"382_CR1","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc of the ICLR. OpenReview.net, Austria"},{"key":"382_CR2","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proc. of the IEEE\/CVF ICCV. IEEE, Montreal, BC, Canada, p 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"382_CR3","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Proc of the NeurIPS, vol 30. Curran Associates Inc.57, Long Beach, CA, USA"},{"key":"382_CR4","unstructured":"Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et\u00a0al (2016) Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. preprint ArXiv:1609.08144"},{"key":"382_CR5","unstructured":"Shoeybi M, Patwary M, Puri R, LeGresley P, Casper J, Catanzaro B (2019) Megatron-LM: Training multi-billion parameter language models using model parallelism. preprint ArXiv:1909.08053"},{"key":"382_CR6","unstructured":"Fedus W, Zoph B, Shazeer N (2021) Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. preprint ArXiv:2101.03961"},{"issue":"2","key":"382_CR7","doi-asserted-by":"publisher","first-page":"670","DOI":"10.1109\/TGCN.2021.3067374","volume":"5","author":"Y Huang","year":"2021","unstructured":"Huang Y, Xu H, Gao H, Ma X, Hussain W (2021) Ssur: An approach to optimizing virtual machine allocation strategy based on user requirements for cloud data center. IEEE Trans Green Commun Netw 5(2):670\u2013681. https:\/\/doi.org\/10.1109\/TGCN.2021.3067374","journal-title":"IEEE Trans Green Commun Netw"},{"issue":"1","key":"382_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13677-022-00293-7","volume":"11","author":"I Mohamed","year":"2022","unstructured":"Mohamed I, Al-Mahdi H, Tahoun M, Nassar H (2022) Characterization of task response time in fog enabled networks using queueing theory under different virtualization modes. J Cloud Comput 11(1):1\u201317","journal-title":"J Cloud Comput"},{"issue":"4","key":"382_CR9","doi-asserted-by":"publisher","first-page":"2131","DOI":"10.1109\/COMST.2021.3106401","volume":"23","author":"Q Luo","year":"2021","unstructured":"Luo Q, Hu S, Li C, Li G, Shi W (2021) Resource scheduling in edge computing: A survey. IEEE Commun Surv Tutorials 23(4):2131\u20132165. https:\/\/doi.org\/10.1109\/COMST.2021.3106401","journal-title":"IEEE Commun Surv Tutorials"},{"issue":"1","key":"382_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13677-020-00201-x","volume":"9","author":"M Pang","year":"2020","unstructured":"Pang M, Wang L, Fang N (2020) A collaborative scheduling strategy for iov computing resources considering location privacy protection in mobile edge computing environment. J Cloud Comput 9(1):1\u201317","journal-title":"J Cloud Comput"},{"key":"382_CR11","first-page":"583","volume":"14","author":"M Li","year":"2014","unstructured":"Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. Proc OSDI 14:583\u2013598.\u00a0USENIX Association,\u00a0Broomfield, CO, USA","journal-title":"Proc OSDI"},{"key":"382_CR12","doi-asserted-by":"crossref","unstructured":"Gupta V, Choudhary D, Tang P, Wei X, Wang X, Huang Y, Kejariwal A, Ramchandran K, Mahoney MW (2021) Training recommender systems at scale: Communication-efficient model and data parallelism. In: Proc. of the SIGKDD. ACM, Singapore, p 2928\u20132936","DOI":"10.1145\/3447548.3467080"},{"key":"382_CR13","unstructured":"Rothchild D, Panda A, Ullah E, Ivkin N, Stoica I, Braverman V, Gonzalez J, Arora R (2020) FetchSGD: Communication-efficient federated learning with sketching. In: Proc. of the ICML. PMLR. Vienna, Austria, p 8253\u20138265"},{"key":"382_CR14","first-page":"13551","volume":"33","author":"CY Chen","year":"2020","unstructured":"Chen CY, Ni J, Lu S, Cui X, Chen PY, Sun X, Wang N, Venkataramani S, Srinivasan VV, Zhang W et al (2020) Scalecom: Scalable sparsified gradient compression for communication-efficient distributed training. Proc NeurIPS 33:13551\u201313563","journal-title":"Proc NeurIPS"},{"key":"382_CR15","doi-asserted-by":"publisher","DOI":"10.1017\/9781108955959","volume-title":"Edge Learning for Distributed Big Data Analytics: Theory, Algorithms, and System Design","author":"S Guo","year":"2022","unstructured":"Guo S, Qu Z (2022) Edge Learning for Distributed Big Data Analytics: Theory, Algorithms, and System Design. Cambridge University Press, United Kingdom"},{"issue":"7","key":"382_CR16","first-page":"1","volume":"54","author":"J Zhang","year":"2021","unstructured":"Zhang J, Qu Z, Chen C, Wang H, Zhan Y, Ye B, Guo S (2021) Edge learning: The enabling technology for distributed big data analytics in the edge. ACM Comput Surv (CSUR) 54(7):1\u201336","journal-title":"ACM Comput Surv (CSUR)"},{"issue":"12","key":"382_CR17","doi-asserted-by":"publisher","first-page":"4502","DOI":"10.1109\/TMC.2021.3083154","volume":"21","author":"Z Qu","year":"2021","unstructured":"Qu Z, Guo S, Wang H, Ye B, Wang Y, Zomaya A, Tang B (2021) Partial synchronization to accelerate federated learning over relay-assisted edge networks. IEEE Trans Mobile Comput 21(12):4502\u20134516","journal-title":"IEEE Trans Mobile Comput"},{"key":"382_CR18","unstructured":"Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le QV, Wu Y, et al (2019) Gpipe: Efficient training of giant neural networks using pipeline parallelism. In: Proc of the NeurIPS, vol 32. Curran Associates Inc.57, Vancouver, BC, Canada"},{"key":"382_CR19","doi-asserted-by":"crossref","unstructured":"Li S, Hoefler T (2021) Chimera: efficiently training large-scale neural networks with bidirectional pipelines. In: Proc. of the SC. ACM, St. Louis, Missouri, USA, p 1\u201314","DOI":"10.1145\/3458817.3476145"},{"key":"382_CR20","doi-asserted-by":"crossref","unstructured":"Narayanan D, Harlap A, Phanishayee A, Seshadri V, Devanur NR, Ganger GR, Gibbons PB, Zaharia M (2019) PipeDream: generalized pipeline parallelism for DNN training. In: Proc. of the SOSP. ACM, Huntsville, ON, Canada, p 1\u201315","DOI":"10.1145\/3341301.3359646"},{"key":"382_CR21","unstructured":"Narayanan D, Phanishayee A, Shi K, Chen X, Zaharia M (2021) Memory-efficient pipeline-parallel dnn training. In: Proc of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, p 7937\u20137947"},{"key":"382_CR22","doi-asserted-by":"crossref","unstructured":"Fan S, Rong Y, Meng C, Cao Z, Wang S, Zheng Z, Wu C, Long G, Yang J, Xia L, et al (2021) DAPPLE: A pipelined data parallel approach for training large models. In: Proc. of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, p 431\u2013445","DOI":"10.1145\/3437801.3441593"},{"key":"382_CR23","unstructured":"Park JH, Yun G, Chang MY, Nguyen NT, Lee S, Choi J, Noh SH, Choi Yr (2020) HetPipe: Enabling large DNN training on (whimpy) heterogeneous GPU clusters through integration of pipelined model parallelism and data parallelism. In: 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, p 307\u2013321"},{"issue":"3","key":"382_CR24","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1109\/TPDS.2021.3094364","volume":"33","author":"S Zhao","year":"2021","unstructured":"Zhao S, Li F, Chen X, Guan X, Jiang J, Huang D, Qing Y, Wang S, Wang P, Zhang G et al (2021) v pipe: A virtualized acceleration system for achieving efficient and scalable pipeline parallel dnn training. IEEE Trans Parallel Distrib Syst 33(3):489\u2013506","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"382_CR25","unstructured":"Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al (2016) TensorFlow: a system for Large-Scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16). USENIX Association, GA, USA, p 265\u2013283"},{"issue":"11","key":"382_CR26","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y LeCun","year":"1998","unstructured":"LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324","journal-title":"Proc IEEE"},{"key":"382_CR27","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. preprint ArXiv:1409.1556"},{"key":"382_CR28","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proc of the NeurIPS, vol 25. Curran Associates Inc.57, Lake Tahoe, Nevada, USA"},{"key":"382_CR29","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. of the IEEE conference on computer vision and pattern recognition. pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"382_CR30","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, vol 30. Curran Associates Inc.57, Long Beach, CA, USA"},{"issue":"4","key":"382_CR31","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1007\/s11023-020-09548-1","volume":"30","author":"L Floridi","year":"2020","unstructured":"Floridi L, Chiriatti M (2020) Gpt-3: Its nature, scope, limits, and consequences. Minds Mach 30(4):681\u2013694","journal-title":"Minds Mach"},{"key":"382_CR32","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"}],"container-title":["Journal of Cloud Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-022-00382-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13677-022-00382-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13677-022-00382-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,16]],"date-time":"2023-02-16T03:07:06Z","timestamp":1676516826000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofcloudcomputing.springeropen.com\/articles\/10.1186\/s13677-022-00382-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,16]]},"references-count":32,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["382"],"URL":"https:\/\/doi.org\/10.1186\/s13677-022-00382-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-2166516\/v1","asserted-by":"object"}]},"ISSN":["2192-113X"],"issn-type":[{"value":"2192-113X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,16]]},"assertion":[{"value":"14 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 December 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Hohai University, Army Engineering University and Nanjing University.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"22"}}