{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:20:25Z","timestamp":1759134025919,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,9,25]],"date-time":"2017-09-25T00:00:00Z","timestamp":1506297600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,9,25]]},"DOI":"10.1145\/3127024.3127028","type":"proceedings-article","created":{"date-parts":[[2017,8,24]],"date-time":"2017-08-24T11:58:11Z","timestamp":1503575891000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Planning for performance"],"prefix":"10.1145","author":[{"given":"Bradley","family":"Morgan","sequence":"first","affiliation":[{"name":"Auburn University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel J.","family":"Holmes","sequence":"additional","affiliation":[{"name":"The University of Edinburgh, Scotland, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anthony","family":"Skjellum","sequence":"additional","affiliation":[{"name":"Auburn University and University of Tennessee"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Purushotham","family":"Bangalore","sequence":"additional","affiliation":[{"name":"University of Alabama at Birmingham"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Srinivas","family":"Sridharan","sequence":"additional","affiliation":[{"name":"Intel Corporation, Bangalore, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,9,25]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Anonymous. 2017. Baidu DeepBench. https:\/\/github.com\/baidu-research\/DeepBench. (2017). Accessed: 2017-05-20.  Anonymous. 2017. Baidu DeepBench. https:\/\/github.com\/baidu-research\/DeepBench. (2017). Accessed: 2017-05-20."},{"key":"e_1_3_2_1_2_1","unstructured":"Anonymous. 2017. Baidu DeepBench on KNL-OPA systems. https:\/\/software.intel.com\/en-us\/articles\/intel-xeon-phi-delivers-competitive-performance-for-deep-learning-and-getting-better-fast. (2017). Accessed: 2017-05-20.  Anonymous. 2017. Baidu DeepBench on KNL-OPA systems. https:\/\/software.intel.com\/en-us\/articles\/intel-xeon-phi-delivers-competitive-performance-for-deep-learning-and-getting-better-fast. (2017). Accessed: 2017-05-20."},{"key":"e_1_3_2_1_3_1","unstructured":"Anonymous. 2017. Baidu TensorFlow. https:\/\/github.com\/baidu-research\/tensorflow-allreduce. (2017). Accessed: 2017-05-20.  Anonymous. 2017. Baidu TensorFlow. https:\/\/github.com\/baidu-research\/tensorflow-allreduce. (2017). Accessed: 2017-05-20."},{"key":"e_1_3_2_1_4_1","unstructured":"Anonymous. 2017. Intel Machine Learning Scalability Library (MLSL). https:\/\/github.com\/01org\/MLSL. (2017). Accessed: 2017-05-20.  Anonymous. 2017. Intel Machine Learning Scalability Library (MLSL). https:\/\/github.com\/01org\/MLSL. (2017). Accessed: 2017-05-20."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018769"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(94)90022-1"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.642949"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2966884.2966905"},{"key":"e_1_3_2_1_9_1","volume-title":"Petascale Tools Workshop.","author":"Compr\u00e9s Isaias A.","year":"2014","unstructured":"Isaias A. Compr\u00e9s . 2014 . On-line Application-specific Tuning with the Periscope Tuning Framework and the MPI Tools Interface. (2014) . Petascale Tools Workshop. Isaias A. Compr\u00e9s. 2014. On-line Application-specific Tuning with the Periscope Tuning Framework and the MPI Tools Interface. (2014). Petascale Tools Workshop."},{"key":"e_1_3_2_1_10_1","volume-title":"Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709","author":"Das Dipankar","year":"2016","unstructured":"Dipankar Das , Sasikanth Avancha , Dheevatsa Mudigere , Karthikeyan Vaidy-nathan, Srinivas Sridharan , Dhiraj Kalamkar , Bharat Kaul , and Pradeep Dubey . 2016. Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709 ( 2016 ). Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidy-nathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709 (2016)."},{"volume-title":"Overlapping of communication and computation and early binding: Fundamental mechanisms for improving parallel performance on clusters of workstations","author":"Dimitrov Rossen Petkov","key":"e_1_3_2_1_11_1","unstructured":"Rossen Petkov Dimitrov . 2001. Overlapping of communication and computation and early binding: Fundamental mechanisms for improving parallel performance on clusters of workstations . Mississippi State University. Ph.D. dissertation, Dept . of Computer Science. Rossen Petkov Dimitrov. 2001. Overlapping of communication and computation and early binding: Fundamental mechanisms for improving parallel performance on clusters of workstations. Mississippi State University. Ph.D. dissertation, Dept. of Computer Science."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2802658.2802667"},{"key":"e_1_3_2_1_13_1","volume-title":"Polynomial-Time Construction of Optimal MPI Derived Datatype Trees. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016","author":"Ganian Robert","year":"2016","unstructured":"Robert Ganian , Martin Kalany , Stefan Szeider , and Jesper Larsson Tr\u00e4ff . 2016 . Polynomial-Time Construction of Optimal MPI Derived Datatype Trees. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016 , Chicago, IL, USA , May 23-27, 2016. 638--647. Robert Ganian, Martin Kalany, Stefan Szeider, and Jesper Larsson Tr\u00e4ff. 2016. Polynomial-Time Construction of Optimal MPI Derived Datatype Trees. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, May 23-27, 2016. 638--647."},{"volume-title":"Deep Learning","author":"Goodfellow Ian","key":"e_1_3_2_1_14_1","unstructured":"Ian Goodfellow , Yoshua Bengio , and Aaron Courville . 2016. Deep Learning . MIT Press . http:\/\/www.deeplearningbook.org. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http:\/\/www.deeplearningbook.org."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2966884.2966890"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362692"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389129"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389129"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1080\/17445760902894688"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43659-3_32"},{"key":"e_1_3_2_1_21_1","unstructured":"IBM Corporation. 1993. AIX Parallel Environment - Parallel Programming Subroutine Reference. (1993). References nonblocking collectives with conjugate gradient example.  IBM Corporation. 1993. AIX Parallel Environment - Parallel Programming Subroutine Reference. (1993). References nonblocking collectives with conjugate gradient example."},{"key":"e_1_3_2_1_22_1","volume-title":"IBM Parallel Edition (PE) Version 1.3: Nonblocking collective communication subroutines. (Circa","author":"IBM Corporation","year":"1997","unstructured":"IBM Corporation . Circa 1997. IBM Parallel Edition (PE) Version 1.3: Nonblocking collective communication subroutines. (Circa 1997 ). URL = https:\/\/www.ibm.com\/support\/knowledgecenter\/SSFK3V_1.3.0\/com.ibm.cluster.pe.v1r3.pe500.doc\/am107_samples.htm. IBM Corporation. Circa 1997. IBM Parallel Edition (PE) Version 1.3: Nonblocking collective communication subroutines. (Circa 1997). URL = https:\/\/www.ibm.com\/support\/knowledgecenter\/SSFK3V_1.3.0\/com.ibm.cluster.pe.v1r3.pe500.doc\/am107_samples.htm."},{"key":"e_1_3_2_1_23_1","volume-title":"MPI: A Message-Passing Interface Standard, Version 1.1. Technical Report. MPI Forum","author":"Forum MPI","year":"1994","unstructured":"MPI Forum . 1994 . MPI: A Message-Passing Interface Standard, Version 1.1. Technical Report. MPI Forum , Knoxville, TN, USA . MPI Forum. 1994. MPI: A Message-Passing Interface Standard, Version 1.1. Technical Report. MPI Forum, Knoxville, TN, USA."},{"key":"e_1_3_2_1_24_1","volume-title":"MPI: A Message-Passing Interface Standard, Version 3.1. Technical Report. MPI Forum","author":"Forum MPI","year":"2015","unstructured":"MPI Forum . 2015 . MPI: A Message-Passing Interface Standard, Version 3.1. Technical Report. MPI Forum , Knoxville, TN, USA . MPI Forum. 2015. MPI: A Message-Passing Interface Standard, Version 3.1. Technical Report. MPI Forum, Knoxville, TN, USA."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2916026.2916028"},{"key":"e_1_3_2_1_26_1","volume-title":"Bangalore","author":"Skjellum Anthony","year":"2017","unstructured":"Anthony Skjellum , Daniel J. Holmes , and Purushotham V . Bangalore . 2017 . Active Datatypes in MPI : Maximizing Performance and Generalizing Persistent Operations for MPI-4. (May 2017). Unpublished . Anthony Skjellum, Daniel J. Holmes, and Purushotham V. Bangalore. 2017. Active Datatypes in MPI: Maximizing Performance and Generalizing Persistent Operations for MPI-4. (May 2017). Unpublished."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.45"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2966884.2966904"},{"key":"e_1_3_2_1_29_1","volume-title":"Fully Distributed Algorithms for Irregular Gather and Scatter. CoRR abs\/1702.05967","author":"Tr\u00e4ff Jesper Larsson","year":"2017","unstructured":"Jesper Larsson Tr\u00e4ff . 2017. Practical, Linear-time , Fully Distributed Algorithms for Irregular Gather and Scatter. CoRR abs\/1702.05967 ( 2017 ). http:\/\/arxiv.org\/abs\/1702.05967 Jesper Larsson Tr\u00e4ff. 2017. Practical, Linear-time, Fully Distributed Algorithms for Irregular Gather and Scatter. CoRR abs\/1702.05967 (2017). http:\/\/arxiv.org\/abs\/1702.05967"},{"key":"e_1_3_2_1_30_1","volume-title":"Sparse Collective Communication. CoRR abs\/1606.07676","author":"Tr\u00e4ff Jesper Larsson","year":"2016","unstructured":"Jesper Larsson Tr\u00e4ff , Alexandra Carpen-Amarie , Sascha Hunold , and Antoine Rougier . 2016. Message-Combining Algorithms for Isomorphic , Sparse Collective Communication. CoRR abs\/1606.07676 ( 2016 ). http:\/\/arxiv.org\/abs\/1606.07676 Jesper Larsson Tr\u00e4ff, Alexandra Carpen-Amarie, Sascha Hunold, and Antoine Rougier. 2016. Message-Combining Algorithms for Isomorphic, Sparse Collective Communication. CoRR abs\/1606.07676 (2016). http:\/\/arxiv.org\/abs\/1606.07676"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2597652.2597662"}],"event":{"name":"EuroMPI\/USA '17: 24th European MPI Users' Group Meeting","sponsor":["Mellanox Mellanox Technologies","Intel Intel","SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing"],"location":"Chicago Illinois","acronym":"EuroMPI\/USA '17"},"container-title":["Proceedings of the 24th European MPI Users' Group Meeting"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3127024.3127028","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3127024.3127028","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:11:06Z","timestamp":1750212666000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3127024.3127028"}},"subtitle":["persistent collective operations for MPI"],"short-title":[],"issued":{"date-parts":[[2017,9,25]]},"references-count":31,"alternative-id":["10.1145\/3127024.3127028","10.1145\/3127024"],"URL":"https:\/\/doi.org\/10.1145\/3127024.3127028","relation":{},"subject":[],"published":{"date-parts":[[2017,9,25]]},"assertion":[{"value":"2017-09-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}