{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T04:20:29Z","timestamp":1772252429146,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,11,5]],"date-time":"2020-11-05T00:00:00Z","timestamp":1604534400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>The new barrier mode in Apache Spark allows for embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK\u2019s new auto-vectorization and Spark\u2019s barrier execution mode, we can add non-map\/reduce-based algorithms, such as Cannon\u2019s distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon\u2019s algorithm, which significantly improves on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24% performance increase on a 10,000 \u00d7 10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network-based workloads, and thus such efficient algorithms can play a ground-breaking role in the faster and more efficient execution of even the most complicated machine learning tasks.<\/jats:p>","DOI":"10.3390\/bdcc4040032","type":"journal-article","created":{"date-parts":[[2020,11,5]],"date-time":"2020-11-05T19:38:41Z","timestamp":1604605121000},"page":"32","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9283-6865","authenticated-orcid":false,"given":"Tamas","family":"Foldi","sequence":"first","affiliation":[{"name":"Starschema Inc., Arlington, VA 22066, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3131-0864","authenticated-orcid":false,"given":"Chris","family":"von Csefalvay","sequence":"additional","affiliation":[{"name":"Starschema Inc., Arlington, VA 22066, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4282-5890","authenticated-orcid":false,"given":"Nicolas A.","family":"Perez","sequence":"additional","affiliation":[{"name":"Google Inc., Seattle, WA 98103, USA"}]}],"member":"1968","published-online":{"date-parts":[[2020,11,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1016\/j.neucom.2015.09.116","article-title":"Deep learning for visual understanding: A review","volume":"187","author":"Guo","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Voulodimos, A., Doulamis, N., Doulamis, A., and Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Comput. Intell. Neurosci., 2018.","DOI":"10.1155\/2018\/7068349"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/TCBB.2014.2343960","article-title":"A deep learning network approach to ab initio protein secondary structure prediction","volume":"12","author":"Spencer","year":"2014","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat. Biotechnol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"e32","DOI":"10.1093\/nar\/gkv1025","article-title":"A deep learning framework for modeling structural features of RNA-binding protein targets","volume":"44","author":"Zhang","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1016\/j.jpdc.2017.08.009","article-title":"Prediction of human protein subcellular localization using deep learning","volume":"117","author":"Wei","year":"2018","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Deselaers, T., Hasan, S., Bender, O., and Ney, H. (2009, January 30\u201331). A deep learning approach to machine transliteration. Proceedings of the Fourth Workshop on Statistical Machine Translation. Association for Computational Linguistics, Athens, Greece.","DOI":"10.3115\/1626431.1626476"},{"key":"ref_8","unstructured":"Socher, R., Bengio, Y., and Manning, C. (2012). Deep learning for NLP. Tutor Abstr. ACL, Available online: https:\/\/nlp.stanford.edu\/courses\/NAACL2013\/."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MCI.2018.2840738","article-title":"Recent trends in deep learning based Natural Language Processing","volume":"13","author":"Young","year":"2018","journal-title":"IEEE Comput. Intell. Mag."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Otter, D.W., Medina, J.R., and Kalita, J.K. (2020). A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Trans. Neural Netw. Learn. Syst.","DOI":"10.1109\/TNNLS.2020.2979670"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., and Greenspan, H. (2015, January 16\u201319). Chest pathology detection using deep learning with non-medical training. Proceedings of the 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York, NY, USA.","DOI":"10.1109\/ISBI.2015.7163871"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Havaei, M., Guizard, N., Larochelle, H., and Jodoin, P.M. (2016). Deep learning trends for focal brain pathology segmentation in MRI. Machine Learning for Health Informatics, Springer.","DOI":"10.1007\/978-3-319-50478-0_6"},{"key":"ref_13","unstructured":"Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G.E., Kohlberger, T., Boyko, A., Venugopalan, S., Timofeev, A., Nelson, P.Q., and Corrado, G.S. (2017). Detecting cancer metastases on gigapixel pathology images. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1001\/jama.2018.11029","article-title":"Clinical implications and challenges of artificial intelligence and deep learning","volume":"320","author":"Stead","year":"2018","journal-title":"JAMA"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1038\/s41591-019-0508-1","article-title":"Clinical-grade computational pathology using weakly supervised deep learning on whole slide images","volume":"25","author":"Campanella","year":"2019","journal-title":"Nat. Med."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1148\/radiol.2018180694","article-title":"Mammographic breast density assessment using deep learning: Clinical implementation","volume":"290","author":"Lehman","year":"2019","journal-title":"Radiology"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Du, M., Li, F., Zheng, G., and Srikumar, V. (November, January 30). Deeplog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA.","DOI":"10.1145\/3133956.3134015"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/TETCI.2017.2772792","article-title":"A deep learning approach to network intrusion detection","volume":"2","author":"Shone","year":"2018","journal-title":"IEEE Trans. Emerg. Top. Comput. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chalapathy, R., and Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv.","DOI":"10.1145\/3394486.3406704"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, N., and Yeung, D.Y. (2015, January 12\u201316). Collaborative deep learning for recommender systems. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Denver, CO, USA.","DOI":"10.1145\/2783258.2783273"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1164","DOI":"10.1109\/TNNLS.2016.2514368","article-title":"On deep learning for trust-aware recommendations in social networks","volume":"28","author":"Deng","year":"2016","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Karatzoglou, A., and Hidasi, B. (2017, January 27\u201331). Deep learning for recommender systems. Proceedings of the Eleventh ACM Conference on Recommender Systems, Como, Italy.","DOI":"10.1145\/3109859.3109933"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10462-018-9654-y","article-title":"A review on deep learning for recommender systems: Challenges and remedies","volume":"52","author":"Batmaz","year":"2019","journal-title":"Artif. Intell. Rev."},{"key":"ref_24","unstructured":"Zadeh, R.B., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E., Staple, A., and Zaharia, M. (2016). Matrix Computations and Optimization in Apache Spark. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Gu, R., Tang, Y., Wang, Z., Wang, S., Yin, X., Yuan, C., and Huang, Y. (November, January 29). Efficient large scale distributed matrix computation with spark. Proceedings of the 2015 IEEE International Conference on Big Data, Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7364023"},{"key":"ref_26","unstructured":"Misra, C., Bhattacharya, S., and Ghosh, S.K. (2020). Stark: Fast and Scalable Strassen\u2019s Matrix Multiplication using Apache Spark. IEEE Trans. Big Data."},{"key":"ref_27","unstructured":"Coppersmith, D., and Winograd, S. (June, January 31). Matrix multiplication via arithmetic progressions. Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, New York, NY, USA."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Williams, V.V. (2012, January 20\u201322). Multiplying matrices faster than Coppersmith-Winograd. Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA.","DOI":"10.1145\/2213977.2214056"},{"key":"ref_29","unstructured":"Cannon, L.E. (1969). A Cellular Computer to Implement the Kalman Filter Algorithm. [Ph.D. Thesis, Montana State University-Bozeman, College of Engineering]."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lee, H.J., Robertson, J.P., and Fortes, J.A. (1997, January 7\u201311). Generalized Cannon\u2019s algorithm for parallel matrix multiplication. Proceedings of the 11th International Conference on Supercomputing, Vienna, Austria.","DOI":"10.1145\/263580.263591"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, Y., and Li, H. (2012, January 19\u201322). Optimization of parallel I\/O for cannon\u2019s algorithm based on lustre. Proceedings of the 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science, Guilin, China.","DOI":"10.1109\/DCABES.2012.61"},{"key":"ref_32","unstructured":"Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., and Shelhamer, E. (2014). Cudnn: Efficient primitives for deep learning. arXiv."},{"key":"ref_33","unstructured":"Damji, J. (2018). Bay Area Apache Spark Meetup Summary at Databricks, HQ in San Francisco HQ, Databricks Inc."},{"key":"ref_34","unstructured":"Meng, X. (2020, November 02). Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark. Available online: https:\/\/www.slideshare.net\/databricks\/project-hydrogen-stateoftheart-deep-learning-on-apache-spark."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Foldi, T., von Csefalvay, C., and Perez, N.A. (2020). JAMPI: Efficient matrix multiplication in Spark using Barrier Execution Mode. arXiv.","DOI":"10.20944\/preprints202007.0450.v1"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/4\/4\/32\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:29:50Z","timestamp":1760178590000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/4\/4\/32"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,5]]},"references-count":35,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["bdcc4040032"],"URL":"https:\/\/doi.org\/10.3390\/bdcc4040032","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202007.0450.v1","asserted-by":"object"}]},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,5]]}}}