{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T06:29:17Z","timestamp":1769236157814,"version":"3.49.0"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T00:00:00Z","timestamp":1750377600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T00:00:00Z","timestamp":1750377600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Basic Energy Sciences, U.S. Department of Energy","award":["DE- AC02-05CH11231"],"award-info":[{"award-number":["DE- AC02-05CH11231"]}]},{"name":"Basic Energy Sciences, U.S. Department of Energy","award":["DE- AC02-05CH11231"],"award-info":[{"award-number":["DE- AC02-05CH11231"]}]},{"name":"Basic Energy Sciences, U.S. Department of Energy","award":["DE- AC02-05CH11231"],"award-info":[{"award-number":["DE- AC02-05CH11231"]}]},{"name":"Basic Energy Sciences, U.S. Department of Energy","award":["DE- AC02-05CH11231"],"award-info":[{"award-number":["DE- AC02-05CH11231"]}]},{"name":"Basic Energy Sciences, U.S. Department of Energy","award":["DE- AC02-05CH11231"],"award-info":[{"award-number":["DE- AC02-05CH11231"]}]},{"name":"Basic Energy Sciences, U.S. Department of Energy","award":["DE- AC02-05CH11231"],"award-info":[{"award-number":["DE- AC02-05CH11231"]}]},{"name":"Scientific User Facilities, U.S. Department of Energy","award":["107514"],"award-info":[{"award-number":["107514"]}]},{"name":"Scientific User Facilities, U.S. Department of Energy","award":["107514"],"award-info":[{"award-number":["107514"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["5R21GM129649-0"],"award-info":[{"award-number":["5R21GM129649-0"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>MLExchange is a machine learning (ML) operations platform providing web user-interfaces (UIs) for data visualization and analysis pipelines at synchrotron facilities. Among these UIs is the segmentation app which helps synchrotron users utilize ML algorithms to automatically segment high-resolution scientific images with minimal manual annotation effort. In this work, we share code optimizations that significantly speed up the segmentation inference workflow of large data in short time. By optimizing the sequence of CPU-GPU data transfers and introducing CPU parallelization to key operations, we improve the per-device, per-image frame computational efficiency and observe close to 3<jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> speedup over the original segmentation inference workflow run time when utilizing a single GPU. Further adaptations enabling multi-GPU inference yield more than 40<jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\times$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> speedup with 100 GPUs compared to the optimized single GPU inference workflow. This acceleration of the segmentation inference workflow will provide MLExchange users with easy access to segmentation results with little wait time.<\/jats:p>","DOI":"10.1007\/s11227-025-07413-5","type":"journal-article","created":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T18:10:38Z","timestamp":1750443038000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Optimizing inference of segmentation on high-resolution images in MLExchange"],"prefix":"10.1007","volume":"81","author":[{"given":"Shizhao","family":"Lu","sequence":"first","affiliation":[]},{"given":"Tanny","family":"Chavez","sequence":"additional","affiliation":[]},{"given":"Wiebke","family":"Koepp","sequence":"additional","affiliation":[]},{"given":"Guanhua","family":"Hao","sequence":"additional","affiliation":[]},{"given":"Petrus H.","family":"Zwart","sequence":"additional","affiliation":[]},{"given":"Alexander","family":"Hexemer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,20]]},"reference":[{"issue":"7990","key":"7413_CR1","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1038\/s41586-023-06734-w","volume":"624","author":"NJ Szymanski","year":"2023","unstructured":"Szymanski NJ, Rendy B, Fei Y, Kumar RE, He T, Milsted D, McDermott MJ, Gallant M, Cubuk ED, Merchant A, Kim H, Jain A, Bartel CJ, Persson K, Zeng Y, Ceder G (2023) An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624(7990):86\u201391. https:\/\/doi.org\/10.1038\/s41586-023-06734-w","journal-title":"Nature"},{"issue":"7990","key":"7413_CR2","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1038\/s41586-023-06735-9","volume":"624","author":"A Merchant","year":"2023","unstructured":"Merchant A, Batzner S, Schoenholz SS, Aykol M, Cheon G, Cubuk ED (2023) Scaling deep learning for materials discovery. Nature 624(7990):80\u201385. https:\/\/doi.org\/10.1038\/s41586-023-06735-9","journal-title":"Nature"},{"key":"7413_CR3","unstructured":"DOE: Frontiers in Artificial Intelligence for Science, Security and Technology (FASST). Accessed: 2024-10-27 (2024). https:\/\/www.energy.gov\/fasst"},{"key":"7413_CR4","doi-asserted-by":"publisher","unstructured":"Miller WL, Bard D, Boehnlein A, Fagnan K, Guok C, Lan\u00e7on E, Ramprakash SJ, Shankar M, Schwarz N, Brown BL (2023) Integrated Research Infrastructure Architecture Blueprint Activity (Final Report 2023). OSTI 1984466, US DOE Office of Science (SC). https:\/\/doi.org\/10.2172\/1984466","DOI":"10.2172\/1984466"},{"key":"7413_CR5","unstructured":"DOE: High Performance Data Facility: Supporting the Data Life Cycle. Accessed: 2024-10-27 (2024). https:\/\/hpdf.science\/"},{"key":"7413_CR6","unstructured":"Advanced Light Source (ALS): ALS-U Timeline and Facility Impacts. Accessed: 2024-09-26 (2024). https:\/\/als.lbl.gov\/als-u\/als-u-timeline\/"},{"key":"7413_CR7","doi-asserted-by":"publisher","unstructured":"Borland M, Abliz M, Arnold N, Berenc T, Blednykh A, Byrd J, Calvey J, Carter J, Carwardine J, Cease H, Conway Z, Decker G, Dooling J, Emery L, Fuerst J, Harkay K, Jain A, Jaski M, Kallakuri P, Kelly M, Kim S-H, Lill R, Lindberg R, Liu J, Liu Z, Nudell J, Preissner C, Sajaev V, Sereno N, Sun X, Sun Y, Veseli S, Wang J, Wienands U, Xiao A, Yao C (2018) The Upgrade of the Advanced Photon Source. In: Proceedings of the 9th International Particle Accelerator Conference, pp. 2872\u20132877. https:\/\/doi.org\/10.18429\/JACOW-IPAC2018-THXGBD1","DOI":"10.18429\/JACOW-IPAC2018-THXGBD1"},{"key":"7413_CR8","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1080\/08940886.2024.2391258","volume":"37","author":"DY Parkinson","year":"2024","unstructured":"Parkinson DY, Chavez T, Choudhary M, English D, Hao G, Hellert T, Leemann SC, Nemsak S, Rotenberg E, Taylor AL, Scholl A, White AA, Islegen-Wojdyla A, Zwart PH, Hexemer A (2024) AI@ALS workshop report: machine learning needs at the advanced light source. Synchrotron Radiat News 37:49\u201364. https:\/\/doi.org\/10.1080\/08940886.2024.2391258","journal-title":"Synchrotron Radiat News"},{"key":"7413_CR9","doi-asserted-by":"publisher","unstructured":"Zhao Z, Chavez T, Holman EA, Hao G, Green A, Krishnan H, McReynolds D, Pandolfi RJ, Roberts EJ, Zwart PH, Yanxon H, Schwarz N, Sankaranarayanan S, Kalinin SV, Mehta A, Campbell SI, Hexemer A (2022) MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies. In: Proceedings of the 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP), pp. 10\u201315. https:\/\/doi.org\/10.1109\/xloop56614.2022.00007","DOI":"10.1109\/xloop56614.2022.00007"},{"key":"7413_CR10","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1107\/S1600576724001390","volume":"57","author":"EJ Roberts","year":"2024","unstructured":"Roberts EJ, Chavez T, Hexemer A, Zwart PH (2024) DLSIA: deep learning for scientific image analysis. J Appl Crystallogr 57:392\u2013402","journal-title":"J Appl Crystallogr"},{"key":"7413_CR11","unstructured":"Bluesky Collaboration: Tiled: API to structured data. Accessed: 2024-09-30 (2024). https:\/\/github.com\/bluesky\/tiled"},{"issue":"3","key":"7413_CR12","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1080\/08940886.2019.1608121","volume":"32","author":"D Allan","year":"2019","unstructured":"Allan D, Caswell T, Campbell S, Rakitin M (2019) Bluesky\u2019s Ahead: a mlti-facility collaboration for an a la carte software project for data acquisition and management. Synchrotron Radiat News 32(3):19\u201322. https:\/\/doi.org\/10.1080\/08940886.2019.1608121","journal-title":"Synchrotron Radiat News"},{"key":"7413_CR13","unstructured":"PrefectHQ: Prefect: workflow orchestration framework. Accessed: 2024-09-30 (2024). https:\/\/github.com\/PrefectHQ\/prefect"},{"issue":"9","key":"7413_CR14","doi-asserted-by":"publisher","first-page":"2901","DOI":"10.2352\/ei.2023.35.9.ipas-290","volume":"35","author":"G Hao","year":"2023","unstructured":"Hao G, Roberts EJ, Chavez T, Zhao Z, Holman EA, Yanxon H, Green A, Krishnan H, Ushizima D, McReynolds D, Schwarz N, Zwart PH, Hexemer A, Parkinson D (2023) Deploying machine learning based segmentation for scientific imaging analysis at synchrotron facilities. Electron Imaging 35(9):2901\u20132905. https:\/\/doi.org\/10.2352\/ei.2023.35.9.ipas-290","journal-title":"Electron Imaging"},{"key":"7413_CR15","unstructured":"Yanxon H, Roberts E, Parraga H, Weng J, Xu W, Ruett U, Hexemer A, Zwart P, Schwarz N (2023) Image segmentation using u-net architecture for powder x-ray diffraction images arXiv:2310.16186 [cs.LG]"},{"key":"7413_CR16","doi-asserted-by":"publisher","DOI":"10.1038\/s41524-023-00985-x","author":"Y Liu","year":"2023","unstructured":"Liu Y, Vasudevan RK, Kelley KP, Funakubo H, Ziatdinov M, Kalinin SV (2023) Learning the right channel in multimodal imaging: automated experiment in piezoresponse force microscopy. npj Comput Mater. https:\/\/doi.org\/10.1038\/s41524-023-00985-x","journal-title":"npj Comput Mater"},{"key":"7413_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.elspec.2023.147381","volume":"267","author":"T Feggeler","year":"2023","unstructured":"Feggeler T, Levitan A, Marcus MA, Ohldag H, Shapiro DA (2023) Scanning transmission x-ray microscopy at the advanced light source. J Electron Spectrosc Relat Phenom 267:147381. https:\/\/doi.org\/10.1016\/j.elspec.2023.147381","journal-title":"J Electron Spectrosc Relat Phenom"},{"issue":"2","key":"7413_CR18","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1073\/pnas.1715832114","volume":"115","author":"DM Pelt","year":"2017","unstructured":"Pelt DM, Sethian JA (2017) A mixed-scale dense convolutional neural network for image analysis. Proc Natl Acad Sci 115(2):254\u2013259. https:\/\/doi.org\/10.1073\/pnas.1715832114","journal-title":"Proc Natl Acad Sci"},{"key":"7413_CR19","doi-asserted-by":"publisher","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234\u2013241. https:\/\/doi.org\/10.1007\/978-3-319-24574-4_28","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"7413_CR20","doi-asserted-by":"publisher","unstructured":"Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen Y-W, Wu J (2020) UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055\u20131059.https:\/\/doi.org\/10.1109\/icassp40776.2020.9053405","DOI":"10.1109\/icassp40776.2020.9053405"},{"key":"7413_CR21","doi-asserted-by":"publisher","unstructured":"Lam SK, Pitrou A, Seibert S (2015) Numba: a LLVM-based Python JIT compiler. In: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. SC15, pp. 1\u20136. https:\/\/doi.org\/10.1145\/2833157.2833162","DOI":"10.1145\/2833157.2833162"},{"key":"7413_CR22","unstructured":"NVIDIA: NVIDIA NSight Systems. Accessed: 2024-10-01 (2024). https:\/\/developer.nvidia.com\/nsight-systems"},{"key":"7413_CR23","unstructured":"PyTorch Core Team: torch.cuda.nvtx \u2013 PyTorch documentation. Accessed: 2025-05-01 (2024). https:\/\/pytorch.org\/docs\/stable\/cuda.html#nvidia-tools-extension-nvtx"},{"key":"7413_CR24","doi-asserted-by":"publisher","unstructured":"Li S, Zhao Y, Varma R, Salpekar O, Noordhuis P, Li T, Paszke A, Smith J, Vaughan B, Damania P, Chintala S (2020) PyTorch Distributed: Experiences on Accelerating Data Parallel Training. Proceedings of the VLDB Endowment 13(12), 3005\u20133018 https:\/\/doi.org\/10.14778\/3415478.3415530arXiv:2006.15704 [cs.DC]","DOI":"10.14778\/3415478.3415530"},{"key":"7413_CR25","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpa.2024.100696","volume":"21","author":"PH Zwart","year":"2024","unstructured":"Zwart PH (2024) Glty: handling large tensors in scientific imaging deep-learning workflows. Softw Impacts 21:100696. https:\/\/doi.org\/10.1016\/j.simpa.2024.100696","journal-title":"Softw Impacts"},{"key":"7413_CR26","unstructured":"Kingma DP, Ba J (2017) Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings arXiv:1412.6980 [cs.LG]"},{"key":"7413_CR27","unstructured":"Agarap AF (2019) Deep Learning using Rectified Linear Units (ReLU) arXiv:1803.08375 [cs.NE]"},{"key":"7413_CR28","unstructured":"National Energy Research Scientific Computing Center (NERSC): NERSC Perlmutter Architecture. Accessed: 2024-09-24 (2024). https:\/\/docs.nersc.gov\/systems\/perlmutter\/architecture\/"},{"key":"7413_CR29","unstructured":"Recasens PG, Agullo F, Zhu Y, Wang C, Lee EK, Tardieu O, Torres J, Berral JL (2025) Mind the memory gap: Unveiling gpu bottlenecks in large-batch llm inference arXiv:2503.08311 [cs.DC]"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-025-07413-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-025-07413-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-025-07413-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T18:10:41Z","timestamp":1750443041000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-025-07413-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,20]]},"references-count":29,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["7413"],"URL":"https:\/\/doi.org\/10.1007\/s11227-025-07413-5","relation":{},"ISSN":["1573-0484"],"issn-type":[{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,20]]},"assertion":[{"value":"6 May 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 June 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"1058"}}