{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T09:36:32Z","timestamp":1761989792191,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T00:00:00Z","timestamp":1742342400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>The fast development of biomolecular structure determination has enabled the fine-grained study of objects in the micro-world, such as proteins and RNAs. The world is benefited. However, as the computational algorithms are constantly developed, the enrichment of features increases the algorithmic complexity and brings more computationally unfriendly modules. It calls for efficient solutions to leverage the rich and various hardware resources from the world\u2019s most state-of-the-art supercomputing systems, and to fully accelerate the performance of the applications. In this article, we present our efforts on porting and optimizing the 3D reconstruction of RELION, one of the most popular cryo-EM software for biomolecular structure determinations, by leveraging different resources of the latest generation of Sunway heterogeneous supercomputer. Several novel approaches are proposed to resolve different challenges faced by the complex algorithm, including a multi-level parallel scheme and operator optimizations to smartly map and scale RELION, efficient strategies to largely address the memory bottlenecks and improve data locality, lock-free writing solutions to minimize write-write conflicts, and pipelining approaches to obtain excellent computation and communication overlap. Combining all proposed optimizations, the computation time is greatly reduced to under 2 hours, achieving 11.9\u00d7 and 8.9\u00d7 speedups on two different datasets. The overall design scales to 131,072 cores, increasing parallel efficiency from 33% to 61% and from 46% to 70%, respectively. To the best of our knowledge, this is the first work that fully optimized and scaled the 3D reconstruction of RELION using the latest Sunway system.<\/jats:p>","DOI":"10.1145\/3701990","type":"journal-article","created":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T09:52:28Z","timestamp":1730281948000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Leveraging the Hardware Resources to Accelerate cryo-EM Reconstruction of RELION on the New Sunway Supercomputer"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-9461-8187","authenticated-orcid":false,"given":"Jingle","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8610-2945","authenticated-orcid":false,"given":"Jiayu","family":"Fu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1297-4462","authenticated-orcid":false,"given":"Lin","family":"Gan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6363-134X","authenticated-orcid":false,"given":"Yaojian","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5947-7192","authenticated-orcid":false,"given":"Zhaoqi","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2932-3931","authenticated-orcid":false,"given":"Zhenchun","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China and Zhejiang Lab, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8673-8254","authenticated-orcid":false,"given":"Guangwen","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China and National Supercomputing Center in Wuxi, Wuxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,19]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Aurora. Retrieved October 2024 from http:\/\/aurora.alcf.anl.gov"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1109\/SC.2018.00043","volume-title":"Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Chen Bingwei","year":"2018","unstructured":"Bingwei Chen, Haohuan Fu, Yanwen Wei, Conghui He, Wenqiang Zhang, Yuxuan Li, Wubin Wan, Wei Zhang, Lin Gan, Zhenguo Zhang, Guangwen Yang, and Xiaofei Chen. 2018. Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight. In Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Dallas, TX, USA, 517\u2013528."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577529"},{"issue":"5124","key":"e_1_3_2_5_2","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1038\/217130a0","article-title":"Reconstruction of three dimensional structures from electron micrographs","volume":"217","author":"Rosier D. J. De","year":"1968","unstructured":"D. J. De Rosier and A. Klug. 1968. Reconstruction of three dimensional structures from electron micrographs. Nature 217, 5124 (1968), 130\u2013134.","journal-title":"Nature"},{"key":"e_1_3_2_6_2","article-title":"Report on the Sunway TaihuLight system","volume":"20","author":"Dongarra Jack","year":"2016","unstructured":"Jack Dongarra. 2016. Report on the Sunway TaihuLight system. PDF). www. netlib. org. Retrieved June 20 (2016).","journal-title":"PDF). www. netlib. org. Retrieved June"},{"key":"e_1_3_2_7_2","first-page":"89","article-title":"TOP500 supercomputer sites","volume":"13","author":"Dongarra Jack J.","year":"1997","unstructured":"Jack J. Dongarra, Hans W. Meuer, Erich Strohmaier. 1997. TOP500 supercomputer sites. Supercomputer 13 (1997), 89\u2013111.","journal-title":"Supercomputer"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1109\/SC.2018.00015","volume-title":"Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Duan Xiaohui","year":"2018","unstructured":"Xiaohui Duan, Ping Gao, Tingjian Zhang, Meng Zhang, Weiguo Liu, Wusheng Zhang, Wei Xue, Haohuan Fu, Lin Gan, Dexun Chen, Xiangxu Meng, and Guangwen Yang. 2018. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 148\u2013159."},{"issue":"6","key":"e_1_3_2_9_2","first-page":"496","article-title":"A pipeline approach to single-particle processing in RELION","volume":"73","author":"Fernandez-Leiro Rafael","year":"2017","unstructured":"Rafael Fernandez-Leiro and Sjors H. W. Scheres. 2017. A pipeline approach to single-particle processing in RELION. Acta Crystallographica 73, 6 (2017), 496.","journal-title":"Acta Crystallographica"},{"key":"e_1_3_2_10_2","unstructured":"Agner Fog. 2022. Instruction Tables: Lists of Instruction Latencies Throughputs and Micro-Operation Breakdowns for Intel AMD and VIA CPUs. Technical University of Denmark(2022). Retrieved from https:\/\/www.agner.org\/optimize\/instruction_tables.pdf"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_3_2_12_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Fu Haohuan","year":"2017","unstructured":"Haohuan Fu, Conghui He, Bingwei Chen, Zekun Yin, Zhenguo Zhang, Wenqiang Zhang, Tingjian Zhang, Wei Xue, Weiguo Liu, Wanwang Yin, Guangwen Yang, and Xiaofei Chen. 2017. 9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: Enabling depiction of 18-Hz and 8-meter scenarios. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201312."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126909"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2020.101212"},{"key":"e_1_3_2_15_2","first-page":"62","volume-title":"Proceedings of the 2019 19th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing","author":"Gan Lin","year":"2019","unstructured":"Lin Gan, Jingheng Xu, Xin Wang, Sihai Wu, Xiaohui Duan, Yuxuan Li, Haohuan Fu, and Guangwen Yang. 2019. Million-core-scalable simulation of the elastic migration algorithm on Sunway TaihuLight supercomputer. In Proceedings of the 2019 19th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 62\u201369."},{"issue":"5","key":"e_1_3_2_16_2","first-page":"355","article-title":"Deep neural network operator acceleration library optimization based on domestic many-core processor","volume":"49","author":"Gao Jie","year":"2022","unstructured":"Jie Gao, Sha Liu, Zeqiang Huang, Tianyu Zheng, Xin Liu, and Fengbin Qi. 2022. Deep neural network operator acceleration library optimization based on domestic many-core processor. Computer Science 49, 5 (2022), 355\u2013362.","journal-title":"Computer Science"},{"key":"e_1_3_2_17_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Gao Ping","year":"2021","unstructured":"Ping Gao, Xiaohui Duan, Jiaxu Guo, Jin Wang, Zhenya Song, Lizhen Cui, Xiangxu Meng, Xin Liu, Wusheng Zhang, Ming Ma, Guohui Li, Dexun Chen, Haohuan Fu, Wei Xue, Weiguo Liu, and Guangwen Yang. 2021. LMFF: Efficient and scalable layered materials force field on heterogeneous many-core processors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201314."},{"issue":"7607","key":"e_1_3_2_18_2","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1038\/nature17964","article-title":"TRPV1 structures in nanodiscs reveal mechanisms of ligand and lipid action","volume":"534","author":"Gao Yuan","year":"2016","unstructured":"Yuan Gao, Erhu Cao, David Julius, and Yifan Cheng. 2016. TRPV1 structures in nanodiscs reveal mechanisms of ligand and lipid action. Nature 534, 7607 (2016), 347\u2013351.","journal-title":"Nature"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCI.2021.3096491"},{"issue":"2","key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1109\/MCSE.2018.021651341","article-title":"Stepping up to Summit","volume":"20","author":"Hines Jonathan","year":"2018","unstructured":"Jonathan Hines. 2018. Stepping up to Summit. Computing in Science & Engineering 20, 2 (2018), 78\u201382.","journal-title":"Computing in Science & Engineering"},{"key":"e_1_3_2_21_2","first-page":"42","volume-title":"Proceedings of the 2019 IEEE International Solid-State Circuits Conference","author":"Kahle James A.","year":"2019","unstructured":"James A. Kahle, Jaime Moreno, and Dan Dreps. 2019. 2.1 Summit and Sierra: Designing AI\/HPC supercomputers. In Proceedings of the 2019 IEEE International Solid-State Circuits Conference. IEEE, 42\u201343."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1042\/BCJ20210708"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.18722"},{"key":"e_1_3_2_24_2","volume-title":"Leonardo: A Simulator4Earth","author":"Lanucara Piero","year":"2023","unstructured":"Piero Lanucara and Giorgio Amati. 2023. Leonardo: A Simulator4Earth. Technical Report. Copernicus Meetings."},{"key":"e_1_3_2_25_2","first-page":"540","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Levy Axel","year":"2022","unstructured":"Axel Levy, Fr\u00e9d\u00e9ric Poitevin, Julien Martel, Youssef Nashed, Ariana Peck, Nina Miolane, Daniel Ratner, Mike Dunne, and Gordon Wetzstein. 2022. CryoAI: Amortized inference of poses for ab initio reconstruction of 3d molecular volumes from real Cryo-EM images. In Proceedings of the European Conference on Computer Vision. Springer, 540\u2013557."},{"issue":"7639","key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1038\/nature20819","article-title":"Structure of a eukaryotic cyclic-nucleotide-gated channel","volume":"542","author":"Li Minghui","year":"2017","unstructured":"Minghui Li, Xiaoyuan Zhou, Shu Wang, Ioannis Michailidis, Ye Gong, Deyuan Su, Huan Li, Xueming Li, and Jian Yang. 2017. Structure of a eukaryotic cyclic-nucleotide-gated channel. Nature 542, 7639 (2017), 60\u201365.","journal-title":"Nature"},{"issue":"4","key":"e_1_3_2_27_2","first-page":"824","article-title":"Enabling large-scale simulation of CAM on the Sunway TaihuLight supercomputer","volume":"71","author":"Li Yuxuan","year":"2021","unstructured":"Yuxuan Li, Xiaohui Duan, Lin Gan, Wubing Wan, Yuhu Chen, Kai Xu, Jinzhe Yang, Weiguo Liu, Wei Xue, Haohuan Fu, and Guangwen Yang. 2021. Enabling large-scale simulation of CAM on the Sunway TaihuLight supercomputer. IEEE Transactions on Computers 71, 4 (2021), 824\u2013837.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_3_2_28_2","first-page":"706","volume-title":"Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Lin Heng","year":"2018","unstructured":"Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, Weimin Zheng, and Jingfang Xu. 2018. Shentu: Processing multi-trillion edge graphs on millions of cores in seconds. In Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 706\u2013716."},{"key":"e_1_3_2_29_2","first-page":"192","volume-title":"Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Ma Zixuan","year":"2022","unstructured":"Zixuan Ma, Jiaao He, Jiezhong Qiu, Huanqi Cao, Yuanwei Wang, Zhenbo Sun, Liyan Zheng, Haojie Wang, Shizhi Tang, Tianyu Zheng, Junyang Lin, Guanyu Feng, Zeqiang Huang, Jie Gao, Aohan Zeng, Jianwei Zhang, Runxin Zhong, Tianhui Shi, Sha Liu, Weimin Zheng, Jie Tang, Hongxia Yang, Xin Liu, Jidong Zhai, and Wenguang Chen. 2022. BaGuaLu: Targeting brain scale pretrained models with over 37 million cores. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 192\u2013204."},{"key":"e_1_3_2_30_2","article-title":"Memory bandwidth and machine balance in current high performance computers","volume":"2","author":"McCalpin John D","year":"1995","unstructured":"John D McCalpin. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture Newsletter 2 (1995), 19--25.","journal-title":"IEEE Computer Society Technical Committee on Computer Architecture Newsletter"},{"issue":"7832","key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1038\/s41586-020-2829-0","article-title":"Single-particle cryo-EM at atomic resolution","volume":"587","author":"Nakane Takanori","year":"2020","unstructured":"Takanori Nakane, Abhay Kotecha, Andrija Sente, Greg McMullan, Simonas Masiulis, Patricia M. G. E. Brown, Ioana T. Grigoras, Lina Malinauskaite, Tomas Malinauskas, Jonas Miehling, Tomasz Ucha\u0144ski, Lingbo Yu, Dimple Karia, Evgeniya V. Pechnikova, Erwin de Jong, Jeroen Keizer, Maarten Bischoff, Jamie McCormack, Peter Tiemeijer, Steven W. Hardwick, Dimitri Y. Chirgadze, Garib Murshudov, A. Radu Aricescu, and Sjors H. W. Scheres. 2020. Single-particle cryo-EM at atomic resolution. Nature 587, 7832 (2020), 152\u2013156.","journal-title":"Nature"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.4169"},{"key":"e_1_3_2_33_2","first-page":"711","volume-title":"Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine","author":"Qiao Liang","year":"2019","unstructured":"Liang Qiao, Hongkun Yu, Kunpeng Wang, Ruixin Sun, Wenlai Zhao, Haohuan Fu, and Guangwen Yang. 2019. Large-scale parallel design for cryo-EM structure determination on heterogeneous many-core architectures. In Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 711\u2013716."},{"issue":"3","key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1016\/j.jsb.2012.09.006","article-title":"RELION: Implementation of a Bayesian approach to cryo-EM structure determination","volume":"180","author":"Scheres Sjors H. W.","year":"2012","unstructured":"Sjors H. W. Scheres. 2012. RELION: Implementation of a Bayesian approach to cryo-EM structure determination. Journal of Structural Biology 180, 3 (2012), 519\u2013530.","journal-title":"Journal of Structural Biology"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSPEC.2022.9676353"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1038\/nprot.2008.156"},{"key":"e_1_3_2_37_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Shang Honghui","year":"2021","unstructured":"Honghui Shang, Xin Chen, Xingyu Gao, Rongfen Lin, Lifang Wang, Fang Li, Qian Xiao, Lei Xu, Qiang Sun, Leilei Zhu, Fei Wang, Yunquan Zhang, and Haifeng Song. 2021. TensorKMC: Kinetic monte carlo simulation of 50 trillion atoms driven by deep learning on a new generation of Sunway supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201314."},{"key":"e_1_3_2_38_2","first-page":"277","volume-title":"Proceedings of the 2024 IEEE International Parallel and Distributed Processing Symposium","author":"Song Zeyu","year":"2024","unstructured":"Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang, Xiaohui Duan, and Guangwen Yang. 2024. Enabling high-performance physical based rendering on new Sunway supercomputer. In Proceedings of the 2024 IEEE International Parallel and Distributed Processing Symposium. IEEE, 277\u2013288."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019832958"},{"key":"e_1_3_2_40_2","first-page":"075887","article-title":"GeRelion: GPU-enhanced parallel implementation of single particle cryo-EM image processing","author":"Su Huayou","year":"2016","unstructured":"Huayou Su, Wen Wen, Xiaoli Du, Xicheng Lu, Maofu Liao, and Dongsheng Li. 2016. GeRelion: GPU-enhanced parallel implementation of single particle cryo-EM image processing. Biorxiv (2016), 075887.","journal-title":"Biorxiv"},{"issue":"7873","key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1038\/s41586-021-03803-w","article-title":"Cryo-EM structures of full-length Tetrahymena ribozyme at 3.1 \u00c5 resolution","volume":"596","author":"Su Zhaoming","year":"2021","unstructured":"Zhaoming Su, Kaiming Zhang, Kalli Kappel, Shanshan Li, Michael Z. Palo, Grigore D. Pintilie, Ramya Rangan, Bingnan Luo, Yuquan Wei, Rhiju Das, and Wah Chiu. 2021. Cryo-EM structures of full-length Tetrahymena ribozyme at 3.1 \u00c5 resolution. Nature 596, 7873 (2021), 603\u2013607.","journal-title":"Nature"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3613209"},{"key":"e_1_3_2_43_2","first-page":"847","volume-title":"Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium","author":"Wang Zihao","year":"2021","unstructured":"Zihao Wang, Xiaohua Wan, Zhiyong Liu, Qianshuo Fan, Fa Zhang, and Guangming Tan. 2021. A multi-GPU design for large size cryo-EM 3D reconstruction. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium. IEEE, 847\u2013858."},{"issue":"2","key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.cell.2016.06.055","article-title":"Seeing is believing in ribosome assembly","volume":"166","author":"Warner Jonathan R.","year":"2016","unstructured":"Jonathan R. Warner. 2016. Seeing is believing in ribosome assembly. Cell 166, 2 (2016), 277\u2013278.","journal-title":"Cell"},{"key":"e_1_3_2_45_2","first-page":"129","volume-title":"Proceedings of the 2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA\/BDCloud\/SocialCom\/SustainCom)","author":"Xu Jingle","year":"2022","unstructured":"Jingle Xu, Jiayu Fu, Lin Gan, Yaojian Chen, Zhenchun Huang, and Guangwen Yang. 2022. Accelerating cryo-EM reconstruction of RELION on the new Sunway supercomputer. In Proceedings of the 2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA\/BDCloud\/SocialCom\/SustainCom). IEEE, 129\u2013138."},{"key":"e_1_3_2_46_2","first-page":"57","volume-title":"SC\u201916: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Yang Chao","year":"2016","unstructured":"Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, and Weimin Zheng. 2016. 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In SC\u201916: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 57\u201368."},{"key":"e_1_3_2_47_2","unstructured":"Lin Yao Ruihan Xu Zhifeng Gao Guolin Ke and Yuhang Wang. 2023. Boosted ab initio cryo-EM 3D reconstruction with ACE-EM. arxiv:2302.06091.Retrieved from https:\/\/arxiv.org\/abs\/2302.06091"},{"key":"e_1_3_2_48_2","first-page":"1","article-title":"swNEMO_v4.0: An ocean model NEMO for the next generation Sunway supercomputer","author":"Ye Yuejin","year":"2022","unstructured":"Yuejin Ye, Zhenya Song, Shengchang Zhou, Yao Liu, Qi Shu, Bingzhuo Wang, Weiguo Liu, Fangli Qiao, and Lanning Wang. 2022. swNEMO_v4.0: An ocean model NEMO for the next generation Sunway supercomputer. Geoscientific Model Development Discussions 15 (2022), 1\u201329.","journal-title":"Geoscientific Model Development Discussions"},{"issue":"7832","key":"e_1_3_2_49_2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1038\/s41586-020-2833-4","article-title":"Atomic-resolution protein structure determination by cryo-EM","volume":"587","author":"Yip Ka Man","year":"2020","unstructured":"Ka Man Yip, Niels Fischer, Elham Paknia, Ashwin Chari, and Holger Stark. 2020. Atomic-resolution protein structure determination by cryo-EM. Nature 587, 7832 (2020), 157\u2013161.","journal-title":"Nature"},{"issue":"3","key":"e_1_3_2_50_2","first-page":"1","article-title":"Accelerating the cryo-EM structure determination in RELION on GPU cluster","volume":"16","author":"You Xin","year":"2022","unstructured":"Xin You, Hailong Yang, Zhongzhi Luan, and Depei Qian. 2022. Accelerating the cryo-EM structure determination in RELION on GPU cluster. Frontiers of Computer Science 16, 3 (2022), 1\u201319.","journal-title":"Frontiers of Computer Science"},{"issue":"2","key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TCBB.2019.2929171","article-title":"Improve the resolution and parallel performance of the three-dimensional refine algorithm in RELION using CUDA and MPI","volume":"18","author":"Zhang Jingrong","year":"2019","unstructured":"Jingrong Zhang, Zihao Wang, Zhiyong Liu, and Fa Zhang. 2019. Improve the resolution and parallel performance of the three-dimensional refine algorithm in RELION using CUDA and MPI. IEEE\/ACM Transactions on Computational Biology and Bioinformatics 18, 2 (2019), 583\u2013595.","journal-title":"IEEE\/ACM Transactions on Computational Biology and Bioinformatics"},{"key":"e_1_3_2_52_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Zhang Tingjian","year":"2019","unstructured":"Tingjian Zhang, Yuxuan Li, Ping Gao, Qi Shao, Mingshan Shao, Meng Zhang, Jinxiao Zhang, Xiaohui Duan, Zhao Liu, Lin Gan, Haohuan Fu, Wei Xue, Weiguo Liu, and Guangwen Yang. 2019. SW_GROMACS: Accelerate GROMACS on Sunway TaihuLight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201314."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-020-01049-4"},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","first-page":"e42166","DOI":"10.7554\/eLife.42166","article-title":"New tools for automated high-resolution cryo-EM structure determination in RELION-3","volume":"7","author":"Zivanov Jasenko","year":"2018","unstructured":"Jasenko Zivanov, Takanori Nakane, Bj\u00f6rn O Forsberg, Dari Kimanius, Wim J. H. Hagen, Erik Lindahl, and Sjors H. W. Scheres. 2018. New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife 7 (2018), e42166.","journal-title":"Elife"},{"key":"e_1_3_2_55_2","volume-title":"LUMI supercomputer for European researchers","author":"Zwinger Thomas","year":"2023","unstructured":"Thomas Zwinger, Jussi Heikonen, and Pekka Manninen. 2023. LUMI supercomputer for European researchers. Technical Report. Copernicus Meetings."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701990","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701990","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:16Z","timestamp":1750298236000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701990"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,19]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3701990"],"URL":"https:\/\/doi.org\/10.1145\/3701990","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,3,19]]},"assertion":[{"value":"2023-08-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}