{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T09:08:46Z","timestamp":1781341726552,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,21]],"date-time":"2023-06-21T00:00:00Z","timestamp":1687305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,21]]},"DOI":"10.1145\/3577193.3593735","type":"proceedings-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T18:47:05Z","timestamp":1687286825000},"page":"398-409","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3459-7960","authenticated-orcid":false,"given":"Tun","family":"Chen","sequence":"first","affiliation":[{"name":"High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9855-5367","authenticated-orcid":false,"given":"Haipeng","family":"Jia","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7520-9640","authenticated-orcid":false,"given":"Yunquan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1013-1325","authenticated-orcid":false,"given":"Kun","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6149-0627","authenticated-orcid":false,"given":"Zhihao","family":"Li","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co., Ltd., Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2432-6507","authenticated-orcid":false,"given":"Xiang","family":"Zhao","sequence":"additional","affiliation":[{"name":"Ocean University of China, Qingdao, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2889-4499","authenticated-orcid":false,"given":"Jianyu","family":"Yao","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2610-042X","authenticated-orcid":false,"given":"Chendi","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,6,21]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/502981"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"key":"e_1_3_2_1_3_1","volume-title":"Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al.","author":"Asanovic Krste","year":"2006","unstructured":"Krste Asanovic , Ras Bodik , Bryan Christopher Catanzaro , Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al. 2006 . The landscape of parallel computing research: A view from berkeley. (2006). Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al. 2006. The landscape of parallel computing research: A view from berkeley. (2006)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00162341"},{"key":"e_1_3_2_1_5_1","first-page":"3","article-title":"Dynamically Generating FFT Code","volume":"76","author":"Blake Anthony","year":"2014","unstructured":"Anthony Blake and Matt Hunter . 2014 . Dynamically Generating FFT Code . Journal of Signal Processing Systems 76 , 3 (Sept. 2014), 275--281. Anthony Blake and Matt Hunter. 2014. Dynamically Generating FFT Code. Journal of Signal Processing Systems 76, 3 (Sept. 2014), 275--281.","journal-title":"Journal of Signal Processing Systems"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAU.1970.1162132"},{"key":"e_1_3_2_1_7_1","volume-title":"A Transpose-free Three-dimensional FFT Algorithm on ARM CPUs. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications","author":"Chen Tun","unstructured":"Tun Chen , Haipeng Jia , Zhihao Li , Chendi Li , and Yunquan Zhang . 2021. A Transpose-free Three-dimensional FFT Algorithm on ARM CPUs. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications ; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC\/DSS\/SmartCity\/DependSys). IEEE , 1--8. Tun Chen, Haipeng Jia, Zhihao Li, Chendi Li, and Yunquan Zhang. 2021. A Transpose-free Three-dimensional FFT Algorithm on ARM CPUs. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC\/DSS\/SmartCity\/DependSys). IEEE, 1--8."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1965-0178586-1"},{"key":"e_1_3_2_1_9_1","unstructured":"Apple Corporation. 2023. Fast Fourier Transforms | Apple Developer Documentation. https:\/\/developer.apple.com\/documentation\/accelerate\/data_packing_for_fourier_transforms  Apple Corporation. 2023. Fast Fourier Transforms | Apple Developer Documentation. https:\/\/developer.apple.com\/documentation\/accelerate\/data_packing_for_fourier_transforms"},{"key":"e_1_3_2_1_10_1","unstructured":"Arm Corporation. Sep 2022. Arm Performance Libraries Reference Guide. https:\/\/developer.arm.com\/documentation\/101004\/latest\/  Arm Corporation. Sep 2022. Arm Performance Libraries Reference Guide. https:\/\/developer.arm.com\/documentation\/101004\/latest\/"},{"key":"e_1_3_2_1_11_1","unstructured":"IBM Corporation. 2020. IBM Engineering and Scientific Subroutine Library. https:\/\/www.ibm.com\/docs\/en\/SSFHY8_6.2\/reference\/essl_reference_pdf.pdf  IBM Corporation. 2020. IBM Engineering and Scientific Subroutine Library. https:\/\/www.ibm.com\/docs\/en\/SSFHY8_6.2\/reference\/essl_reference_pdf.pdf"},{"key":"e_1_3_2_1_12_1","volume-title":"cuFFT Library User's Guide. (Dec","author":"NVIDIA Corporation","year":"2022","unstructured":"NVIDIA Corporation . Dec 2022. cuFFT Library User's Guide. (Dec 2022 ). https:\/\/docs.nvidia.com\/cuda\/pdf\/CUFFT_Library.pdf NVIDIA Corporation. Dec 2022. cuFFT Library User's Guide. (Dec 2022). https:\/\/docs.nvidia.com\/cuda\/pdf\/CUFFT_Library.pdf"},{"key":"e_1_3_2_1_13_1","first-page":"22","article-title":"Guest editors' introduction: The top 10 algorithms","volume":"2","author":"Dongarra Jack","year":"2000","unstructured":"Jack Dongarra and Francis Sullivan . 2000 . Guest editors' introduction: The top 10 algorithms . IEEE Annals of the History of Computing 2 , 01 (2000), 22 -- 23 . Jack Dongarra and Francis Sullivan. 2000. Guest editors' introduction: The top 10 algorithms. IEEE Annals of the History of Computing 2, 01 (2000), 22--23.","journal-title":"IEEE Annals of the History of Computing"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1049\/el:19840012"},{"key":"e_1_3_2_1_15_1","volume-title":"Proc. IEEE 106, 11 (Nov. 2018), 1935--1968. Conference Name: Proceedings of the IEEE.","author":"Franchetti Franz","unstructured":"Franz Franchetti , Tze Meng Low , Doru Thom Popovici , Richard M. Veras , Daniele G. Spampinato , Jeremy R. Johnson , Markus Puschel , James C. Hoe , and Jose M. F. Moura . 2018. SPIRAL: Extreme Performance Portability . Proc. IEEE 106, 11 (Nov. 2018), 1935--1968. Conference Name: Proceedings of the IEEE. Franz Franchetti, Tze Meng Low, Doru Thom Popovici, Richard M. Veras, Daniele G. Spampinato, Jeremy R. Johnson, Markus Puschel, James C. Hoe, and Jose M. F. Moura. 2018. SPIRAL: Extreme Performance Portability. Proc. IEEE 106, 11 (Nov. 2018), 1935--1968. Conference Name: Proceedings of the IEEE."},{"key":"e_1_3_2_1_16_1","first-page":"1520","volume-title":"Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)","volume":"3","author":"Frigo M.","unstructured":"M. Frigo and S.G. Johnson . 1998. FFTW: an adaptive software architecture for the FFT . In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) , Vol. 3 . 1381--1384 vol.3. ISSN: 1520 - 6149 . M. Frigo and S.G. Johnson. 1998. FFTW: an adaptive software architecture for the FFT. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), Vol. 3. 1381--1384 vol.3. ISSN: 1520-6149."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_3_2_1_18_1","first-page":"0272","volume-title":"40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)","author":"Frigo M.","unstructured":"M. Frigo , C.E. Leiserson , H. Prokop , and S. Ramachandran . 1999. Cache-oblivious algorithms . In 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039) . 285--297. ISSN: 0272 - 5428 . M. Frigo, C.E. Leiserson, H. Prokop, and S. Ramachandran. 1999. Cache-oblivious algorithms. In 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039). 285--297. ISSN: 0272-5428."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356058.1356085"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASSP.1984.1162257"},{"key":"e_1_3_2_1_21_1","volume-title":"Johnson and Matteo Frigo","author":"Steven","year":"2003","unstructured":"Steven G. Johnson and Matteo Frigo . 2003 . FFT Benchmark Results . http:\/\/www.fftw.org\/speed\/ Steven G. Johnson and Matteo Frigo. 2003. FFT Benchmark Results. http:\/\/www.fftw.org\/speed\/"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2006.882087"},{"key":"e_1_3_2_1_23_1","volume-title":"Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations. Computer Physics Communications 200 (March","author":"Jung Jaewoon","year":"2016","unstructured":"Jaewoon Jung , Chigusa Kobayashi , Toshiyuki Imamura , and Yuji Sugita . 2016. Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations. Computer Physics Communications 200 (March 2016 ), 57--65. Jaewoon Jung, Chigusa Kobayashi, Toshiyuki Imamura, and Yuji Sugita. 2016. Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations. Computer Physics Communications 200 (March 2016), 57--65."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/TASSP.1977.1162973","article-title":"A prime factor FFT algorithm using high-speed convolution","volume":"25","author":"Kolba D.","year":"1977","unstructured":"D. Kolba and T. Parks . 1977 . A prime factor FFT algorithm using high-speed convolution . IEEE Transactions on Acoustics, Speech, and Signal Processing 25 , 4 (Aug. 1977), 281--294. D. Kolba and T. Parks. 1977. A prime factor FFT algorithm using high-speed convolution. IEEE Transactions on Acoustics, Speech, and Signal Processing 25, 4 (Aug. 1977), 281--294.","journal-title":"IEEE Transactions on Acoustics, Speech, and Signal Processing"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19)","author":"Li Zhihao","year":"2019","unstructured":"Zhihao Li , Haipeng Jia , Yunquan Zhang , Tun Chen , Liang Yuan , Luning Cao , and Xiao Wang . 2019 . AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19) . Association for Computing Machinery, New York, NY, USA, 1--15. Zhihao Li, Haipeng Jia, Yunquan Zhang, Tun Chen, Liang Yuan, Luning Cao, and Xiao Wang. 2019. AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). Association for Computing Machinery, New York, NY, USA, 1--15."},{"key":"e_1_3_2_1_26_1","first-page":"8","article-title":"Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs","volume":"31","author":"Li Z.","year":"2020","unstructured":"Z. Li , H. Jia , Y. Zhang , T. Chen , L. Yuan , and R. Vuduc . 2020 . Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs . IEEE Transactions on Parallel and Distributed Systems 31 , 8 (Aug. 2020), 1925--1941. Conference Name: IEEE Transactions on Parallel and Distributed Systems. Z. Li, H. Jia, Y. Zhang, T. Chen, L. Yuan, and R. Vuduc. 2020. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs. IEEE Transactions on Parallel and Distributed Systems 31, 8 (Aug. 2020), 1925--1941. Conference Name: IEEE Transactions on Parallel and Distributed Systems.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC\/SmartCity\/DSS.2019.00078"},{"key":"e_1_3_2_1_28_1","volume-title":"Mark Harman, Phillip Gibbs, Andre Van Es, Maria-Grazia Labate, Gerhard Swart, Marco Ciaizzo, Daniel Hayden, and Wallace Turner.","author":"McPherson Alistair M.","year":"2018","unstructured":"Alistair M. McPherson , Joe McMullin , Tim Stevenson , Peter Dewdney , Andrea Casson , Luca Stringhetti , Miles Deegan , Peter Hekman , Martin Austin , Mark Harman, Phillip Gibbs, Andre Van Es, Maria-Grazia Labate, Gerhard Swart, Marco Ciaizzo, Daniel Hayden, and Wallace Turner. 2018 . Square Kilometer Array project status report. In Ground-based and Airborne Telescopes VII, Vol. 10700 . International Society for Optics and Photonics , 107000Y. Alistair M. McPherson, Joe McMullin, Tim Stevenson, Peter Dewdney, Andrea Casson, Luca Stringhetti, Miles Deegan, Peter Hekman, Martin Austin, Mark Harman, Phillip Gibbs, Andre Van Es, Maria-Grazia Labate, Gerhard Swart, Marco Ciaizzo, Daniel Hayden, and Wallace Turner. 2018. Square Kilometer Array project status report. In Ground-based and Airborne Telescopes VII, Vol. 10700. International Society for Optics and Photonics, 107000Y."},{"key":"e_1_3_2_1_29_1","unstructured":"OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface. Specification. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5.0.pdf  OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface. Specification. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5.0.pdf"},{"key":"e_1_3_2_1_30_1","first-page":"4","article-title":"P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions","volume":"34","author":"Pekurovsky Dmitry","year":"2012","unstructured":"Dmitry Pekurovsky . 2012 . P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions . SIAM Journal on Scientific Computing 34 , 4 (Jan. 2012), C192--C209. Publisher: Society for Industrial and Applied Mathematics. Dmitry Pekurovsky. 2012. P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions. SIAM Journal on Scientific Computing 34, 4 (Jan. 2012), C192--C209. Publisher: Society for Industrial and Applied Mathematics.","journal-title":"SIAM Journal on Scientific Computing"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2909437.2909451"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00048"},{"key":"e_1_3_2_1_33_1","unstructured":"Arm Ne10 project. 2021. https:\/\/projectne10.github.io\/Ne10\/ original-date: 2011-12-13T07:47:21Z.  Arm Ne10 project. 2021. https:\/\/projectne10.github.io\/Ne10\/ original-date: 2011-12-13T07:47:21Z."},{"key":"e_1_3_2_1_34_1","volume-title":"Proc. IEEE 93, 2 (Feb. 2005), 232--275. Conference Name: Proceedings of the IEEE.","author":"Puschel M.","unstructured":"M. Puschel , J.M.F. Moura , J.R. Johnson , D. Padua , M.M. Veloso , B.W. Singer , Jianxin Xiong , F. Franchetti , A. Gacic , Y. Voronenko , K. Chen , R.W. Johnson , and N. Rizzolo . 2005. SPIRAL: Code Generation for DSP Transforms . Proc. IEEE 93, 2 (Feb. 2005), 232--275. Conference Name: Proceedings of the IEEE. M. Puschel, J.M.F. Moura, J.R. Johnson, D. Padua, M.M. Veloso, B.W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 (Feb. 2005), 232--275. Conference Name: Proceedings of the IEEE."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1968.6477"},{"key":"e_1_3_2_1_36_1","volume-title":"Vectorizing the FFTs**This","author":"Swarztrauber Paul N.","unstructured":"Paul N. Swarztrauber . 1982. Vectorizing the FFTs**This chapter was written while the author was visiting the Scientific Computing Division at the National Bureau of Standards. In Parallel Computations, GARRY Rodrigue (Ed.). Academic Press , 51--83. Paul N. Swarztrauber. 1982. Vectorizing the FFTs**This chapter was written while the author was visiting the Scientific Computing Division at the National Bureau of Standards. In Parallel Computations, GARRY Rodrigue (Ed.). Academic Press, 51--83."},{"key":"e_1_3_2_1_37_1","first-page":"1537","article-title":"Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-core Processors. In Innovative architecture for future generation high-performance processors and systems (iwia 2007). 53--59","author":"Takahashi Daisuke","year":"2007","unstructured":"Daisuke Takahashi . 2007 . Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-core Processors. In Innovative architecture for future generation high-performance processors and systems (iwia 2007). 53--59 . ISSN : 1537 - 3223 . Daisuke Takahashi. 2007. Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-core Processors. In Innovative architecture for future generation high-performance processors and systems (iwia 2007). 53--59. ISSN: 1537-3223.","journal-title":"ISSN"},{"key":"e_1_3_2_1_38_1","volume-title":"FFTE: A Fast Fourier Transform Package","author":"Takahashi Daisuke","year":"2020","unstructured":"Daisuke Takahashi . Aug 2020 . FFTE: A Fast Fourier Transform Package . http:\/\/www.ffte.jp\/ Daisuke Takahashi. Aug 2020. FFTE: A Fast Fourier Transform Package. http:\/\/www.ffte.jp\/"},{"key":"e_1_3_2_1_39_1","volume-title":"The seventh workshop of the INRIA-Illinois-ANL Joint Laboratory on Petascale Computing","author":"Takahashi Daisuke","unstructured":"Daisuke Takahashi , Alex Yee , Torsten Hoefler , Camille Coti , Jeongnim Kim , and Franck Cappello . 2012. An implementation of parallel 3-D FFT with 1.5-d decomposition . In The seventh workshop of the INRIA-Illinois-ANL Joint Laboratory on Petascale Computing , Vol. 4 . 3. Daisuke Takahashi, Alex Yee, Torsten Hoefler, Camille Coti, Jeongnim Kim, and Franck Cappello. 2012. An implementation of parallel 3-D FFT with 1.5-d decomposition. In The seventh workshop of the INRIA-Illinois-ANL Joint Laboratory on Petascale Computing, Vol. 4. 3."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Charles Van Loan. 1992. Computational frameworks for the fast Fourier transform. SIAM.  Charles Van Loan. 1992. Computational frameworks for the fast Fourier transform. SIAM.","DOI":"10.1137\/1.9781611970999"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"J. L. Vay A. Almgren J. Bell L. Ge D. P. Grote M. Hogan O. Kononenko R. Lehe A. Myers C. Ng J. Park R. Ryne O. Shapoval M. Thevenet and W. Zhang. 2018. Warp-X: A new exascale computing platform for beam-plasma simulations. Nuclear Instruments and Methods in Physics Research Section A: Accelerators Spectrometers Detectors and Associated Equipment 909 (Nov. 2018) 476--479.  J. L. Vay A. Almgren J. Bell L. Ge D. P. Grote M. Hogan O. Kononenko R. Lehe A. Myers C. Ng J. Park R. Ryne O. Shapoval M. Thevenet and W. Zhang. 2018. Warp-X: A new exascale computing platform for beam-plasma simulations. Nuclear Instruments and Methods in Physics Research Section A: Accelerators Spectrometers Detectors and Associated Equipment 909 (Nov. 2018) 476--479.","DOI":"10.1016\/j.nima.2018.01.035"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-06486-4_7"},{"key":"e_1_3_2_1_43_1","volume-title":"Algorithms and Architectures for Parallel Processing (Lecture Notes in Computer Science)","author":"Wang Xiao","unstructured":"Xiao Wang , Haipeng Jia , Zhihao Li , and Yunquan Zhang . 2018. Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform . In Algorithms and Architectures for Parallel Processing (Lecture Notes in Computer Science) , Jaideep Vaidya and Jin Li (Eds.). Springer International Publishing , Cham , 338--353. Xiao Wang, Haipeng Jia, Zhihao Li, and Yunquan Zhang. 2018. Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform. In Algorithms and Architectures for Parallel Processing (Lecture Notes in Computer Science), Jaideep Vaidya and Jin Li (Eds.). Springer International Publishing, Cham, 338--353."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3085578"}],"event":{"name":"ICS '23: 37th International Conference on Supercomputing","location":"Orlando FL USA","acronym":"ICS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 37th International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593735","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:32Z","timestamp":1750178852000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593735"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,21]]},"references-count":44,"alternative-id":["10.1145\/3577193.3593735","10.1145\/3577193"],"URL":"https:\/\/doi.org\/10.1145\/3577193.3593735","relation":{},"subject":[],"published":{"date-parts":[[2023,6,21]]},"assertion":[{"value":"2023-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}