{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T18:46:41Z","timestamp":1771958801247,"version":"3.50.1"},"reference-count":23,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2012,5,21]],"date-time":"2012-05-21T00:00:00Z","timestamp":1337558400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2013,5]]},"abstract":"<jats:p> We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from \u2018large-[Formula: see text]\u2019 arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implemented efficiently on NVIDIA\u2019s Fermi architecture, sustaining up to 79% of the peak single-precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared with application-specific integrated circuit (ASIC) and field programmable gate array (FPGA) implementations have the potential to greatly shorten the cycle of correlator development and deployment, for cases where some power-consumption penalty can be tolerated. <\/jats:p>","DOI":"10.1177\/1094342012444794","type":"journal-article","created":{"date-parts":[[2012,5,22]],"date-time":"2012-05-22T01:08:07Z","timestamp":1337648887000},"page":"178-192","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":44,"title":["Accelerating radio astronomy cross-correlation with graphics processing units"],"prefix":"10.1177","volume":"27","author":[{"given":"M.A.","family":"Clark","sequence":"first","affiliation":[{"name":"The work was done while the author was at Harvard-Smithsonian Center for Astrophysics, Cambridge, MA, USA"},{"name":"NVIDIA Corporation, CA, USA"}]},{"given":"PC La","family":"Plante","sequence":"additional","affiliation":[{"name":"Loyola University Maryland, Baltimore, MD, USA"}]},{"given":"L.J.","family":"Greenhill","sequence":"additional","affiliation":[{"name":"Harvard-Smithsonian Center for Astrophysics, Cambridge, MA, USA"}]}],"member":"179","published-online":{"date-parts":[[2012,5,21]]},"reference":[{"key":"bibr1-1094342012444794","volume-title":"NVIDIA CUDA C Programming Guide","author":"NVIDIA","year":"2010","edition":"3"},{"key":"bibr2-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1007\/s10686-005-2861-y"},{"key":"bibr3-1094342012444794","volume-title":"ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems","author":"Bergman K","year":"2008"},{"key":"bibr4-1094342012444794","unstructured":"Cornwell TJ, van Diepen G (n.d.) Scaling Mount Exaflop: from the pathfinders to the Square Kilometre Array. http:\/\/www.atnf.csiro.au\/people\/tim.cornwell\/publications\/MountExaflop.pdf."},{"key":"bibr5-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2010.06.019"},{"key":"bibr6-1094342012444794","volume-title":"Matrix Computations","author":"Golub GH","year":"1996","edition":"3"},{"key":"bibr7-1094342012444794","volume-title":"11th Asian-Pacific Regional IAU Meeting 2011","volume":"1","author":"Greenhill L","year":"2012"},{"key":"bibr8-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1007\/s10686-008-9114-9"},{"key":"bibr9-1094342012444794","unstructured":"Huang J-H (2010) Keynote presentation at the GPU Technology Conference by Jen Hsun Huan."},{"key":"bibr10-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1088\/0004-6256\/140\/6\/2086"},{"key":"bibr11-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2009.2017564"},{"key":"bibr12-1094342012444794","unstructured":"Micikevicius P (2010) Async Memcpy Sharing bandwidth with Kernel? NVIDIA Forums, http:\/\/forums.nvidia.com\/index.php?showtopic=187442."},{"key":"bibr13-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1177\/1094342010385729"},{"key":"bibr14-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542337"},{"key":"bibr15-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1086\/593053"},{"key":"bibr16-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1007\/s10686-005-2864-8"},{"key":"bibr17-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1117\/12.786780"},{"key":"bibr18-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1016\/j.newast.2007.12.005"},{"key":"bibr19-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.82.103501"},{"key":"bibr20-1094342012444794","volume-title":"Interferometry and Synthesis in Radio Astronomy","author":"Thompson AR","year":"2004"},{"key":"bibr21-1094342012444794","unstructured":"Volkov V (2010) Better Performance at Lower Occupancy. Presented at the GPU Technology Conference. http:\/\/www.cs.berkeley.edu\/\u223cvolkov\/volkov10-GTC.pdf."},{"key":"bibr22-1094342012444794","doi-asserted-by":"publisher","DOI":"10.1086\/605334"},{"key":"bibr23-1094342012444794","unstructured":"Williams SW, Waterman A, Patterson DA (2008) Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures. Technical Report UCB:\/EECS-2008-134, EECS Department, University of California, Berkeley, CA, http:\/\/www.eecs.berkeley.edu\/Pubs\/TechRpts\/2008\/EECS-2008-134.html."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342012444794","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342012444794","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342012444794","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T05:39:07Z","timestamp":1741066747000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342012444794"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5,21]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2013,5]]}},"alternative-id":["10.1177\/1094342012444794"],"URL":"https:\/\/doi.org\/10.1177\/1094342012444794","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,5,21]]}}}