{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:29:20Z","timestamp":1766136560996,"version":"3.41.0"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,1,9]],"date-time":"2015-01-09T00:00:00Z","timestamp":1420761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100011101","name":"Intel Collaborative Research Institute for Computational Intelligence","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100011101","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Hasso-Plattner Institute"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>GP-SIMD, a novel hybrid general-purpose SIMD computer architecture, resolves the issue of data synchronization by in-memory computing through combining data storage and massively parallel processing. GP-SIMD employs a two-dimensional access memory with modified SRAM storage cells and a bit-serial processing unit per each memory row. An analytic performance model of the GP-SIMD architecture is presented, comparing it to associative processor and to conventional SIMD architectures. Cycle-accurate simulation of four workloads supports the analytical comparison. Assuming a moderate die area, GP-SIMD architecture outperforms both the associative processor and conventional SIMD coprocessor architectures by almost an order of magnitude while consuming less power.<\/jats:p>","DOI":"10.1145\/2686875","type":"journal-article","created":{"date-parts":[[2015,1,12]],"date-time":"2015-01-12T20:02:10Z","timestamp":1421092930000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["GP-SIMD Processing-in-Memory"],"prefix":"10.1145","volume":"11","author":[{"given":"Amir","family":"Morad","sequence":"first","affiliation":[{"name":"Technion, Haifa, Israel"}]},{"given":"Leonid","family":"Yavits","sequence":"additional","affiliation":[{"name":"Technion, Haifa, Israel"}]},{"given":"Ran","family":"Ginosar","sequence":"additional","affiliation":[{"name":"Technion, Haifa, Israel"}]}],"member":"320","published-online":{"date-parts":[[2015,1,9]]},"reference":[{"volume":"5","volume-title":"Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP\u201995)","author":"Akerib A.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of the 8th Israel Conference on Artificial Intelligence, Vision & Pattern Recognition. Elsevier.","author":"Akerib A. J.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/773365.773369"},{"key":"e_1_2_1_4_1","unstructured":"AltiVec Engine. 2014. Homepage. Retrieved from http:\/\/www.freescale.com\/webapp\/sps\/site\/overview.jsp&quest;code&equals;DRPPCALTVC.  AltiVec Engine. 2014. Homepage. Retrieved from http:\/\/www.freescale.com\/webapp\/sps\/site\/overview.jsp&quest;code&equals;DRPPCALTVC."},{"key":"e_1_2_1_5_1","unstructured":"ARM. 2014. NEON\u2122 General-Purpose SIMD Engine. Retrieved from http:\/\/www.arm.com\/products\/processors\/technologies\/neon.php.  ARM. 2014. NEON\u2122 General-Purpose SIMD Engine. Retrieved from http:\/\/www.arm.com\/products\/processors\/technologies\/neon.php."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSIT.2012.6242496"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM.2003.1269421"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1500175.1500260"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1086\/260062"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1278480.1278667"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1054943.1054946"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/268806.268810"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.169"},{"volume-title":"Proceedings of the 2nd Symposium on the Frontiers of Massively Parallel Computation. IEEE.","year":"1988","author":"Cloud E. L.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"P. Dlugosch D. Brown P. Glendenning M. Leventhal and H. Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. In IEEE Transactions on Parallel and Distributed Systems. 1--1.  P. Dlugosch D. Brown P. Glendenning M. Leventhal and H. Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. In IEEE Transactions on Parallel and Distributed Systems. 1--1.","DOI":"10.1109\/TPDS.2014.8"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514197"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2408776.2408797"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(89)90003-3"},{"key":"e_1_2_1_20_1","unstructured":"C. Foster. 1976. Content Addressable Parallel Processors. Van Nostrand Reinhold Company New York.   C. Foster. 1976. Content Addressable Parallel Processors. Van Nostrand Reinhold Company New York."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.375174"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.41"},{"key":"e_1_2_1_23_1","unstructured":"N. Gunther S. Subramanyam and S. Parvu. 2011. A methodology for optimizing multithreaded system scalability on multi--cores. Retrieved from http:\/\/arxiv.org\/abs\/1105.4301.  N. Gunther S. Subramanyam and S. Parvu. 2011. A methodology for optimizing multithreaded system scalability on multi--cores. Retrieved from http:\/\/arxiv.org\/abs\/1105.4301."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331589"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.77"},{"key":"e_1_2_1_26_1","volume-title":"Computer Architecture: A Quantitative Approach","author":"Hennessy J.","year":"1996","edition":"2"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/EIT.2009.5189662"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2008.209"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555815.1555775"},{"key":"e_1_2_1_30_1","unstructured":"IBM. 2005. PowerPC Vector\/SIMD Multimedia Extension. Retrieved from http:\/\/math-at-las.sourceforge.net\/devel\/assembly\/vector_simd_pem.ppc.2005AUG23.pdf.  IBM. 2005. PowerPC Vector\/SIMD Multimedia Extension. Retrieved from http:\/\/math-at-las.sourceforge.net\/devel\/assembly\/vector_simd_pem.ppc.2005AUG23.pdf."},{"key":"e_1_2_1_31_1","unstructured":"Intel. 2013. The Intel\u00ae Xeon Phi\u2122 Coprocessor. Retrieved from http:\/\/www.intel.com\/content\/www\/us\/en\/high-performance-computing\/high-performance-xeon-phi-coprocessor-brief.html.  Intel. 2013. The Intel\u00ae Xeon Phi\u2122 Coprocessor. Retrieved from http:\/\/www.intel.com\/content\/www\/us\/en\/high-performance-computing\/high-performance-xeon-phi-coprocessor-brief.html."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.89"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/518906.858153"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.612252"},{"key":"e_1_2_1_35_1","unstructured":"S. Kumar. 2012. Smart Memory. Retrieved from http:\/\/www.hotchips.org\/wp-content\/uploads\/hc_archives\/hc22\/HC22.23.325-1-Kumar-Smart-Memory.pdf.  S. Kumar. 2012. Smart Memory. Retrieved from http:\/\/www.hotchips.org\/wp-content\/uploads\/hc_archives\/hc22\/HC22.23.325-1-Kumar-Smart-Memory.pdf."},{"volume-title":"Proceedings of the IEEE International Workshop on Memory Technology, Design and Testing.","author":"Lipovski G.","key":"e_1_2_1_36_1"},{"volume-title":"Proceedings of the Workshop on Architectures and Languages for Throughput Applications (ALTA).","year":"2008","author":"Loh G.","key":"e_1_2_1_37_1"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1103900.1103933"},{"key":"e_1_2_1_39_1","first-page":"4","article-title":"Architectural considerations of a wafer scale processor","volume":"4","author":"Midwinter T.","year":"1988","journal-title":"IEE Colloquium on VLSI for Parallel Processing"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2012.34"},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"A. Morad etal 2014. Convex optimization of resource allocation in asymmetric and heterogeneous SoC. Power and Timing Modeling Optimization and Simulation (PATMOS).  A. Morad et al. 2014. Convex optimization of resource allocation in asymmetric and heterogeneous SoC. Power and Timing Modeling Optimization and Simulation (PATMOS).","DOI":"10.1109\/PATMOS.2014.6951864"},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"A. Morad etal 2014. Efficient dense and sparse matrix multiplication on GP-SIMD. Power and Timing Modeling Optimization and Simulation (PATMOS).  A. Morad et al. 2014. Efficient dense and sparse matrix multiplication on GP-SIMD. Power and Timing Modeling Optimization and Simulation (PATMOS).","DOI":"10.1109\/PATMOS.2014.6951895"},{"key":"e_1_2_1_43_1","unstructured":"A. Morad etal 2014. Optimization of asymmetric and heterogeneous SoC. Under review.  A. Morad et al. 2014. Optimization of asymmetric and heterogeneous SoC. Under review."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2006.6"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917757"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"A. Pedram. 2013. Algorithm\/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics. PhD dissertation University of Texas. Retrieved from http:\/\/repositories.lib.utexas.edu\/bitstream\/handle\/2152\/21364\/PEDRAM-DISSERTATION-2013.pdf&quest;sequence=1.  A. Pedram. 2013. Algorithm\/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics. PhD dissertation University of Texas. Retrieved from http:\/\/repositories.lib.utexas.edu\/bitstream\/handle\/2152\/21364\/PEDRAM-DISSERTATION-2013.pdf&quest;sequence=1.","DOI":"10.1109\/IPDPSW.2013.166"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/320080.320082"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.330039"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2014.54"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485939"},{"key":"e_1_2_1_51_1","unstructured":"M. Quinn. 1987. Designing Efficient Algorithms for Parallel Computers. McGraw-Hill 125.   M. Quinn. 1987. Designing Efficient Algorithms for Parallel Computers. McGraw-Hill 125."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/633642.803971"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555801"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/359327.359336"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"G. E. Sayre. 1976. STARAN: An associative approach to multiprocessor architecture. Computer Architecture. Springer Berlin.  G. E. Sayre. 1976. STARAN: An associative approach to multiprocessor architecture. Computer Architecture. Springer Berlin.","DOI":"10.1007\/978-3-642-66400-7_9"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.166599"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2005.1430559"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2005.251"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing.","author":"Sterling T.","key":"e_1_2_1_59_1"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/645609.663107"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.74"},{"volume-title":"Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing. IEEE Press.","author":"Volkov V.","key":"e_1_2_1_62_1"},{"key":"e_1_2_1_63_1","unstructured":"D. Wentzlaff etal 2010. Core Count vs. Cache Size for Manycore Architectures in the Cloud. Technical Report. MIT-CSAIL-TR-2010-008 MIT.  D. Wentzlaff et al. 2010. Core Count vs. Cache Size for Manycore Architectures in the Cloud. Technical Report. MIT-CSAIL-TR-2010-008 MIT."},{"key":"e_1_2_1_64_1","unstructured":"L. Yavits. 1994. Architecture and Design of Associative Processor for Image Processing and Computer Vision. MSc Thesis Technion -- Israel Institute of Technology. Retrieved from http:\/\/webee.technion.ac.il\/&sim;ran\/papers\/LeonidYavitsMasterThesis1994.pdf.  L. Yavits. 1994. Architecture and Design of Associative Processor for Image Processing and Computer Vision. MSc Thesis Technion -- Israel Institute of Technology. Retrieved from http:\/\/webee.technion.ac.il\/&sim;ran\/papers\/LeonidYavitsMasterThesis1994.pdf."},{"key":"e_1_2_1_65_1","article-title":"Computer architecture with associative processor replacing last level cache and SIMD accelerator","author":"Yavits L.","year":"2014","journal-title":"IEEE Transactions on Computers."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.11.001"},{"key":"e_1_2_1_67_1","unstructured":"L. Yavits etal 2014c. Thermal analysis of 3D associative processor. http:\/\/arxiv.org\/abs\/1307.3853v1  L. Yavits et al. 2014c. Thermal analysis of 3D associative processor. http:\/\/arxiv.org\/abs\/1307.3853v1"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600213"},{"volume-title":"Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). IEEE.","author":"Zhang Y.","key":"e_1_2_1_69_1"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2686875","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2686875","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:12:13Z","timestamp":1750227133000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2686875"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,9]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2686875"],"URL":"https:\/\/doi.org\/10.1145\/2686875","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,1,9]]},"assertion":[{"value":"2014-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}