{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T10:47:36Z","timestamp":1761648456775,"version":"3.41.0"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,5,1]],"date-time":"2017-05-01T00:00:00Z","timestamp":1493596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2017,7,31]]},"abstract":"<jats:p>Big Data refers to the growing challenge of turning massive, often unstructured datasets into meaningful, organized, and actionable data. As datasets grow from petabytes to exabytes and beyond, it becomes increasingly difficult to run advanced analytics, especially Machine Learning (ML) applications, in a reasonable time and on a practical power budget using traditional architectures. Previous work has focused on accelerating analytics readily implemented as SQL queries on data-parallel platforms, generally using off-the-shelf CPUs and General Purpose Graphics Processing Units (GPGPUs) for computation or acceleration. However, these systems are general-purpose and still require a vast amount of data transfer between the storage devices and computing elements, thus limiting the system efficiency. As an alternative, this article presents a reconfigurable memory-centric advanced analytics accelerator that operates at the last level of memory and dramatically reduces energy required for data transfer. We functionally validate the framework using an FPGA-based hardware emulation platform and three representative applications: Na\u00efve Bayesian Classification, Convolutional Neural Networks, and k-Means Clustering. Results are compared with implementations on a modern CPU and workstation GPGPU. Finally, the use of in-memory dataset decompression to further reduce data transfer volume is investigated. With these techniques, the system achieves an average energy efficiency improvement of 74\u00d7 and 212\u00d7 over GPU and single-threaded CPU, respectively, while dataset compression is shown to improve overall efficiency by an additional 1.8\u00d7 on average.<\/jats:p>","DOI":"10.1145\/2997649","type":"journal-article","created":{"date-parts":[[2017,5,1]],"date-time":"2017-05-01T14:55:23Z","timestamp":1493650523000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Memory-Centric Reconfigurable Accelerator for Classification and Machine Learning Applications"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2713-029X","authenticated-orcid":false,"given":"Robert","family":"Karam","sequence":"first","affiliation":[{"name":"University of Florida, Gainesville, FL"}]},{"given":"Somnath","family":"Paul","sequence":"additional","affiliation":[{"name":"Intel Labs, Hillsboro, OR"}]},{"given":"Ruchir","family":"Puri","sequence":"additional","affiliation":[{"name":"IBM T. J. Watson Research Lab, Yorktown Heights, NY"}]},{"given":"Swarup","family":"Bhunia","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, FL"}]}],"member":"320","published-online":{"date-parts":[[2017,5]]},"reference":[{"volume-title":"Quartus II Subscription Edition. Retrieved","year":"2016","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.144"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2013.6645534"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735706"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/PERVASIVE.2015.7087201"},{"volume-title":"Kerry Hill, Jon Hiller, and others.","year":"2008","author":"Bergman Keren","key":"e_1_2_1_6_1"},{"volume-title":"Proceedings of the Python for Scientific Computing Conference (SciPy\u201910)","year":"2010","author":"Bergstra James","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/130385.130401"},{"key":"e_1_2_1_9_1","unstructured":"CACTI. Online. Retrieved from http:\/\/arch.cs.utah.edu\/cacti\/.  CACTI. Online. Retrieved from http:\/\/arch.cs.utah.edu\/cacti\/."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.16"},{"key":"e_1_2_1_11_1","unstructured":"clock-gettime(3) - Linux main page. Retrieved from. http:\/\/linux.die.net\/man\/3\/clock_gettime.  clock-gettime(3) - Linux main page. Retrieved from. http:\/\/linux.die.net\/man\/3\/clock_gettime."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/275107.275138"},{"key":"e_1_2_1_13_1","unstructured":"CUDA Profiling Tools Interface. Retrieved from https:\/\/developer.nvidia.com\/cuda-profiling-tools-interface.  CUDA Profiling Tools Interface. Retrieved from https:\/\/developer.nvidia.com\/cuda-profiling-tools-interface."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"volume-title":"Pattern Classification and Scene Analysis","author":"Duda Richard O.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-61730-2_13"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2408776.2408797"},{"volume-title":"Campbell","year":"2008","author":"Farivar Reza","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007465528199"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2621934.2621936"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2012.358"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASSCC.2007.4425784"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007594"},{"key":"e_1_2_1_24_1","unstructured":"Alexander Gray. 2013. Analyzing Massive Datasets. Retrieved from http:\/\/www.skytree.net\/resources\/.  Alexander Gray. 2013. Analyzing Massive Datasets. Retrieved from http:\/\/www.skytree.net\/resources\/."},{"volume-title":"Proceedings of the Workshop on Near-Data Processing (WoNDP) (Held in Conjunction with MICRO-47","year":"2014","author":"Guo Qi","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2013.17"},{"volume-title":"Cowie","year":"2006","author":"Helmreich Stephen C.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1952.273898"},{"key":"e_1_2_1_29_1","unstructured":"Intel Core2 Quad Processor Q8200. Retrieved from http:\/\/ark.intel.com\/Products\/Spec\/SLG9S.  Intel Core2 Quad Processor Q8200. Retrieved from http:\/\/ark.intel.com\/Products\/Spec\/SLG9S."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1536616.1536632"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2016.2555984"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2434888"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/MWSCAS.2015.7282213"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-200"},{"key":"e_1_2_1_35_1","first-page":"21","article-title":"Big data, analytics and the path from insights to value","volume":"52","author":"LaValle Steve","year":"2011","journal-title":"MIT Sloan Management Review"},{"key":"e_1_2_1_36_1","unstructured":"Yann Lecun and Corinna Cortes. 2016. The MNIST database of handwritten digits. Retrieved from http:\/\/yann.lecun.com\/exdb\/mnist\/.  Yann Lecun and Corinna Cortes. 2016. The MNIST database of handwritten digits. Retrieved from http:\/\/yann.lecun.com\/exdb\/mnist\/."},{"volume-title":"Bridging the processor-memory performance gap with 3D IC technology","year":"2005","author":"Liu Christianto C.","key":"e_1_2_1_37_1"},{"volume":"1","volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability","author":"James","key":"e_1_2_1_38_1"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPGA.1996.564808"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.1997.585356"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.59"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_1_43_1","unstructured":"Nios II Processor: The World\u2019s Most Versatile Embedded Processor. Retrieved from http:\/\/www.altera.com\/devices\/processor\/nios2\/ni2-index.html.  Nios II Processor: The World\u2019s Most Versatile Embedded Processor. Retrieved from http:\/\/www.altera.com\/devices\/processor\/nios2\/ni2-index.html."},{"key":"e_1_2_1_44_1","unstructured":"NVIDIA. 2016. CUDA Toolkit Documentation (7.5 ed.). NVIDIA. Retrieved from https:\/\/docs.nvidia.com\/cuda\/.  NVIDIA. 2016. CUDA Toolkit Documentation (7.5 ed.). NVIDIA. Retrieved from https:\/\/docs.nvidia.com\/cuda\/."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2012.2196446"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.592312"},{"volume-title":"Proceedings of the 9th IEEE Conference on Nanotechnology (IEEE-NANO","year":"2009","author":"Paul S.","key":"e_1_2_1_47_1"},{"volume-title":"Proceedings of the Conference on Design, Automation 8 Test in Europe. European Design and Automation Association, 266","year":"2014","author":"Paul Somnath","key":"e_1_2_1_48_1"},{"key":"e_1_2_1_49_1","article-title":"MAHA: An energy-efficient malleable hardware accelerator for data-intensive applications","author":"Paul Somnath","year":"2014","journal-title":"IEEE Transactions on Very Large Scale Integration Systems. IEEE, 1005--1016."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853195"},{"key":"e_1_2_1_51_1","unstructured":"QUADRO FOR DESKTOP WORKSTATIONS. Retrieved from http:\/\/www.nvidia.com\/object\/quadro-desktop-gpus.html.  QUADRO FOR DESKTOP WORKSTATIONS. Retrieved from http:\/\/www.nvidia.com\/object\/quadro-desktop-gpus.html."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540718"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1723112.1723129"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48311-X_205"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370874"},{"volume-title":"Redwood Shores","year":"2012","author":"Sun Helen","key":"e_1_2_1_57_1"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688538"},{"volume-title":"Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE\u201914)","year":"2014","author":"Wang Yu","key":"e_1_2_1_59_1"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2011.5767669"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531666.1531668"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-10665-1_71"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1978.1055934"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2997649","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2997649","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:50:32Z","timestamp":1750218632000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2997649"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,5]]},"references-count":64,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,7,31]]}},"alternative-id":["10.1145\/2997649"],"URL":"https:\/\/doi.org\/10.1145\/2997649","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"type":"print","value":"1550-4832"},{"type":"electronic","value":"1550-4840"}],"subject":[],"published":{"date-parts":[[2017,5]]},"assertion":[{"value":"2016-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-05-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}