{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T11:17:29Z","timestamp":1775042249795,"version":"3.50.1"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T00:00:00Z","timestamp":1634256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>\n            In this work, we present a novel bitsliced high-performance Viterbi algorithm suitable for high-throughput and data-intensive communication. A new column-major data representation scheme coupled with the bitsliced architecture is employed in our proposed Viterbi decoder that enables the maximum utilization of the parallel processing units in modern parallel accelerators. With the help of the proposed alteration of the data scheme, instead of the conventional bit-by-bit operations, 32-bit chunks of data are processed by each processing unit. This means that a single bitsliced parallel Viterbi decoder is capable of decoding 32 different chunks of data simultaneously. Here, the Viterbi\u2019s Add-Compare-Select procedure is implemented with our proposed bitslicing technique, where it is shown that the bitsliced operations for the Viterbi internal functionalities are efficient in terms of their performance and complexity. We have achieved this level of high parallelism while keeping an acceptable bit error rate performance for our proposed methodology. Our suggested hard and soft-decision Viterbi decoder implementations on GPU platforms outperform the fastest previously proposed works by\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" content-type=\"gif\" xlink:href=\"3470642-inline1.gif\"\/>\n            <\/jats:inline-formula>\n            and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" content-type=\"gif\" xlink:href=\"3470642-inline2.gif\"\/>\n            <\/jats:inline-formula>\n            , achieving 21.41 and 8.24 Gbps on Tesla V100, respectively.\n          <\/jats:p>","DOI":"10.1145\/3470642","type":"journal-article","created":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T01:38:50Z","timestamp":1634434730000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["A High-throughput Parallel Viterbi Algorithm\u00a0via Bitslicing"],"prefix":"10.1145","volume":"8","author":[{"given":"Saleh Khalaj","family":"Monfared","sequence":"first","affiliation":[{"name":"Institute for Research in Fundamental Sciences (IPM), Tehran, Iran"}]},{"given":"Omid","family":"Hajihassani","sequence":"additional","affiliation":[{"name":"University of Alberta, Edmonton, Canada"}]},{"given":"Vahid","family":"Mohsseni","sequence":"additional","affiliation":[{"name":"Institute for Research in Fundamental Sciences (IPM), Tehran, Iran"}]},{"given":"Dara","family":"Rahmati","sequence":"additional","affiliation":[{"name":"Shahid Beheshti University (SBU), Tehran, Iran"}]},{"given":"Saeid","family":"Gorgin","sequence":"additional","affiliation":[{"name":"Iranian Research Organization for Science and Technology (IROST), Tehran, Iran"}]}],"member":"320","published-online":{"date-parts":[[2021,10,15]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-017-2120-9"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2011.2125954"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2009.2021379"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.5555\/1201104"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICC.1993.397371"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.5555\/647932.757246"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISTC.2016.7593092"},{"key":"e_1_3_1_9_2","unstructured":"T. M. Synchronization. 2017. Synchronization and Channel Coding. Report Concerning Space Data System Standard . Informational Report CCSDS (2017)."},{"key":"e_1_3_1_10_2","first-page":"1168","volume-title":"Proceedings of the 10th IEEE International Conference on Electronics, Circuits and Systems","volume":"3","author":"Cheng Shun-Wen","year":"2003","unstructured":"Shun-Wen Cheng. 2003. A high-speed magnitude comparator with small transistor count. In Proceedings of the 10th IEEE International Conference on Electronics, Circuits and Systems (ICECS\u201903), Vol. 3. IEEE, 1168\u20131171."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOCOM.1990.116778"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/26.31176"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/MVT.2013.2295069"},{"issue":"20","key":"e_1_3_1_14_2","article-title":"Intel AVX: New frontiers in performance improvements and energy efficiency","volume":"19","author":"Firasta Nadeem","year":"2008","unstructured":"Nadeem Firasta, Mark Buxton, Paula Jinbo, Kaveh Nasri, and Shihjong Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel white paper 19, 20 (2008).","journal-title":"Intel white paper"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1973.9030"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRAIE.2014.6909193"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2017.2766925"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOCOM.1989.64230"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2911278"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2016.2610187"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOCOM.1990.116774"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCOMM.2020.2966723"},{"key":"e_1_3_1_23_2","first-page":"1664","volume-title":"Proceedings of the IEEE Global Telecommunications Conference","volume":"3","author":"Lee Inkyu","year":"2000","unstructured":"Inkyu Lee and Jeff L. Sonntag. 2000. A new architecture for the fast Viterbi algorithm. In Proceedings of the IEEE Global Telecommunications Conference (Globecom\u201900), Vol. 3. IEEE, 1664\u20131668."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3093"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2005.849106"},{"key":"e_1_3_1_26_2","unstructured":"Shu Lin and Marc Fossorier. 1999. Tail Biting Trellis Representation of Codes: Decoding and Construction . Technical Report to NASA."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/79.410439"},{"key":"e_1_3_1_28_2","unstructured":"Alireza Mohammadidoost and Matin Hashemi. 2020. High-throughput and memory-efficient parallel viterbi decoder for convolutional codes on GPU. arXiv:2011.09337. Retrieved from https:\/\/arxiv.org\/abs\/2011.09337."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3409390.3409402"},{"key":"e_1_3_1_30_2","unstructured":"CUDA Nvidia. 2007. Compute unified device architecture programming guide. (2007)."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/WCSP.2016.7752638"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2013.2290831"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2013.2290831"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220379"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.3115\/116580.116591"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDSP.2015.7251985"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/2220077.2220227"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/PIMRCW.2019.8880815"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.bpj.2012.04.009"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1054010"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCOM.1971.1090700"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053453"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.5555\/197029"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2003.813250"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470642","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470642","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:55Z","timestamp":1750191535000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470642"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,15]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3470642"],"URL":"https:\/\/doi.org\/10.1145\/3470642","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"value":"2329-4949","type":"print"},{"value":"2329-4957","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,15]]},"assertion":[{"value":"2020-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}