{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:15:42Z","timestamp":1760058942932,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T00:00:00Z","timestamp":1746576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>This work presents two variants of an odd\u2013even sort algorithm that are implemented in a dataflow-based polymorphic computing architecture. The two odd\u2013even sort algorithms are the \u201cfully unrolled\u201d variant and the \u201ccompact\u201d variant. They are used as test kernels to evaluate the polymorphic computing architecture. Incidentally, these two odd\u2013even sort algorithm variants can be readily adapted to ASIC (Application-Specific Integrated Circuit) and FPGA (Field Programmable Gate Array) designs. Additionally, two methods of placing the sort algorithms\u2019 instructions in different configurations of the polymorphic computing architecture to achieve performance gains are furnished: a genetic-algorithm-based instruction placement method and a deterministic instruction placement method. Finally, a comparative study of the odd\u2013even sort algorithm in several configurations of the polymorphic computing architecture is presented. The results show that scaling up the number of processing cores in the polymorphic architecture to the maximum amount of instantaneously exploitable parallelism improves the speed of the sort algorithms. Additionally, the sort algorithms that were placed in the polymorphic computing architecture configurations by the genetic instruction placement algorithm generally performed better than when they were placed by the deterministic instruction placement algorithm.<\/jats:p>","DOI":"10.3390\/computers14050181","type":"journal-article","created":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T08:20:53Z","timestamp":1746606053000},"page":"181","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Parallel Sort Implementation and Evaluation in a Dataflow-Based Polymorphic Computing Architecture"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2944-2462","authenticated-orcid":false,"given":"David","family":"Hentrich","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2376-8325","authenticated-orcid":false,"given":"Erdal","family":"Oruklu","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2655-6950","authenticated-orcid":false,"given":"Jafar","family":"Saniie","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"358","DOI":"10.4236\/cs.2011.24049","article-title":"Polymorphic Computing: Definition, Trends, and a New Agent-Based Architecture","volume":"2","author":"Hentrich","year":"2011","journal-title":"Circuits Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1109\/MC.2004.65","article-title":"Scaling to the End of Silicon with EDGE Architectures","volume":"37","author":"Burger","year":"2004","journal-title":"IEEE Comput."},{"key":"ref_3","unstructured":"McDonald, R., Burger, D., Keckler, S.W., Sankaralingam, K., and Nagarajan, R. (2005). TRIPS Processor Reference Manual, Department of Computer Sciences, The University of Texas at Austin. Technical Report TR-05-19."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Gratz, P., Kim, C., McDonald, R., Keckler, S.W., and Burger, D. (2006, January 1\u20134). Implementation and Evaluation of On-Chip Network Architectures. Proceedings of the IEEE International Conference on Computer Design (ICCD 2006), San Jose, CA, USA.","DOI":"10.1109\/ICCD.2006.4380859"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/MM.2007.4378782","article-title":"On-Chip Interconnection Networks of the TRIPS Chip","volume":"27","author":"Gratz","year":"2007","journal-title":"IEEE Micro"},{"key":"ref_6","unstructured":"Gebhart, M., Maher, B.A., Coons, K.E., Diamond, J., Gratz, P., Marino, M., Ranganathan, N., Robatmili, B., Smith, A., and Burrill, J. (2008). An Evaluation of the TRIPS Computer System (Extended Technical Report), Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin. Technical Report TR-08-31."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1109\/MM.2002.997877","article-title":"The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs","volume":"22","author":"Taylor","year":"2002","journal-title":"IEEE Micro"},{"key":"ref_8","unstructured":"Michael, T. (2005). The Raw Prototype Design Document V5.02, Department of Electrical and Computer Science, Massachusetts Institue of Technology."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1038\/scientificamerican0899-60","article-title":"Raw Computation","volume":"281","author":"Agarwal","year":"1999","journal-title":"Sci. Am."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lee, W., Barua, R., Frank, M., Srikrishna, D., Babb, J., Sarkar, V., and Amarasinghe, S. (1998, January 2\u20137). Space-Time Scheduling of Instruction-Level Prallelism on a Raw Machine. Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA.","DOI":"10.1145\/291069.291018"},{"key":"ref_11","unstructured":"Granacki, J., and Vahey, M. (2003). MONARCH: A Morphable Networked Micro-ARCHitecture, USC\/Information Sciences Institute and Raytheon. Technical Report."},{"key":"ref_12","unstructured":"Granacki, J.J. (2005). MONARCH: Next Generation SoC (Supercomputer on a Chip), USC\/Information Sciences Institute. Technical Report."},{"key":"ref_13","unstructured":"Vahey, M., Granacki, J., Lewins, L., Davidoff, D., Draper, J., Steele, C., Groves, G., Kramer, M., LaCoss, J., and Prager, K. (2006, January 19\u201321). MONARCH: A First Generation Polymorphic Computing Processor. Proceedings of the 10th Annual High Performance Embedded Computing Workshop 2006 (HPEC 2006), Lexington, MA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1363","DOI":"10.1109\/TC.2004.104","article-title":"The MOLEN polymorphic processor","volume":"53","author":"Vassiliadis","year":"2004","journal-title":"IEEE Trans. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hoozemans, J., van Straten, J., and Wong, S. (2017, January 16\u201318). Using a polymorphic VLIW processor to improve schedulability and performance for mixed-criticality systems. Proceedings of the 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Hsinchu, Taiwan.","DOI":"10.1109\/RTCSA.2017.8046315"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2164","DOI":"10.1109\/TPDS.2018.2819670","article-title":"A Dataflow Processor as the Basis of a Tiled Polymorphic Computing Architecture with Fine-Grain Instruction Migration","volume":"29","author":"Hentrich","year":"2018","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_17","unstructured":"Hentrich, D.R. (2017). Operation Cell Data Processor Systems and Methods. (Application 15,844,810), U.S. Patent."},{"key":"ref_18","unstructured":"Hentrich, D. (2018). A Polymorphic Computing Architecture Based on a Dataflow Processor. [Ph.D. Thesis, Illinois Institute of Technology]."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hentrich, D., Oruklu, E., and Saniie, J. (2020, January 10\u201321). Fine-Grained Instruction Placement in Polymorphic Computing Architectures. Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Virtual.","DOI":"10.1109\/ISCAS45731.2020.9181277"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dennis, J.B., and Misunas, D.P. (1975, January 20\u201322). A Preliminary Architecture for a Basic Data-Flow Processor. Proceedings of the 2nd annual symposium on Computer Architecture (ISCA \u201975), New York, NY, USA.","DOI":"10.1145\/642089.642111"},{"key":"ref_21","unstructured":"Silc, J., Robic, B., and Ungerer, T. (1999). Processor Architecture: From Dataflow to Superscalar and Beyond, Springer."},{"key":"ref_22","unstructured":"Shiva, S.G. (1996). Pipelined and Parallel Computer Architectures, HarperCollins."},{"key":"ref_23","unstructured":"Hennessy, J.L., and Patterson, D.A. (2007). Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers. [4th ed.]."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1145\/2465.214917","article-title":"Reduced Instruction Set Computers","volume":"28","author":"Patterson","year":"1985","journal-title":"Commun. ACM"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/45.127642","article-title":"RISC-(reduced instruction set computers)","volume":"10","author":"Chow","year":"1991","journal-title":"IEEE Potentials"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1109\/MSPEC.1985.6370784","article-title":"Toward simpler, faster computers: By omitting unnecessary functions, designers of reduced-instruction-set computers increase system speed and hold down equipment costs","volume":"22","author":"Wallich","year":"1985","journal-title":"IEEE Spectr."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Batcher, K.E. (1968). Sorting Networks and Their Applications, ACM.","DOI":"10.1145\/1468075.1468121"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2387","DOI":"10.1016\/j.comnet.2009.04.007","article-title":"Improved layout of the odd-even sorting network","volume":"53","year":"2009","journal-title":"Comput. Netw."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lipu, A.R., Amin, R., Mondal, M.N.I., and Mamun, M.A. (2016, January 8\u201310). Exploiting parallelism for faster implementation of Bubble sort algorithm using FPGA. Proceedings of the 2016 2nd International Conference on Electrical, Computer Telecommunication Engineering (ICECTE), Rajshahi, Bangladesh.","DOI":"10.1109\/ICECTE.2016.7879576"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/0167-8191(87)90032-9","article-title":"Implementation of bubble sort and the odd-even transposition sort on a rack of transputers","volume":"4","author":"Modi","year":"1987","journal-title":"Parallel Comput."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Korat, U.A., Yadav, P., and Shah, H. (2017, January 19\u201321). An efficient hardware implementation of vector-based odd-even merge sorting. Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), New York, NY, USA.","DOI":"10.1109\/UEMCON.2017.8249010"},{"key":"ref_32","unstructured":"(2022, August 26). Odd-Even Sort. Available online: https:\/\/en.wikipedia.org\/wiki\/Odd-even_sort."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Amdahl, G.M. (1967, January 18\u201320). Validity of the single processor approach to achieving large scale computing capabilities. Proceedings of the Spring Joint Computer Conference, Atlantic City, NJ, USA.","DOI":"10.1145\/1465482.1465560"},{"key":"ref_34","unstructured":"Foster, I. (1995). Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering, Addison-Wesley."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Jmaa, Y.B., Ali, K.M., Duvivier, D., Jemaa, M.B., and Atitallah, R.B. (2017, January 17\u201321). An Efficient Hardware Implementation of TimSort and MergeSort Algorithms Using High Level Synthesis. Proceedings of the 2017 International Conference on High Performance Computing & Simulation (HPCS), Genoa, Italy.","DOI":"10.1109\/HPCS.2017.92"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1310","DOI":"10.1109\/12.895849","article-title":"An optimal hardware-algorithm for sorting using a fixed-size parallel sorting device","volume":"49","author":"Olarlu","year":"2000","journal-title":"IEEE Trans. Comput."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Skliarova, I., Sklyarov, V., Mihhailov, D., and Sudnitson, A. (2012, January 25\u201328). Implementation of sorting algorithms in reconfigurable hardware. Proceedings of the 2012 16th IEEE Mediterranean Electrotechnical Conference, Yasmine Hammamet, Tunisia.","DOI":"10.1109\/MELCON.2012.6196391"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Mihhailov, D., Sklyarov, V., Skliarova, I., and Sudnitson, A. (2010, January 13\u201315). Parallel FPGA-Based Implementation of Recursive Sorting Algorithms. Proceedings of the 2010 International Conference on Reconfigurable Computing and FPGAs, Cancun, Mexico.","DOI":"10.1109\/ReConFig.2010.30"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1601","DOI":"10.1109\/TVLSI.2019.2912554","article-title":"RTHS: A low-cost high-performance real-time hardware sorter, using a multidimensional sorting algorithm","volume":"27","author":"Norollah","year":"2019","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1076","DOI":"10.1109\/JSSC.2003.811982","article-title":"SORTCHIP: A VLSI implementation of a hardware algorithm for continuous data sorting","volume":"38","author":"Colavita","year":"2023","journal-title":"IEEE J. Solid-State Circuits"},{"key":"ref_41","first-page":"213","article-title":"A Comparative Study of Sorting Algorithms with FPGA Acceleration by High Level Synthesis","volume":"23","author":"Duvivier","year":"2019","journal-title":"Comput. Sist."},{"key":"ref_42","unstructured":"Alif, A.F., Islam, S.M.R., and Deb, P. (2019, January 11\u201312). Design and implementation of sorting algorithms based on FPGA. Proceedings of the 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh."},{"key":"ref_43","unstructured":"Jalilv, A.H., Banitaba, F.S., Estiri, S.N., Aygun, S., and Najafi, M.H. (2023). Sorting it out in Hardware: A State-of-the-Art Survey. arXiv."},{"key":"ref_44","unstructured":"Keahey, K., Anderson, J., Zhen, Z., Riteau, P., Ruth, P., Stanzione, D., Cevik, M., Colleran, J., Gunawi, H.S., and Hammock, C. (2020, January 15\u201317). Lessons Learned from the Chameleon Testbed. Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC \u201920), Online."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/5\/181\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:28:39Z","timestamp":1760030919000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/5\/181"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,7]]},"references-count":44,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["computers14050181"],"URL":"https:\/\/doi.org\/10.3390\/computers14050181","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,5,7]]}}}