{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T11:28:38Z","timestamp":1771068518652,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,7,15]],"date-time":"2020-07-15T00:00:00Z","timestamp":1594771200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,7,15]],"date-time":"2020-07-15T00:00:00Z","timestamp":1594771200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Soft Comput"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Nowadays, embedded systems are comprised of heterogeneous multi-core architectures, i.e., CPUs and GPUs. If the application is mapped to an appropriate processing core, then these architectures provide many performance benefits to applications. Typically, programmers map sequential applications to CPU and parallel applications to GPU. The task mapping becomes challenging because of the usage of evolving and complex CPU- and GPU-based architectures. This paper presents an approach to map the OpenCL application to heterogeneous multi-core architecture by determining the application suitability and processing capability. The classification is achieved by developing a machine learning-based device suitability classifier that predicts which processor has the highest computational compatibility to run OpenCL applications. In this paper, 20 distinct features are proposed that are extracted by using the developed LLVM-based static analyzer. In order to select the best subset of features, feature selection is performed by using both correlation analysis and the feature importance method. For the class imbalance problem, we use and compare synthetic minority over-sampling method with and without feature selection. Instead of hand-tuning the machine learning classifier, we use the tree-based pipeline optimization method to select the best classifier and its hyper-parameter. We then compare the optimized selected method with traditional algorithms, i.e., random forest, decision tree, Na\u00efve Bayes and KNN. We apply our novel approach on extensively used OpenCL benchmarks, i.e., AMD and Polybench. The dataset contains 653 training and 277 testing applications. We test the classification results using four performance metrics, i.e., <jats:italic>F<\/jats:italic>-measure, precision, recall and <jats:inline-formula><jats:alternatives><jats:tex-math>$$R^2$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:msup>\n                    <mml:mi>R<\/mml:mi>\n                    <mml:mn>2<\/mml:mn>\n                  <\/mml:msup>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>. The optimized and reduced feature subset model achieved a high <jats:italic>F<\/jats:italic>-measure of 0.91 and <jats:inline-formula><jats:alternatives><jats:tex-math>$$R^2$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:msup>\n                    <mml:mi>R<\/mml:mi>\n                    <mml:mn>2<\/mml:mn>\n                  <\/mml:msup>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> of 0.76. The proposed framework automatically distributes the workload based on the application requirement and processor compatibility.<\/jats:p>","DOI":"10.1007\/s00500-020-05152-8","type":"journal-article","created":{"date-parts":[[2020,7,15]],"date-time":"2020-07-15T10:04:38Z","timestamp":1594807478000},"page":"407-420","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3933-4273","authenticated-orcid":false,"given":"Usman","family":"Ahmed","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8768-9709","authenticated-orcid":false,"given":"Jerry Chun-Wei","family":"Lin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gautam","family":"Srivastava","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Aleem","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,7,15]]},"reference":[{"key":"5152_CR5","doi-asserted-by":"crossref","unstructured":"Ahmed U, Zafar L, Qayyum F, Islam MA (2018) Irony detector at semeval\u20142018 task 3: irony detection in English tweets using word graph. In: The international workshop on semantic evaluation, pp 581\u2013586","DOI":"10.18653\/v1\/S18-1095"},{"key":"5152_CR2","doi-asserted-by":"crossref","unstructured":"Ahmed U, Aleem M, Noman Khalid Y, Arshad Islam M, Azhar Iqbal M (2019a) RALB-HC: a resource-aware load balancer for heterogeneous cluster. In: Concurrency and computation: practice and experience, p e5606","DOI":"10.1002\/cpe.5606"},{"key":"5152_CR4","doi-asserted-by":"crossref","unstructured":"Ahmed U, Liaquat H, Ahmed L, Hussain SJ (2019b) Suggestion miner at semeval-2019 task 9: suggestion detection in online forum using word graph. In: The international workshop on semantic evaluation, pp 1242\u20131246","DOI":"10.18653\/v1\/S19-2218"},{"key":"5152_CR1","doi-asserted-by":"publisher","first-page":"6635","DOI":"10.1007\/s00500-019-04303-w","volume":"34","author":"U Ahmed","year":"2020","unstructured":"Ahmed U, Waqas H, Afzal MT (2020) Pre-production box-office success quotient forecasting. Soft Comput 34:6635\u20136653","journal-title":"Soft Comput"},{"key":"5152_CR6","doi-asserted-by":"crossref","unstructured":"Albayrak OE, Akturk I, Ozturk O (2012) Effective kernel mapping for OpenCL applications in heterogeneous platforms. In: The international conference on parallel processing workshops, pp 81\u201388","DOI":"10.1109\/ICPPW.2012.14"},{"key":"5152_CR7","doi-asserted-by":"crossref","unstructured":"Alizadeh NS, Momtazpour M (2020) Machine learning-based interference detection in GPGPU concurrent kernel execution. In: The international computer conference. Computer Society of Iran, pp 1\u20134","DOI":"10.1109\/CSICC49403.2020.9050074"},{"key":"5152_CR8","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1109\/MCOM.2019.1800624","volume":"57","author":"M Aloqaily","year":"2019","unstructured":"Aloqaily M, Ridhawi IA, Salameh HB, Jararweh Y (2019) Data and service management in densely crowded environments: challenges, opportunities, and recent dvelopments. IEEE Commun Mag 57:81\u201387","journal-title":"IEEE Commun Mag"},{"key":"5152_CR9","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/978-3-030-38557-6_8","volume-title":"Handbook of big data privacy","author":"M Amrollahi","year":"2020","unstructured":"Amrollahi M, Hadayeghparast S, Karimipour H, Derakhshan F, Srivastava G (2020) Enhancing network security via machine learning: opportunities and challenges. In: Choo K-KR, Dehghantanha A (eds) Handbook of big data privacy. Springer, Berlin, pp 165\u2013189"},{"issue":"2","key":"5152_CR10","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1002\/cpe.1631","volume":"23","author":"C Augonnet","year":"2011","unstructured":"Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187\u2013198","journal-title":"Concurr Comput Pract Exp"},{"key":"5152_CR11","doi-asserted-by":"crossref","unstructured":"Becchi M, Byna S, Cadambi S, Chakradhar S (2010) Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In: ACM symposium on parallelism in algorithms and architectures, pp 82\u201391","DOI":"10.1145\/1810479.1810498"},{"key":"5152_CR12","doi-asserted-by":"crossref","unstructured":"Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9. Article no. 57","DOI":"10.1145\/2400682.2400716"},{"key":"5152_CR13","doi-asserted-by":"publisher","first-page":"886","DOI":"10.1007\/s11227-013-0870-6","volume":"65","author":"HJ Choi","year":"2013","unstructured":"Choi HJ, Son DO, Kang SG, Kim JM, Lee H-H, Kim CH (2013) An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. J Supercomput 65:886\u2013902","journal-title":"J Supercomput"},{"key":"5152_CR14","doi-asserted-by":"crossref","unstructured":"Daraghmeh M, Al Ridhawi I, Aloqaily M, Jararweh Y, Agarwal A (2019) A power management approach to reduce energy consumption for edge computing servers. In: Fourth international conference on fog and mobile edge computing, pp 259\u2013264","DOI":"10.1109\/FMEC.2019.8795328"},{"key":"5152_CR15","doi-asserted-by":"publisher","first-page":"1750008","DOI":"10.1142\/S0129626417500086","volume":"27","author":"A Ghose","year":"2017","unstructured":"Ghose A, Dokara L, Dey S, Mitra P (2017) A framework for opencl task scheduling on heterogeneous multicores. Parallel Process Lett 27:1750008","journal-title":"Parallel Process Lett"},{"key":"5152_CR16","doi-asserted-by":"crossref","unstructured":"Ghose A, Dey S, Mitra P, Chaudhuri M (2016) Divergence aware automated partitioning of OpenCL workloads. In: The India software engineering conference, pp 131\u2013135","DOI":"10.1145\/2856636.2856639"},{"key":"5152_CR17","unstructured":"Gregg C, Brantley J, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: USENIX workshop on hot topics in parallelism, pp 1\u201310"},{"key":"5152_CR18","doi-asserted-by":"crossref","unstructured":"Grewe D, O\u2019Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: The international conference on compiler construction, pp 286\u2013305","DOI":"10.1007\/978-3-642-19861-8_16"},{"key":"5152_CR19","doi-asserted-by":"crossref","unstructured":"Grewe D, Wang Z, O\u2019Boyle MFP (2013) Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: IEEE\/ACM international symposium on code generation and optimization, pp 1\u201310","DOI":"10.1109\/CGO.2013.6494993"},{"key":"5152_CR20","doi-asserted-by":"crossref","unstructured":"Hechtman BA, Sorin DJ (2013) Exploring memory consistency for massively threaded throughput-oriented processors. In: Annual international symposium on computer architecture, pp 201\u2013212","DOI":"10.1145\/2508148.2485940"},{"key":"5152_CR21","doi-asserted-by":"crossref","unstructured":"Huchant P, Counilh MC, Barthou D (2016) Automatic OpenCL task adaptation for heterogeneous architectures. In: European conference on parallel processing, pp 684\u2013696","DOI":"10.1007\/978-3-319-43659-3_50"},{"key":"5152_CR22","doi-asserted-by":"publisher","first-page":"47379","DOI":"10.1109\/ACCESS.2019.2906863","volume":"7","author":"Z Iftikha","year":"2019","unstructured":"Iftikha Z, Jangsher S, Qureshi HK, Aloqaily M (2019) Resource efficient allocation and RRH placement for backhaul of moving small cells. IEEE Access 7:47379\u201347389","journal-title":"IEEE Access"},{"key":"5152_CR23","doi-asserted-by":"crossref","unstructured":"Ishtiaq A, Islam MA, Iqbal MA, Aleem M, Ahmed U (2019) Graph centrality based spam SMS detection. In: The international Bhurban conference on applied sciences and technology, pp 629\u2013633","DOI":"10.1109\/IBCAST.2019.8667174"},{"key":"5152_CR24","doi-asserted-by":"publisher","first-page":"5399","DOI":"10.1007\/s11227-018-2435-1","volume":"74","author":"YN Khalid","year":"2018","unstructured":"Khalid YN, Aleem M, Prodan R, Iqbal MA, Islam MA (2018) E-OSched: a load balancing scheduler for heterogeneous multicores. J Supercomput 74:5399\u20135431","journal-title":"J Supercomput"},{"key":"5152_CR25","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1016\/j.jpdc.2019.05.015","volume":"132","author":"YN Khalid","year":"2019","unstructured":"Khalid YN, Aleem M, Ahmed U, Islam MA, Iqbal MA (2019) Troodon: a machine-learning based load-balancing application scheduler for CPU\u2013GPU system. J Parallel Distrib Comput 132:79\u201394","journal-title":"J Parallel Distrib Comput"},{"key":"5152_CR26","doi-asserted-by":"crossref","unstructured":"Khan NA, Latif MB, Pervaiz N, Baig M, Khatoon H, Baig MZ, Burney A (2019) Smart scheduler for CUDA programming in heterogeneous CPU\/GPU environment. In: The international conference on computer modeling and simulation, pp 250\u2013253","DOI":"10.1145\/3307363.3307377"},{"key":"5152_CR27","doi-asserted-by":"crossref","unstructured":"Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input sensitive approach for heterogeneous task partitioning. In: The international conference on supercomputing, pp 149\u2013160","DOI":"10.1145\/2464996.2465007"},{"key":"5152_CR28","unstructured":"Krishna M, Wang Z, O\u2019Boyle MFP (2013) Smart, adaptive mapping of parallelism in the presence of external workload. In: IEEE\/ACM international symposium on code generation and optimization, pp 1\u201310"},{"key":"5152_CR29","first-page":"1","volume":"5","author":"C Lattner","year":"2008","unstructured":"Lattner C (2008) LLVM and clang: next generation compiler technology. BSD Conf 5:1\u201310","journal-title":"BSD Conf"},{"key":"5152_CR30","doi-asserted-by":"crossref","unstructured":"Lee VW, Kim C, Chhugani J, Deisher M, Kim D, Nguyen AD, Satish N, Smelyanskiy M, Chennupaty S, Hammarlund P, Singhal R, Dubey P (2010) Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In: Annual international symposium on computer architecture, pp 451\u2013460","DOI":"10.1145\/1816038.1816021"},{"key":"5152_CR31","doi-asserted-by":"crossref","unstructured":"Luk CK, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Annual IEEE\/ACM international symposium on microarchitecture, pp 45\u201355","DOI":"10.1145\/1669112.1669121"},{"key":"5152_CR32","doi-asserted-by":"crossref","unstructured":"P\u00e9rez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Annual workshop on general purpose processing using graphics processing unit, pp 42\u201351","DOI":"10.1145\/2884045.2884051"},{"key":"5152_CR33","doi-asserted-by":"crossref","unstructured":"Ravi VT, Agrawal G (2010) A dynamic scheduling framework for emerging heterogeneous systems. In: IEEE international conference on high performance computing, pp 1\u201310","DOI":"10.1109\/HiPC.2011.6152724"},{"key":"5152_CR34","doi-asserted-by":"crossref","unstructured":"Ravi VT, Becchi M, Jiang W, Agrawal G, Chakradhar S (2012) Scheduling concurrent applications on a cluster of CPU\u2013GPU nodes. In: IEEE\/ACM international symposium on cluster, cloud and grid computing, pp 140\u2013147","DOI":"10.1109\/CCGrid.2012.78"},{"key":"5152_CR35","first-page":"776","volume":"8","author":"GT Reddy","year":"2020","unstructured":"Reddy GT, Reddy MPK, Lakshmanna K, Kaluri R, Rajput DS, Srivastava G, Baker T (2020) Analysis of dimensionality reduction techniques on big data. IEEE Access 8:776\u2013788","journal-title":"IEEE Access"},{"key":"5152_CR36","doi-asserted-by":"publisher","DOI":"10.1016\/j.iot.2019.100111","author":"J Sakhnini","year":"2019","unstructured":"Sakhnini J, Karimipour H, Dehghantanha A, Parizi RM, Srivastava G (2019) Security aspects of internet of things aided smart grids: a bibliometric survey. Internet Things. https:\/\/doi.org\/10.1016\/j.iot.2019.100111","journal-title":"Internet Things"},{"key":"5152_CR37","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1109\/MCSE.2010.69","volume":"12","author":"JE Stone","year":"2019","unstructured":"Stone JE, Gohara D, Shi G (2019) Opencl: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12:66\u201373","journal-title":"Comput. Sci. Eng."},{"key":"5152_CR38","doi-asserted-by":"crossref","unstructured":"Taylor B, Marco VS, Wang Z (2017) Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In: ACM SIGPLAN\/SIGBED conference on languages, compilers, and tools for embedded systems, pp 11\u201320","DOI":"10.1145\/3140582.3081040"},{"key":"5152_CR39","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1007\/s10723-015-9340-0","volume":"14","author":"A Tchernykh","year":"2016","unstructured":"Tchernykh A, Lozano L, Schwiegelshohn U, Bouvry P, Pecero JE, Nesmachnow S, Drozdov AY (2016) Online bi-objective scheduling for IaaS clouds ensuring quality of service. J Grid Comput 14:5\u201322","journal-title":"J Grid Comput"},{"key":"5152_CR40","first-page":"4516","volume":"1","author":"N Tsog","year":"2019","unstructured":"Tsog N, Becker M, Bruhn F, Behnam M, Sj\u00f6din M (2019) Static allocation of parallel tasks to improve schedulability in CPU\u2013GPU heterogeneous real time systems. Annu Conf IEEE Ind Electron Soc 1:4516\u20134522","journal-title":"Annu Conf IEEE Ind Electron Soc"},{"key":"5152_CR41","doi-asserted-by":"crossref","unstructured":"Wen Y, O\u2019Boyle MF (2017) Merge or separate? Multi-job scheduling for OpenCL kernels on CPU\/GPU platforms. In: The general purpose GPUs, pp 22\u201331","DOI":"10.1145\/3038228.3038235"},{"key":"5152_CR42","doi-asserted-by":"crossref","unstructured":"Wen Y, Wang Z, O\u2019Boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU\/GPU heterogeneous platforms. In: The international conference on high performance computing, pp 1960\u20131965","DOI":"10.1109\/HiPC.2014.7116910"},{"key":"5152_CR43","unstructured":"Zafar L, Afzal MT, Ahmed U (2017) Exploiting polarity features for developing sentiment analysis tool. In: The international workshop on emotions, modality, sentiment analysis and the semantic web, pp 1\u201310"}],"container-title":["Soft Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00500-020-05152-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00500-020-05152-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00500-020-05152-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,15]],"date-time":"2021-07-15T00:02:48Z","timestamp":1626307368000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00500-020-05152-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,15]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["5152"],"URL":"https:\/\/doi.org\/10.1007\/s00500-020-05152-8","relation":{},"ISSN":["1432-7643","1433-7479"],"issn-type":[{"value":"1432-7643","type":"print"},{"value":"1433-7479","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,15]]},"assertion":[{"value":"15 July 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"The authors declare that there are no conflicts of interest in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical standard"}}]}}