{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:34:15Z","timestamp":1760240055417,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,3,18]],"date-time":"2019-03-18T00:00:00Z","timestamp":1552867200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.<\/jats:p>","DOI":"10.3390\/informatics6010012","type":"journal-article","created":{"date-parts":[[2019,3,18]],"date-time":"2019-03-18T12:18:53Z","timestamp":1552911533000},"page":"12","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Improvement in the Efficiency of a Distributed Multi-Label Text Classification Algorithm Using Infrastructure and Task-Related Data"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3019-8364","authenticated-orcid":false,"given":"Martin","family":"Sarnovsky","sequence":"first","affiliation":[{"name":"Department of Cybernetics and Artificial Intelligence, Technical University Ko\u0161ice, Letn\u00e1 9\/A, 040 01 Ko\u0161ice, Slovakia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1128-0701","authenticated-orcid":false,"given":"Marek","family":"Olejnik","sequence":"additional","affiliation":[{"name":"Department of Cybernetics and Artificial Intelligence, Technical University Ko\u0161ice, Letn\u00e1 9\/A, 040 01 Ko\u0161ice, Slovakia"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,18]]},"reference":[{"key":"ref_1","unstructured":"Feldman, R., Feldman, R., and Dagan, I. (1995, January 20\u201321). Knowledge Discovery in Textual Databases (KDT). Proceedings of the The First International Conference on Knowledge Discovery and Data Mining, Montreal, QC, Canada."},{"key":"ref_2","first-page":"13","article-title":"The CRISP-DM model: The New Blueprint for Data Mining","volume":"5","author":"Shearer","year":"2000","journal-title":"J. Data Wareh."},{"key":"ref_3","first-page":"217","article-title":"A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)","volume":"12","author":"Shafique","year":"2014","journal-title":"Innov. Space Sci. Res."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/jdwm.2007070101","article-title":"Multi-Label Classification: An Overview","volume":"3","author":"Tsoumakas","year":"2007","journal-title":"Int. J. Data Wareh. Min."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Weinman, J.J., Lidaka, A., and Aggarwal, S. (2011). Large-scale machine learning. GPU Computing Gems Emerald Edition, Elsevier.","DOI":"10.1016\/B978-0-12-384988-5.00019-X"},{"key":"ref_6","first-page":"80","article-title":"A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees","volume":"1","author":"Caragea","year":"2004","journal-title":"Int. J. Hybrid Intell. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Haldankar, A., and Bhowmick, K. (2016, January 19). A MapReduce based approach for classification. Proceedings of the 2016 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, India.","DOI":"10.1109\/GET.2016.7916756"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Shanahan, J., and Dai, L. (2017, January 3\u20137). Large Scale Distributed Data Science from scratch using Apache Spark 2.0. Proceedings of the 26th International Conference on World Wide Web Companion\u2014WWW \u201917 Companion, Perth, Australia.","DOI":"10.1145\/3041021.3051108"},{"key":"ref_9","first-page":"1426","article-title":"PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce","volume":"2","author":"Panda","year":"2009","journal-title":"Learning"},{"key":"ref_10","first-page":"621","article-title":"Distributed Classification of Text Documents on Apache Spark Platform","volume":"Volume 9692","author":"Rutkowski","year":"2016","journal-title":"Artificial Intelligence and Soft Computing"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Abraham, A., Franke, K., and K\u00f6ppen, M. (2003). Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources. Intelligent Systems Design and Applications, Springer.","DOI":"10.1007\/978-3-540-44999-7"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Babbar, R., and Shoelkopf, B. (2017, January 6\u201310). DiSMEC\u2014Distributed Sparse Machines for Extreme Multi-label Classification. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining-WSDM \u201917, Cambridge, UK.","DOI":"10.1145\/3018661.3018741"},{"key":"ref_13","unstructured":"Babbar, R., and Sch\u00f6lkopf, B. (arXiv, 2018). Adversarial Extreme Multi-label Classification, arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhang, W., Yan, J., Wang, X., and Zha, H. (2018, January 11\u201314). Deep Extreme Multi-label Learning. Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval-ICMR \u201818, Yokohama, Japan.","DOI":"10.1145\/3206025.3206030"},{"key":"ref_15","unstructured":"Belyy, A., and Sholokhov, A. (arXiv, 2018). MEMOIR: Multi-class Extreme Classification with Inexact Margin, arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sun, X., Xu, J., Jiang, C., Feng, J., Chen, S.-S., and He, F. (2016). Extreme Learning Machine for Multi-Label Classification. Entropy, 18.","DOI":"10.3390\/e18060225"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sarnovsk\u00fd, M., Butka, P., Bedn\u00e1r, P., Babi\u010d, F., and Parali\u010d, J. (2015). Analytical platform based on Jbowl library providing text-mining services in distributed environment. In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Information and Communication Technology-EurAsia Conference, Springer.","DOI":"10.1007\/978-3-319-24315-3_32"},{"key":"ref_18","unstructured":"Gualtieri, M. (2019, January 02). The Forrester WaveTM: In-Memory Data Grids, Q3. Available online: https:\/\/www.forrester.com\/report\/The+Forrester+Wave+InMemory+Data+Grids+Q3+2015\/-\/E-RES120420."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, C., Li, F., and Jestes, J. Efficient parallel kNN joins for large data in MapReduce. Proceedings of the Proceedings of the 15th International Conference on Extending Database Technology-EDBT \u201912, Berlin, Germany, 26\u201330 March 2012.","DOI":"10.1145\/2247596.2247602"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sarnovsky, M., and Ulbrik, Z. (2013, January 23\u201325). Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform. Proceedings of the SACI 2013-8th IEEE International Symposium on Applied Computational Intelligence and Informatics, Timisoara, Romania.","DOI":"10.1109\/SACI.2013.6608988"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Anchalia, P.P., Koundinya, A.K., and Srinath , N.K. (2013, January 24\u201326). MapReduce Design of K-Means Clustering Algorithm. Proceedings of the 2013 International Conference on Information Science and Applications (ICISA), Pattaya, Thailand.","DOI":"10.1109\/ICISA.2013.6579448"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhao, W., Ma, H., and He, Q. (2009). Parallel K-means clustering based on MapReduce. Proceedings Lecture Notes in Computer Science, Springer. Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics.","DOI":"10.1007\/978-3-642-10665-1_71"},{"key":"ref_23","unstructured":"Amado, N., and Silva, O. (2018, January 10\u201314). Exploiting Parallelism in Decision Tree Induction. In Parallel and Distributed computing for Machine Learning. Proceedings of the Conjunction 14th European Conference on Machine Learning ECML\u201903 7th European Conference Principles and Practice of Knowledge Discovery in Databases PKDD\u201903, Dublin, Ireland."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.future.2015.07.014","article-title":"Reliability-driven scheduling of time\/cost-constrained grid workflows","volume":"55","author":"Kianpisheh","year":"2016","journal-title":"Futur. Gener. Comput. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1007\/s10489-014-0640-z","article-title":"A novel approach to task assignment in a cooperative multi-agent design system","volume":"43","author":"Liu","year":"2015","journal-title":"Appl. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1134\/S106423071404008X","article-title":"Graph approach to job assignment in distributed real-time systems","volume":"53","author":"Gruzlikov","year":"2014","journal-title":"J. Comput. Syst. Sci. Int."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s10723-017-9410-6","article-title":"Adaptive Resource Allocation with Job Runtime Uncertainty","volume":"15","author":"Tchernykh","year":"2017","journal-title":"J. Grid Comput."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1691","DOI":"10.1007\/s10586-016-0625-2","article-title":"MrHeter: Improving MapReduce performance in heterogeneous environments","volume":"19","author":"Zhang","year":"2016","journal-title":"Clust. Comput."},{"key":"ref_29","unstructured":"Younes Hamed, A. (2019, January 02). Task Allocation for Minimizing Cost of Distributed Computing Systems Using Genetic Algorithms. Available online: https:\/\/www.semanticscholar.org\/paper\/Task-Allocation-for-Minimizing-Cost-of-Distributed-Hamed\/1dc02df36cbd55539369def9d2eed47a90c346c4."},{"key":"ref_30","first-page":"667","article-title":"Assignment Problems","volume":"6","year":"2002","journal-title":"Handb. Appl. Optim. Part II Appl."},{"key":"ref_31","first-page":"1","article-title":"Transportation, Assignment, and Transshipment Problems","volume":"41","author":"Winston","year":"2003","journal-title":"Oper. Res. Appl. Algorithm."},{"key":"ref_32","unstructured":"Kawajir, L. (2019, January 02). Waechter Introduction to IPOPT: A tutorial for downloading, installing, and using IPOPT. Available online: https:\/\/www.coin-or.org\/Ipopt\/documentation\/."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sarnovsky, M., and Kacur, T. (2012, January 24\u201326). Cloud-based classification of text documents using the Gridgain platform. Proceedings of the SACI 2012-7th IEEE International Symposium on Applied Computational Intelligence and Informatics, Timisoara, Romania.","DOI":"10.1109\/SACI.2012.6250009"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/6\/1\/12\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:38:43Z","timestamp":1760186323000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/6\/1\/12"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,18]]},"references-count":33,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["informatics6010012"],"URL":"https:\/\/doi.org\/10.3390\/informatics6010012","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2019,3,18]]}}}