{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T10:54:36Z","timestamp":1762080876145,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T00:00:00Z","timestamp":1671753600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme","award":["863059","P2-0103","J6-2581","J5-2554"],"award-info":[{"award-number":["863059","P2-0103","J6-2581","J5-2554"]}]},{"name":"Slovenian Research Agency (ARRS) core research programme Knowledge Technologies","award":["863059","P2-0103","J6-2581","J5-2554"],"award-info":[{"award-number":["863059","P2-0103","J6-2581","J5-2554"]}]},{"name":"Computer-assisted multilingual news discourse analysis with contextual embeddings","award":["863059","P2-0103","J6-2581","J5-2554"],"award-info":[{"award-number":["863059","P2-0103","J6-2581","J5-2554"]}]},{"name":"Quantitative and qualitative analysis of the unregulated corporate financial reporting","award":["863059","P2-0103","J6-2581","J5-2554"],"award-info":[{"award-number":["863059","P2-0103","J6-2581","J5-2554"]}]},{"name":"Ministry of Culture of Republic of Slovenia","award":["863059","P2-0103","J6-2581","J5-2554"],"award-info":[{"award-number":["863059","P2-0103","J6-2581","J5-2554"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>With the increasing amounts of available data, learning simultaneously from different types of inputs is becoming necessary to obtain robust and well-performing models. With the advent of representation learning in recent years, lower-dimensional vector-based representations have become available for both images and texts, while automating simultaneous learning from multiple modalities remains a challenging problem. This paper presents an AutoML (automated machine learning) approach to automated machine learning model configuration identification for data composed of two modalities: texts and images. The approach is based on the idea of representation evolution, the process of automatically amplifying heterogeneous representations across several modalities, optimized jointly with a collection of fast, well-regularized linear models. The proposed approach is benchmarked against 11 unimodal and multimodal (texts and images) approaches on four real-life benchmark datasets from different domains. It achieves competitive performance with minimal human effort and low computing requirements, enabling learning from multiple modalities in automated manner for a wider community of researchers.<\/jats:p>","DOI":"10.3390\/make5010001","type":"journal-article","created":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T03:55:21Z","timestamp":1671767721000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Multimodal AutoML via Representation Evolution"],"prefix":"10.3390","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9916-8756","authenticated-orcid":false,"given":"Bla\u017e","family":"\u0160krlj","sequence":"first","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matej","family":"Bevec","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nada","family":"Lavra\u010d","sequence":"additional","affiliation":[{"name":"Department of Knowledge Technologies, Jo\u017eef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia"},{"name":"School of Engineering and Management, University of Nova Gorica, Glavni trg 8, 5271 Vipava, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"106622","DOI":"10.1016\/j.knosys.2020.106622","article-title":"AutoML: A Survey of the State-of-the-Art","volume":"212","author":"He","year":"2021","journal-title":"Knowl.-Based Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/MCSE.2017.29","article-title":"The end of moore\u2019s law: A new beginning for information technology","volume":"19","author":"Theis","year":"2017","journal-title":"Comput. Sci. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s10462-013-9406-y","article-title":"Metalearning: A survey of trends and technologies","volume":"44","author":"Lemke","year":"2015","journal-title":"Artif. Intell. Rev."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hutter, F., Kotthoff, L., and Vanschoren, J. (2019). Meta-Learning. Automated Machine Learning: Methods, Systems, Challenges, Springer International Publishing.","DOI":"10.1007\/978-3-030-05318-5"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1","author":"Wolpert","year":"1997","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., and Leyton-Brown, K. (2019). Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. Automated Machine Learning, Springer.","DOI":"10.1007\/978-3-030-05318-5_4"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., and Hutter, F. (2019). Auto-sklearn: Efficient and robust automated machine learning. Automated Machine Learning, Springer.","DOI":"10.1007\/978-3-030-05318-5_6"},{"key":"ref_8","unstructured":"Olson, R.S., and Moore, J.H. (2016, January 20\u201322). TPOT: A tree-based pipeline optimization tool for automating machine learning. Proceedings of the Workshop on Automatic Machine Learning, New York, NY, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yang, C., Akimoto, Y., Kim, D.W., and Udell, M. (2019, January 4\u20138). OBOE: Collaborative filtering for AutoML model selection. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330909"},{"key":"ref_10","unstructured":"Wang, C., Wu, Q., Weimer, M., and Zhu, E.E. (2021, January 4\u20137). FLAML: A Fast and Lightweight AutoML Library. Proceedings of the 4th Conference on Machine Learning and Systems (MLSys 2021), San Jose, CA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1495","DOI":"10.1007\/s10994-018-5735-z","article-title":"ML-Plan: Automated machine learning via hierarchical planning","volume":"107","author":"Mohr","year":"2018","journal-title":"Mach. Learn."},{"key":"ref_12","unstructured":"Thomas, J., Coors, S., and Bischl, B. (2018, January 14). Automatic Gradient Boosting. Proceedings of the International Workshop on Automatic Machine Learning at ICML, Stockholm, Sweden."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gijsbers, P., and Vanschoren, J. (2021, January 13). GAMA: A General Automated Machine learning Assistant. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bilbao, Spain.","DOI":"10.1007\/978-3-030-67670-4_39"},{"key":"ref_14","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Pareja, A., Domeniconi, G., Chen, J., Ma, T., Suzumura, T., Kanezashi, H., Kaler, T., Schardl, T.B., and Leiserson, C.E. (2020, January 7\u201312). EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i04.5984"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Jin, H., Song, Q., and Hu, X. (2019, January 4\u20138). Auto-Keras: An efficient neural architecture search system. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330648"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Thornton, C., Hutter, F., Hoos, H.H., and Leyton-Brown, K. (2013, January 11\u201314). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA.","DOI":"10.1145\/2487575.2487629"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., and Le, Q.V. (2018). Learning Transferable Architectures for Scalable Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2018.00907"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Elsken, T., Staffler, B., Metzen, J.H., and Hutter, F. (2020). Meta-Learning of Neural Architectures for Few-Shot Learning. arXiv.","DOI":"10.1109\/CVPR42600.2020.01238"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1007\/s10618-021-00737-9","article-title":"Dataset2vec: Learning dataset meta-features","volume":"35","author":"Jomaa","year":"2021","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Humm, B.G., and Zender, A. (2021, January 17\u201320). An ontology-based concept for meta automl. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Greece.","DOI":"10.1007\/978-3-030-79150-6_10"},{"key":"ref_22","unstructured":"Davis, L. (1991). Handbook of Genetic Algorithms, Van Nostrand Reinhold. [1st ed.]."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Doerr, B., Le, H.P., Makhmara, R., and Nguyen, T.D. (2017, January 15\u201319). Fast genetic algorithms. Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany.","DOI":"10.1145\/3071178.3071301"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1109\/TEVC.2017.2745715","article-title":"Standard steady state genetic algorithms can hillclimb faster than mutation-only evolutionary algorithms","volume":"22","author":"Corus","year":"2017","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"105903","DOI":"10.1016\/j.asoc.2019.105903","article-title":"Optimization strategies for Microgrid energy management systems by Genetic Algorithms","volume":"86","author":"Leonori","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"106623","DOI":"10.1016\/j.ymssp.2020.106623","article-title":"Intelligent vehicle network system and smart city management based on genetic algorithms and image perception","volume":"141","author":"Li","year":"2020","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_27","unstructured":"Shi, X., Mueller, J., Erickson, N., Li, M., and Smola, A. (2021, January 23). Multimodal AutoML on Structured Tables with Text Fields. Proceedings of the 8th ICML Workshop on Automated Machine Learning (AutoML), Virtual."},{"key":"ref_28","unstructured":"Shi, X., Mueller, J., Erickson, N., Li, M., and Smola, A.J. (2021, January 6\u201314). Benchmarking Multimodal AutoML for Tabular Data with Text Fields. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Online."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1109\/MSP.2017.2738401","article-title":"Deep Multimodal Learning: A Survey on Recent Advances and Trends","volume":"34","author":"Ramachandram","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1023\/A:1015059928466","article-title":"Evolution strategies\u2014A comprehensive introduction","volume":"1","author":"Beyer","year":"2002","journal-title":"Nat. Comput."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1007\/s10994-021-05968-x","article-title":"autoBOT: Evolving neuro-symbolic representations for explainable low resource text classification","volume":"110","author":"Martinc","year":"2021","journal-title":"Mach. Learn."},{"key":"ref_32","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 13\u201314). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_33","unstructured":"Kaplenko, M. (2022, November 20). Multimodal Classification. Available online: https:\/\/github.com\/xkaple01\/multimodal-classification."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Reed, S., Akata, Z., Lee, H., and Schiele, B. (2016, January 27\u201330). Learning deep representations of fine-grained visual descriptions. Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.13"},{"key":"ref_35","unstructured":"Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., and Perona, P. (2010). Caltech-UCSD Birds 200, California Institute of Technology. Technical Report CNS-TR-2010-001."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zlatkova, D., Nakov, P., and Koychev, I. (2019). Fact-checking meets fauxtography: Verifying claims about images. arXiv.","DOI":"10.18653\/v1\/D19-1216"},{"key":"ref_37","unstructured":"Nakamura, K., Levy, S., and Wang, W.Y. (2019). r\/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. arXiv."},{"key":"ref_38","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_39","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Curran Associates, Inc."},{"key":"ref_40","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_41","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_43","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_44","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1093\/bioinformatics\/btz470","article-title":"Scaling tree-based automated machine learning to biomedical big data with a feature set selector","volume":"36","author":"Le","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_47","first-page":"16857","article-title":"Mpnet: Masked and permuted pre-training for language understanding","volume":"33","author":"Song","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"861","DOI":"10.21105\/joss.00861","article-title":"UMAP: Uniform Manifold Approximation and Projection","volume":"3","author":"McInnes","year":"2018","journal-title":"J. Open Source Softw."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/5\/1\/1\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:48:43Z","timestamp":1760147323000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/5\/1\/1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,23]]},"references-count":48,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["make5010001"],"URL":"https:\/\/doi.org\/10.3390\/make5010001","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2022,12,23]]}}}