{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,24]],"date-time":"2026-06-24T15:41:56Z","timestamp":1782315716367,"version":"3.54.5"},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Centre for Priority Research Area Artificial Intelligence and Robotics of Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme","award":["1820\/27\/Z01\/POB2\/2021"],"award-info":[{"award-number":["1820\/27\/Z01\/POB2\/2021"]}]},{"name":"Centre for Priority Research Area Artificial Intelligence and Robotics of Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme","award":["POIR.01.01.01-00-0066\/22"],"award-info":[{"award-number":["POIR.01.01.01-00-0066\/22"]}]},{"name":"National Centre for Research and Development in Poland, Smart Growth Operational Program for 2014-2020, Digital Innovations","award":["1820\/27\/Z01\/POB2\/2021"],"award-info":[{"award-number":["1820\/27\/Z01\/POB2\/2021"]}]},{"name":"National Centre for Research and Development in Poland, Smart Growth Operational Program for 2014-2020, Digital Innovations","award":["POIR.01.01.01-00-0066\/22"],"award-info":[{"award-number":["POIR.01.01.01-00-0066\/22"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Data processing in robotics is currently challenged by the effective building of multimodal and common representations. Tremendous volumes of raw data are available and their smart management is the core concept of multimodal learning in a new paradigm for data fusion. Although several techniques for building multimodal representations have been proven successful, they have not yet been analyzed and compared in a given production setting. This paper explored three of the most common techniques, (1) the late fusion, (2) the early fusion, and (3) the sketch, and compared them in classification tasks. Our paper explored different types of data (modalities) that could be gathered by sensors serving a wide range of sensor applications. Our experiments were conducted on Amazon Reviews, MovieLens25M, and Movie-Lens1M datasets. Their outcomes allowed us to confirm that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination. Consequently, we designed criteria for choosing this optimal data fusion technique.<\/jats:p>","DOI":"10.3390\/s23052381","type":"journal-article","created":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T02:08:34Z","timestamp":1677031714000},"page":"2381","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":166,"title":["Effective Techniques for Multimodal Data Fusion: A Comparative Analysis"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3096-9918","authenticated-orcid":false,"given":"Maciej","family":"Paw\u0142owski","sequence":"first","affiliation":[{"name":"Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa Street 75, 00-662 Warsaw, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3407-7570","authenticated-orcid":false,"given":"Anna","family":"Wr\u00f3blewska","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa Street 75, 00-662 Warsaw, Poland"},{"name":"WeSub, Adama Branickiego Street 17, 02-972 Warsaw, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5960-8131","authenticated-orcid":false,"given":"Sylwia","family":"Sysko-Roma\u0144czuk","sequence":"additional","affiliation":[{"name":"Faculty of Management, Warsaw University of Technology, Narbutta Street 85, 02-524 Warsaw, Poland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1109\/35.41402","article-title":"Integration of acoustic and visual speech signals using neural networks","volume":"27","author":"Yuhas","year":"1989","journal-title":"IEEE Commun. Mag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","article-title":"Multimodal Machine Learning: A Survey and Taxonomy","volume":"41","author":"Baltrusaitis","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"15377","DOI":"10.1109\/ACCESS.2020.2968154","article-title":"A Review of Hashing Methods for Multimodal Retrieval","volume":"8","author":"Cao","year":"2020","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1162\/neco_a_01273","article-title":"A Survey on Deep Learning for Multimodal Data Fusion","volume":"32","author":"Gao","year":"2020","journal-title":"Neural Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1016\/j.inffus.2022.09.025","article-title":"Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions","volume":"91","author":"Gandhi","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Tsanousa, A., Bektsis, E., Kyriakopoulos, C., Gonz\u00e1lez, A.G., Leturiondo, U., Gialampoukidis, I., Karakostas, A., Vrochidis, S., and Kompatsiaris, I. (2022). A Review of Multisensor Data Fusion Solutions in Smart Manufacturing: Systems and Trends. Sensors, 22.","DOI":"10.3390\/s22051734"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"He, R., and McAuley, J. (2016, January 11\u201315). Ups and Downsm: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. Proceedings of the 25th International Conference on World Wide Web, Montr\u00e9al, QC, Canada.","DOI":"10.1145\/2872427.2883037"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2827872","article-title":"The MovieLens Datasets: History and Context","volume":"5","author":"Harper","year":"2016","journal-title":"ACM Trans. Interact. Intell. Syst."},{"key":"ref_9","unstructured":"Varshney, K. (2021, March 23). Trust in Machine Learning, Manning Publications, Shelter Island, Chapter 4 Data Sources and Biases, Section 4.1 Modalities. Available online: https:\/\/livebook.manning.com\/book\/trust-in-machine-learning\/chapter-4\/v-2\/."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"104042","DOI":"10.1016\/j.imavis.2020.104042","article-title":"Deep multimodal fusion for semantic image segmentation: A survey","volume":"105","author":"Zhang","year":"2021","journal-title":"Image Vis. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.neucom.2020.05.087","article-title":"Multimodal multitask deep learning model for Alzheimer\u2019s disease progression detection based on time series data","volume":"412","author":"Abuhmed","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_12","unstructured":"Jaiswal, M., Bara, C.P., Luo, Y., Burzo, M., Mihalcea, R., and Provost, E.M. (2020, January 11\u201316). MuSE: A Multimodal Dataset of Stressed Emotion. Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Singh, A., Natarajan, V., Shah, M., Jiang, Y., Chen, X., Batra, D., Parikh, D., and Rohrbach, M. (2019, January 15\u201320). Towards VQA Models That Can Read. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00851"},{"key":"ref_14","unstructured":"Rychalska, B., Basaj, D.B., Dabrowski, J., and Daniluk, M. (2020). I know why you like this movie: Interpretable Efficient Multimodal Recommender. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"102316","DOI":"10.1016\/j.ipm.2020.102316","article-title":"A Comparative Study of Outfit Recommendation Methods with a Focus on Attention-based Fusion","volume":"57","author":"Laenen","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_16","first-page":"1","article-title":"Cornac: A Comparative Framework for Multimodal Recommender Systems","volume":"21","author":"Salah","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL, Minneapolis, MN, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation Learning: A Review and New Perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","first-page":"2949","article-title":"Multimodal learning with deep Boltzmann machines","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Frank, S., Bugliarello, E., and Elliott, D. (2021). Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.775"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gallo, I., Calefati, A., and Nawaz, S. (2017, January 9\u201315). Multimodal Classification Fusion in Real-World Scenarios. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan.","DOI":"10.1109\/ICDAR.2017.326"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kiela, D., Grave, E., Joulin, A., and Mikolov, T. (2018). Efficient Large-Scale Multi-Modal Classification. arXiv.","DOI":"10.1609\/aaai.v32i1.11945"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2939","DOI":"10.1007\/s00371-021-02166-7","article-title":"A survey on deep multimodal learning for computer vision: Advances, trends, applications, and datasets","volume":"38","author":"Bayoudh","year":"2022","journal-title":"Vis. Comput."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Dabrowski, J., Rychalska, B., Daniluk, M., Basaj, D., Goluchowski, K., Babel, P., Michalowski, A., and Jakubowski, A. (2020). An efficient manifold density estimator for all recommendation systems. arXiv.","DOI":"10.1007\/978-3-030-92273-3_27"},{"key":"ref_26","unstructured":"Wirojwatanakul, P., and Wangperawong, A. (2019). Multi-Label Product Categorization Using Multi-Modal Fusion Models. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Rychalska, B., Basaj, D., Dabrowski, J., and Daniluk, M. (2021). Cleora: A Simple, Strong and Scalable Graph Embedding Scheme. arXiv.","DOI":"10.1007\/978-3-030-92273-3_28"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1007\/s41060-019-00185-1","article-title":"A benchmarking study of classification techniques for behavioral data","volume":"9","author":"Martens","year":"2020","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. (2018, January 1). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium.","DOI":"10.18653\/v1\/W18-5446"},{"key":"ref_30","unstructured":"Liang, P.P., Lyu, Y., Fan, X., Wu, Z., Cheng, Y., Wu, J., Chen, L.Y., Wu, P., Lee, M.A., and Zhu, Y. MultiBench: Multiscale Benchmarks for Multimodal Representation Learning. Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), Available online: https:\/\/arxiv.org\/abs\/2107.07502."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"bbab569","DOI":"10.1093\/bib\/bbab569","article-title":"Multimodal deep learning for biomedical data fusion: A review","volume":"23","author":"Stahlschmidt","year":"2022","journal-title":"Briefings Bioinform."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.inffus.2020.07.006","article-title":"Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation","volume":"64","author":"Zhang","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3543848","article-title":"Multimodal Classification: Current Landscape, Taxonomy and Future Directions","volume":"55","author":"Sleeman","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_34","unstructured":"Liang, P.P., Zadeh, A., and Morency, L.P. (2022). Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2381\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:38:19Z","timestamp":1760121499000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2381"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,21]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052381"],"URL":"https:\/\/doi.org\/10.3390\/s23052381","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,21]]}}}