{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T15:18:53Z","timestamp":1777043933918,"version":"3.51.4"},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T00:00:00Z","timestamp":1736812800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T00:00:00Z","timestamp":1736812800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Recently, multimodal data analysis in medical domain has started receiving a great attention. Researchers from both computer science, and medicine are trying to develop models to handle multimodal medical data. However, most of the published work have targeted the homogeneous multimodal data. The collection and preparation of heterogeneous multimodal data is a complex and time-consuming task. Further, development of models to handle such heterogeneous multimodal data is another challenge. This study presents a cross modal transformer-based fusion approach for multimodal clinical data analysis using medical images and clinical data. The proposed approach leverages the image embedding layer to convert image into visual tokens, and another clinical embedding layer to convert clinical data into text tokens. Further, a cross-modal transformer module is employed to learn a holistic representation of imaging and clinical modalities. The proposed approach was tested for a multi-modal lung disease tuberculosis data set. Further, the results are compared with recent approaches proposed in the field of multimodal medical data analysis. The comparison shows that the proposed approach outperformed the other approaches considered in the study. Another advantage of this approach is that it is faster to analyze heterogeneous multimodal medical data in comparison to existing methods used in the study, which is very important if we do not have powerful machines for computation.<\/jats:p>","DOI":"10.1186\/s40537-024-01054-w","type":"journal-article","created":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T02:56:36Z","timestamp":1736823396000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Transformer enabled multi-modal medical diagnosis for tuberculosis classification"],"prefix":"10.1186","volume":"12","author":[{"given":"Sachin","family":"Kumar","sequence":"first","affiliation":[]},{"given":"Shivani","family":"Sharma","sequence":"additional","affiliation":[]},{"given":"Kassahun Tadesse","family":"Megra","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,14]]},"reference":[{"key":"1054_CR1","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1016\/j.measurement.2019.05.07","volume":"145","author":"AK Jaiswal","year":"2019","unstructured":"Jaiswal AK, Tiwari P, Kumar S, Gupta D, Khanna A, Rodrigues JJPC. Identifying pneumonia in chest x-rays: a deep learning approach. Measurement. 2019;145:511\u20138. https:\/\/doi.org\/10.1016\/j.measurement.2019.05.07.","journal-title":"Measurement"},{"issue":"1","key":"1054_CR2","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1038\/s41746-021-00438-z","volume":"4","author":"Ravi Aggarwal","year":"2021","unstructured":"Aggarwal Ravi, Sounderajah Viknesh, Martin Guy, Ting Daniel SW, Karthikesalingam Alan, King Dominic, Ashrafian Hutan, Darzi Ara. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med. 2021;4(1):65.","journal-title":"NPJ Digit Med"},{"key":"1054_CR3","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1016\/j.neucom.2020.04.157","volume":"444","author":"H Yu","year":"2021","unstructured":"Yu H, Yang LT, Zhang Q, Armstrong D, Deen MJ. Convolutional neural networks for medical image analysis: state-of-the-art, comparisons, improvement and perspectives. Neurocomputing. 2021;444:92\u2013110. https:\/\/doi.org\/10.1016\/j.neucom.2020.04.157.","journal-title":"Neurocomputing"},{"issue":"2","key":"1054_CR4","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1093\/jamia\/ocw112","volume":"24","author":"Edward Choi","year":"2017","unstructured":"Choi Edward, Schuetz Andy, Stewart Walter F, Sun Jimeng. Using recurrent neural network models for early detection of heart failure onset. J Am Med Inform Assoc. 2017;24(2):361\u201370.","journal-title":"J Am Med Inform Assoc"},{"key":"1054_CR5","unstructured":"Joze HRV, Shaban A, Iuzzolino ML, Koishida K. Mmtm: multimodal transfer module for cnn fusion. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020."},{"issue":"3","key":"1054_CR6","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1136\/jamia.1995.0040252","volume":"4","author":"Casimir A Kulikowski","year":"1997","unstructured":"Kulikowski Casimir A. Medical imaging informatics: challenges of definition and integration. J Am Med Inform Assoc. 1997;4(3):252\u20133. https:\/\/doi.org\/10.1136\/jamia.1995.0040252.","journal-title":"J Am Med Inform Assoc"},{"issue":"6","key":"1054_CR7","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1007\/s00138-021-01249-8","volume":"32","author":"Said Yacine Boulahia","year":"2021","unstructured":"Boulahia Said Yacine, Amamra Abdenour, Madi Mohamed Ridha, Daikh Said. Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach Vis Appl. 2021;32(6):121.","journal-title":"Mach Vis Appl"},{"key":"1054_CR8","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1007\/s11042-020-08836-3","volume":"80","author":"YR Pandeya","year":"2021","unstructured":"Pandeya YR, Lee J. Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimed Tools Appl. 2021;80:2887\u2013905.","journal-title":"Multimed Tools Appl"},{"key":"1054_CR9","doi-asserted-by":"publisher","first-page":"12113","DOI":"10.1109\/TPAMI.2023.3275156","volume":"45","author":"P Xu","year":"2023","unstructured":"Xu P, Zhu X, Clifton DA. Multimodal learning with transformers: a survey. IEEE Trans Pattern Anal Mach Intell. 2023;45:12113\u201332.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1054_CR10","doi-asserted-by":"crossref","unstructured":"Xu Tao, Zhang Han, Huang Xiaolei, Zhang Shaoting, Metaxas Dimitris\u00a0N. Multimodal deep learning for cervical dysplasia diagnosis. In Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2016: 19th International Conference, Athens, Greece, 2016, Proceedings, Part II 19, Springer. 2016. p 115\u2013123. .","DOI":"10.1007\/978-3-319-46723-8_14"},{"issue":"1","key":"1054_CR11","doi-asserted-by":"publisher","first-page":"22147","DOI":"10.1038\/s41598-020-78888-w","volume":"10","author":"SC Huang","year":"2020","unstructured":"Huang SC, Pareek A, Zamanian R, Banerjee I, Lungren MP. Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Sci Rep. 2020;10(1):22147.","journal-title":"Sci Rep"},{"key":"1054_CR12","doi-asserted-by":"publisher","DOI":"10.3389\/fonc.2021.788740","volume":"11","author":"S Schulz","year":"2021","unstructured":"Schulz S, Woerl AC, Jungmann F, Christina Glasner, Stenzel P, Strobl S, Fernandez A, Wagner DC, Haferkamp A, Mildenberger P, et al. Multimodal deep learning for prognosis prediction in renal cancer. Front Oncol. 2021;11: 788740.","journal-title":"Front Oncol"},{"issue":"1","key":"1054_CR13","doi-asserted-by":"publisher","first-page":"13505","DOI":"10.1038\/s41598-021-92799-4","volume":"11","author":"LA Vale-Silva","year":"2021","unstructured":"Vale-Silva LA, Rohr K. Long-term cancer survival prediction using multimodal deep learning. Sci Rep. 2021;11(1):13505.","journal-title":"Sci Rep."},{"issue":"1","key":"1054_CR14","doi-asserted-by":"publisher","first-page":"18800","DOI":"10.1038\/s41598-021-98408-8","volume":"11","author":"S Joo","year":"2021","unstructured":"Joo S, Ko ES, Kwon S, Jeon E, Jung H, Kim JY, Chung MJ, Im YH. Multimodal deep learning models for the prediction of pathologic response to neoadjuvant chemotherapy in breast cancer. Sci Rep. 2021;11(1):18800.","journal-title":"Sci Rep"},{"issue":"1","key":"1054_CR15","doi-asserted-by":"publisher","first-page":"19794","DOI":"10.1038\/s41598-023-47146-0","volume":"13","author":"SY Choi","year":"2023","unstructured":"Choi SY, Choi A, Baek SE, Ahn JY, Roh YH, Kim JH. Effect of multimodal diagnostic approach using deep learning-based automated detection algorithm for active pulmonary tuberculosis. Sci Rep. 2023;13(1):19794.","journal-title":"Sci Rep"},{"issue":"4","key":"1054_CR16","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1038\/s42256-023-00633-5","volume":"5","author":"S Steyaert","year":"2023","unstructured":"Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat Mach Intell. 2023;5(4):351\u201362.","journal-title":"Nat Mach Intell"},{"key":"1054_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.imu.2023.101367","volume":"42","author":"S Kumar","year":"2023","unstructured":"Kumar S, Ivanova O, Melyokhin A, Tiwari P. Deep-learning-enabled multimodal data fusion for lung disease classification. Inform Med Unlocked. 2023;42: 101367.","journal-title":"Inform Med Unlocked"},{"key":"1054_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2023.106712","volume":"157","author":"Y Zhang","year":"2023","unstructured":"Zhang Y, Xie F, Chen J. Tformer: a throughout fusion transformer for multi-modal skin lesion diagnosis. Comput Biol Med. 2023;157: 106712. https:\/\/doi.org\/10.1016\/j.compbiomed.2023.106712.","journal-title":"Comput Biol Med"},{"key":"1054_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41551-023-01045-x","volume":"7","author":"HY Zhou","year":"2023","unstructured":"Zhou HY, Yu Y, Wang C, Zhang S, Gao Y, Pan J, Shao J, Lu G, Zhang K, Li W. A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nat Biomed Eng. 2023;7:1\u201313.","journal-title":"Nat Biomed Eng"},{"key":"1054_CR20","unstructured":"Hayat N, Geras KJ, Shamout FE. Medfuse: multi-modal fusion with clinical time-series data and chest x-ray images. In Machine Learning for Healthcare Conference, PMLR. 2022. p 479\u2013503."},{"issue":"1","key":"1054_CR21","doi-asserted-by":"publisher","first-page":"10666","DOI":"10.1038\/s41598-023-37835-1","volume":"13","author":"Firas Khader","year":"2023","unstructured":"Khader Firas, Kather Jakob Nikolas, M\u00fcller-Franzes Gustav, Wang Tianci, Han Tianyu, Arasteh Soroosh Tayebi, Hamesch Karim, Bressem Keno, Haarburger Christoph, Stegmaier Johannes, et al. Medical transformer for multimodal survival prediction in intensive care: integration of imaging and non-imaging data. Sci Rep. 2023;13(1):10666.","journal-title":"Sci Rep"},{"key":"1054_CR22","first-page":"8","volume":"6","author":"LR Soenksen","year":"2022","unstructured":"Soenksen LR, Ma Y, Zeng C, Boussioux LD, Villalobos Carballo K, Na L, Wiberg H, Li M, Fuentes I, Bertsimas D. Code for generating the haim multimodal dataset of mimic-iv clinical data and x-rays (version 1.0.1). PhysioNet. 2022;6:8.","journal-title":"PhysioNet"},{"key":"1054_CR23","doi-asserted-by":"publisher","unstructured":"Wu Y, Ma J, Huang X, Ling SH, Weidong\u00a0SS. Deepmmsa: A novel multimodal deep learning method for non-small cell lung cancer survival analysis. In 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2021. p 1468\u20131472. https:\/\/doi.org\/10.1109\/SMC52423.2021.9658891.","DOI":"10.1109\/SMC52423.2021.9658891"},{"issue":"10","key":"1054_CR24","doi-asserted-by":"publisher","first-page":"125","DOI":"10.3390\/bdcc8100125","volume":"8","author":"S Kumar","year":"2024","unstructured":"Kumar S, Sharma S. An improved deep learning framework for multimodal medical data analysis. Big Data Cognit Comput. 2024;8(10):125.","journal-title":"Big Data Cognit Comput"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-01054-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-024-01054-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-01054-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,14]],"date-time":"2025-01-14T02:56:42Z","timestamp":1736823402000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-024-01054-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,14]]},"references-count":24,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1054"],"URL":"https:\/\/doi.org\/10.1186\/s40537-024-01054-w","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,14]]},"assertion":[{"value":"22 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 December 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not Applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed consent statement"}},{"value":"The author declares no competing interests regarding the publication of this article.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"5"}}