{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T11:07:25Z","timestamp":1775473645369,"version":"3.50.1"},"reference-count":24,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T00:00:00Z","timestamp":1681689600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:p>Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.<\/jats:p>","DOI":"10.3389\/fdata.2023.1173038","type":"journal-article","created":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T05:38:35Z","timestamp":1681709915000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Biomedical heterogeneous data categorization and schema mapping toward data integration"],"prefix":"10.3389","volume":"6","author":[{"given":"Priya","family":"Deshpande","sequence":"first","affiliation":[]},{"given":"Alexander","family":"Rasin","sequence":"additional","affiliation":[]},{"given":"Roselyne","family":"Tchoua","sequence":"additional","affiliation":[]},{"given":"Jacob","family":"Furst","sequence":"additional","affiliation":[]},{"given":"Daniela","family":"Raicu","sequence":"additional","affiliation":[]},{"given":"Michiel","family":"Schinkel","sequence":"additional","affiliation":[]},{"given":"Hari","family":"Trivedi","sequence":"additional","affiliation":[]},{"given":"Sameer","family":"Antani","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,4,17]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","first-page":"1652","DOI":"10.1109\/FSKD.2011.6019867","article-title":"\u201cA study of density-grid based clustering algorithms on data streams,\u201d","volume-title":"2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), volume 3","author":"Amini","year":"2011"},{"key":"B2","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1016\/0020-7373(89)90027-8","article-title":"Extensions to the cart algorithm","volume":"31","author":"Crawford","year":"1989","journal-title":"Int. J. Man Mach. Stud"},{"key":"B3","first-page":"10","article-title":"\u201cAn integrated database and smart search tool for medical knowledge extraction from radiology teaching files,\u201d","author":"Deshpande","year":"2017","journal-title":"Medical Informatics and Healthcare"},{"key":"B4","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1109\/LSC.2018.8572185","article-title":"\u201cBig data integration case study for radiology data sources,\u201d","volume-title":"IEEE Life Sciences Conference (LSC 2018)","author":"Deshpande","year":"2018"},{"key":"B5","doi-asserted-by":"publisher","DOI":"10.5220\/0008166603720383","article-title":"\u201cMultimodal ranked search over integrated repository of radiology data source,\u201d","author":"Deshpande","year":"","journal-title":"2019 Knowledge Discovery and Information Retrieval (KDIR)"},{"key":"B6","doi-asserted-by":"publisher","first-page":"54","DOI":"10.3390\/data4020054","article-title":"Diis: a biomedical data access framework for aiding data driven research supporting fair principles","volume":"4","author":"Deshpande","year":"","journal-title":"Data"},{"key":"B7","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1007\/s10278-020-00331-3","article-title":"Ontology-based radiology teaching file summarization, coverage, and integration","volume":"33","author":"Deshpande","year":"","journal-title":"J Digit Imaging"},{"key":"B8","first-page":"265","article-title":"\u201cEnhancing recall using data cleaning for biomedical big data,\u201d","volume-title":"2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS)","author":"Deshpande","year":""},{"key":"B9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13244-019-0798-3","article-title":"Impact of artificial intelligence on radiology: a euroaim survey among members of the european society of radiology","volume":"10","author":"Codari","year":"2019","journal-title":"Insights Imaging"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.2196\/17687","article-title":"What you need to know before implementing a clinical research data warehouse: comparative review of integrated data repositories in health care institutions","author":"Gagalova","year":"2020","journal-title":"JMIR Formative Res"},{"key":"B11","unstructured":"GroupM. M. I.\n          Mypacs tfs2017"},{"key":"B12","unstructured":"InternationalH. L. S.\n          Health Level Seven International2018"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12874-020-01057-0","article-title":"The challenges in data integration-heterogeneity and complexity in clinical trials and patient registries of systemic lupus erythematosus","volume":"20","author":"Le Sueur","year":"2020","journal-title":"BMC Med. Res. Methodol"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.15265\/IY-2017-007","article-title":"Clinical data reuse or secondary use: current status and potential future progress","author":"Meystre","year":"2017","journal-title":"Yearbook Med. Inform"},{"key":"B15","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1007\/s00357-014-9161-z","article-title":"Ward's hierarchical agglomerative clustering method: which algorithms implement ward's criterion?","volume":"31","author":"Murtagh","year":"2014","journal-title":"J. Classification"},{"key":"B16","unstructured":"NeutorgasseE.\n          Eurorad2017"},{"key":"B17","doi-asserted-by":"publisher","first-page":"2975","DOI":"10.37247\/PAENVR.1.2020.20","article-title":"Information is selection\u2013a review of basics shows substantial potential for improvement of digital information representation","volume":"17","author":"Orthuber","year":"2020","journal-title":"Int. J. Environ. Res. Public Health"},{"key":"B18","doi-asserted-by":"publisher","first-page":"1860","DOI":"10.1109\/JBHI.2020.2970807","article-title":"Short keynote paper: mainstreaming personalized healthcare-transforming healthcare through new era of artificial intelligence","volume":"24","author":"Paranjape","year":"2020","journal-title":"IEEE J. Biomed. Health Inform"},{"key":"B19","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1016\/j.artmed.2010.02.003","article-title":"Classification integration and reclassification using constraint databases","volume":"49","author":"Revesz","year":"2010","journal-title":"Artif. Intell. Med"},{"key":"B20","first-page":"656","article-title":"\u201cUnsupervised clustering method with optimal estimation of the number of clusters: application to image segmentation,\u201d","volume-title":"Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Volume 1","author":"Rosenberger","year":"2000"},{"key":"B21","first-page":"439","article-title":"\u201cMerging heterogeneous clinical data to enable knowledge discovery,\u201d","volume-title":"PSB","author":"Seneviratne","year":"2019"},{"key":"B22","unstructured":"Snomedct Ontology2017"},{"key":"B23","first-page":"3","article-title":"Data integration: the current status and the way forward","volume":"41","author":"Stonebraker","year":"2018","journal-title":"IEEE Data Eng. Bull"},{"key":"B24","unstructured":"Hausdorff2020"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2023.1173038\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T05:38:44Z","timestamp":1681709924000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2023.1173038\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,17]]},"references-count":24,"alternative-id":["10.3389\/fdata.2023.1173038"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2023.1173038","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,17]]},"article-number":"1173038"}}