{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T11:01:03Z","timestamp":1764327663421,"version":"3.46.0"},"reference-count":67,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T00:00:00Z","timestamp":1764288000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T00:00:00Z","timestamp":1764288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004837","name":"Ministerio de Ciencia e Innovaci\u00f3n","doi-asserted-by":"publisher","award":["PID2020-117191RB-I00 \/ AEI\/10.13039\/501100011033","PID2020-117191RB-I00 \/ AEI\/10.13039\/501100011033","PID2020-117191RB-I00 \/ AEI\/10.13039\/501100011033"],"award-info":[{"award-number":["PID2020-117191RB-I00 \/ AEI\/10.13039\/501100011033","PID2020-117191RB-I00 \/ AEI\/10.13039\/501100011033","PID2020-117191RB-I00 \/ AEI\/10.13039\/501100011033"]}],"id":[{"id":"10.13039\/501100004837","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Comission Horizon Europe programme","award":["GA.101135513.","GA.101135513.","GA.101135513."],"award-info":[{"award-number":["GA.101135513.","GA.101135513.","GA.101135513."]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In this paper, entity contextual pre-filtering is proposed to refine dataset relevance assessment and streamline data discovery. Heterogeneous Graph Neural Networks are used to exploit the local context embedded within graph-based schemas. The proposed pre-filtering approach is versatile and does not rely on any specific similarity metric, making it applicable to a wide range of data discovery methods. The proposed technique increases data discovery precision by reducing false positives and identifying significant data relationships. This method has been empirically validated across a variety of real-world datasets to improve data discovery efficiency and accuracy.<\/jats:p>","DOI":"10.1186\/s40537-025-01312-5","type":"journal-article","created":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T10:58:31Z","timestamp":1764327511000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing data discovery with contextual pre-filtering"],"prefix":"10.1186","volume":"12","author":[{"given":"Javier","family":"Flores","sequence":"first","affiliation":[]},{"given":"Sergi","family":"Nadal","sequence":"additional","affiliation":[]},{"given":"Oscar","family":"Romero","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,11,28]]},"reference":[{"issue":"1","key":"1312_CR1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-022-00673-5","volume":"9","author":"S Nadal","year":"2022","unstructured":"Nadal S, Jovanovic P, Bilalli B, Romero O. Operationalizing and automating data governance. J Big Data. 2022;9(1):117.","journal-title":"J Big Data"},{"issue":"12","key":"1312_CR2","doi-asserted-by":"publisher","first-page":"1986","DOI":"10.14778\/3352063.3352116","volume":"12","author":"F Nargesian","year":"2019","unstructured":"Nargesian F, Zhu E, Miller RJ, Pu KQ, Arocena PC. Data lake management: challenges and opportunities. Proc VLDB Endow. 2019;12(12):1986\u20139.","journal-title":"Proc VLDB Endow"},{"issue":"12","key":"1312_CR3","doi-asserted-by":"publisher","first-page":"12571","DOI":"10.1109\/TKDE.2023.3270101","volume":"35","author":"R Hai","year":"2023","unstructured":"Hai R, Koutras C, Quix C, Jarke M. Data lakes: a survey of functions and systems. IEEE Trans Knowl Data Eng. 2023;35(12):12571\u201390.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1312_CR4","doi-asserted-by":"crossref","unstructured":"Khalid H, Zim\u00e1nyi E. Using rule and goal based agents to create metadata profiles. In: ADBIS (Short Papers and Workshops).\u00a0vol. 1064 of Commun Comput Inf Sci. Springer; 2019. p. 365\u2013377.","DOI":"10.1007\/978-3-030-30278-8_37"},{"issue":"4","key":"1312_CR5","doi-asserted-by":"publisher","DOI":"10.1145\/3626521","volume":"56","author":"NW Paton","year":"2024","unstructured":"Paton NW, Chen J, Wu Z. Dataset discovery and exploration: a survey. ACM Comput Surv. 2024;56(4):102:1\u2013102:37.","journal-title":"ACM Comput Surv"},{"key":"1312_CR6","doi-asserted-by":"crossref","unstructured":"Bogatu A, Fernandes AAA, Paton NW, Konstantinou N. Dataset discovery in data lakes. In: ICDE. IEEE; 2020. p. 709\u2013720.","DOI":"10.1109\/ICDE48307.2020.00067"},{"key":"1312_CR7","doi-asserted-by":"crossref","unstructured":"Nargesian F, Asudeh A, Jagadish HV. Responsible data integration: next-generation challenges. In: SIGMOD Conf. ACM; 2022. p. 2458\u20132464.","DOI":"10.1145\/3514221.3522567"},{"key":"1312_CR8","unstructured":"Flores J, Nadal S, Romero O. Towards scalable data discovery. In: EDBT. OpenProceedings.org; 2021. p. 433\u2013438."},{"issue":"3","key":"1312_CR9","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825","volume":"36","author":"C Xiao","year":"2011","unstructured":"Xiao C, Wang W, Lin X, Yu JX, Wang G. Efficient similarity joins for near-duplicate detection. ACM Trans Database Syst. 2011;36(3):15:1\u201315:41.","journal-title":"ACM Trans Database Syst"},{"key":"1312_CR10","unstructured":"Mann W, Augsten N. PEL: Position-enhanced length filter for set similarity joins. In: Grundlagen von Datenbanken. CEUR Works Proc.\u00a0vol. 1313 of\u00a0CEUR-WS.org; 2014. p. 89\u201394."},{"key":"1312_CR11","doi-asserted-by":"crossref","unstructured":"Zhu E, Deng D, Nargesian F, Miller RJ. JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes. In: SIGMOD Conference. ACM; 2019. p. 847\u2013864.","DOI":"10.1145\/3299869.3300065"},{"issue":"12","key":"1312_CR12","doi-asserted-by":"publisher","first-page":"1185","DOI":"10.14778\/2994509.2994534","volume":"9","author":"E Zhu","year":"2016","unstructured":"Zhu E, Nargesian F, Pu KQ, Miller RJ. LSH ensemble: internet-scale domain search. Proc VLDB Endow. 2016;9(12):1185\u201396.","journal-title":"Proc VLDB Endow"},{"issue":"1","key":"1312_CR13","doi-asserted-by":"publisher","DOI":"10.1145\/3588689","volume":"1","author":"A Khatiwada","year":"2023","unstructured":"Khatiwada A, Fan G, Shraga R, Chen Z, Gatterbauer W, Miller RJ, et al. SANTOS: Relationship-based Semantic Table Union Search. Proc ACM Manag Data. 2023;1(1):9:1\u20139:25.","journal-title":"Proc ACM Manag Data"},{"issue":"7","key":"1312_CR14","doi-asserted-by":"publisher","first-page":"813","DOI":"10.14778\/3192965.3192973","volume":"11","author":"F Nargesian","year":"2018","unstructured":"Nargesian F, Zhu E, Pu KQ. Table union search on open data. Proc VLDB Endow. 2018;11(7):813\u201325.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR15","unstructured":"Cong T, Nargesian F, Jagadish HV. Pylon: Semantic Table Union Search in Data Lakes. CoRR. 2023;abs\/2301.04901."},{"key":"1312_CR16","doi-asserted-by":"crossref","unstructured":"Yakout M, Ganjam K, Chakrabarti K, Chaudhuri S. InfoGather: entity augmentation and attribute discovery by holistic matching with web tables. In: SIGMOD Conference. ACM; 2012. p. 97\u2013108.","DOI":"10.1145\/2213836.2213848"},{"key":"1312_CR17","doi-asserted-by":"crossref","unstructured":"Sarma AD, Fang L, Gupta N, Halevy AY, Lee H, Wu F, et\u00a0al. Finding related tables. In: SIGMOD Conf. ACM; 2012. p. 817\u2013828.","DOI":"10.1145\/2213836.2213962"},{"issue":"12","key":"1312_CR18","doi-asserted-by":"publisher","first-page":"2791","DOI":"10.14778\/3476311.3476346","volume":"14","author":"S Castelo","year":"2021","unstructured":"Castelo S, Rampin R, Santos ASR, Bessa A, Chirigati F, Freire J. Auctus: a dataset search engine for data discovery and augmentation. Proc VLDB Endow. 2021;14(12):2791\u20134.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR19","unstructured":"Cong T, Gale J, Frantz J, Jagadish HV, Demiralp \u00c7. WarpGate: a semantic join discovery system for cloud data warehouses. In: CIDR. www.cidrdb.org; 2023. ."},{"issue":"10","key":"1312_CR20","doi-asserted-by":"publisher","first-page":"2458","DOI":"10.14778\/3603581.3603587","volume":"16","author":"Y Dong","year":"2023","unstructured":"Dong Y, Xiao C, Nozawa T, Enomoto M, Oyamada M. Deepjoin: joinable table discovery with pre-trained language models. Proc VLDB Endow. 2023;16(10):2458\u201370.","journal-title":"Proc VLDB Endow"},{"issue":"2","key":"1312_CR21","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1007\/s00778-023-00806-z","volume":"33","author":"N Karpov","year":"2024","unstructured":"Karpov N, Zhang H, Zhang Q. Minjoin++: a fast algorithm for string similarity joins under edit distance. VLDB J. 2024;33(2):281\u201399.","journal-title":"VLDB J"},{"issue":"12","key":"1312_CR22","doi-asserted-by":"publisher","first-page":"1902","DOI":"10.14778\/3352063.3352095","volume":"12","author":"Y Zhang","year":"2019","unstructured":"Zhang Y, Ives ZG. Juneau: Data Lake Management for Jupyter. Proc VLDB Endow. 2019;12(12):1902\u20135.","journal-title":"Proc VLDB Endow"},{"issue":"8","key":"1312_CR23","doi-asserted-by":"publisher","first-page":"625","DOI":"10.14778\/2732296.2732299","volume":"7","author":"Y Jiang","year":"2014","unstructured":"Jiang Y, Li G, Feng J, Li W. String similarity joins: an experimental evaluation. Proc VLDB Endow. 2014;7(8):625\u201336.","journal-title":"Proc VLDB Endow"},{"issue":"9","key":"1312_CR24","doi-asserted-by":"publisher","first-page":"636","DOI":"10.14778\/2947618.2947620","volume":"9","author":"W Mann","year":"2016","unstructured":"Mann W, Augsten N, Bouros P. An empirical evaluation of set similarity join techniques. Proc VLDB Endow. 2016;9(9):636\u201347.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR25","doi-asserted-by":"publisher","first-page":"69614","DOI":"10.1109\/ACCESS.2019.2914071","volume":"7","author":"S Hakak","year":"2019","unstructured":"Hakak S, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M. Exact string matching algorithms: survey, issues, and future research directions. IEEE Access. 2019;7:69614\u201337.","journal-title":"IEEE Access"},{"key":"1312_CR26","doi-asserted-by":"crossref","unstructured":"Jia L, Zhang L, Yu G, You J, Ding J, Li M. A survey on set similarity search and join. Int J Perfo Eng. 2018;14.","DOI":"10.23940\/ijpe.18.02.p6.245258"},{"key":"1312_CR27","doi-asserted-by":"crossref","unstructured":"Koutras C, Siachamis G, Ionescu A, Psarakis K, Brons J, Fragkoulis M, et\u00a0al. Valentine: evaluating matching techniques for dataset discovery. In: ICDE. IEEE; 2021. p. 468\u2013479.","DOI":"10.1109\/ICDE51399.2021.00047"},{"issue":"1","key":"1312_CR28","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1145\/2627692.2627706","volume":"43","author":"S Wandelt","year":"2014","unstructured":"Wandelt S, Deng D, Gerdjikov S, Mishra S, Mitankin P, Patil M, et al. State-of-the-art in string similarity search and join. SIGMOD Rec. 2014;43(1):64\u201376.","journal-title":"SIGMOD Rec"},{"issue":"5","key":"1312_CR29","doi-asserted-by":"publisher","DOI":"10.1145\/2816813","volume":"62","author":"F Chierichetti","year":"2015","unstructured":"Chierichetti F, Kumar R. LSH-preserving functions and their applications. J ACM. 2015;62(5):33:1\u201333:25.","journal-title":"J ACM"},{"issue":"2","key":"1312_CR30","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1137\/17M1127752","volume":"48","author":"F Chierichetti","year":"2019","unstructured":"Chierichetti F, Kumar R, Panconesi A, Terolli E. On the distortion of locality sensitive hashing. SIAM J Comput. 2019;48(2):350\u201372.","journal-title":"SIAM J Comput"},{"key":"1312_CR31","unstructured":"Usta A, Salihoglu S. To join or not to join: an analysis on the usefulness of joining tables in open government data portals. In: VLDB Workshops. vol. 3462 of CEUR Works Proc. CEUR-WS.org; 2023."},{"key":"1312_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2022.100761","volume":"76","author":"J Liu","year":"2023","unstructured":"Liu J, Chabot Y, Troncy R, Huynh V, Labb\u00e9 T, Monnin P. From tabular data to knowledge graphs: a survey of semantic table interpretation tasks and methods. J Web Semant. 2023;76:100761.","journal-title":"J Web Semant"},{"key":"1312_CR33","unstructured":"Pan JZ, Razniewski S, Kalo J, Singhania S, Chen J, Dietze S, et\u00a0al. Large language models and knowledge graphs: opportunities and challenges. TGDK. 2023;1(1):2:1\u20132:38."},{"key":"1312_CR34","unstructured":"Hoseini S, Theissen-Lipp J, Quix C. Semantic data management in data lakes. CoRR. 2023;abs\/2310.15373."},{"issue":"9","key":"1312_CR35","doi-asserted-by":"publisher","DOI":"10.3390\/computation11090175","volume":"11","author":"NO Dorodnykh","year":"2023","unstructured":"Dorodnykh NO, Yurin AY. Knowledge graph engineering based on semantic annotation of tables. Computation. 2023;11(9):175.","journal-title":"Computation"},{"issue":"4","key":"1312_CR36","doi-asserted-by":"publisher","first-page":"849","DOI":"10.14778\/3636218.3636237","volume":"17","author":"T Cong","year":"2023","unstructured":"Cong T, Hulsebos M, Sun Z, Groth P, Jagadish HV. Observatory: characterizing embeddings of relational tables. Proc VLDB Endow. 2023;17(4):849\u201362.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR37","doi-asserted-by":"crossref","unstructured":"Taha I, Lissandrini M, Simitsis A, Ioannidis YE. A study on efficient indexing for table search in data lakes. In: ICSC. IEEE; 2024. p. 245\u2013252.","DOI":"10.1109\/ICSC59802.2024.00046"},{"key":"1312_CR38","doi-asserted-by":"crossref","unstructured":"Li Y, Li J, Suhara Y, Wang J, Hirota W, Tan W. Deep entity matching: challenges and opportunities. ACM J Data Inf Qual. 2021;13(1):1:1\u20131:17.","DOI":"10.1145\/3431816"},{"key":"1312_CR39","doi-asserted-by":"crossref","unstructured":"Christophides V, Efthymiou V, Palpanas T, Papadakis G, Stefanidis K. An overview of end-to-end entity resolution for big data. ACM Comput Surv. 2021;53(6):127:1\u2013127:42.","DOI":"10.1145\/3418896"},{"issue":"1","key":"1312_CR40","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","volume":"32","author":"Z Wu","year":"2021","unstructured":"Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. 2021;32(1):4\u201324.","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1312_CR41","doi-asserted-by":"crossref","unstructured":"Ma Y, Cheng G, Liang X, Wang Y, Zhou Y. Heterogeneous graph neural networks based on meta-path. In: ACAI. ACM; 2020. p. 14:1\u201314:4.","DOI":"10.1145\/3446132.3446146"},{"key":"1312_CR42","unstructured":"Lin B, Wang X, Dong Y, Huo C, Ren W, Xu C. Metapaths guided neighbors aggregated network for heterogeneous graph reasoning. CoRR. 2021;abs\/2103.06474."},{"key":"1312_CR43","unstructured":"Wang J, Shen HT, Song J, Ji J. Hashing for similarity search: A Survey. CoRR. 2014;abs\/1408.2927."},{"issue":"4","key":"1312_CR44","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1007\/s007780100057","volume":"10","author":"E Rahm","year":"2001","unstructured":"Rahm E, Bernstein PA. A survey of approaches to automatic schema matching. VLDB J. 2001;10(4):334\u201350.","journal-title":"VLDB J"},{"issue":"9","key":"1312_CR45","doi-asserted-by":"publisher","first-page":"528","DOI":"10.14778\/2002938.2002939","volume":"4","author":"P Venetis","year":"2011","unstructured":"Venetis P, Halevy AY, Madhavan J, Pasca M, Shen W, Wu F, et al. Recovering semantics of tables on the Web. Proc VLDB Endow. 2011;4(9):528\u201338.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR46","doi-asserted-by":"crossref","unstructured":"Dong Y, Takeoka K, Xiao C, Oyamada M. Efficient joinable table discovery in data lakes: a high-dimensional similarity-based approach. In: ICDE. IEEE; 2021. p. 456\u2013467.","DOI":"10.1109\/ICDE51399.2021.00046"},{"issue":"11","key":"1312_CR47","doi-asserted-by":"publisher","first-page":"1835","DOI":"10.14778\/3407790.3407793","volume":"13","author":"D Zhang","year":"2020","unstructured":"Zhang D, Suhara Y, Li J, Hulsebos M, Demiralp \u00c7, Tan W. Sato: contextual semantic type detection in tables. Proc VLDB Endow. 2020;13(11):1835\u201348.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR48","doi-asserted-by":"crossref","unstructured":"Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: EACL (2). Association for Computational Linguistics; 2017. p. 427\u2013431.","DOI":"10.18653\/v1\/E17-2068"},{"key":"1312_CR49","doi-asserted-by":"crossref","unstructured":"Cappuzzo R, Papotti P, Thirumuruganathan S. Creating embeddings of heterogeneous relational datasets for data integration tasks. In: SIGMOD Conference. ACM; 2020. p. 1335\u20131349.","DOI":"10.1145\/3318464.3389742"},{"key":"1312_CR50","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR. 2019;abs\/1910.01108."},{"issue":"8","key":"1312_CR51","doi-asserted-by":"publisher","first-page":"1684","DOI":"10.14778\/3529337.3529353","volume":"15","author":"M Esmailoghli","year":"2022","unstructured":"Esmailoghli M, Quian\u00e9-Ruiz J, Abedjan Z. MATE: multi-attribute table extraction. Proc VLDB Endow. 2022;15(8):1684\u201396.","journal-title":"Proc VLDB Endow"},{"issue":"7","key":"1312_CR52","doi-asserted-by":"publisher","first-page":"1726","DOI":"10.14778\/3587136.3587146","volume":"16","author":"G Fan","year":"2023","unstructured":"Fan G, Wang J, Li Y, Zhang D, Miller RJ. Semantics-aware dataset discovery from data lakes with contextualized column-based representation learning. Proc VLDB Endow. 2023;16(7):1726\u201339.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR53","doi-asserted-by":"crossref","unstructured":"Liu J, Chai C, Luo Y, Lou Y, Feng J, Tang N. Feature augmentation with reinforcement learning. In: ICDE. IEEE; 2022. p. 3360\u20133372.","DOI":"10.1109\/ICDE53745.2022.00317"},{"issue":"9","key":"1312_CR54","doi-asserted-by":"publisher","first-page":"1373","DOI":"10.14778\/3397230.3397235","volume":"13","author":"N Chepurko","year":"2020","unstructured":"Chepurko N, Marcus R, Zgraggen E, Fernandez RC, Kraska T, Karger DR. Arda: automatic relational data augmentation for machine learning. Proc VLDB Endow. 2020;13(9):1373\u201387.","journal-title":"Proc VLDB Endow"},{"key":"1312_CR55","doi-asserted-by":"crossref","unstructured":"Ionescu A, Vasilev K, Buse F, Hai R, Katsifodimos A. AutoFeat: Transitive feature discovery over join paths. In: ICDE. IEEE; 2024. p. 1861\u20131873.","DOI":"10.1109\/ICDE60146.2024.00150"},{"key":"1312_CR56","unstructured":"Liang J, Lei C, Qin X, Zhang J, Katsifodimos A, Faloutsos C, et\u00a0al. FeatNavigator: Automatic feature augmentation on tabular data. CoRR. 2024;abs\/2406.09534."},{"key":"1312_CR57","unstructured":"Usta A, Liu C, Salihoglu S. Analysis of open government datasets from a data design and integration perspective. In: EDBT. OpenProceedings.org; 2024. p. 345\u2013358."},{"issue":"3","key":"1312_CR58","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1007\/s11704-015-5900-5","volume":"10","author":"M Yu","year":"2016","unstructured":"Yu M, Li G, Deng D, Feng J. String similarity search and join: a survey. Front Comput Sci. 2016;10(3):399\u2013417.","journal-title":"Front Comput Sci"},{"key":"1312_CR59","unstructured":"Maynou M, Nadal S, Panadero R, Flores J, Romero O, Queralt A. FREYJA: Efficient join discovery in data lakes. CoRR. 2024;abs\/2412.06637."},{"key":"1312_CR60","doi-asserted-by":"crossref","unstructured":"Hulsebos M, Hu KZ, Bakker MA, Zgraggen E, Satyanarayan A, Kraska T, et\u00a0al. Sherlock: a deep learning approach to semantic data type detection. In: KDD. ACM; 2019. p. 1500\u20131508.","DOI":"10.1145\/3292500.3330993"},{"key":"1312_CR61","doi-asserted-by":"crossref","unstructured":"Chen J, Jim\u00e9nez-Ruiz E, Horrocks I, Sutton C. ColNet: Embedding the semantics of web tables for column type prediction. In: AAAI. AAAI Press; 2019. p. 29\u201336.","DOI":"10.1609\/aaai.v33i01.330129"},{"key":"1312_CR62","doi-asserted-by":"crossref","unstructured":"Bhagavatula CS, Noraset T, Downey D. TabEL: Entity linking in web tables. In: ISWC (1).\u00a0vol. 9366 of Lecture Notes in Comput Sci. Springer; 2015. p. 425\u2013441.","DOI":"10.1007\/978-3-319-25007-6_25"},{"key":"1312_CR63","unstructured":"Hulsebos M, Groth P, Demiralp \u00c7. AdaTyper: Adaptive semantic column type detection. CoRR. 2023;abs\/2311.13806."},{"key":"1312_CR64","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1016\/j.neunet.2023.11.030","volume":"170","author":"X Fu","year":"2024","unstructured":"Fu X, King I. MECCH: metapath context convolution-based heterogeneous graph neural networks. Neural Netw. 2024;170:266\u201375.","journal-title":"Neural Netw"},{"key":"1312_CR65","unstructured":"Kotnis B, Nastase V. Analysis of the impact of negative sampling on link prediction in knowledge graphs. CoRR. 2017;abs\/1708.06816."},{"key":"1312_CR66","doi-asserted-by":"crossref","unstructured":"Yang Z, Ding M, Zhou C, Yang H, Zhou J, Tang J. Understanding negative sampling in graph representation learning. In: KDD. ACM; 2020. p. 1666\u20131676.","DOI":"10.1145\/3394486.3403218"},{"issue":"3","key":"1312_CR67","first-page":"793","volume":"15","author":"J Flores","year":"2024","unstructured":"Flores J, Rabbani K, Nadal S, G\u00f3mez C, Romero O, Jamin E, et al. Incremental schema integration for data wrangling via knowledge graphs. Semantic Web. 2024;15(3):793\u2013830.","journal-title":"Semantic Web"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01312-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01312-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01312-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T10:58:37Z","timestamp":1764327517000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01312-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,28]]},"references-count":67,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1312"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01312-5","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,28]]},"assertion":[{"value":"31 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"262"}}