{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:42:37Z","timestamp":1777657357007,"version":"3.51.4"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:p>Understanding the semantics of tabular data is of great importance in various downstream applications, such as schema matching, data cleaning, and data integration. Column semantic type annotation is a critical task in the semantic understanding of tabular data. Despite the fact that various approaches have been proposed, they are challenged by the difficulties of handling wide tables and incorporating complex inter-table context information. Failure to handle wide tables limits the usage of column type annotation approaches, while failure to incorporate inter-table context harms the annotation quality. Existing methods either completely ignore these problems or propose ad-hoc solutions. In this paper, we propose Related tables Enhanced Column semantic type Annotation framework (RECA), which incorporates inter-table context information by finding and aligning schema-similar and topic-relevant tables based on a novel named entity schema. The design of RECA can naturally handle wide tables and incorporate useful inter-table context information to enhance the annotation quality. We conduct extensive experiments on two web table datasets to comprehensively evaluate the performance of RECA. Our results show that RECA achieves support-weighted F1 scores of 0.853 and 0.937 with macro average F1 scores of 0.674 and 0.783 on the two datasets respectively, which outperform the state-of-the-art methods.<\/jats:p>","DOI":"10.14778\/3583140.3583149","type":"journal-article","created":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T16:45:59Z","timestamp":1682009159000},"page":"1319-1331","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["RECA: Related Tables Enhanced Column Semantic Type Annotation Framework"],"prefix":"10.14778","volume":"16","author":[{"given":"Yushi","family":"Sun","sequence":"first","affiliation":[{"name":"HKUST, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hao","family":"Xin","sequence":"additional","affiliation":[{"name":"HKUST, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"HKUST, Hong Kong, China and HKUST(GZ), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,4,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2019. Sherlock. Retrieved Feb 1 2023 from https:\/\/github.com\/mitmedialab\/sherlock-project  2019. Sherlock. Retrieved Feb 1 2023 from https:\/\/github.com\/mitmedialab\/sherlock-project"},{"key":"e_1_2_1_2_1","unstructured":"2020. TaBERT. Retrieved Feb 1 2023 from https:\/\/github.com\/facebookresearch\/TaBERT  2020. TaBERT. Retrieved Feb 1 2023 from https:\/\/github.com\/facebookresearch\/TaBERT"},{"key":"e_1_2_1_3_1","unstructured":"2021. TABBIE. Retrieved Feb 1 2023 from https:\/\/github.com\/SFIG611\/tabbie  2021. TABBIE. Retrieved Feb 1 2023 from https:\/\/github.com\/SFIG611\/tabbie"},{"key":"e_1_2_1_4_1","unstructured":"2022. DODUO. Retrieved Feb 1 2023 from https:\/\/github.com\/megagonlabs\/doduo  2022. DODUO. Retrieved Feb 1 2023 from https:\/\/github.com\/megagonlabs\/doduo"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2746539"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-62466-8_21"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 3103","author":"Cutrona Vincenzo","year":"2022","unstructured":"Vincenzo Cutrona , Jiaoyan Chen , Vasilis Efthymiou , Oktie Hassanzadeh , Ernesto Jim\u00e9nez-Ruiz , Juan Sequeda , Kavitha Srinivas , Nora Abdelmageed , Madelon Hulsebos , Daniela Oliveira , 2022 . Results of SemTab 2021 . Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 3103 (2022), 1--12. Vincenzo Cutrona, Jiaoyan Chen, Vasilis Efthymiou, Oktie Hassanzadeh, Ernesto Jim\u00e9nez-Ruiz, Juan Sequeda, Kavitha Srinivas, Nora Abdelmageed, Madelon Hulsebos, Daniela Oliveira, et al. 2022. Results of SemTab 2021. Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching 3103 (2022), 1--12."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3430915.3430921"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313629"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1133"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.aclmain.398"},{"key":"e_1_2_1_13_1","volume-title":"spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7, 1","author":"Honnibal Matthew","year":"2017","unstructured":"Matthew Honnibal and Ines Montani . 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7, 1 ( 2017 ), 411--420. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7, 1 (2017), 411--420."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300892"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330993"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.270"},{"key":"e_1_2_1_17_1","volume-title":"The distribution of the flora in the alpine zone. 1. New phytologist 11, 2","author":"Jaccard Paul","year":"1912","unstructured":"Paul Jaccard . 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 ( 1912 ), 37--50. Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 (1912), 37--50."},{"key":"e_1_2_1_18_1","volume-title":"CEUR Workshop Proceedings","volume":"2775","author":"Jim\u00e9nez-Ruiz Ernesto","year":"2020","unstructured":"Ernesto Jim\u00e9nez-Ruiz , Oktie Hassanzadeh , Vasilis Efthymiou , Jiaoyan Chen , Kavitha Srinivas , and Vincenzo Cutrona . 2020 . Results of semtab 2020 . In CEUR Workshop Proceedings , Vol. 2775 . 1--8. Ernesto Jim\u00e9nez-Ruiz, Oktie Hassanzadeh, Vasilis Efthymiou, Jiaoyan Chen, Kavitha Srinivas, and Vincenzo Cutrona. 2020. Results of semtab 2020. In CEUR Workshop Proceedings, Vol. 2775. 1--8."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1979444"},{"key":"e_1_2_1_20_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint (2014), 5. arXiv:1412.6980 Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint (2014), 5. arXiv:1412.6980"},{"key":"e_1_2_1_21_1","volume-title":"Generating Table Vector Representations. arXiv preprint","author":"Koleva Aneta","year":"2021","unstructured":"Aneta Koleva , Martin Ringsquandl , Mitchell Joblin , and Volker Tresp . 2021. Generating Table Vector Representations. arXiv preprint ( 2021 ), 5. arXiv:2110.15132 Aneta Koleva, Martin Ringsquandl, Mitchell Joblin, and Volker Tresp. 2021. Generating Table Vector Representations. arXiv preprint (2021), 5. arXiv:2110.15132"},{"key":"e_1_2_1_22_1","volume-title":"Testing statistical hypotheses","author":"Lehmann Erich Leo","unstructured":"Erich Leo Lehmann , Joseph P Romano , and George Casella . 2005. Testing statistical hypotheses . Vol. 3 . Springer , New York, NY, USA . Erich Leo Lehmann, Joseph P Romano, and George Casella. 2005. Testing statistical hypotheses. Vol. 3. Springer, New York, NY, USA."},{"key":"e_1_2_1_23_1","unstructured":"Frederic P Miller Agnes F Vandome and John McBrewster. 2009. Levenshtein distance: Information theory computer science string (computer science) string metric damerau? Levenshtein distance spell checker hamming distance.  Frederic P Miller Agnes F Vandome and John McBrewster. 2009. Levenshtein distance: Information theory computer science string (computer science) string metric damerau? Levenshtein distance spell checker hamming distance."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3240491"},{"key":"e_1_2_1_25_1","volume-title":"Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12","author":"Pedregosa Fabian","year":"2011","unstructured":"Fabian Pedregosa , Ga\u00ebl Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , 2011 . Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830. Fabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46523-4_27"},{"key":"e_1_2_1_27_1","volume-title":"A survey of approaches to automatic schema matching. the VLDB Journal 10, 4","author":"Rahm Erhard","year":"2001","unstructured":"Erhard Rahm and Philip A Bernstein . 2001. A survey of approaches to automatic schema matching. the VLDB Journal 10, 4 ( 2001 ), 334--350. Erhard Rahm and Philip A Bernstein. 2001. A survey of approaches to automatic schema matching. the VLDB Journal 10, 4 (2001), 334--350."},{"key":"e_1_2_1_28_1","volume-title":"VLDB","author":"Raman Vijayshankar","unstructured":"Vijayshankar Raman and Joseph M Hellerstein . 2001. Potter's wheel: An interactive data cleaning system . In VLDB , Vol. 1 . Morgan Kaufmann , San Francisco, CA, USA , 381--390. Vijayshankar Raman and Joseph M Hellerstein. 2001. Potter's wheel: An interactive data cleaning system. In VLDB, Vol. 1. Morgan Kaufmann, San Francisco, CA, USA, 381--390."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18818-8_25"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517906"},{"key":"e_1_2_1_31_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002939"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3450090"},{"key":"e_1_2_1_34_1","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz etal 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint (2019) 1. arXiv:1910.03771  Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint (2019) 1. arXiv:1910.03771"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.745"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407793"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482197"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3583140.3583149","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T16:53:18Z","timestamp":1682009598000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3583140.3583149"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2]]},"references-count":37,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["10.14778\/3583140.3583149"],"URL":"https:\/\/doi.org\/10.14778\/3583140.3583149","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,2]]},"assertion":[{"value":"2023-04-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}