{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T22:57:30Z","timestamp":1776985050288,"version":"3.51.4"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:p>Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository, or approximate solutions lacking precision. In this paper, we propose DeepJoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic (with a similarity condition on word embeddings) joins for textual attributes with fairly small cardinalities. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is sublinear in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, DeepJoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. DeepJoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, DeepJoin is up to two orders of magnitude faster than existing solutions.<\/jats:p>","DOI":"10.14778\/3603581.3603587","type":"journal-article","created":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T19:06:48Z","timestamp":1691521608000},"page":"2458-2470","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":42,"title":["DeepJoin: Joinable Table Discovery with Pre-Trained Language Models"],"prefix":"10.14778","volume":"16","author":[{"given":"Yuyang","family":"Dong","sequence":"first","affiliation":[{"name":"NEC Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chuan","family":"Xiao","sequence":"additional","affiliation":[{"name":"Osaka University and Nagoya University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Takuma","family":"Nozawa","sequence":"additional","affiliation":[{"name":"NEC Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masafumi","family":"Enomoto","sequence":"additional","affiliation":[{"name":"NEC Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masafumi","family":"Oyamada","sequence":"additional","affiliation":[{"name":"NEC Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,8,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"https:\/\/huggingface.co\/docs\/transformers\/index","author":"Hugging","year":"2022","unstructured":"Hugging face transformers. https:\/\/huggingface.co\/docs\/transformers\/index , 2022 . Hugging face transformers. https:\/\/huggingface.co\/docs\/transformers\/index, 2022."},{"key":"e_1_2_1_2_1","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1007\/978-3-319-25007-6_25","volume-title":"ISWC","author":"Bhagavatula C. S.","year":"2015","unstructured":"C. S. Bhagavatula , T. Noraset , and D. Downey . Tabel: Entity linking in web tables . In ISWC , volume 9366 of Lecture Notes in Computer Science , pages 425 -- 441 . Springer , 2015 . C. S. Bhagavatula, T. Noraset, and D. Downey. Tabel: Entity linking in web tables. In ISWC, volume 9366 of Lecture Notes in Computer Science, pages 425--441. Springer, 2015."},{"key":"e_1_2_1_3_1","unstructured":"C. S. Bhagavatula T. Noraset and D. Downey. Wikitables. http:\/\/websail-fe.cs.northwestern.edu\/TabEL\/ 2015.  C. S. Bhagavatula T. Noraset and D. Downey. Wikitables. http:\/\/websail-fe.cs.northwestern.edu\/TabEL\/ 2015."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457403"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00067"},{"key":"e_1_2_1_6_1","first-page":"21","volume-title":"SEQUENCES","author":"Broder A. Z.","year":"1997","unstructured":"A. Z. Broder . On the resemblance and containment of documents . In SEQUENCES , pages 21 -- 29 . IEEE, 1997 . A. Z. Broder. On the resemblance and containment of documents. In SEQUENCES, pages 21--29. IEEE, 1997."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115411"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401044"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397235"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2959100.2959190"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115413"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3430915.3430921"},{"key":"e_1_2_1_14_1","first-page":"4171","volume-title":"InNAACL-HLT","author":"Devlin J.","year":"2019","unstructured":"J. Devlin , M. Chang , K. Lee , and K. Toutanova . BERT: pre-training of deep bidirectional transformers for language understanding . InNAACL-HLT , pages 4171 -- 4186 . ACL, 2019 . J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. InNAACL-HLT, pages 4171--4186. ACL, 2019."},{"key":"e_1_2_1_15_1","volume-title":"Morgan Kaufmann","author":"Doan A.","year":"2012","unstructured":"A. Doan , A. Y. Halevy , and Z. G. Ives . Principles of Data Integration . Morgan Kaufmann , 2012 . A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012."},{"key":"e_1_2_1_16_1","first-page":"3267","volume-title":"SIGIR","author":"Dong Y.","year":"2022","unstructured":"Y. Dong and M. Oyamada . Table enrichment system for machine learning . In SIGIR , pages 3267 -- 3271 . ACM, 2022 . Y. Dong and M. Oyamada. Table enrichment system for machine learning. In SIGIR, pages 3267--3271. ACM, 2022."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00046"},{"key":"e_1_2_1_18_1","volume-title":"Faiss: Facebook ai similarity search. https:\/\/github.com\/facebookresearch\/faiss\/wiki\/Faiss-indexes","author":"AI","year":"2022","unstructured":"facebook AI research. Faiss: Facebook ai similarity search. https:\/\/github.com\/facebookresearch\/faiss\/wiki\/Faiss-indexes , 2022 . facebook AI research. Faiss: Facebook ai similarity search. https:\/\/github.com\/facebookresearch\/faiss\/wiki\/Faiss-indexes, 2022."},{"key":"e_1_2_1_19_1","volume-title":"fastText: Library for efficient text classification and representation learning. https:\/\/fasttext.cc\/","author":"Research Lab Facebook AI","year":"2015","unstructured":"Facebook AI Research Lab . fastText: Library for efficient text classification and representation learning. https:\/\/fasttext.cc\/ , 2015 . Facebook AI Research Lab. fastText: Library for efficient text classification and representation learning. https:\/\/fasttext.cc\/, 2015."},{"key":"e_1_2_1_20_1","volume-title":"https:\/\/github.com\/joke2k\/faker","author":"Faraglia D.","year":"2023","unstructured":"D. Faraglia . Faker. https:\/\/github.com\/joke2k\/faker , 2023 . D. Faraglia. Faker. https:\/\/github.com\/joke2k\/faker, 2023."},{"key":"e_1_2_1_21_1","first-page":"690","volume-title":"EDBT","author":"Flores J.","year":"2021","unstructured":"J. Flores , S. Nadal , and O. Romero . Effective and scalable data discovery with nextiajd . In EDBT , pages 690 -- 693 . OpenProceedings.org , 2021 . J. Flores, S. Nadal, and O. Romero. Effective and scalable data discovery with nextiajd. In EDBT, pages 690--693. OpenProceedings.org, 2021."},{"key":"e_1_2_1_22_1","first-page":"230","volume-title":"ICDM","author":"Ghasemi-Gol M.","year":"2019","unstructured":"M. Ghasemi-Gol , J. Pujara , and P. A. Szekely . Tabular cell classification using pre-trained cell embeddings . In ICDM , pages 230 -- 239 . IEEE, 2019 . M. Ghasemi-Gol, J. Pujara, and P. A. Szekely. Tabular cell classification using pre-trained cell embeddings. In ICDM, pages 230--239. IEEE, 2019."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824036"},{"issue":"11","key":"e_1_2_1_24_1","first-page":"2368","article-title":"Auto-transform: Learning-to-transform by patterns","volume":"13","author":"He Y.","year":"2020","unstructured":"Y. He , Z. Jin , and S. Chaudhuri . Auto-transform: Learning-to-transform by patterns . PVLDB , 13 ( 11 ): 2368 -- 2381 , 2020 . Y. He, Z. Jin, and S. Chaudhuri. Auto-transform: Learning-to-transform by patterns. PVLDB, 13(11):2368--2381, 2020.","journal-title":"PVLDB"},{"key":"e_1_2_1_25_1","volume-title":"Efficient natural language response suggestion for smart reply. CoRR, abs\/1705.00652","author":"Henderson M. L.","year":"2017","unstructured":"M. L. Henderson , R. Al-Rfou , B. Strope , Y. Sung , L. Luk\u00e1cs , R. Guo , S. Kumar , B. Miklos , and R. Kurzweil . Efficient natural language response suggestion for smart reply. CoRR, abs\/1705.00652 , 2017 . M. L. Henderson, R. Al-Rfou, B. Strope, Y. Sung, L. Luk\u00e1cs, R. Guo, S. Kumar, B. Miklos, and R. Kurzweil. Efficient natural language response suggestion for smart reply. CoRR, abs\/1705.00652, 2017."},{"key":"e_1_2_1_26_1","first-page":"4320","volume-title":"ACL","author":"Herzig J.","year":"2020","unstructured":"J. Herzig , P. K. Nowak , T. M\u00fcller , F. Piccinno , and J. M. Eisenschlos . Tapas: Weakly supervised table parsing via pre-training . In ACL , pages 4320 -- 4333 . ACL, 2020 . J. Herzig, P. K. Nowak, T. M\u00fcller, F. Piccinno, and J. M. Eisenschlos. Tapas: Weakly supervised table parsing via pre-training. In ACL, pages 4320--4333. ACL, 2020."},{"key":"e_1_2_1_27_1","first-page":"2553","volume-title":"KDD","author":"Huang J.","year":"2020","unstructured":"J. Huang , A. Sharma , S. Sun , L. Xia , D. Zhang , P. Pronin , J. Padmanabhan , G. Ottaviano , and L. Yang . Embedding-based retrieval in facebook search . In KDD , pages 2553 -- 2561 . ACM, 2020 . J. Huang, A. Sharma, S. Sun, L. Xia, D. Zhang, P. Pronin, J. Padmanabhan, G. Ottaviano, and L. Yang. Embedding-based retrieval in facebook search. In KDD, pages 2553--2561. ACM, 2020."},{"key":"e_1_2_1_28_1","first-page":"1500","volume-title":"KDD","author":"Hulsebos M.","year":"2019","unstructured":"M. Hulsebos , K. Z. Hu , M. A. Bakker , E. Zgraggen , A. Satyanarayan , T. Kraska , \u00c7. Demiralp, and C. A. Hidalgo . Sherlock: A deep learning approach to semantic data type detection . In KDD , pages 1500 -- 1508 . ACM, 2019 . M. Hulsebos, K. Z. Hu, M. A. Bakker, E. Zgraggen, A. Satyanarayan, T. Kraska, \u00c7. Demiralp, and C. A. Hidalgo. Sherlock: A deep learning approach to semantic data type detection. In KDD, pages 1500--1508. ACM, 2019."},{"key":"e_1_2_1_29_1","first-page":"3446","volume-title":"NAACL-HLT","author":"Iida H.","year":"2021","unstructured":"H. Iida , D. Thai , V. Manjunatha , and M. Iyyer . TABBIE: pretrained representations of tabular data . In NAACL-HLT , pages 3446 -- 3456 . ACL, 2021 . H. Iida, D. Thai, V. Manjunatha, and M. Iyyer. TABBIE: pretrained representations of tabular data. In NAACL-HLT, pages 3446--3456. ACL, 2021."},{"issue":"7","key":"e_1_2_1_30_1","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1108\/eb051463","article-title":"Estimating the recall performance of web search engines","volume":"49","author":"Peter C.","year":"1997","unstructured":"C. S. J. and W. Peter . Estimating the recall performance of web search engines . Aslib Proceedings , 49 ( 7 ): 184 -- 189 , Jan 1997 . C. S. J. and W. Peter. Estimating the recall performance of web search engines. Aslib Proceedings, 49(7):184--189, Jan 1997.","journal-title":"Aslib Proceedings"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_1_32_1","series-title":"Findings of ACL","first-page":"4163","volume-title":"EMNLP (Findings)","author":"Jiao X.","year":"2020","unstructured":"X. Jiao , Y. Yin , L. Shang , X. Jiang , X. Chen , L. Li , F. Wang , and Q. Liu . Tinybert: Distilling BERT for natural language understanding . In EMNLP (Findings) , volume EMNLP 2020 of Findings of ACL , pages 4163 -- 4174 . ACL , 2020 . X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu. Tinybert: Distilling BERT for natural language understanding. In EMNLP (Findings), volume EMNLP 2020 of Findings of ACL, pages 4163--4174. ACL, 2020."},{"key":"e_1_2_1_33_1","first-page":"6769","volume-title":"EMNLP","author":"Karpukhin V.","year":"2020","unstructured":"V. Karpukhin , B. Oguz , S. Min , P. S. H. Lewis , L. Wu , S. Edunov , D. Chen , and W. Yih . Dense passage retrieval for open-domain question answering . In EMNLP , pages 6769 -- 6781 . ACL, 2020 . V. Karpukhin, B. Oguz, S. Min, P. S. H. Lewis, L. Wu, S. Edunov, D. Chen, and W. Yih. Dense passage retrieval for open-domain question answering. In EMNLP, pages 6769--6781. ACL, 2020."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574274"},{"key":"e_1_2_1_35_1","first-page":"2284","volume-title":"NAACL-HLT","author":"Koncel-Kedziorski R.","year":"2019","unstructured":"R. Koncel-Kedziorski , D. Bekal , Y. Luan , M. Lapata , and H. Hajishirzi . Text generation from knowledge graphs with graph transformers . In NAACL-HLT , pages 2284 -- 2293 . ACL, 2019 . R. Koncel-Kedziorski, D. Bekal, Y. Luan, M. Lapata, and H. Hajishirzi. Text generation from knowledge graphs with graph transformers. In NAACL-HLT, pages 2284--2293. ACL, 2019."},{"key":"e_1_2_1_36_1","first-page":"468","volume-title":"ICDE","author":"Koutras C.","year":"2021","unstructured":"C. Koutras , G. Siachamis , A. Ionescu , K. Psarakis , J. Brons , M. Fragkoulis , C. Lofi , A. Bonifati , and A. Katsifodimos . Valentine: Evaluating matching techniques for dataset discovery . In ICDE , pages 468 -- 479 . IEEE, 2021 . C. Koutras, G. Siachamis, A. Ionescu, K. Psarakis, J. Brons, M. Fragkoulis, C. Lofi, A. Bonifati, and A. Katsifodimos. Valentine: Evaluating matching techniques for dataset discovery. In ICDE, pages 468--479. IEEE, 2021."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1394399"},{"key":"e_1_2_1_40_1","first-page":"3111","volume-title":"NIPS","author":"Mikolov T.","year":"2013","unstructured":"T. Mikolov , I. Sutskever , K. Chen , G. S. Corrado , and J. Dean . Distributed representations of words and phrases and their compositionality . In NIPS , pages 3111 -- 3119 , 2013 . T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574258"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380605"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192973"},{"key":"e_1_2_1_44_1","volume-title":"https:\/\/www.sbert.net\/docs\/package_reference\/losses.html","author":"Reimers Nils","year":"2019","unstructured":"Nils Reimers . Sentencetransformers.losses. https:\/\/www.sbert.net\/docs\/package_reference\/losses.html , 2019 . Nils Reimers. Sentencetransformers.losses. https:\/\/www.sbert.net\/docs\/package_reference\/losses.html, 2019."},{"key":"e_1_2_1_45_1","first-page":"4062","volume-title":"KDD","author":"Qin J.","year":"2021","unstructured":"J. Qin , W. Wang , C. Xiao , Y. Zhang , and Y. Wang . High-dimensional similarity query processing for data science . In KDD , pages 4062 -- 4063 . ACM, 2021 . J. Qin, W. Wang, C. Xiao, Y. Zhang, and Y. Wang. High-dimensional similarity query processing for data science. In KDD, pages 4062--4063. ACM, 2021."},{"key":"e_1_2_1_46_1","volume-title":"https:\/\/www.sbert.net\/","author":"Reimers N.","year":"2022","unstructured":"N. Reimers . Sentence bert. https:\/\/www.sbert.net\/ , 2022 . N. Reimers. Sentence bert. https:\/\/www.sbert.net\/, 2022."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_48_1","unstructured":"D. Ritze O. Lehmberg R. Meusel C. Bizer and S. Zope. WDC web table corpus. http:\/\/webdatacommons.org\/webtables\/2015\/downloadInstructions.html 2015.  D. Ritze O. Lehmberg R. Meusel C. Bizer and S. Zope. WDC web table corpus. http:\/\/webdatacommons.org\/webtables\/2015\/downloadInstructions.html 2015."},{"key":"e_1_2_1_49_1","volume-title":"Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108","author":"Sanh V.","year":"2019","unstructured":"V. Sanh , L. Debut , J. Chaumond , and T. Wolf . Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108 , 2019 . V. Sanh, L. Debut, J. Chaumond, and T. Wolf. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108, 2019."},{"key":"e_1_2_1_50_1","first-page":"1678","volume-title":"SIGMOD","author":"Song J.","year":"2021","unstructured":"J. Song and Y. He . Auto-validate: Unsupervised data validation using data-domain patterns inferred from data lakes . In SIGMOD , pages 1678 -- 1691 . ACM, 2021 . J. Song and Y. He. Auto-validate: Unsupervised data validation using data-domain patterns inferred from data lakes. In SIGMOD, pages 1678--1691. ACM, 2021."},{"key":"e_1_2_1_51_1","volume-title":"NeurIPS","author":"Song K.","year":"2020","unstructured":"K. Song , X. Tan , T. Qin , J. Lu , and T. Liu . Mpnet: Masked and permuted pre-training for language understanding . In NeurIPS , 2020 . K. Song, X. Tan, T. Qin, J. Lu, and T. Liu. Mpnet: Masked and permuted pre-training for language understanding. In NeurIPS, 2020."},{"key":"e_1_2_1_52_1","first-page":"1493","volume-title":"SIGMOD","author":"Suhara Y.","year":"2022","unstructured":"Y. Suhara , J. Li , Y. Li , D. Zhang , \u00c7. Demiralp, C. Chen , and W. Tan . Annotating columns with pre-trained language models . In SIGMOD , pages 1493 -- 1503 . ACM, 2022 . Y. Suhara, J. Li, Y. Li, D. Zhang, \u00c7. Demiralp, C. Chen, and W. Tan. Annotating columns with pre-trained language models. In SIGMOD, pages 1493--1503. ACM, 2022."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494149"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457391"},{"key":"e_1_2_1_55_1","first-page":"5998","volume-title":"NIPS","author":"Vaswani A.","year":"2017","unstructured":"A. Vaswani , N. Shazeer , N. Parmar , J. Uszkoreit , L. Jones , A. N. Gomez , L. Kaiser , and I. Polosukhin . Attention is all you need . In NIPS , pages 5998 -- 6008 , 2017 . A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS, pages 5998--6008, 2017."},{"key":"e_1_2_1_56_1","first-page":"1472","volume-title":"SIGIR","author":"Wang F.","year":"2021","unstructured":"F. Wang , K. Sun , M. Chen , J. Pujara , and P. A. Szekely . Retrieving complex tables with multi-granular graph representation learning . In SIGIR , pages 1472 -- 1482 . ACM, 2021 . F. Wang, K. Sun, M. Chen, J. Pujara, and P. A. Szekely. Retrieving complex tables with multi-granular graph representation learning. In SIGIR, pages 1472--1482. ACM, 2021."},{"key":"e_1_2_1_57_1","first-page":"1780","volume-title":"InKDD","author":"Wang Z.","year":"2021","unstructured":"Z. Wang , H. Dong , R. Jia , J. Li , Z. Fu , S. Han , and D. Zhang . TUTA: tree-based transformers for generally structured table pre-training . InKDD , pages 1780 -- 1790 . ACM, 2021 . Z. Wang, H. Dong, R. Jia, J. Li, Z. Fu, S. Han, and D. Zhang. TUTA: tree-based transformers for generally structured table pre-training. InKDD, pages 1780--1790. ACM, 2021."},{"key":"e_1_2_1_58_1","volume-title":"Chain of thought prompting elicits reasoning in large language models. CoRR, abs\/2201.11903","author":"Wei J.","year":"2022","unstructured":"J. Wei , X. Wang , D. Schuurmans , M. Bosma , E. H. Chi , Q. Le , and D. Zhou . Chain of thought prompting elicits reasoning in large language models. CoRR, abs\/2201.11903 , 2022 . J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. Chi, Q. Le, and D. Zhou. Chain of thought prompting elicits reasoning in large language models. CoRR, abs\/2201.11903, 2022."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825"},{"key":"e_1_2_1_60_1","first-page":"8413","volume-title":"ACL","author":"Yin P.","year":"2020","unstructured":"P. Yin , G. Neubig , W. Yih , and S. Riedel . Tabert: Pretraining for joint understanding of textual and tabular data . In ACL , pages 8413 -- 8426 . ACL, 2020 . P. Yin, G. Neubig, W. Yih, and S. Riedel. Tabert: Pretraining for joint understanding of textual and tabular data. In ACL, pages 8413--8426. ACL, 2020."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407793"},{"key":"e_1_2_1_62_1","first-page":"1029","volume-title":"SIGIR","author":"Zhang L.","year":"2019","unstructured":"L. Zhang , S. Zhang , and K. Balog . Table2vec: Neural word and entity embeddings for table population and retrieval . In SIGIR , pages 1029 -- 1032 . ACM, 2019 . L. Zhang, S. Zhang, and K. Balog. Table2vec: Neural word and entity embeddings for table population and retrieval. In SIGIR, pages 1029--1032. ACM, 2019."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389726"},{"key":"e_1_2_1_64_1","first-page":"847","volume-title":"SIGMOD","author":"Zhu E.","year":"2019","unstructured":"E. Zhu , D. Deng , F. Nargesian , and R. J. Miller . JOSIE: overlap set similarity search for finding joinable tables in data lakes . In SIGMOD , pages 847 -- 864 . ACM, 2019 . E. Zhu, D. Deng, F. Nargesian, and R. J. Miller. JOSIE: overlap set similarity search for finding joinable tables in data lakes. In SIGMOD, pages 847--864. ACM, 2019."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115409"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994534"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3603581.3603587","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T19:08:20Z","timestamp":1691521700000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3603581.3603587"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6]]},"references-count":66,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["10.14778\/3603581.3603587"],"URL":"https:\/\/doi.org\/10.14778\/3603581.3603587","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,6]]},"assertion":[{"value":"2023-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}