{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:32:57Z","timestamp":1761006777921,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T00:00:00Z","timestamp":1760918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["72301250"],"award-info":[{"award-number":["72301250"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Tables, as a form of structured or semi-structured data, are widely found in documents, reports, and data manuals. Table-based question answering (TableQA) plays a key role in table document analysis and understanding. Existing approaches to TableQA can be broadly categorized into content-matching methods and end-to-end generation methods based on encoder\u2013decoder deep neural networks. Content-matching methods return one or more table cells as answers, thereby preserving the original data and making them more suitable for downstream tasks. End-to-end methods, especially those leveraging large language models (LLMs), have achieved strong performance on various benchmarks. However, the variability in LLM-generated expressions and their heavy reliance on prompt engineering limit their applicability where answer fidelity to the source table is critical. In this work, we propose CBCM (Cell-by-Cell semantic Matching), a fine-grained cell-level matching method that extends the traditional row- and column-matching paradigm to improve accuracy and applicability in TableQA. Furthermore, based on the public IM-TQA dataset, we construct a new benchmark, IM-TQA-X, specifically designed for the multi-row and multi-column cell recall task, a scenario underexplored in existing state-of-the-art content-matching methods. Experimental results show that CBCM improves overall accuracy by 2.5% over the latest row- and column-matching method RGCNRCI (Relational Graph Convolutional Networks based Row and Column Intersection), and boosts accuracy in the multi-row and multi-column recall task from 4.3% to 34%.<\/jats:p>","DOI":"10.3390\/bdcc9100265","type":"journal-article","created":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T09:23:34Z","timestamp":1760952214000},"page":"265","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Complex Table Question Answering with Multiple Cells Recall Based on Extended Cell Semantic Matching"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6864-6268","authenticated-orcid":false,"given":"Hainan","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China"}]},{"given":"Dongqi","family":"Shen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1186\/s40537-023-00808-2","article-title":"A systematic review on big data applications and scope for industrial processing and healthcare sectors","volume":"10","author":"Rahul","year":"2023","journal-title":"J. Big Data"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Wang, H., Dong, B., Wang, X., and Li, C. (2022, January 22\u201327). HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing. Proceedings of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.findings-acl.236"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hui, B., Geng, R., Wang, L., Qin, B., Li, Y., Li, B., Sun, J., and Li, Y. (2022, January 22\u201327). S2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. Proceedings of the Association for Computational Linguistics, Dubin, Ireland.","DOI":"10.18653\/v1\/2022.findings-acl.99"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"189","DOI":"10.5715\/jnlp.31.189","article-title":"A Table Question Alignment based Cell-Selection Method for Table-Text QA","volume":"31","author":"Wu","year":"2024","journal-title":"J. Nat. Lang. Process."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Khurana, U., Suneja, S., and Samulowitz, H. (2025, January 28). Table Retrieval using LLMs and Semantic Table Similarity. Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, Sydney, NSW, Australia.","DOI":"10.1145\/3701716.3715558"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wu, J., Xu, Y., Gao, Y., Lou, J.G., Karlsson, B., and Okumura, M. (2023, January 9\u201314). TACR: A Table Alignment-based Cell Selection Method for HybridQA. Proceedings of the Association for Computational Linguistics, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.findings-acl.409"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"7478","DOI":"10.1109\/TNNLS.2022.3227717","article-title":"A survey of visual transformers","volume":"35","author":"Liu","year":"2023","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, L., Zhang, A., Wu, K., Sun, K., Li, Z., Wu, H., Zhang, M., and Wang, H. (2020, January 5\u201310). DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset. Proceedings of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.emnlp-main.562"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., and Roman, S. (2018, January 15\u201320). Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. Proceedings of the Association for Computational Linguistics, Melbourne, Australia.","DOI":"10.18653\/v1\/D18-1425"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zheng, M., Hao, Y., Jiang, W., Lin, Z., Lyu, Y., She, Q., and Wang, W. (2023, January 9\u201314). IM-TQA: A Chinese Table Question Answering Dataset with Implicit and Multi-type Table Structures. Proceedings of the Association for Computational Linguistics, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.278"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Katsis, Y., Chemmengath, S., Kumar, V., Bharadwaj, S., Canim, M., Glass, M., Gliozzo, A., Pan, F., Sen, J., and Sankaranarayanan, K. (2022, January 22\u201327). AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry. Proceedings of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.naacl-industry.34"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Cheng, Z., Dong, H., Wang, Z., Jia, R., Guo, J., Gao, Y., Han, S., Lou, J.G., and Zhang, D. (2022, January 22\u201327). HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation. Proceedings of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-long.78"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shi, H., Xie, Y., Goncalves, L., Gao, S., and Zhao, J. (2024). WikiDT: Visual-Based Table Recognition and Question Answering Dataset. International Conference on Document Analysis and Recognition, Springer.","DOI":"10.1007\/978-3-031-70533-5_24"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1430","DOI":"10.1038\/s41592-024-02353-z","article-title":"Transformers in single-cell omics: A review and new perspectives","volume":"21","author":"Hrovatin","year":"2024","journal-title":"Nat. Methods"},{"key":"ref_15","first-page":"1","article-title":"Dtt: An example-driven tabular transformer for joinability by leveraging large language models","volume":"2","author":"Rafiei","year":"2024","journal-title":"Proc. ACM Manag. Data"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wang, Y., Wang, S., Cao, X., Zhang, F., and Wang, Z. (2020, January 5\u201310). Table Fact Verification with Structure-Aware Transformer. Proceedings of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.emnlp-main.126"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yin, P., Neubig, G., Yih, W.T., and Riedel, S. (2020, January 5\u201310). TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. Proceedings of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.745"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yang, J., Gupta, A., Upadhyay, S., He, L., Goel, R., and Paul, S. (2022, January 22\u201327). TableFormer: Robust Transformer Modeling for Table-Text Encoding. Proceedings of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.acl-long.40"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Eisenschlos, J., Gor, M., M\u00fcller, T., and Cohen, W. (2021, January 1\u20136). MATE: Multi-view Attention for Table Transformer Efficiency. Proceedings of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2021.emnlp-main.600"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Glass, M., Canim, M., Gliozzo, A., Chemmengath, S., Kumar, V., Chakravarti, R., Sil, A., Pan, F., Bharadwaj, S., and Fauceglia, N.R. (2021, January 1\u20136). Capturing Row and Column Semantics in Transformer Based Question Answering over Tables. Proceedings of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2021.naacl-main.96"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhu, F., Lei, W., Huang, Y., Wang, C., Zhang, S., Lv, J., Feng, F., and Chua, T.S. (2021, January 1\u20136). TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance. Proceedings of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2021.acl-long.254"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Nan, L., Qi, Z., Zhang, R., and Radev, D. (2022, January 22\u201327). ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples. Proceedings of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.emnlp-main.615"},{"key":"ref_23","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020, January 6). Language models are few-shot learners. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1162\/tacl_a_00446","article-title":"FeTaQA: Free-form Table Question Answering","volume":"10","author":"Nan","year":"2022","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Guan, C., Huang, M., and Zhang, P. (2024, January 26\u201329). MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering. Proceedings of the 2024 10th International Conference on Computing and Artificial Intelligence, Bali, Indonesia.","DOI":"10.1145\/3669754.3669822"},{"key":"ref_26","first-page":"50117","article-title":"Toolqa: A dataset for LLM question answering with external tools","volume":"36","author":"Zhuang","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kumar, V., Gupta, Y., Chemmengath, S., Sen, J., Chakrabarti, S., Bharadwaj, S., and Pan, F. (2023, January 9\u201314). Multi-Row, Multi-Span Distant Supervision For Table+Text Question Answering. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.449"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term-weighting approaches in automatic text retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Inf. Process. Manag."},{"key":"ref_29","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst., 26."},{"key":"ref_30","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (August, January 28). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sowmya, B., and Srinivasa, K. (2016, January 6\u20138). Large scale multi-label text classification of a hierarchical dataset using rocchio algorithm. Proceedings of the 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India.","DOI":"10.1109\/CSITSS.2016.7779373"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.eswa.2016.09.009","article-title":"Turning from TF-IDF to TF-IGM for term weighting in text classification","volume":"66","author":"Chen","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_33","unstructured":"Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, C. (2000, January 1). Text classification using string kernels. Proceedings of the 14th International Conference on Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"10809","DOI":"10.1007\/s00521-018-3442-0","article-title":"Verbal aggression detection on Twitter comments: Convolutional neural network for short-text sentiment analysis","volume":"32","author":"Chen","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_35","unstructured":"Kowsari, K., Heidarysafa, M., Brown, D.E., Meimandi, K.J., and Barnes, L.E. (2018, January 9\u201311). RMDL: Random Multimodel Deep Learning for Classification. Proceedings of the 2nd International Conference on Information System and Data Mining, Lakeland, FL, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_37","unstructured":"Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., and Liu, Q. (August, January 28). ERNIE: Enhanced Language Representation with Informative Entities. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/5254.708428","article-title":"Support vector machines","volume":"13","author":"Hearst","year":"1998","journal-title":"IEEE Intell. Syst. Their Appl."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/10\/265\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T10:08:34Z","timestamp":1760954914000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/10\/265"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,20]]},"references-count":39,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["bdcc9100265"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9100265","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,20]]}}}