{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T17:21:15Z","timestamp":1780420875349,"version":"3.54.1"},"reference-count":174,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T00:00:00Z","timestamp":1765756800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In this work, we present a principled framework for the deployment of Large Language Models (LLMs) in enterprise big data management across digital governance, marketing, and accounting domains. Unlike conventional predictive applications, our approach integrates LLMs as auditable, sector-adaptive components that robustly and directly enhance data curation, lineage, and regulatory compliance. The study contributes (i) a systematic evaluation of seven LLM-enabled functions\u2014including schema mapping, entity resolution, and document extraction\u2014that directly improve data quality and operational governance; (ii) a distributed architecture that deploys Apache Spark orchestration with Markov Chain Monte Carlo sampling to achieve quantifiable uncertainty and reproducible audit trails; and (iii) a cross-sector analysis demonstrating robust semantic accuracy, compliance management, and explainable outputs suited to diverse assurance requirements. Empirical evaluations reveal that the proposed architecture persistently attains elevated mapping precision, resilient multimodal feature extraction, and consistent human supervision. These characteristics collectively reinforce the integrity, accountability, and transparency of information ecosystems, particularly within compliance-driven organizational settings.<\/jats:p>","DOI":"10.3390\/a18120791","type":"journal-article","created":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T08:46:52Z","timestamp":1765874812000},"page":"791","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["LLM-Driven Big Data Management Across Digital Governance, Marketing, and Accounting: A Spark-Orchestrated Framework"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4632-6511","authenticated-orcid":false,"given":"Aristeidis","family":"Karras","sequence":"first","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0891-6780","authenticated-orcid":false,"given":"Leonidas","family":"Theodorakopoulos","sequence":"additional","affiliation":[{"name":"Department of Management Science and Technology, University of Patras, 26334 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4253-7661","authenticated-orcid":false,"given":"Christos","family":"Karras","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0008-547X","authenticated-orcid":false,"given":"George A.","family":"Krimpas","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9413-8841","authenticated-orcid":false,"given":"Anastasios","family":"Giannaros","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8837-2248","authenticated-orcid":false,"given":"Charalampos-Panagiotis","family":"Bakalis","sequence":"additional","affiliation":[{"name":"Department of Management Science and Technology, University of Patras, 26334 Patras, Greece"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhang, M., Ji, Z., Luo, Z., Wu, Y., and Chai, C. (2024, January 13\u201316). Applications and Challenges for Large Language Models: From Data Management Perspective. Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands.","DOI":"10.1109\/ICDE60146.2024.00441"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Fan, W., Wu, P., Ding, Y., Ning, L., Wang, S., and Li, Q. (2025, January 19\u201323). Towards Retrieval-Augmented Large Language Models: Data Management and System Design. Proceedings of the 2025 IEEE 41st International Conference on Data Engineering (ICDE), Hong Kong.","DOI":"10.1109\/ICDE65448.2025.00341"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Trummer, I. (2023). From BERT to GPT-3 codex: Harnessing the potential of very large language models for data management. arXiv.","DOI":"10.14778\/3554821.3554896"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Karras, A., Theodorakopoulos, L., Karras, C., Theodoropoulou, A., Kalliampakou, I., and Kalogeratos, G. (2025). LLMs for Cybersecurity in the Big Data Era: A Comprehensive Review of Applications, Challenges, and Future Directions. Information, 16.","DOI":"10.3390\/info16110957"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Jadhav, A., and Mirza, V. (2025). Large Language Models in Equity Markets: Applications, Techniques, and Insights. Front. Artif. Intell., 8.","DOI":"10.3389\/frai.2025.1608365"},{"key":"ref_6","unstructured":"Kim, Y., Xu, X., McDuff, D., Breazeal, C., and Park, H.W. (2024). Health-llm: Large language models for health prediction via wearable sensor data. arXiv."},{"key":"ref_7","unstructured":"Fang, X., Xu, W., Tan, F.A., Zhang, J., Hu, Z., Qi, Y., Nickleach, S., Socolinsky, D., Sengamedu, S., and Faloutsos, C. (2024). Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Understanding\u2013A Survey. arXiv."},{"key":"ref_8","unstructured":"Su, J., Jiang, C., Jin, X., Qiao, Y., Xiao, T., Ma, H., Wei, R., Jing, Z., Xu, J., and Lin, J. (2024). Large language models for forecasting and anomaly detection: A systematic literature review. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Carriero, A., Pettenuzzo, D., and Shekhar, S. (2024). Macroeconomic forecasting with large language models. arXiv.","DOI":"10.2139\/ssrn.4881094"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Im, J., Lee, J., Lee, S., and Kwon, H.Y. (2024). Data pipeline for real-time energy consumption data management and prediction. Front. Big Data, 7.","DOI":"10.3389\/fdata.2024.1308236"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1145\/3719207","article-title":"Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters","volume":"16","author":"Chang","year":"2025","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Jacques-Silva, G., Kalyvianaki, E., Cohn-Gordon, K., Meguid, A., Nguyen, H., Ben-David, D., Nayak, C., Saravagi, V., Stasa, G., and Papagiannis, I. (2025, January 22\u201327). Unified Lineage System: Tracking Data Provenance at Scale. Proceedings of the Companion of the 2025 International Conference on Management of Data, Berlin, Germany.","DOI":"10.1145\/3722212.3724458"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Luo, H., Chuang, Y.S., Gong, Y., Zhang, T., Kim, Y., Wu, X., Fox, D., Meng, H., and Glass, J. (2023). Sail: Search-augmented instruction learning. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.242"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Mao, K., Dou, Z., Mo, F., Hou, J., Chen, H., and Qian, H. (2023). Large language models know your contextual search intent: A prompting framework for conversational search. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.86"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ma, M.D., Wang, X., Kung, P.N., Brantingham, P.J., Peng, N., and Wang, W. (2024, January 20\u201327). STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada.","DOI":"10.1609\/aaai.v38i17.29839"},{"key":"ref_16","unstructured":"Zhang, J., Zhang, H., Chakravarti, R., Hu, Y., Ng, P., Katsifodimos, A., Rangwala, H., Karypis, G., and Halevy, A. (2025). CoddLLM: Empowering Large Language Models for Data Analytics. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, X., Dou, Z., Zhou, Y., and Liu, F. (2024, January 14\u201318). Corpuslm: Towards a unified language model on corpus for knowledge-intensive tasks. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA.","DOI":"10.1145\/3626772.3657778"},{"key":"ref_18","unstructured":"Singh, S., and Vorster, L. (2024, January 5\u20136). LLM Supply Chain Provenance: A Blockchain-Based Approach. Proceedings of the International Conference on AI Research, Lisbon, Portugal."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hoffmann, N., and Pour, N.E. (2024, January 8\u201312). A low overhead approach for automatically tracking provenance in machine learning workflows. Proceedings of the 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Vienna, Austria.","DOI":"10.1109\/EuroSPW61312.2024.00092"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Korolev, V., and Joshi, A. (2024, January 15\u201318). Crystalia: Flexible and Efficient Method for Large Dataset Lineage Tracking. Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA.","DOI":"10.1109\/BigData62323.2024.10826067"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Padovani, G., Anantharaj, V., and Fiore, S. (2025). yProv4ML: Effortless Provenance Tracking for Machine Learning Systems. arXiv.","DOI":"10.2139\/ssrn.5226904"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Spoczynski, M., Melara, M.S., and Szyller, S. (2025). Atlas: A framework for ml lifecycle provenance & transparency. arXiv.","DOI":"10.1109\/EuroSPW67616.2025.00058"},{"key":"ref_23","unstructured":"Lu, L., An, J., Wang, Y., Kong, C., Liu, Z., Wang, S., Lin, H., Fang, M., Huang, Y., and Yang, E. (2024). From text to cql: Bridging natural language and corpus search engine. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Xu, X., Yao, B., Dong, Y., Gabriel, S., Yu, H., Hendler, J., Ghassemi, M., Dey, A.K., and Wang, D. (2024, January 5\u20139). Mental-llm: Leveraging large language models for mental health prediction via online text data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Melbourne, Australia.","DOI":"10.1145\/3643540"},{"key":"ref_25","first-page":"32","article-title":"LLMs and Databases: A Synergistic Approach to Data Utilization","volume":"49","author":"Ozcan","year":"2025","journal-title":"IEEE Data Eng. Bull."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liu, Z., He, X., Tian, Y., and Chawla, N.V. (2024, January 13\u201317). Can we soft prompt llms for graph learning tasks?. Proceedings of the Companion Proceedings of the ACM Web Conference, Singapore.","DOI":"10.1145\/3589335.3651476"},{"key":"ref_27","unstructured":"Lee, G., Yu, W., Shin, K., Cheng, W., and Chen, H. (March, January 25). Timecap: Learning to contextualize, augment, and predict time series events with large language model (LLM) agents. Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA."},{"key":"ref_28","unstructured":"Nako, P., and Jatowt, A. (2025). Navigating Tomorrow: Reliably Assessing Large Language Models Performance on Future Event Prediction. arXiv."},{"key":"ref_29","unstructured":"Soru, T., and Marshall, J. (2025). Leveraging Log Probabilities in Language Models to Forecast Future Events. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Vadlapati, P. (2024). LML-DAP: Language Model Learning a Dataset for Data-Augmented Prediction. arXiv.","DOI":"10.34218\/IJCET_16_01_001"},{"key":"ref_31","unstructured":"Wu, Z., Zhao, Y., and Wang, H. (2025). Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tang, N., Chen, M., Ning, Z., Bansal, A., Huang, Y., McMillan, C., and Li, T.J.J. (2024). A study on developer behaviors for validating and repairing llm-generated code using eye tracking and ide actions. arXiv.","DOI":"10.1109\/VL\/HCC60511.2024.00015"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tang, N., Chen, M., Ning, Z., Bansal, A., Huang, Y., McMillan, C., and Li, T.J.J. (2024, January 2\u20136). Developer Behaviors in Validating and Repairing LLM-Generated Code Using IDE and Eye Tracking. Proceedings of the 2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL\/HCC), Liverpool, UK.","DOI":"10.1109\/VL\/HCC60511.2024.00015"},{"key":"ref_34","unstructured":"Li, L., Wang, P., Ren, K., Sun, T., and Qiu, X. (2023). Origin tracing and detecting of llms. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhu, J., Xiao, M., Wang, Y., Zhai, F., Zhou, Y., and Zong, C. (2025). TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification. arXiv.","DOI":"10.18653\/v1\/2025.acl-long.577"},{"key":"ref_36","unstructured":"Nikolic, I., Baluta, T., and Saxena, P. (2025). Model Provenance Testing for Large Language Models. arXiv."},{"key":"ref_37","unstructured":"Chen, S., Kang, F., Yu, N., and Jia, R. (2024). FASTTRACK: Fast and Accurate Fact Tracing for LLMs. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, J., Crawl, D., Purawat, S., Nguyen, M., and Altintas, I. (November, January 29). Big data provenance: Challenges, state of the art and opportunities. Proceedings of the 2015 IEEE international conference on big data (Big Data), Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7364047"},{"key":"ref_39","unstructured":"Zhang, D., Zhoubian, S., Hu, Z., Yue, Y., Dong, Y., and Tang, J. (2024, January 10\u201315). Rest-mcts*: Llm self-training via process reward guided tree search. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tang, X., Yang, X., Yao, Z., Wen, J., Zhou, X., Han, J., and Hu, S. (2025, January 5\u20137). DS-GCG: Enhancing LLM Jailbreaks with Token Suppression and Induction Dual-Strategy. Proceedings of the 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Compiegne, France.","DOI":"10.1109\/CSCWD64889.2025.11033375"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Michail, A., Clematide, S., and Sennrich, R. (2025). Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples. arXiv.","DOI":"10.18653\/v1\/2025.findings-emnlp.115"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Ziegler, I., K\u00f6ksal, A., Elliott, D., and Sch\u00fctze, H. (2024). Craft your dataset: Task-specific synthetic dataset generation through corpus retrieval and augmentation. arXiv.","DOI":"10.1162\/TACL.a.56"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Jia, P., Liu, Y., Zhao, X., Li, X., Hao, C., Wang, S., and Yin, D. (2023). Mill: Mutual verification with large language models for zero-shot query expansion. arXiv.","DOI":"10.18653\/v1\/2024.naacl-long.138"},{"key":"ref_44","unstructured":"Kong, X., Gunter, T., and Pang, R. (2024). Large language model-guided document selection. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"104128","DOI":"10.1016\/j.ipm.2025.104128","article-title":"Traceable LLM-based validation of statements in knowledge graphs","volume":"62","author":"Adam","year":"2025","journal-title":"Inf. Process. Manag."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"e10365","DOI":"10.1002\/lrh2.10365","article-title":"Toward a common standard for data and specimen provenance in life sciences","volume":"8","author":"Wittner","year":"2024","journal-title":"Learn. Health Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Vieira, M., de Oliveira, T., Cicco, L., de Oliveira, D., and Bedo, M. (2024, January 28\u201330). From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL. Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024), Angers, France.","DOI":"10.5220\/0012634500003690"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Lau, G.K.R., Niu, X., Dao, H., Chen, J., Foo, C.S., and Low, B.K.H. (2024, January 12\u201316). Waterfall: Scalable framework for robust text watermarking and provenance for llms. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.emnlp-main.1138"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Longpre, S., Mahari, R., Chen, A., Obeng-Marnu, N., Sileo, D., Brannon, W., Muennighoff, N., Khazam, N., Kabbara, J., and Perisetla, K. (2023). The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI. arXiv.","DOI":"10.1038\/s42256-024-00878-8"},{"key":"ref_50","unstructured":"Hu, Y., Nguyen, T.P., Ghosh, S., and Razniewski, S. (August, January 27). Enabling LLM knowledge analysis via extensive materialization. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Karras, C., Karras, A., Avlonitis, M., and Sioutas, S. (2022, January 17\u201320). An overview of mcmc methods: From theory to applications. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Greece.","DOI":"10.1007\/978-3-031-08341-9_26"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Karras, C., Theodorakopoulos, L., Karras, A., Krimpas, G.A., Bakalis, C.P., and Theodoropoulou, A. (2025). MCMC Methods: From Theory to Distributed Hamiltonian Monte Carlo over PySpark. Algorithms, 18.","DOI":"10.3390\/a18100661"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Amasiadi, N., Aslani-Gkotzamanidou, M., Theodorakopoulos, L., Theodoropoulou, A., Krimpas, G.A., Merkouris, C., and Karras, A. (2025). AI-Driven Bayesian Deep Learning for Lung Cancer Prediction: Precision Decision Support in Big Data Health Informatics. BioMedInformatics, 5.","DOI":"10.3390\/biomedinformatics5030039"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Karras, C., Karras, A., Avlonitis, M., Giannoukou, I., and Sioutas, S. (2022, January 17\u201320). Maximum likelihood estimators on mcmc sampling algorithms for decision making. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Greece.","DOI":"10.1007\/978-3-031-08341-9_28"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Vlachou, E., Karras, C., Karras, A., Tsolis, D., and Sioutas, S. (2023). EVCA classifier: A MCMC-based classifier for analyzing high-dimensional big data. Information, 14.","DOI":"10.3390\/info14080451"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Jayasri, P., Jeya, R., and Saritha, V. (2025, January 6\u20138). Service Provisioning and Data Management in IIoT using Improved Bi-LSTM: A Comprehensive Review. Proceedings of the 2025 3rd International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India.","DOI":"10.1109\/ICSCDS65426.2025.11167279"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"213","DOI":"10.34148\/teknika.v14i2.1229","article-title":"Fine-Hybrid: Integration of BM25 And Finetuned SBERT to Enhance Search Relevance","volume":"14","author":"Kodri","year":"2025","journal-title":"Teknika"},{"key":"ref_58","unstructured":"Yuksel, K.A., Gunduz, A., Anees, A.B., and Sawaf, H. (2025). Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"25074","DOI":"10.1609\/aaai.v39i23.34692","article-title":"Automatically Generating Numerous Context-Driven SFT Data for LLMs Across Diverse Granularity","volume":"Volume 39","author":"Quan","year":"2025","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-25)"},{"key":"ref_60","unstructured":"Zeighami, S., Lin, Y., Shankar, S., and Parameswaran, A. (2025). LLM-Powered Proactive Data Systems. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Changala, R., Kaur, C., Satapathy, N.R., Vuyyuru, V.A., Santosh, K., and Valavan, M.P. (2024, January 26\u201327). Healthcare Data Management Optimization Using LSTM and GAN-Based Predictive Modeling: Towards Effective Health Service Delivery. Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India.","DOI":"10.1109\/ICDSNS62112.2024.10690859"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"1652","DOI":"10.2166\/hydro.2025.100","article-title":"Towards HydroLLM: Approaches for building a domain-specific language model for hydrology","volume":"27","author":"Kizilkaya","year":"2025","journal-title":"J. Hydroinform."},{"key":"ref_63","unstructured":"Cohen, O.S., Malul, E., Meidan, Y., Mimran, D., Elovici, Y., and Shabtai, A. (2025). KubeGuard: LLM-Assisted Kubernetes Hardening via Configuration Files and Runtime Logs Analysis. arXiv."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Shan, R., and Shan, T. (2024, January 27\u201328). Enterprise LLMOps: Advancing Large Language Models Operations Practice. Proceedings of the 2024 IEEE Cloud Summit, Washington, DC, USA.","DOI":"10.1109\/Cloud-Summit61220.2024.00030"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Reed, C., Wynn, M., and Bown, R. (2025). Artificial Intelligence in Digital Marketing: Towards an Analytical Framework for Revealing and Mitigating Bias. Big Data Cogn. Comput., 9.","DOI":"10.3390\/bdcc9020040"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Li, W., Liu, W., Deng, M., Liu, X., and Feng, L. (2025). The impact of large language models on accounting and future application scenarios. J. Account. Lit.","DOI":"10.1108\/JAL-12-2024-0357"},{"key":"ref_67","unstructured":"Aghaei, R., Kiaei, A.A., Boush, M., Vahidi, J., Zavvar, M., Barzegar, Z., and Rofoosheh, M. (2025). Harnessing the Potential of Large Language Models in Modern Marketing Management: Applications, Future Directions, and Strategic Recommendations. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Ao, S.I., Hurwitz, M., and Palade, V. (2025). Cognitive computing and business intelligence applications in accounting, finance and management. Big Data Cogn. Comput., 9.","DOI":"10.3390\/bdcc9030054"},{"key":"ref_69","unstructured":"Tavasoli, A., Sharbaf, M., and Madani, S.M. (2025). Responsible innovation: A strategic framework for financial LLM integration. arXiv."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Kerr, D., Smith, K.T., Smith, L.M., and Xu, T. (2025). A Review of AI and Its Impact on Management Accounting and Society. J. Risk Financ. Manag., 18.","DOI":"10.3390\/jrfm18060340"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Bedagkar, A., Mitra, S., Medicherla, R., Naik, R., and Pal, S. (May, January 27). LLM Driven Smart Assistant for Data Mapping. Proceedings of the 2025 IEEE\/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Ottawa, ON, Canada.","DOI":"10.1109\/ICSE-SEIP66354.2025.00022"},{"key":"ref_72","unstructured":"Sheetrit, E., Brief, M., Mishaeli, M., and Elisha, O. (2024). Rematch: Retrieval enhanced schema matching with llms. arXiv."},{"key":"ref_73","unstructured":"Wang, T., Chen, X., Lin, H., Han, X., Sun, L., Wang, H., and Zeng, Z. (2025, January 25\u201328). DBCopilot: Natural Language Querying over Massive Databases via Schema Routing. Proceedings of the 28th International Conference on Extending Database Technology (EDBT), Barcelona, Spain."},{"key":"ref_74","unstructured":"Ma, C., Chakrabarti, S., Khan, A., and Moln\u00e1r, B. (2025). Knowledge graph-based retrieval-augmented generation for schema matching. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Gan, Y., Chen, X., Huang, Q., Purver, M., Woodward, J.R., Xie, J., and Huang, P. (2021). Towards robustness of text-to-SQL models against synonym substitution. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.195"},{"key":"ref_76","unstructured":"Fu, S.D., and Chen, X. (2024). Compound Schema Registry. arXiv."},{"key":"ref_77","first-page":"2533322","article-title":"Monkuu: A LLM-powered natural language interface for geospatial databases with dynamic schema mapping","volume":"38","author":"Yu","year":"2025","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Wang, Y., Liu, P., and Yang, X. (2025). Linkalign: Scalable schema linking for real-world large-scale multi-database text-to-sql. arXiv.","DOI":"10.18653\/v1\/2025.emnlp-main.51"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1145\/3749170","article-title":"In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration","volume":"3","author":"Fu","year":"2025","journal-title":"Proc. ACM Manag. Data"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Bouabdelli, L.F., Abdelhedi, F., Hammoudi, S., and Hadjali, A. (2025, January 10\u201312). An Advanced Entity Resolution in Data Lakes: First Steps. Proceedings of the 14th International Conference on Data Science, Technology and Applications\u2014Volume 1: DATA, Bilbao, Spain.","DOI":"10.5220\/0013643200003967"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Saengsiripaiboon, S., Pacharawongsakda, E., and Jitkongchuen, D. (2025, January 1\u20133). Enhancing Efficiency in Entity Resolution Strategies for Batch Prompting. Proceedings of the 2025 6th International Conference on Big Data Analytics and Practices (IBDAP), Chiang Mai, Thailand.","DOI":"10.1109\/IBDAP65587.2025.11145850"},{"key":"ref_82","unstructured":"Wang, T., Chen, X., Lin, H., Chen, X., Han, X., Wang, H., Zeng, Z., and Sun, L. (2024). Match, compare, or select? an investigation of large language models for entity matching. arXiv."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Zhang, J., Fang, J., Zhang, C., Zhang, W., Ren, H., and Xu, L. (2025). Geographic Named Entity Matching and Evaluation Recommendation Using Multi-Objective Tasks: A Study Integrating a Large Language Model (LLM) and Retrieval-Augmented Generation (RAG). ISPRS Int. J. Geo-Inf., 14.","DOI":"10.3390\/ijgi14030095"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Abd El Aziz, R.A., Elzanfaly, D., and Farhan, M.S. (2024, January 1\u20132). Towards Semantic Layer for Enhancing Blocking Entity Resolution Accuracy in Big Data. Proceedings of the 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles.","DOI":"10.1109\/ACDSA59508.2024.10467666"},{"key":"ref_85","unstructured":"Xu, L., Zhang, X., Duan, F., Wang, S., Weng, R., Wang, J., and Cai, X. (2025). FIRE: Flexible Integration of Data Quality Ratings for Effective Pre-Training. arXiv."},{"key":"ref_86","unstructured":"Asthana, S., Zhang, B., Mahindru, R., DeLuca, C., Gentile, A.L., and Gopisetty, S. (2025). Deploying Privacy Guardrails for LLMs: A Comparative Analysis of Real-World Applications. arXiv."},{"key":"ref_87","unstructured":"Asthana, S., Mahindru, R., Zhang, B., and Sanz, J. (2025). Adaptive PII Mitigation Framework for Large Language Models. arXiv."},{"key":"ref_88","unstructured":"Zhu, L., Yang, L., Li, C., Hu, S., Liu, L., and Yin, B. (2024). LegiLM: A Fine-Tuned Legal Language Model for Data Compliance. arXiv."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"4282","DOI":"10.1109\/TSE.2023.3288901","article-title":"NLP-Based Automated Compliance Checking of Data Processing Agreements Against GDPR","volume":"49","author":"Cejas","year":"2023","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_90","unstructured":"Cory, T., Rieder, W., Kr\u00e4mer, J., Raschke, P., Herbke, P., and K\u00fcpper, A. (2025). Word-level Annotation of GDPR Transparency Compliance in Privacy Policies using Large Language Models. arXiv."},{"key":"ref_91","unstructured":"Garza, L., Elluri, L., Kotal, A., Piplai, A., Gupta, D., and Joshi, A. (2024). Privcomp-kg: Leveraging knowledge graph and large language models for privacy policy compliance verification. arXiv."},{"key":"ref_92","unstructured":"Berghaus, D., Berger, A., Hillebrand, L., Cvejoski, K., and Sifa, R. (2025). Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing. arXiv."},{"key":"ref_93","unstructured":"Biswas, A., and Talukdar, W. (2024). Robustness of structured data extraction from in-plane rotated documents using multi-modal large language models (llm). arXiv."},{"key":"ref_94","unstructured":"Liu, J., Zeng, Y., H\u00f8jmark-Bertelsen, M., Gadeberg, M.N., Wang, H., and Wu, Q. (2024). Memory-Augmented Agent Training for Business Document Understanding. arXiv."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Abdellaif, O.H., Hassan, A.N., and Hamdi, A. (2024, January 13\u201314). Erpa: Efficient rpa model integrating ocr and llms for intelligent document processing. Proceedings of the 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt.","DOI":"10.1109\/MIUCC62295.2024.10783599"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Soylu, A., Elves\u00e6ter, B., Turk, P., Roman, D., Corcho, O., Simperl, E., Konstantinidis, G., and Lech, T.C. (2019, January 18\u201320). Towards an ontology for public procurement based on the open contracting data standard. Proceedings of the Conference on e-Business, e-Services and e-Society, Trondheim, Norway.","DOI":"10.1007\/978-3-030-29374-1_19"},{"key":"ref_97","first-page":"i","article-title":"An ontology of E-commerce-mapping a relevant corpus of knowledge","volume":"10","author":"Ramaprasad","year":"2015","journal-title":"J. Theor. Appl. Electron. Commer. Res."},{"key":"ref_98","first-page":"417","article-title":"Lake data warehouse architecture for big data solutions","volume":"11","author":"Saddad","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_99","unstructured":"Mazumdar, D., Hughes, J., and Onofre, J. (2023). The data lakehouse: Data warehousing and more. arXiv."},{"key":"ref_100","first-page":"8783952","article-title":"Audit as You Go: A Smart Contract-Based Outsourced Data Integrity Auditing Scheme for Multiauditor Scenarios with One Person, One Vote","volume":"2022","author":"Li","year":"2022","journal-title":"Secur. Commun. Netw."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Francati, D., Ateniese, G., Faye, A., Milazzo, A.M., Perillo, A.M., Schiatti, L., and Giordano, G. (2021, January 7\u201311). Audita: A blockchain-based auditing framework for off-chain storage. Proceedings of the Ninth International Workshop on Security in Blockchain and Cloud Computing, Virtual Event.","DOI":"10.1145\/3457977.3460293"},{"key":"ref_102","unstructured":"Shi, Z., Bergers, J., Korsmit, K., and Zhao, Z. (2022). AUDITEM: Toward an automated and efficient data integrity verification model using blockchain. arXiv."},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Roychowdhury, S., Krema, M., Mahammad, A., Moore, B., Mukherjee, A., and Prakashchandra, P. (2024, January 15\u201318). ERATTA: Extreme RAG for enterprise-Table To Answers with Large Language Models. Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA.","DOI":"10.1109\/BigData62323.2024.10825910"},{"key":"ref_104","doi-asserted-by":"crossref","unstructured":"Hamayat, F., Ejaz, L., Danish, M., Nazir, A., Ahadian, P., and Ahmad, R.F. (2025, January 14\u201316). SEEBot: Leveraging Open-Source LLMs and RAG for Secure and Economical Enterprise Chatbots. Proceedings of the 2025 5th International Conference on Artificial Intelligence and Education (ICAIE), Suzhou, China.","DOI":"10.1109\/ICAIE64856.2025.11158323"},{"key":"ref_105","unstructured":"Di Profio, M., Zhong, M., Sripada, Y., and Jaspars, M. (2025). FlowETL: An Autonomous Example-Driven Pipeline for Data Engineering. arXiv."},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"741","DOI":"10.32996\/jcsts.2025.7.3.81","article-title":"Beyond ETL: How AI Agents Are Building Self-Healing Data Pipelines","volume":"7","author":"Chakraborty","year":"2025","journal-title":"J. Comput. Sci. Technol. Stud."},{"key":"ref_107","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1109\/MS.2023.3340256","article-title":"Design patterns for machine learning-based systems with humans in the loop","volume":"41","author":"Andersen","year":"2023","journal-title":"IEEE Softw."},{"key":"ref_108","doi-asserted-by":"crossref","unstructured":"Xin, D., Ma, L., Liu, J., Macke, S., Song, S., and Parameswaran, A. (2018). Helix: Accelerating human-in-the-loop machine learning. arXiv.","DOI":"10.1145\/3209889.3209897"},{"key":"ref_109","doi-asserted-by":"crossref","unstructured":"Pogiatzis, A., and Samakovitis, G. (2020). An event-driven serverless ETL pipeline on AWS. Appl. Sci., 11.","DOI":"10.3390\/app11010191"},{"key":"ref_110","unstructured":"Yin, W., Heinecke, S., Li, J., Keskar, N.S., Jones, M., Shi, S., Georgiev, S., Milich, K., Esposito, J., and Xiong, C. (2021). Combining data-driven supervision with human-in-the-loop feedback for entity resolution. arXiv."},{"key":"ref_111","unstructured":"Wang, J., Guo, B., and Chen, L. (2022). Human-in-the-loop machine learning: A macro-micro perspective. arXiv."},{"key":"ref_112","unstructured":"Wang, Z.J., Choi, D., Xu, S., and Yang, D. (2021). Putting humans in the natural language processing loop: A survey. arXiv."},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Hardin, T., and Kotz, D. (2022, January 10\u201313). Amanuensis: Provenance, privacy, and permission in TEE-enabled blockchain data systems. Proceedings of the 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), Bologna, Italy.","DOI":"10.1109\/ICDCS54860.2022.00023"},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Ahmad, A., Saad, M., Bassiouni, M., and Mohaisen, A. (2018, January 5\u20137). Towards blockchain-driven, secure and transparent audit logs. Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, New York, NY, USA.","DOI":"10.1145\/3286978.3286985"},{"key":"ref_115","unstructured":"Amin, M.A., Tummala, H., Mohan, S., and Ray, I. (2023). Healthcare Policy Compliance: A Blockchain Smart Contract-Based Approach. arXiv."},{"key":"ref_116","doi-asserted-by":"crossref","unstructured":"Pattengale, N.D., and Hudson, C.M. (2020). Decentralized genomics audit logging via permissioned blockchain ledgering. BMC Med. Genom., 13.","DOI":"10.1186\/s12920-020-0720-3"},{"key":"ref_117","unstructured":"Lu, Y., and Wang, J. (2025). KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment. arXiv."},{"key":"ref_118","unstructured":"Zhang, H., Si, J., Yan, G., Qi, B., Cai, P., Mao, S., Wang, D., and Shi, B. (2025). RAKG: Document-level Retrieval Augmented Knowledge Graph Construction. arXiv."},{"key":"ref_119","first-page":"e1179","article-title":"Critical data for critical care: A primer on leveraging electronic health record data for research from society of critical care medicine\u2019s panel on data sharing and harmonization","volume":"6","author":"Heavner","year":"2024","journal-title":"Crit. Care Explor."},{"key":"ref_120","unstructured":"Santos, A., Pena, E.H., Lopez, R., and Freire, J. (2025). Interactive Data Harmonization with LLM Agents: Opportunities and Challenges. arXiv."},{"key":"ref_121","doi-asserted-by":"crossref","first-page":"103964","DOI":"10.1016\/j.cose.2024.103964","article-title":"From cobit to iso 42001: Evaluating cybersecurity frameworks for opportunities, risks, and regulatory compliance in commercializing large language models","volume":"144","author":"McIntosh","year":"2024","journal-title":"Comput. Secur."},{"key":"ref_122","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1007\/s43681-023-00289-2","article-title":"Auditing large language models: A three-layered approach","volume":"4","author":"Schuett","year":"2024","journal-title":"AI Ethics"},{"key":"ref_123","doi-asserted-by":"crossref","unstructured":"Jernite, Y., Nguyen, H., Biderman, S., Rogers, A., Masoud, M., Danchev, V., Tan, S., Luccioni, A.S., Subramani, N., and Johnson, I. (2022, January 21\u201324). Data governance in the age of large-scale data-driven language technology. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea.","DOI":"10.1145\/3531146.3534637"},{"key":"ref_124","doi-asserted-by":"crossref","first-page":"lsaa065","DOI":"10.1093\/jlb\/lsaa065","article-title":"Policy-aware data lakes: A flexible approach to achieve legal interoperability for global research collaborations","volume":"7","author":"Thorogood","year":"2020","journal-title":"J. Law Biosci."},{"key":"ref_125","first-page":"216","article-title":"Bridging the global divide in AI regulation: A proposal for a contextual, coherent, and commensurable framework","volume":"33","author":"Park","year":"2023","journal-title":"Wash. Int. Law J."},{"key":"ref_126","unstructured":"Akbarfam, A.J., and Maleki, H. (2024). SOK: Blockchain for Provenance. arXiv."},{"key":"ref_127","doi-asserted-by":"crossref","unstructured":"Ozdayi, M.S., Kantarcioglu, M., and Malin, B. (2020). Leveraging blockchain for immutable logging and querying across multiple sites. BMC Med Genom., 13.","DOI":"10.1186\/s12920-020-0721-2"},{"key":"ref_128","doi-asserted-by":"crossref","unstructured":"Akbarfam, A.J., Heidaripour, M., Maleki, H., Dorai, G., and Agrawal, G. (2023, January 1\u20133). Forensiblock: A provenance-driven blockchain framework for data forensics and auditability. Proceedings of the 2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Atlanta, GA, USA.","DOI":"10.1109\/TPS-ISA58951.2023.00025"},{"key":"ref_129","doi-asserted-by":"crossref","unstructured":"Shi, J., Firmansyah, E.A., Wang, Y., and Xu, W. (2025). Technological innovation and regulatory harmonization in Islamic finance: A systematic review and machine learning analysis (2000\u20132023). J. Islam. Account. Bus. Res.","DOI":"10.1108\/JIABR-01-2025-0026"},{"key":"ref_130","doi-asserted-by":"crossref","unstructured":"Boldt Sousa, T. (2022, January 6\u201310). Customer Data Platforms: A Pattern Language for Digital Marketing Optimization with First-Party Data. Proceedings of the 27th European Conference on Pattern Languages of Programs, Irsee, Germany.","DOI":"10.1145\/3551902.3551984"},{"key":"ref_131","doi-asserted-by":"crossref","first-page":"837","DOI":"10.32996\/jcsts.2025.7.8.98","article-title":"AI-Augmented Customer Data Platforms: Engineering for Scale, Speed, and Compliance","volume":"7","author":"Shivampeta","year":"2025","journal-title":"J. Comput. Sci. Technol. Stud."},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Wen, Y., Li, W., Luo, J., Xiao, J., Jia, Y., and Wang, Z. (2024, January 27\u201329). Multi-domain Data Association Analysis: Research on Precise Customer Classification Based on LLM and GMM Models. Proceedings of the 4th Asia-Pacific Artificial Intelligence and Big Data Forum, Ganzhou, China.","DOI":"10.1145\/3718491.3718640"},{"key":"ref_133","doi-asserted-by":"crossref","unstructured":"Kasuga, A., and Yonetani, R. (2024, January 21\u201325). Cxsimulator: A user behavior simulation using llm embeddings for web-marketing campaign assessment. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA.","DOI":"10.1145\/3627673.3679894"},{"key":"ref_134","doi-asserted-by":"crossref","unstructured":"Tan, Z., Zeng, Q., Tian, Y., Liu, Z., Yin, B., and Jiang, M. (2024). Democratizing large language models via personalized parameter-efficient fine-tuning. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-main.372"},{"key":"ref_135","unstructured":"Zeldes, Y., Zait, A., Labzovsky, I., Karmon, D., and Farkash, E. (2025). ComMer: A Framework for Compressing and Merging User Data for Personalization. arXiv."},{"key":"ref_136","unstructured":"Embar, V., Shrivastava, R., Damodaran, V., Mehlinger, T., Hsiao, Y.C., and Raghunathan, K. (2025). LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment. arXiv."},{"key":"ref_137","unstructured":"Gan, C., Yang, D., Hu, B., Liu, Z., Shen, Y., Zhang, Z., Gu, J., Zhou, J., and Zhang, G. (2023). Making large language models better knowledge miners for online marketing with progressive prompting augmentation. arXiv."},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Jena, P.K., Dash, A.K., Maharana, D., and Palai, C. (2023, January 7\u20138). A Novel Invoice Automation System. Proceedings of the 2023 IEEE 5th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA), Hamburg, Germany.","DOI":"10.1109\/ICCCMLA58983.2023.10346749"},{"key":"ref_139","unstructured":"Akdo\u011fan, A., and Kurt, M. (2024). Exttnet: A deep learning algorithm for extracting table texts from invoice images. arXiv."},{"key":"ref_140","unstructured":"Aguda, T., Siddagangappa, S., Kochkina, E., Kaur, S., Wang, D., Smiley, C., and Shah, S. (2024). Large language models as financial data annotators: A study on effectiveness and efficiency. arXiv."},{"key":"ref_141","doi-asserted-by":"crossref","first-page":"100715","DOI":"10.1016\/j.accinf.2024.100715","article-title":"A scoping review of ChatGPT research in accounting and finance","volume":"55","author":"Dong","year":"2024","journal-title":"Int. J. Account. Inf. Syst."},{"key":"ref_142","unstructured":"Kim, S., Song, H., Seo, H., and Kim, H. (2025). Optimizing retrieval strategies for financial question answering documents in retrieval-augmented generation systems. arXiv."},{"key":"ref_143","doi-asserted-by":"crossref","unstructured":"Yue, C., Xu, X., Ma, X., Du, L., Ding, Z., Han, S., Zhang, D., and Zhang, Q. (2025, January 6\u201311). Extract Information from Hybrid Long Documents Leveraging LLMs: A Framework and Dataset. Proceedings of the ICASSP 2025\u20142025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India.","DOI":"10.1109\/ICASSP49660.2025.10889045"},{"key":"ref_144","doi-asserted-by":"crossref","unstructured":"Rajpoot, P.K., and Parikh, A. (2023). Nearest Neighbor Search over Vectorized Lexico-Syntactic Patterns for Relation Extraction from Financial Documents. arXiv.","DOI":"10.18653\/v1\/2023.pandl-1.1"},{"key":"ref_145","doi-asserted-by":"crossref","first-page":"617","DOI":"10.3390\/make2040033","article-title":"Automatic electronic invoice classification using machine learning models","volume":"2","author":"Bardelli","year":"2020","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_146","unstructured":"Sarmah, B., Li, M., Lyu, J., Frank, S., Castellanos, N., Pasquali, S., and Mehta, D. (2024). How to choose a threshold for an evaluation metric for large language models. arXiv."},{"key":"ref_147","unstructured":"Cardei, M.A., Lamp, J., Derdzinski, M., and Bhatia, K. (2025). DexBench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management. arXiv."},{"key":"ref_148","unstructured":"Kumar, A., and Lakkaraju, H. (2024). Manipulating large language models to increase product visibility. arXiv."},{"key":"ref_149","doi-asserted-by":"crossref","unstructured":"Zhu, J., Bazaz, S.A., Dutta, S., Anuraag, B., Haider, I., and Bandopadhyay, S. (2023, January 25\u201326). Talk to your data: Enhancing Business Intelligence and Inventory Management with LLM-Driven Semantic Parsing and Text-to-SQL for Database Querying. Proceedings of the 2023 4th International Conference on Data Analytics for Business and Industry (ICDABI), Bahrain.","DOI":"10.1109\/ICDABI60145.2023.10629374"},{"key":"ref_150","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.30574\/wjarr.2023.18.1.0721","article-title":"Cross-sector AI framework for risk detection in national security, energy and financial networks","volume":"18","author":"Bamigbade","year":"2023","journal-title":"World J. Adv. Res. Rev."},{"key":"ref_151","doi-asserted-by":"crossref","unstructured":"Palaniappan, S., Mali, R., Cuomo, L., Vitale, M., Youssef, A., Madathil, A.P., Murugesan, M., Bettini, A., De Magistris, G., and Veneri, G. (2024, January 4\u20137). Enhancing Enterprise-Wide Information Retrieval through RAG Systems Techniques, Evaluation, and Scalable Deployment. Proceedings of the Abu Dhabi International Petroleum Exhibition and Conference (ADIPEC), Abu Dhabi, United Arab Emirates.","DOI":"10.2118\/222032-MS"},{"key":"ref_152","doi-asserted-by":"crossref","unstructured":"Purwar, A., and Balakrishnan, G. (2024). Evaluating the efficacy of open-source llms in enterprise-specific rag systems: A comparative study of performance and scalability. arXiv.","DOI":"10.1109\/INDICON63790.2024.10958508"},{"key":"ref_153","doi-asserted-by":"crossref","unstructured":"Sserunjogi, R., Ogenrwot, D., Niwamanya, N., Nsimbe, N., Bbaale, M., Ssempala, B., Mutabazi, N., Wabinyai, R.F., Okure, D., and Bainomugisha, E. (2025, January 20\u201321). Design and Evaluation of a Scalable Data Pipeline for AI-Driven Air Quality Monitoring in Low-Resource Settings. Proceedings of the International Conference on Software Engineering and Data Engineering, New Orleans, LA, USA.","DOI":"10.1007\/978-3-032-08649-5_14"},{"key":"ref_154","unstructured":"Rucco, C., Longo, A., and Saad, M. (2025). Efficient Data Ingestion in Cloud-based architecture: A Data Engineering Design Pattern Proposal. arXiv."},{"key":"ref_155","unstructured":"Wang, X., and Carey, M.J. (2019). An IDEA: An ingestion framework for data enrichment in AsterixDB. arXiv."},{"key":"ref_156","doi-asserted-by":"crossref","unstructured":"Singh, R., V, A., Mishra, S., and Singh, S.K. (2025, January 6\u201310). Streamlined Data Pipeline for Real-Time Threat Detection and Model Inference. Proceedings of the 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS), Bengaluru, India.","DOI":"10.1109\/COMSNETS63942.2025.10885573"},{"key":"ref_157","unstructured":"Jakubik, J., Weber, D., Hemmer, P., V\u00f6ssing, M., and Satzger, G. (2023, January 18\u201321). Improving the efficiency of human-in-the-loop systems: Adding artificial to human experts. Proceedings of the International Conference on Wirtschaftsinformatik, Paderborn, Germany."},{"key":"ref_158","doi-asserted-by":"crossref","unstructured":"Liao, Y.C., Streli, P., Li, Z., Gebhardt, C., and Holz, C. (May, January 26). Continual Human-in-the-Loop Optimization. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan.","DOI":"10.1145\/3706598.3713603"},{"key":"ref_159","doi-asserted-by":"crossref","unstructured":"Amaral, O., Abualhaija, S., and Briand, L. (2023, January 4\u20138). ML-Based Compliance Verification of Data Processing Agreements against GDPR. Proceedings of the 2023 IEEE 31st International Requirements Engineering Conference (RE), Hannover, Germany.","DOI":"10.1109\/RE57278.2023.00015"},{"key":"ref_160","doi-asserted-by":"crossref","unstructured":"Hassani, S., Sabetzadeh, M., Amyot, D., and Liao, J. (2024, January 24\u201328). Rethinking legal compliance automation: Opportunities with large language models. Proceedings of the 2024 IEEE 32nd International Requirements Engineering Conference (RE), Reykjavik, Iceland.","DOI":"10.1109\/RE59067.2024.00051"},{"key":"ref_161","doi-asserted-by":"crossref","unstructured":"Wang, M., Zhang, D.J., and Zhang, H. (2024). Large language models for market research: A data-augmentation approach. arXiv.","DOI":"10.2139\/ssrn.5057769"},{"key":"ref_162","unstructured":"Cao, H., Gu, H., and Guo, X. (2023). Feasibility of transfer learning: A mathematical framework. arXiv."},{"key":"ref_163","doi-asserted-by":"crossref","unstructured":"Witkowski, A., and Wodecki, A. (2024, January 5\u20136). A cross-disciplinary knowledge management framework for generative artificial intelligence in product management: A case study from the manufacturing sector. Proceedings of the European Conference on Knowledge Management, Veszprem, Hungary.","DOI":"10.34190\/eckm.25.1.2605"},{"key":"ref_164","doi-asserted-by":"crossref","unstructured":"Jeong, C. (2023). A study on the implementation of generative ai services using an enterprise data-based llm application architecture. arXiv.","DOI":"10.54364\/AAIML.2023.1191"},{"key":"ref_165","unstructured":"Agarwal, A., Chan, A., Chandel, S., Jang, J., Miller, S., Moghaddam, R.Z., Mohylevskyy, Y., Sundaresan, N., and Tufano, M. (2024). Copilot evaluation harness: Evaluating llm-guided software programming. arXiv."},{"key":"ref_166","unstructured":"Szymanski, A., Gebreegziabher, S.A., Anuyah, O., Metoyer, R.A., and Li, T.J.J. (2024). Comparing criteria development across domain experts, lay users, and models in large language model evaluation. arXiv."},{"key":"ref_167","doi-asserted-by":"crossref","first-page":"e081554","DOI":"10.1136\/bmj-2024-081554","article-title":"FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare","volume":"388","author":"Lekadir","year":"2025","journal-title":"BMJ"},{"key":"ref_168","unstructured":"Tjondronegoro, D. (2025). TOAST framework: A multidimensional approach to ethical and sustainable ai integration in organizations. arXiv."},{"key":"ref_169","doi-asserted-by":"crossref","unstructured":"Bouchard, D., Chauhan, M.S., Skarbrevik, D., Bajaj, V., and Ahmad, Z. (2025). LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases. arXiv.","DOI":"10.21105\/joss.07570"},{"key":"ref_170","unstructured":"Brown, N.B. (2024). Enhancing trust in llms: Algorithms for comparing and interpreting llms. arXiv."},{"key":"ref_171","unstructured":"Zhong, H., Do, T., Jie, Y., Neuwirth, R.J., and Shen, H. (2025). Global AI Governance: Where the Challenge is the Solution-An Interdisciplinary, Multilateral, and Vertically Coordinated Approach. arXiv."},{"key":"ref_172","unstructured":"Natarajan, S., Mathur, S., Sidheekh, S., Stammer, W., and Kersting, K. (March, January 25). Human-in-the-loop or AI-in-the-loop? Automate or Collaborate?. Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA."},{"key":"ref_173","doi-asserted-by":"crossref","unstructured":"Wu, J., and He, J. (2024). Trustworthy Transfer Learning: A Survey. arXiv.","DOI":"10.1613\/jair.1.17602"},{"key":"ref_174","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1038\/s41746-022-00690-x","article-title":"Moving towards vertically integrated artificial intelligence development","volume":"5","author":"Zhang","year":"2022","journal-title":"NPJ Digit. Med."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/791\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T08:58:48Z","timestamp":1765875528000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/791"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,15]]},"references-count":174,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["a18120791"],"URL":"https:\/\/doi.org\/10.3390\/a18120791","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,15]]}}}