{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T11:09:51Z","timestamp":1774264191919,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T00:00:00Z","timestamp":1769558400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T00:00:00Z","timestamp":1769558400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National University Ireland, Galway"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["AI Ethics"],"published-print":{"date-parts":[[2026,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Identifying hidden biases in AI documentation metadata (model, data, and dataspace cards) is essential for responsible AI; yet this domain remains largely unexplored. The proposed work evaluates four Transformer models (XLNet, DistilBERT, RoBERTa, and ELECTRA) for bias detection across publicly available, synthetic, and custom datasets. On the BABE news corpus, all models achieved 77\u201380% accuracy, with only ELECTRA exceeding 80% on every metric. To address the absence of publicly available AI-card datasets, we generated synthetic metadata for two use cases (\n                    <jats:italic>Customer Interaction and Customer Data Uploaded by Organisations<\/jats:italic>\n                    ) using ChatGPT. Models trained on this synthetic corpus displayed near-perfect scores, reflecting shared stylistic cues embedded in the generated text. To test real-world robustness, we curated a Hugging Face dataset by scraping documentation comments, filtering for bias-related keywords, and obtaining annotations from four independent labellers in a single-blind setting. Partial fine-tuning (zero-shot) evaluations of models trained only on BABE or synthetic data revealed substantial performance drops on this real-world set. To mitigate this cross-domain loss, we introduce a cascaded, full fine-tuning (few-shot) pipeline in which Transformer models are sequentially fine-tuned on BABE, synthetic text, and a subset of the Hugging Face corpus. Evaluation on the remaining portion achieved over 85% across all performance metrics, enhancing precision and generalisation. This study demonstrates the challenges of bias detection beyond controlled or synthetic data and highlights cascaded fine-tuning as a practical, low-resource strategy. Future directions include leveraging evidence fusion methods, integrating cross-attention with bias taxonomies, and adopting dual-encoder architectures to advance bias detection toward more in-depth, knowledge-guided reasoning.\n                  <\/jats:p>","DOI":"10.1007\/s43681-025-00975-3","type":"journal-article","created":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T11:32:55Z","timestamp":1769599975000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Investigating transformer models for textual bias detection in model, data, and dataspace cards"],"prefix":"10.1007","volume":"6","author":[{"given":"Andy","family":"Donald","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Apostolos","family":"Galanopoulos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atul Kumar","family":"Ojha","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edward","family":"Curry","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emir","family":"Mu\u00f1oz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ihsan","family":"Ullah","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John P.","family":"McCrae","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manan","family":"Kalra","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sagar","family":"Saxena","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Talha","family":"Iqbal","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,1,28]]},"reference":[{"key":"975_CR1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.121542","volume":"237","author":"S Raza","year":"2024","unstructured":"Raza, S., Garg, M., Reji, D.J., Bashir, S.R., Ding, C.: Nbias: a natural language processing framework for bias identification in text. Expert Syst. Appl. 237, 121542 (2024)","journal-title":"Expert Syst. Appl."},{"issue":"11","key":"975_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2024.101074","volume":"5","author":"H Cui","year":"2024","unstructured":"Cui, H., Yasseri, T.: Ai-enhanced collective intelligence. Patterns 5(11), 101074 (2024)","journal-title":"Patterns"},{"key":"975_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-025-11162-5","volume":"58","author":"NM Gardazi","year":"2025","unstructured":"Gardazi, N.M., et al.: Bert applications in natural language processing: a review. Artif. Intell. Rev. 58, 1\u201349 (2025)","journal-title":"Artif. Intell. Rev."},{"key":"975_CR4","doi-asserted-by":"publisher","first-page":"110194","DOI":"10.1109\/ACCESS.2025.3572211","volume":"13","author":"A Donald","year":"2025","unstructured":"Donald, A., et al.: A semantic approach for linked model, data, and dataspace cards. IEEE Access 13, 110194\u2013110207 (2025)","journal-title":"IEEE Access"},{"key":"975_CR5","doi-asserted-by":"crossref","unstructured":"Mitchell, M., et\u00a0al.: Model cards for model reporting. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220\u2013229 (2019)","DOI":"10.1145\/3287560.3287596"},{"key":"975_CR6","doi-asserted-by":"publisher","first-page":"3483","DOI":"10.3390\/app14083483","volume":"14","author":"RVK Bevara","year":"2024","unstructured":"Bevara, R.V.K., Mannuru, N.R., Karedla, S.P., Xiao, T.: Scaling implicit bias analysis across transformer-based language models through embedding association test and prompt engineering. Appl. Sci. 14, 3483 (2024)","journal-title":"Appl. Sci."},{"key":"975_CR7","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s41060-022-00359-4","volume":"17","author":"S Raza","year":"2024","unstructured":"Raza, S., Reji, D.J., Ding, C.: Dbias: detecting biases and ensuring fairness in news articles. Int. J. Data Sci. Anal. 17, 39\u201359 (2024)","journal-title":"Int. J. Data Sci. Anal."},{"key":"975_CR8","doi-asserted-by":"publisher","first-page":"11804","DOI":"10.1007\/s10489-024-05747-w","volume":"54","author":"PV Dantas","year":"2024","unstructured":"Dantas, P.V., da Silva Jr, W.S., Cordeiro, L.C., Carvalho, C.B.: A comprehensive review of model compression techniques in machine learning. Appl. Intell. 54, 11804\u201311844 (2024)","journal-title":"Appl. Intell."},{"key":"975_CR9","doi-asserted-by":"publisher","first-page":"1097","DOI":"10.1162\/coli_a_00524","volume":"50","author":"IO Gallegos","year":"2024","unstructured":"Gallegos, I.O., et al.: Bias and fairness in large language models: a survey. Comput. Linguist. 50, 1097\u20131179 (2024)","journal-title":"Comput. Linguist."},{"key":"975_CR10","doi-asserted-by":"crossref","unstructured":"Cortiz, D.: Exploring transformers in emotion recognition: a comparison of bert, distillbert, roberta, xlnet and electra. arXiv preprint arXiv:2104.02041 (2021)","DOI":"10.1145\/3562007.3562051"},{"key":"975_CR11","unstructured":"Clark, K., Luong, M.-T., Le, Q.V. Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)"},{"key":"975_CR12","doi-asserted-by":"crossref","unstructured":"Yang, Y., Duan, H., Abbasi, A., Lalor, J.P., Tam, K.Y.: Bias a-head? Analyzing bias in transformer-based language model attention heads, pp. 276\u2013290 (2025)","DOI":"10.18653\/v1\/2025.trustnlp-main.18"},{"key":"975_CR13","doi-asserted-by":"publisher","first-page":"3509","DOI":"10.3390\/electronics13173509","volume":"13","author":"M Goyal","year":"2024","unstructured":"Goyal, M., Mahmoud, Q.H.: A systematic review of synthetic data generation techniques using generative AI. Electronics 13, 3509 (2024)","journal-title":"Electronics"},{"key":"975_CR14","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1016\/j.future.2024.02.023","volume":"155","author":"R Gonz\u00e1lez-Sendino","year":"2024","unstructured":"Gonz\u00e1lez-Sendino, R., Serrano, E., Bajo, J.: Mitigating bias in artificial intelligence: fair data generation via causal models for transparent and explainable decision-making. Futur. Gener. Comput. Syst. 155, 384\u2013401 (2024)","journal-title":"Futur. Gener. Comput. Syst."},{"key":"975_CR15","doi-asserted-by":"crossref","unstructured":"Long, L., et\u00a0al.: On llms-driven synthetic data generation, curation, and evaluation: a survey (2024). arXiv preprint arXiv:2406.15126","DOI":"10.18653\/v1\/2024.findings-acl.658"},{"key":"975_CR16","doi-asserted-by":"crossref","unstructured":"Splieth\u00f6ver, M., et\u00a0al.: Adaptive prompting: Ad-hoc prompt composition for social bias detection (2025). arXiv preprint arXiv:2502.06487","DOI":"10.18653\/v1\/2025.naacl-long.122"},{"issue":"2","key":"975_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.im.2025.104103","volume":"62","author":"X Wei","year":"2025","unstructured":"Wei, X., Kumar, N., Zhang, H.: Addressing bias in generative ai: challenges and research opportunities in information management. Inf. Manag. 62(2), 104103 (2025)","journal-title":"Inf. Manag."},{"key":"975_CR18","unstructured":"Chen, B., Zhang, Z., Langren\u00e9, N., Zhu, S.: Unleashing the potential of prompt engineering in large language models: a comprehensive review. Patterns (2023)"},{"key":"975_CR19","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1007\/s10462-024-10896-y","volume":"57","author":"Z Lin","year":"2024","unstructured":"Lin, Z., et al.: Towards trustworthy llms: a review on debiasing and dehallucinating in large language models. Artif. Intell. Rev. 57, 243 (2024)","journal-title":"Artif. Intell. Rev."},{"key":"975_CR20","unstructured":"Li, M., et\u00a0al.: Understanding and mitigating the bias inheritance in llm-based data augmentation on downstream tasks (2025). arXiv preprint arXiv:2502.04419"},{"key":"975_CR21","unstructured":"Yang, Z., et al.: Xlnet: generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)"},{"key":"975_CR22","doi-asserted-by":"publisher","DOI":"10.1111\/exsy.13701","volume":"41","author":"A Areshey","year":"2024","unstructured":"Areshey, A., Mathkour, H.: Exploring transformer models for sentiment classification: a comparison of bert, roberta, albert, distilbert, and xlnet. Expert. Syst. 41, e13701 (2024)","journal-title":"Expert. Syst."},{"issue":"3","key":"975_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s13198-025-02713-8","volume":"16","author":"A Chauhan","year":"2025","unstructured":"Chauhan, A., Mohana, R.: Combining transfer and ensemble learning models for image and text aspect-based sentiment analysis. Int. J. Syst. Assur. Eng. Manag. 16(3), 1\u201319 (2025)","journal-title":"Int. J. Syst. Assur. Eng. Manag."},{"key":"975_CR24","unstructured":"Liu, Y., et\u00a0al.: Roberta: a robustly optimized bert pretraining approach (2019). arXiv preprint arXiv:1907.11692"},{"key":"975_CR25","unstructured":"Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter (2019). arXiv preprint arXiv:1910.01108"},{"key":"975_CR26","unstructured":"Multimodal Tokenizer. [Available Online] https:\/\/huggingface.co\/docs\/transformers\/main_classes\/tokenizer. Accessed 22 May 2025"},{"key":"975_CR27","unstructured":"Transformers. [Available Online] https:\/\/huggingface.co\/docs\/transformers\/en\/index. Accessed 22 May 2025"},{"key":"975_CR28","unstructured":"Babe - media bias dataset: Annotations by experts. [Available Online] https:\/\/www.kaggle.com\/datasets\/timospinde\/babe-media-bias-annotations-by-experts. Accessed 29 May 2024"},{"key":"975_CR29","doi-asserted-by":"crossref","unstructured":"Spinde, T., et\u00a0al.: Neural media bias detection using distant supervision with babe\u2013bias annotations by experts (2022). arXiv preprint arXiv:2209.14557","DOI":"10.18653\/v1\/2021.findings-emnlp.101"},{"key":"975_CR30","unstructured":"Nagar, K.L., et\u00a0al.: Bias detection using textual representation of multimedia contents 408\u2013416 (2023)"},{"key":"975_CR31","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0301276","volume":"19","author":"S Steinert","year":"2024","unstructured":"Steinert, S., et al.: A refined approach for evaluating small datasets via binary classification using machine learning. PLoS ONE 19, e0301276 (2024)","journal-title":"PLoS ONE"}],"container-title":["AI and Ethics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43681-025-00975-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s43681-025-00975-3","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43681-025-00975-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T10:26:37Z","timestamp":1774261597000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s43681-025-00975-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,28]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,2]]}},"alternative-id":["975"],"URL":"https:\/\/doi.org\/10.1007\/s43681-025-00975-3","relation":{},"ISSN":["2730-5953","2730-5961"],"issn-type":[{"value":"2730-5953","type":"print"},{"value":"2730-5961","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,28]]},"assertion":[{"value":"4 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 December 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study used only publicly available datasets (BABE corpus, Hugging Face documentation comments) and synthetically generated text, with no personal or sensitive information involved. Human annotation was carried out by independent labellers with informed consent, and no identifying data were collected. The research adheres to GDPR requirements and the EU Ethics Guidelines for Trustworthy AI.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"The authors declare no conflict of interest.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"118"}}