{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T23:23:30Z","timestamp":1781306610409,"version":"3.54.1"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T00:00:00Z","timestamp":1709596800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T00:00:00Z","timestamp":1709596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Karlsruher Institut f\u00fcr Technologie (KIT)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Bus Inf Syst Eng"],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm that emphasizes the importance of enhancing data systematically and at scale to \u00a0build effective and efficient AI-based systems. The novel paradigm complements recent model-centric AI, which focuses on improving the performance of AI-based systems based on changes in the model using a fixed set of data. The objective of this article is to introduce practitioners and researchers from the field of Business and Information Systems Engineering (BISE) to data-centric AI. The paper defines relevant terms, provides key characteristics to contrast the paradigm of data-centric AI with the model-centric one, and introduces a framework to illustrate the different dimensions of data-centric AI. In addition, an overview of available tools for data-centric AI is presented and this novel paradigm is differenciated from related concepts. Finally, the paper discusses the longer-term implications of data-centric AI for the BISE community.<\/jats:p>","DOI":"10.1007\/s12599-024-00857-8","type":"journal-article","created":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T09:02:38Z","timestamp":1709629358000},"page":"507-515","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":72,"title":["Data-Centric Artificial Intelligence"],"prefix":"10.1007","volume":"66","author":[{"given":"Johannes","family":"Jakubik","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"V\u00f6ssing","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Niklas","family":"K\u00fchl","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jannis","family":"Walk","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gerhard","family":"Satzger","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,3,5]]},"reference":[{"issue":"2","key":"857_CR1","first-page":"1","volume":"17","author":"A Abbasi","year":"2016","unstructured":"Abbasi A, Sarker S, Chiang RH (2016) Big data research in information systems: toward an inclusive research agenda. J Assoc Inf Syst 17(2):1\u201332","journal-title":"J Assoc Inf Syst"},{"key":"857_CR2","volume-title":"Data profiling","author":"Z Abedjan","year":"2022","unstructured":"Abedjan Z, Golab L, Naumann F, Papenbrock T (2022) Data profiling. Springer, Heidelberg"},{"key":"857_CR3","volume-title":"Introduction to machine learning","author":"E Alpaydin","year":"2020","unstructured":"Alpaydin E (2020) Introduction to machine learning. MIT Press, Cambridge"},{"key":"857_CR4","unstructured":"Amrani H (2021) Model-centric and data-centric AI for personalization in human activity recognition. Ph.D. thesis, University of Milano-Bicocca"},{"key":"857_CR5","doi-asserted-by":"crossref","unstructured":"Aramburu MJ, Berlanga R, Lanza-Cruz I (2023) A data quality multidimensional model for social media analysis. Bus Inf Syst Eng 1\u201323","DOI":"10.1007\/s12599-023-00840-9"},{"issue":"113","key":"857_CR6","first-page":"492","volume":"150","author":"B Baesens","year":"2021","unstructured":"Baesens B, H\u00f6ppner S, Verdonck T (2021) Data engineering for fraud detection. Decis Support Syst 150(113):492","journal-title":"Decis Support Syst"},{"key":"857_CR7","doi-asserted-by":"crossref","unstructured":"Baier L, Kellner V, K\u00fchl N, Satzger G (2021) Switching scheme: a novel approach for handling incremental concept drift in real-world data sets. In: Proceedings of the Hawaii international conference on systems sciences, pp 990\u20131000","DOI":"10.24251\/HICSS.2021.120"},{"key":"857_CR8","unstructured":"Biewald L (2020) Experiment tracking with weights and biases. https:\/\/www.wandb.com\/. Accessed 02 Dec 2022"},{"key":"857_CR9","unstructured":"Budach L, Feuerpfeil M, Ihde N, Nathansen A, Noack N, Patzlaff H, Naumann F, Harmouch H (2022) The effects of data quality on machine learning performance. arXiv:2207.14529"},{"issue":"4","key":"857_CR10","doi-asserted-by":"publisher","first-page":"1165","DOI":"10.2307\/41703503","volume":"36","author":"H Chen","year":"2012","unstructured":"Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165\u20131188","journal-title":"MIS Q"},{"key":"857_CR11","doi-asserted-by":"crossref","unstructured":"Deng Y, Lyu F, Ren J, Chen YC, Yang P, Zhou Y, Zhang Y (2021) Fair: quality-aware federated learning with precise user incentive and model aggregation. In: Proceedings of IEEE conference on computer communications. IEEE, pp 1\u201310","DOI":"10.1109\/INFOCOM42981.2021.9488743"},{"key":"857_CR12","doi-asserted-by":"crossref","unstructured":"Fassnacht M, Benz C, Heinz D, Leimstoll J, Satzger G (2023) Barriers to data sharing among private sector organizations. In: Proceedings of the Hawaii international conference on system sciences (HICSS), pp 3695\u20133705","DOI":"10.24251\/HICSS.2023.453"},{"key":"857_CR13","doi-asserted-by":"crossref","unstructured":"Fiedler N, Bestmann M, Hendrich N (2019) Imagetagger: an open source online platform for collaborative image labeling. In: Proceedings of RoboCup 2018: robot world cup XXII. Springer, Heidelberg, pp 162\u2013169","DOI":"10.1007\/978-3-030-27544-0_13"},{"issue":"11","key":"857_CR14","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1145\/3448247","volume":"64","author":"C Gr\u00f6ger","year":"2021","unstructured":"Gr\u00f6ger C (2021) There is no AI without data. Commun ACM 64(11):98\u2013108","journal-title":"Commun ACM"},{"issue":"2","key":"857_CR15","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1080\/07421222.2018.1451951","volume":"35","author":"V Grover","year":"2018","unstructured":"Grover V, Chiang RH, Liang TP, Zhang D (2018) Creating strategic business value from big data analytics: a research framework. J Manag Inf Syst 35(2):388\u2013423","journal-title":"J Manag Inf Syst"},{"issue":"1","key":"857_CR16","first-page":"1","volume":"10","author":"V Gudivada","year":"2017","unstructured":"Gudivada V, Apon A, Ding J (2017) Data quality considerations for big data and machine learning: going beyond data cleaning and transformations. Int J Adv Softw 10(1):1\u201320","journal-title":"Int J Adv Softw"},{"key":"857_CR17","first-page":"171","volume":"3","author":"P Hemmer","year":"2022","unstructured":"Hemmer P, K\u00fchl N, Sch\u00f6ffer J (2022) DEAL: deep evidential active learning for image classification. Deep Learn Appl 3:171\u2013192","journal-title":"Deep Learn Appl"},{"key":"857_CR18","doi-asserted-by":"crossref","unstructured":"Hirt R, K\u00fchl N, Martin D, Satzger G (2023) Enabling inter-organizational analytics in business networks through meta machine learning. Inf Technol Manag (forthcoming)","DOI":"10.1007\/s10799-023-00399-7"},{"issue":"1","key":"857_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s12525-023-00677-w","volume":"33","author":"J Holstein","year":"2023","unstructured":"Holstein J, Schemmer M, Jakubik J, V\u00f6ssing M, Satzger G (2023) Sanitizing data for analysis: designing systems for data understanding. Electron Market 33(1):1\u201318","journal-title":"Electron Market"},{"issue":"2","key":"857_CR20","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1007\/s40708-016-0042-6","volume":"3","author":"A Holzinger","year":"2016","unstructured":"Holzinger A (2016) Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform 3(2):119\u2013131","journal-title":"Brain Inform"},{"issue":"8","key":"857_CR21","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3571724","volume":"66","author":"MH Jarrahi","year":"2023","unstructured":"Jarrahi MH, Memariani A, Guha S (2023) The principles of data-centric AI. Commun ACM 66(8):84\u201392","journal-title":"Commun ACM"},{"issue":"6245","key":"857_CR22","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1126\/science.aaa8415","volume":"349","author":"MI Jordan","year":"2015","unstructured":"Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255\u2013260","journal-title":"Science"},{"key":"857_CR23","unstructured":"Kaggle (2023) Kaggle competitions. https:\/\/www.kaggle.com\/competitions. Accessed 05 Jul 2023"},{"issue":"4","key":"857_CR24","doi-asserted-by":"publisher","first-page":"2235","DOI":"10.1007\/s12525-022-00598-0","volume":"32","author":"N K\u00fchl","year":"2022","unstructured":"K\u00fchl N, Schemmer M, Goutier M, Satzger G (2022) Artificial intelligence and machine learning. Electron Market 32(4):2235\u20132244","journal-title":"Electron Market"},{"issue":"3","key":"857_CR25","first-page":"735","volume":"21","author":"C Legner","year":"2020","unstructured":"Legner C, Pentek T, Otto B (2020) Accumulating design knowledge with reference models: insights from 12 years\u2019 research into data management. J Assoc Inf Syst 21(3):735\u2013770","journal-title":"J Assoc Inf Syst"},{"key":"857_CR26","unstructured":"Lin Q, Ye G, Wang J, Liu H (2022) RoboFlow: a data-centric workflow management system for developing AI-enhanced robots. In: Proceedings of the conference on robot learning. PMLR, pp 1789\u20131794"},{"key":"857_CR27","doi-asserted-by":"crossref","unstructured":"McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426","DOI":"10.21105\/joss.00861"},{"key":"857_CR28","unstructured":"Ng A, Aroyo L, Coleman C, Diamos G, Reddi V, Vanschoren J, Wu C, S Z (2021) Data-centric AI workshop. https:\/\/datacentricai.org\/neurips21\/. Accessed 12 Feb 2022"},{"key":"857_CR29","unstructured":"Ng A, Laird D, He L (2022) Data-centric AI competition. https:\/\/https-deeplearning-ai.github.io\/data-centriccomp\/. Accessed 04 Dec 2022"},{"key":"857_CR30","unstructured":"Northcutt CG, Athalye A, Mueller J (2021) Pervasive label errors in test sets destabilize machine learning benchmarks. arXiv:2103.14749"},{"issue":"1","key":"857_CR31","first-page":"45","volume":"29","author":"B Otto","year":"2011","unstructured":"Otto B (2011) Organizing data governance: findings from the telecommunications industry and consequences for large service providers. Commun Assoc Inf Syst 29(1):45\u201366","journal-title":"Commun Assoc Inf Syst"},{"issue":"4","key":"857_CR32","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1007\/s12525-019-00362-x","volume":"29","author":"B Otto","year":"2019","unstructured":"Otto B, Jarke M (2019) Designing a multi-sided data platform: findings from the international data spaces case. Electron Market 29(4):561\u2013580","journal-title":"Electron Market"},{"issue":"1","key":"857_CR33","first-page":"139","volume":"23","author":"E Parmiggiani","year":"2022","unstructured":"Parmiggiani E, \u00d8sterlie T, Almklov PG (2022) In the backrooms of data science. J Assoc Inf Syst 23(1):139\u2013164","journal-title":"J Assoc Inf Syst"},{"issue":"1","key":"857_CR34","first-page":"11","volume":"44","author":"C Renggli","year":"2021","unstructured":"Renggli C, Rimanic L, G\u00fcrel NM, Karlas B, Wu W, Zhang C (2021) A data quality-driven view of MLOps. IEEE Data Eng Bull 44(1):11\u201323","journal-title":"IEEE Data Eng Bull"},{"key":"857_CR35","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of the international conference on medical image computing and computer-assisted intervention, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"857_CR36","doi-asserted-by":"crossref","unstructured":"Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM (2021) \u201cEveryone wants to do the model work, not the data work\u201d: data cascades in high-stakes AI. In: Proceedings of the CHI conference on human factors in computing systems, pp 1\u201315","DOI":"10.1145\/3411764.3445518"},{"issue":"3","key":"857_CR37","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1080\/10580530.2022.2085825","volume":"40","author":"J Schneider","year":"2023","unstructured":"Schneider J, Abraham R, Meske C, Vom Brocke J (2023) Artificial intelligence governance for businesses. Inf Syst Manag 40(3):229\u2013249","journal-title":"Inf Syst Manag"},{"issue":"4","key":"857_CR38","first-page":"13","volume":"5","author":"C Shearer","year":"2000","unstructured":"Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13\u201322","journal-title":"J Data Warehous"},{"key":"857_CR39","unstructured":"Strickland E (2022) Andrew Ng: unbiggen AI. https:\/\/spectrum.ieee.org\/andrew-ng-data-centric-ai. Accessed 12 Dec 2022"},{"issue":"2","key":"857_CR40","first-page":"521","volume":"23","author":"P Toreini","year":"2022","unstructured":"Toreini P, Langner M, Maedche A, Morana S, Vogel T (2022) Designing attentive information dashboards. J Assoc Inf Syst 23(2):521\u2013552","journal-title":"J Assoc Inf Syst"},{"key":"857_CR41","unstructured":"Turban E (2011) Decision support and business intelligence systems. Pearson Education India"},{"issue":"4","key":"857_CR42","doi-asserted-by":"publisher","first-page":"791","DOI":"10.1007\/s00778-022-00775-9","volume":"32","author":"SE Whang","year":"2023","unstructured":"Whang SE, Roh Y, Song H, Lee JG (2023) Data collection and quality challenges in deep learning: a data-centric AI perspective. VLDB J 32(4):791\u2013813","journal-title":"VLDB J"},{"key":"857_CR43","doi-asserted-by":"publisher","first-page":"575","DOI":"10.1007\/s12599-019-00608-0","volume":"61","author":"R Zhang","year":"2019","unstructured":"Zhang R, Indulska M, Sadiq S (2019) Discovering data quality problems: the case of repurposed data. Bus Inf Syst Eng 61:575\u2013593","journal-title":"Bus Inf Syst Eng"}],"container-title":["Business &amp; Information Systems Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12599-024-00857-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12599-024-00857-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12599-024-00857-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,31]],"date-time":"2024-08-31T09:24:21Z","timestamp":1725096261000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12599-024-00857-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,5]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["857"],"URL":"https:\/\/doi.org\/10.1007\/s12599-024-00857-8","relation":{},"ISSN":["2363-7005","1867-0202"],"issn-type":[{"value":"2363-7005","type":"print"},{"value":"1867-0202","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,5]]},"assertion":[{"value":"5 March 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}