{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T19:54:38Z","timestamp":1765828478551,"version":"3.46.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>Collaboration is essential for scientific research. This is the foundation behind Open Science and the FAIR Principles, aimed at standardizing the development of scientific data sharing repositories. However, developing FAIR-compliant repositories can be a challenge, mostly due to managing a huge volume and variety of research data and metadata generated at a high velocity. We address these challenges by proposing BigFAIR, a novel FAIR-compliant architecture capable of managing this type of information in a massive scale. BigFAIR leverages existing local repositories, using separate infrastructures to handle scientific data and metadata. With this separation, it can reduce development and maintenance efforts, support data ownership, and increase flexibility. We define pipelines to demonstrate how BigFAIR answers queries in different contexts, introduce guidelines to support its instantiation, and propose a generic metadata warehouse model to support analytical query processing. We also demonstrate the applicability of BigFAIR through a case study in the context of two real-world datasets, detailing different types of queries and highlighting their importance to big data analytics.<\/jats:p>","DOI":"10.1145\/3774755","type":"journal-article","created":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T11:28:22Z","timestamp":1762428502000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["The BigFAIR Architecture: Enabling Big Data Analytics in FAIR-compliant Repositories"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3566-0415","authenticated-orcid":false,"given":"Jo\u00e3o Pedro de Carvalho","family":"Castro","sequence":"first","affiliation":[{"name":"Department of Computer Science, USP S\u00e3o Carlos","place":["S\u00e3o Carlos, Brazil"]},{"name":"Information Technology Board, Federal University of Minas Gerais","place":["S\u00e3o Carlos, Brazil"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2155-8076","authenticated-orcid":false,"given":"Lucas","family":"Medeiros Fran\u00e7a romero","sequence":"additional","affiliation":[{"name":"Department of Computer Science, USP S\u00e3o Carlos","place":["S\u00e3o Carlos, Brazil"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8297-9894","authenticated-orcid":false,"given":"Anderson","family":"Chaves Carniel","sequence":"additional","affiliation":[{"name":"Department of Computer Science, UFSCar","place":["S\u00e3o Carlos, Brazil"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7618-1405","authenticated-orcid":false,"given":"Cristina","family":"Dutra Aguiar","sequence":"additional","affiliation":[{"name":"Department of Computer Science, USP S\u00e3o Carlos","place":["S\u00e3o Carlos, Brazil"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-018-0149-0"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10050589"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1177\/18758789251324008"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1017\/9781316941386"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.5220\/0011066800003179"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/UBMK.2019.8907155"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.5220\/0011045500003179"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-15743-1_6"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-94571-7_7"},{"key":"e_1_3_1_12_2","unstructured":"CDC. 2022. COVID-19 Case Surveillance Public Use Data. Retrieved July 1 2022 from https:\/\/data.cdc.gov\/Case-Surveillance\/COVID-19-Case-Surveillance-Public-Use-Data\/vbim-akqf"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11036-013-0489-0"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.48786\/edbt.2025.30"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.3233\/SHTI210810"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9006051"},{"key":"e_1_3_1_17_2","unstructured":"Dublin Core. 2025. The Dublin Core Metadata Initiative. Retrieved May 25 2025 from https:\/\/www.dublincore.org\/"},{"key":"e_1_3_1_18_2","volume-title":"Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS\u201916)","author":"Ehrlinger Lisa","year":"2016","unstructured":"Lisa Ehrlinger and Wolfram W\u00f6\u00df. 2016. Towards a definition of knowledge graphs. In Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS\u201916)."},{"key":"e_1_3_1_19_2","unstructured":"FAPESP. 2020. COVID-19 Data Sharing\/BR. Retrieved July 1 2022 from https:\/\/repositoriodatasharingfapesp.uspdigital.usp.br"},{"key":"e_1_3_1_20_2","volume-title":"Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR\u201915)","author":"Fernandez Raul Castro","year":"2015","unstructured":"Raul Castro Fernandez, Peter R. Pietzuch, Jay Kreps, Neha Narkhede, Jun Rao, Joel Koshy, Dong Lin, Chris Riccomini, and Guozhang Wang. 2015. Liquid: Unifying nearline and offline big data integration. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR\u201915)."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1186\/s13023-021-02004-y"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1162\/dint_r_00024"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1162\/dint_a_00028"},{"key":"e_1_3_1_24_2","volume-title":"The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling","author":"Kimball Ralph","year":"2013","unstructured":"Ralph Kimball and Margy Ross. 2013. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. John Wiley & Sons."},{"key":"e_1_3_1_25_2","unstructured":"Jay Kreps. 2014. Questioning the Lambda Architecture. Retrieved May 25 2025 from https:\/\/www.oreilly.com\/radar\/questioning-the-lambda-architecture\/"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1162\/dint_a_00034"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8258548"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.3390\/s23010468"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijinfomgt.2019.04.003"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ESEM.2015.7321184"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2014.10.016"},{"key":"e_1_3_1_32_2","unstructured":"MatPlotLib. 2025. Visualization with Python. Retrieved May 25 2025 from https:\/\/matplotlib.org\/"},{"key":"e_1_3_1_33_2","volume-title":"IAP Input Into the UNESCO Open Science Recommendation","author":"Medeiros Claudia Bauzer","year":"2020","unstructured":"Claudia Bauzer Medeiros, Barth\u00e9l\u00e9my Rapha\u00ebl Darboux, Juan Armando S\u00e1nchez, Henrikki Tenkanen, Maria Luisa Meneghetti, Zabta Khan Shinwari, Jaime C Montoya, Ina Smith, Alexa T McCray, and Koen Vermeir. 2020. IAP Input Into the UNESCO Open Science Recommendation. Retrieved May 25, 2025 from https:\/\/www.interacademies.org\/sites\/default\/files\/2020-07\/Open_Science_0.pdf"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2017.06.001"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23798-0_22"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.17226\/25116"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.34133\/2019\/1671403"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.24251\/HICSS.2017.719"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-020-00608-7"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01619-5"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831365"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1055\/s-0040-1713684"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/5254.912382"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-54655-6"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1080\/08874417.2022.2089775"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btac362"},{"key":"e_1_3_1_47_2","unstructured":"W3C. 2023. Data Catalog Vocabulary (DCAT) - Version 3. Retrieved May 27 2025 from https:\/\/www.w3.org\/TR\/vocab-dcat-3\/"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.5555\/2717065"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.18"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-019-0184-5"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.48786\/edbt.2025.75"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626203.3670558"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3774755","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T14:26:03Z","timestamp":1764858363000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3774755"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3774755"],"URL":"https:\/\/doi.org\/10.1145\/3774755","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2025,12,4]]},"assertion":[{"value":"2025-01-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}