{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T10:17:25Z","timestamp":1774433845549,"version":"3.50.1"},"posted":{"date-parts":[[2026]]},"group-title":"SSRN","reference-count":69,"publisher":"Elsevier BV","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>Context: The increasing digitalization of the healthcare sector generates massive volumes of heterogeneous data, a phenomenon known as Big Data, creating unprecedented opportunities to optimize processes such as the auditing of public accounts through Artificial Intelligence (AI). However, implementing effective AI systems requires a robust, scalable, and secure Big Data infrastructure, whose cost and complexity can be prohibitive. Objective: This article presents ALIAS (Architecture for Large-scale Intelligent Auditing of Healthcare Systems), a detailed and replicable technical blueprint for building a Big Data and AI platform. The goal is to provide a reference model based entirely on open-source technologies to democratize access to advanced analytics in the healthcare sector. Method: The architecture was developed and its feasibility investigated through a practical case study. The process included gathering requirements for healthcare auditing, selecting open-source technologies, and iterative implementation of the platform. Its effectiveness was evaluated through a real-world application. Results: The result is a complete blueprint implemented and tested at the Laboratory for Technological Innovation in Health (LAIS). The architecture comprises layers ranging from distributed storage and large-scale data processing to pipeline orchestration, data governance, and lifecycle management of AI models (MLOps). Its effectiveness is illustrated through a case study on intelligent classification of invoice items. Replicability is ensured through a public code repository with declarative manifests and lessons learned. Conclusion: ALIAS presents a robust and cost-effective solution for healthcare institutions to implement AI platforms, promoting technological sovereignty, transparency, and improved efficiency and quality of services.<\/jats:p>","DOI":"10.2139\/ssrn.6467244","type":"posted-content","created":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T07:37:42Z","timestamp":1774424262000},"source":"Crossref","is-referenced-by-count":0,"title":["The ALIAS Blueprint: A Replicable Architecture for Large-scale Intelligent Auditing of Healthcare Systems"],"prefix":"10.2139","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4806-1315","authenticated-orcid":true,"given":"Helder","family":"Prado Santos","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4811-1477","authenticated-orcid":true,"given":"Methanias","family":"Cola\u00e7o J\u00fanior","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3160-3384","authenticated-orcid":true,"given":"Raphael  Silva","family":"Fontes","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0345-6150","authenticated-orcid":true,"given":"Marianne  Dantas Farias","family":"Vieira","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9804-1964","authenticated-orcid":true,"given":"Gleyson Jos\u00e9 Pinheiro  Caldeir","family":"Silva","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6467-6409","authenticated-orcid":true,"given":"Mario  Luiz da Gama Rosa","family":"dos Reis","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5283-9228","authenticated-orcid":true,"given":"Guilherme  Medeiros","family":"Machado","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2690-1563","authenticated-orcid":true,"given":"Luiz","family":"Affonso Guedes","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6612-0580","authenticated-orcid":true,"given":"Cl\u00e1udia  Miranda","family":"Veloso","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7506-1401","authenticated-orcid":true,"given":"Susana","family":"Henriques","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9130-7723","authenticated-orcid":true,"given":"Jo\u00e3o  Paulo Q.","family":"Santos","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9216-8593","authenticated-orcid":true,"given":"Ricardo","family":"Valentim","sequence":"additional","affiliation":[]}],"member":"78","reference":[{"issue":"1","key":"ref1","first-page":"1","article-title":"Big data analytics in healthcare","volume":"26","author":"A Belle","year":"2015","journal-title":"Bio-medical materials and engineering"},{"key":"ref2","article-title":"Applications of artificial intelligence for auditing and classification of incongruent descriptions in public procurement","author":"W F Gomes","journal-title":"XVIII Brazilian Symposium on Information Systems, SBSI, ACM"},{"issue":"1","key":"ref3","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1038\/s41591-018-0300-7","article-title":"High-performance medicine: the convergence of human and artificial intelligence","volume":"25","author":"E J Topol","year":"2019","journal-title":"Nature medicine"},{"key":"ref4","author":"R Carmo De Souza Cruz","year":"2022","journal-title":"An\ufffdlise do impacto do banco de pre\ufffdos em sa\ufffdde (bps) para redu\ufffd\ufffdo das assimetrias de informa\ufffd\ufffdo dos pre\ufffdos de compras de \ufffdrteses, pr\ufffdtese e materiais especiais (opme), JMPHC | Journal of Management ; Primary Health Care |"},{"key":"ref5","first-page":"116","volume":"36","author":"L Ribeiro","year":"2018","journal-title":"Reconhecimento de entidades nomeadas em itens de produto da nota fiscal eletr\ufffdnica"},{"key":"ref6","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"Mapreduce: simplified data processing on large clusters","volume":"51","author":"J Dean","year":"2008","journal-title":"Communications of the ACM"},{"key":"ref7","doi-asserted-by":"crossref","DOI":"10.1016\/j.ijmedinf.2020.104203","article-title":"Scalpel3: A scalable open-source library for healthcare claims databases","volume":"141","author":"E Bacry","year":"2020","journal-title":"International Journal of Medical Informatics"},{"key":"ref8","doi-asserted-by":"crossref","first-page":"31756","DOI":"10.1109\/ACCESS.2023.3262138","article-title":"Machine learning operations (mlops): A survey and architecture","volume":"11","author":"D Kreuzberger","year":"2023","journal-title":"IEEE Access"},{"key":"ref9","author":"F Rouzbeh","journal-title":"Collaborative cloud computing framework for health data with open source technologies"},{"issue":"C","key":"ref10","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/j.future.2022.12.007","article-title":"A big data platform exploiting auditable tokenization to promote good practices inside local energy communities","volume":"141","author":"L Gagliardelli","year":"2023","journal-title":"Future Gener. Comput. Syst"},{"key":"ref11","author":"Y Fu","year":"2021","journal-title":"Real-time data infrastructure at uber"},{"key":"ref12","author":"S Jacobs","year":"2020","journal-title":"BAD to the bone: Big active data at its core"},{"issue":"2","key":"ref13","doi-asserted-by":"crossref","DOI":"10.1145\/2834118","article-title":"An efficient multidimensional big data fusion approach in machine-to-machine communication","volume":"15","author":"A Ahmad","year":"2016","journal-title":"ACM Trans. Embed. Comput. Syst"},{"issue":"1","key":"ref14","doi-asserted-by":"crossref","DOI":"10.1038\/s41746-020-00323-1","article-title":"The future of digital health with federated learning","volume":"3","author":"N Rieke","year":"2020","journal-title":"NPJ digital medicine"},{"issue":"10","key":"ref15","article-title":"Evaluating service-oriented and microservice architecture patterns to deploy eHealth applications in cloud computing environment","volume":"11","author":"H Calder\ufffdn-G\ufffdmez","year":"2021","journal-title":"Applied Sciences"},{"key":"ref16","first-page":"171","volume":"19","author":"M Chen","year":"2014","journal-title":"Big data: A survey"},{"key":"ref17","author":"K Suse","year":"2024"},{"key":"ref18","author":"Rancher Suse","year":"2024"},{"key":"ref19","year":"2024","journal-title":"Docker swarm doc"},{"key":"ref20","author":"Inc Github","year":"2024"},{"key":"ref21","year":"2024","journal-title":"The Gitea Authors, Gitea doc"},{"key":"ref22","year":"2024","journal-title":"Gluster Community, Glusterfs doc"},{"key":"ref23","year":"2024","journal-title":"OpenSearch Project, Opensearch doc"},{"key":"ref24","first-page":"307","article-title":"Ceph: A scalable, high-performance distributed file system","author":"S A Weil","year":"2006","journal-title":"Proceedings of the 7th symposium on Operating systems design and implementation"},{"key":"ref25","first-page":"1","author":"K Shvachko","year":"2010","journal-title":"The hadoop distributed file system, in: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST)"},{"key":"ref26","author":"Elasticsearch Elastic","year":"2024"},{"key":"ref27","year":"2024","journal-title":"The Apache Software Foundation, Apache solr doc"},{"key":"ref28","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/2934664","article-title":"Apache spark: a unified engine for big data processing","volume":"59","author":"M Zaharia","year":"2016","journal-title":"Communications of the ACM"},{"key":"ref29","year":"2024","journal-title":"Dask Development Team, Dask doc"},{"key":"ref30","article-title":"Apache flink: Stream and batch processing in a single engine","author":"P Carbone","year":"2015","journal-title":"Bulletin of the IEEE Computer Society Technical Committee on Data Engineering"},{"key":"ref31","first-page":"561","article-title":"Ray: A distributed framework for emerging ai applications","volume":"18","author":"P Moritz","year":"2018","journal-title":"13th USENIX Symposium on Operating Systems Design and Implementation"},{"key":"ref32","article-title":"Apache Pulsar Documentation","journal-title":"The Apache Software Foundation, vers\ufffdo 4.0 LTS (2025)"},{"key":"ref33","year":"2024","journal-title":"The Apache Software Foundation, Apache airflow doc"},{"key":"ref34","year":"2024","journal-title":"Argo workflows doc"},{"key":"ref35","author":"Dagster Elementl","year":"2024"},{"key":"ref36","author":"Openmetadata Openmetadata","year":"2024"},{"key":"ref37","author":"Amundsen Project","year":"2024"},{"key":"ref38","year":"2024","journal-title":"DataHub Project, Datahub doc"},{"key":"ref39","year":"2024","journal-title":"The Apache Software Foundation, Apache atlas doc"},{"key":"ref40","first-page":"39","volume":"41","author":"M Zaharia","year":"2018","journal-title":"Accelerating the machine learning lifecycle with mlflow"},{"key":"ref41","year":"2024","journal-title":"The Kubeflow Authors, Kubeflow doc"},{"key":"ref42","year":"2024","journal-title":"Weights & Biases, Weights & biases doc"},{"key":"ref43","author":"Metaflow Netflix","year":"2024"},{"key":"ref44","author":"Project Jupyter","year":"2024"},{"key":"ref45","author":"Posit","year":"2024","journal-title":"Rstudio server doc"},{"key":"ref46","year":"2024","journal-title":"The Apache Software Foundation, Apache zeppelin doc"},{"key":"ref47","first-page":"1842","author":"D Sethi","year":"2019","journal-title":"IEEE 35th International Conference on Data Engineering (ICDE)"},{"key":"ref48","year":"2024","journal-title":"The Apache Software Foundation, Apache pinot doc"},{"key":"ref49","year":"2024","journal-title":"The Apache Software Foundation, Apache drill doc"},{"key":"ref50","author":"Apache Druid","year":"2024","journal-title":"Apache druid doc"},{"key":"ref51","author":"Apache Superset","year":"2024","journal-title":"Apache superset doc"},{"key":"ref52","author":"Metabase Metabase","year":"2024"},{"key":"ref53","author":"Redash Redash","year":"2024"},{"key":"ref54","author":"Grafana Labs","year":"2024"},{"key":"ref55","author":"Starrocks Project","year":"2025"},{"issue":"3","key":"ref56","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/MCC.2014.51","article-title":"Containers and cloud: From lxc to docker to kubernetes","volume":"1","author":"D Bernstein","year":"2014","journal-title":"IEEE Cloud Computing"},{"key":"ref57","first-page":"1","article-title":"Large-scale cluster management at google with borg","author":"A Verma","year":"2015","journal-title":"Proceedings of the tenth European conference on computer systems"},{"key":"ref58","first-page":"687","article-title":"Big data, fast data and data lake concepts","volume":"80","author":"N Miloslavskaya","year":"2016","journal-title":"Procedia Computer Science"},{"key":"ref59","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4842-2424-3","author":"D Vohra","year":"2016","journal-title":"Apache HBase Primer: A-Z of HBase concepts and features"},{"issue":"2","key":"ref60","doi-asserted-by":"crossref","first-page":"148","DOI":"10.14778\/3626292.3626298","article-title":"An Empirical Evaluation of Columnar Storage Formats","volume":"17","author":"X Zeng","year":"2023","journal-title":"Proceedings of the VLDB Endowment"},{"key":"ref61","author":"B Saha","year":"2021","journal-title":"Making data discovery and data governance easy with openmetadata"},{"key":"ref62","doi-asserted-by":"crossref","DOI":"10.5753\/sbcas_estendido.2023.231487","article-title":"Prova de conceito de um classificador de opmes em notas fiscais","author":"W Gomes","year":"2023","journal-title":"Anais Estendidos do XXIII Simp\ufffdsio Brasileiro de Computa\ufffd\ufffdo Aplicada \ufffd Sa\ufffdde (SBCAS 2023)"},{"key":"ref63","author":"W H Inmon","year":"2016","journal-title":"Data lake architecture: designing the data lake and avoiding the garbage dump"},{"key":"ref64","author":"K Morris","year":"2016","journal-title":"Infrastructure as code: managing servers in the cloud"},{"issue":"10","key":"ref65","first-page":"64","article-title":"Challenges in the design, deployment, and operation of large-scale distributed systems","volume":"64","author":"S Balasubramanian","year":"2021","journal-title":"Communications of the ACM"},{"issue":"1","key":"ref66","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1145\/2723872.2723882","article-title":"An introduction to docker for reproducible research","volume":"49","author":"C Boettiger","year":"2015","journal-title":"ACM SIGOPS Operating Systems Review"},{"key":"ref67","doi-asserted-by":"crossref","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"Pre-training of deep bidirectional transformers for language understanding","volume":"1","author":"J Devlin","year":"2019","journal-title":"Proceedings of the 2019 Conference of the North American Chapter"},{"key":"ref68","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"P Lewis","year":"2020"},{"key":"ref69","first-page":"250","article-title":"To the practitioners of mlops: A survey of tools for the machine learning lifecycle","author":"I Amlan","year":"2021","journal-title":"2021 IEEE International Conference on Cloud Engineering (IC2E)"}],"container-title":[],"original-title":[],"deposited":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T07:41:06Z","timestamp":1774424466000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ssrn.com\/abstract=6467244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026]]},"references-count":69,"URL":"https:\/\/doi.org\/10.2139\/ssrn.6467244","relation":{},"subject":[],"published":{"date-parts":[[2026]]},"subtype":"preprint"}}