{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T00:59:36Z","timestamp":1760057976994,"version":"build-2065373602"},"reference-count":65,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T00:00:00Z","timestamp":1741132800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The accelerating digitalization of the public and private sectors has made information technologies (IT) indispensable in modern life. As services shift to digital platforms and technologies expand across industries, the complexity of legal, regulatory, and technical requirement documentation is growing rapidly. This increase presents significant challenges in managing, gathering, and analyzing documents, as their dispersion across various repositories and formats hinders accessibility and efficient processing. This paper presents the development of an automated repository designed to streamline the collection, classification, and analysis of cybersecurity-related documents. By harnessing the capabilities of natural language processing (NLP) models\u2014specifically Generative Pre-Trained Transformer (GPT) technologies\u2014the system automates text ingestion, extraction, and summarization, providing users with visual tools and organized insights into large volumes of data. The repository facilitates the efficient management of evolving cybersecurity documentation, addressing issues of accessibility, complexity, and time constraints. This paper explores the potential applications of NLP in cybersecurity documentation management and highlights the advantages of integrating automated repositories equipped with visualization and search tools. By focusing on legal documents and technical guidelines from Portugal and the European Union (EU), this applied research seeks to enhance cybersecurity governance, streamline document retrieval, and deliver actionable insights to professionals. Ultimately, the goal is to develop a scalable, adaptable platform capable of extending beyond cybersecurity to serve other industries that rely on the effective management of complex documentation.<\/jats:p>","DOI":"10.3390\/info16030205","type":"journal-article","created":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T11:20:57Z","timestamp":1741173657000},"page":"205","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An Automated Repository for the Efficient Management of Complex Documentation"],"prefix":"10.3390","volume":"16","author":[{"given":"Jos\u00e9","family":"Frade","sequence":"first","affiliation":[{"name":"School of Technology and Management, Polytechnic University of Leiria, 2411-901 Leiria, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3448-6726","authenticated-orcid":false,"given":"M\u00e1rio","family":"Antunes","sequence":"additional","affiliation":[{"name":"School of Technology and Management, Polytechnic University of Leiria, 2411-901 Leiria, Portugal"},{"name":"INESC TEC, CRACS, 4200-465 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Floridi, L., and Cowls, J. (2022). A unified framework of five principles for AI in society. Machine Learning and the City: Applications in Architecture and Urban Design, Wiley.","DOI":"10.1002\/9781119815075.ch45"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"100167","DOI":"10.1016\/j.chbr.2022.100167","article-title":"Hacker types, motivations and strategies: A comprehensive framework","volume":"5","author":"Chng","year":"2022","journal-title":"Comput. Hum. Behav. Rep."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1016\/j.fmre.2021.08.009","article-title":"Managing privacy in the digital economy","volume":"1","author":"Wang","year":"2021","journal-title":"Fundam. Res."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1007\/s11747-022-00845-y","article-title":"Digital technologies: Tensions in privacy and data","volume":"50","author":"Quach","year":"2022","journal-title":"J. Acad. Mark. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.giq.2019.03.002","article-title":"Close encounters of the digital kind: A research agenda for the digitalization of public services","volume":"36","author":"Lindgren","year":"2019","journal-title":"Gov. Inf. Q."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Usman, M., Felderer, M., Unterkalmsteiner, M., Klotins, E., Mendez, D., and Al\u00e9groth, E. (2020, January 25\u201327). Compliance requirements in large-scale software development: An industrial case study. Proceedings of the Product-Focused Software Process Improvement: 21st International Conference, PROFES 2020, Turin, Italy. Proceedings 21.","DOI":"10.1007\/978-3-030-64148-1_24"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kuty\u0142owski, M., Lauks-Dutka, A., and Yung, M. (2020, January 14\u201318). Gdpr\u2013challenges for reconciling legal rules with technical reality. Proceedings of the Computer Security\u2013ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK. Proceedings, Part I 25.","DOI":"10.1007\/978-3-030-58951-6_36"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Filipovska, E., Mladenovska, A., Bajrami, M., Dobreva, J., Hillman, V., Lameski, P., and Zdravevski, E. (2024, January 8\u201311). Benchmarking OpenAI\u2019s APIs and other Large Language Models for Repeatable and Efficient Question Answering Across Multiple Documents. Proceedings of the 2024 19th Conference on Computer Science and Intelligence Systems (FedCSIS), Belgrade, Serbia.","DOI":"10.15439\/2024F3979"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"T\u00f6rnberg, P. (2023). How to use Large Language Models for Text Analysis. arXiv.","DOI":"10.4135\/9781529683707"},{"key":"ref_10","unstructured":"(2024, July 01). GPT-3.5 Turbo. Available online: https:\/\/platform.openai.com\/docs\/models#gpt-3-5-turbo."},{"key":"ref_11","unstructured":"(2024, July 01). GPT-4o. Available online: https:\/\/platform.openai.com\/docs\/models#gpt-4o."},{"key":"ref_12","first-page":"632","article-title":"A document classification using NLP and recurrent neural network","volume":"8","author":"Ghumade","year":"2019","journal-title":"Int. J. Eng. Adv. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1007\/s10916-024-02045-3","article-title":"The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives","volume":"48","author":"Cascella","year":"2024","journal-title":"J. Med. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Merchant, K., and Pande, Y. (2018, January 19\u201322). NLP Based Latent Semantic Analysis for Legal Text Summarization. Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India.","DOI":"10.1109\/ICACCI.2018.8554831"},{"key":"ref_15","unstructured":"Feyisa, D.W., Berihun, H., Zewdu, A., Najimoghadam, M., and Zare, M. (2024). The future of document indexing: GPT and Donut revolutionize table of content processing. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"100300","DOI":"10.1016\/j.dibe.2023.100300","article-title":"GPT models in construction industry: Opportunities, limitations, and a use case validation","volume":"17","author":"Saka","year":"2023","journal-title":"Dev. Built Environ."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Savelka, J., Agarwal, A., Bogart, C., Song, Y., and Sakr, M. (2023, January 7\u201312). Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?. Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. ACM, Turku, Finland. ITiCSE.","DOI":"10.1145\/3587102.3588792"},{"key":"ref_18","unstructured":"Liu, S., and Healey, C.G. (2023). Abstractive Summarization of Large Document Collections Using GPT. arXiv."},{"key":"ref_19","first-page":"7","article-title":"The Potential of GPT in Ottoman Studies: Computational Analysis of Evliya \u00c7elebi\u2019s Travelogue with NLP and Text Mining and Digital Edition with TEI","volume":"5","year":"2023","journal-title":"Culture"},{"key":"ref_20","unstructured":"Thippeswamy, B., Ramachandra, H., Rohan, S., Salam, R., and Pai, M. (2024, January 26\u201327). TextVerse: A Streamlit Web Application for Advanced Analysis of PDF and Image Files with and without Language Models. Proceedings of the 2024 Asia Pacific Conference on Innovation in Technology (APCIT), Mysuru, India."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/ICACCS60874.2024.10716988","article-title":"Mining Mate: A Chat Bot for Navigating Mining Regulations Using LLM Models","volume":"Volume 1","author":"Vallabhaneni","year":"2024","journal-title":"Proceedings of the 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS)"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lin, L.H.M., Ting, F.K., Chang, T.J., Wu, J.W., and Tsai, R.T.H. (2024, January 19\u201321). GPT4ESG: Streamlining Environment, Society, and Governance Analysis with Custom AI Models. Proceedings of the 2024 IEEE 4th International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB), New Taipei, Taiwan.","DOI":"10.1109\/ICEIB61477.2024.10602567"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Feng, Y. (2024, January 25\u201327). Semantic textual similarity analysis of clinical text in the era of llm. Proceedings of the 2024 IEEE Conference on Artificial Intelligence (CAI), Singapore.","DOI":"10.1109\/CAI59869.2024.00227"},{"key":"ref_24","first-page":"1","article-title":"Leveraging Large Language Models for Document Analysis and Decision-Making in AI Chatbots","volume":"2","author":"Ibrahim","year":"2025","journal-title":"Adv. Sci. Technol. J."},{"key":"ref_25","unstructured":"Litaina, T., Soularidis, A., Bouchouras, G., Kotis, K., and Kavakli, E. (2024, January 26\u201327). Towards llm-based semantic analysis of historical legal documents. Proceedings of the SemDH2024: First International Workshop of Semantic Digital Humanities, co-located with ESWC2024, Hersonissos, Greece."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bouzid, S., and Piron, L. (2024). Leveraging Generative AI in Short Document Indexing. Electronics, 13.","DOI":"10.3390\/electronics13173563"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Mao, Q., Dabrowski, A., Wei, F., Olson, E., Neary, R., Yang, J., Qin, H., and Huber-Fliflet, N. (2024, January 15\u201318). Comparative Analysis of LLM-Generated Event Timeline Summarization for Legal Investigations. Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA.","DOI":"10.1109\/BigData62323.2024.10826063"},{"key":"ref_28","unstructured":"Merilehto, J. (2024). From PDFs to Structured Data: Utilizing LLM Analysis in Sports Database Management. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wiest, I.C., Lessmann, M.E., Wolf, F., Ferber, D., Van Treeck, M., Zhu, J., Ebert, M.P., Westphalen, C.B., Wermke, M., and Kather, J.N. (2024). Anonymizing medical documents with local, privacy preserving large language models: The LLM-Anonymizer. medRxiv.","DOI":"10.1101\/2024.06.11.24308355"},{"key":"ref_30","unstructured":"(2025, January 30). Assembleia da Rep\u00fablica. Available online: https:\/\/www.parlamento.pt."},{"key":"ref_31","unstructured":"(2025, January 31). Governo de Portugal, Available online: https:\/\/www.portugal.gov.pt\/pt\/gc24\/primeiro-ministro,organization=Rep\u00fablicaPortuguesa."},{"key":"ref_32","unstructured":"(2025, February 01). European Council. Available online: https:\/\/www.consilium.europa.eu\/en\/european-council."},{"key":"ref_33","unstructured":"(2025, February 01). European Commission, Official Website. Available online: https:\/\/commission.europa.eu\/index_en."},{"key":"ref_34","unstructured":"(2025, February 01). European Parliament. Available online: https:\/\/www.europarl.europa.eu\/portal\/en."},{"key":"ref_35","unstructured":"(2025, February 01). CNCS\u2014Centro Nacional de Ciberseguran\u00e7a, Available online: https:\/\/www.cncs.gov.pt."},{"key":"ref_36","unstructured":"(2025, February 01). ENISA. Available online: https:\/\/www.enisa.europa.eu."},{"key":"ref_37","unstructured":"(2025, February 01). ISO\u2014International Organization for Standardization. Available online: https:\/\/www.iso.org\/home.html."},{"key":"ref_38","unstructured":"(2018). Information Technology\u2014Security Techniques\u2014Information Security Management Systems\u2014Overview and Vocabulary (Standard No. ISO-27000)."},{"key":"ref_39","unstructured":"(2025, February 01). Official PCI Security Standards Council Site. Available online: https:\/\/www.pcisecuritystandards.org."},{"key":"ref_40","unstructured":"(2025, February 01). No Direito Portugu\u00eas, Qual a Diferen\u00e7a Entre Uma lei, Um Decreto-lei e uma Portaria?. Available online: https:\/\/ffms.pt\/pt-pt\/direitos-e-deveres\/no-direito-portugues-qual-diferenca-entre-uma-lei-um-decreto-lei-e-uma-portaria."},{"key":"ref_41","unstructured":"Infop\u00e9dia (2025, February 01). Regulamento\u2014Infop\u00e9dia. Available online: https:\/\/www.infopedia.pt\/apoio\/artigos\/$regulamento."},{"key":"ref_42","unstructured":"(2025, February 01). Tipos de Legisla\u00e7\u00e3o | Uni\u00e3o Europeia. Available online: https:\/\/european-union.europa.eu\/institutions-law-budget\/law\/types-legislation_pt."},{"key":"ref_43","unstructured":"Contribuidores dos Projetos da Wikimedia (2025, February 01). Norma t\u00e9cnica \u2013 Wikip\u00e9dia, a Enciclop\u00e9dia Livre. Available online: https:\/\/pt.wikipedia.org\/w\/index.php?title=Norma_t%C3%A9cnica&oldid=66145026."},{"key":"ref_44","unstructured":"(2025, February 01). | StandICT.eu 2026. Available online: https:\/\/standict.eu."},{"key":"ref_45","unstructured":"(2025, February 01). Cyber Policy Portal. Available online: https:\/\/cyberpolicyportal.org."},{"key":"ref_46","unstructured":"(2025, February 01). Publications. Available online: https:\/\/www.enisa.europa.eu\/publications#c3=2014&c3=2024&c3=false&c5=publicationDate&reversed=on&b_start=0."},{"key":"ref_47","unstructured":"(2025, February 01). National Cyber Security Strategies\u2014Interactive Map. Available online: https:\/\/tools.enisa.europa.eu\/topics\/national-cyber-security-strategies\/ncss-map\/national-cyber-security-strategies-interactive-map."},{"key":"ref_48","unstructured":"(2025, February 01). Country Wiki\u2014Octopus Cybercrime Community\u2014 www.coe.int. Available online: https:\/\/www.coe.int\/en\/web\/octopus\/country-wiki."},{"key":"ref_49","unstructured":"(2025, February 01). DataGuidance. Available online: https:\/\/www.dataguidance.com."},{"key":"ref_50","unstructured":"(2025, February 01). EU Law\u2014EUR-Lex. Available online: https:\/\/eur-lex.europa.eu\/homepage.html?locale=en."},{"key":"ref_51","unstructured":"(2025, February 01). CNCS\u2014Observat\u00f3rio de Ciberseguran\u00e7a, Available online: https:\/\/www.cncs.gov.pt\/pt\/observatorio."},{"key":"ref_52","unstructured":"(2025, February 01). CNCS\u2014Quadro Nacional, Available online: https:\/\/www.cncs.gov.pt\/pt\/quadro-nacional."},{"key":"ref_53","unstructured":"(2025, February 01). Di\u00e1rio da Rep\u00fablica. Available online: https:\/\/diariodarepublica.pt\/dr\/home."},{"key":"ref_54","unstructured":"(2025, February 01). MongoDB: The Developer Data Platform. Available online: https:\/\/www.mongodb.com."},{"key":"ref_55","unstructured":"(2025, February 01). OpenAI API. Available online: https:\/\/openai.com\/api."},{"key":"ref_56","unstructured":"(2025, February 01). Welcome to Flask\u2014Flask Documentation (3.0.x). Available online: https:\/\/flask.palletsprojects.com\/en\/stable\/changes\/#version-3-0-0."},{"key":"ref_57","unstructured":"(2025, February 01). Vuetify\u2014A Vue Component Framework. Available online: https:\/\/vuetifyjs.com\/en\/#installation."},{"key":"ref_58","unstructured":"(2025, February 01). Vis-Network. Available online: https:\/\/github.com\/visjs\/vis-network."},{"key":"ref_59","unstructured":"(2024, May 21). Chart.js. Available online: https:\/\/www.chartjs.org."},{"key":"ref_60","unstructured":"(2025, February 01). ChatGPT. Available online: https:\/\/openai.com\/chatgpt."},{"key":"ref_61","unstructured":"(2025, February 10). OpenAI. Available online: https:\/\/openai.com."},{"key":"ref_62","unstructured":"(2025, February 01). OpenAI Models. Available online: https:\/\/platform.openai.com\/docs\/models."},{"key":"ref_63","unstructured":"(2025, March 01). Pricing. Available online: https:\/\/openai.com\/api\/pricing\/."},{"key":"ref_64","unstructured":"SimFin (2025, February 01). Pdf-Crawler. Available online: https:\/\/github.com\/SimFin\/pdf-crawler\/tree\/master."},{"key":"ref_65","unstructured":"Openai (2025, February 01). Tiktoken. Available online: https:\/\/github.com\/openai\/tiktoken."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/3\/205\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:47:48Z","timestamp":1760028468000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/3\/205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,5]]},"references-count":65,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["info16030205"],"URL":"https:\/\/doi.org\/10.3390\/info16030205","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,3,5]]}}}