{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T15:11:06Z","timestamp":1774969866556,"version":"3.50.1"},"reference-count":41,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T00:00:00Z","timestamp":1774915200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>The rapid expansion of scientific publications indexed in multiple bibliographic databases has created new computational challenges for large-scale scientometric analysis. Differences in metadata schemas, identifier structures, and export formats across indexing systems such as Web of Science and Scopus introduce inconsistencies that may distort network-based bibliometric analyses. These issues affect duplicate detection, node identification, and network topology construction. This study proposes a reproducible computational pipeline for cross-database scientometric network construction. The framework formalizes the preprocessing workflow into explicit computational modules, including metadata harmonization, deterministic duplicate detection, sparse graph construction, normalization, and structural diagnostics. The proposed architecture separates preprocessing stages into reproducible algorithmic components, enabling transparent evaluation of methodological assumptions. Empirical evaluation using an interdisciplinary dataset of 317 publications (1990\u20132023) demonstrate that deterministic preprocessing significantly improves network stability and preserves clustering structure. Structural diagnostics based on modularity, Herfindahl\u2013Hirschman Index, Shannon entropy, and Gini coefficient provide multi-dimensional evaluation of network topology. Scalability experiments confirm near-linear computational growth under sparse graph construction. The principal contribution of this work lies in the formalization of a transparent and extensible computational architecture for reproducible scientometric analysis. The proposed pipeline supports reliable cross-database integration and enables scalable knowledge-mapping applications in interdisciplinary research domains.<\/jats:p>","DOI":"10.3390\/computers15040213","type":"journal-article","created":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T13:23:17Z","timestamp":1774963397000},"page":"213","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Reproducible Computational Pipeline for Cross-Database Scientometric Network Construction: Architecture, Algorithms, and Structural Validation"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2027-6958","authenticated-orcid":false,"given":"Denny","family":"Moreno-Castro","sequence":"first","affiliation":[{"name":"Facultad de Ciencias e Ingenier\u00eda, Universidad Estatal de Milagro (UNEMI), Milagro 091050, Ecuador"},{"name":"Programa de P\u00f3s-Gradua\u00e7\u00e3o em Ci\u00eancia, Tecnologia e Inova\u00e7\u00e3o Agropecu\u00e1ria (PPGCTIA), Universidade Federal Rural do Rio de Janeiro (UFRRJ), Serop\u00e9dica 23890-000, Rio de Janeiro, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0178-4604","authenticated-orcid":false,"given":"Omar Orlando","family":"Franco-Arias","sequence":"additional","affiliation":[{"name":"Facultad de Ciencias e Ingenier\u00eda, Universidad Estatal de Milagro (UNEMI), Milagro 091050, Ecuador"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3744-8928","authenticated-orcid":false,"given":"C\u00edcero","family":"Pimenteira","sequence":"additional","affiliation":[{"name":"Programa de P\u00f3s-Gradua\u00e7\u00e3o em Ci\u00eancia, Tecnologia e Inova\u00e7\u00e3o Agropecu\u00e1ria (PPGCTIA), Universidade Federal Rural do Rio de Janeiro (UFRRJ), Serop\u00e9dica 23890-000, Rio de Janeiro, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2371-8253","authenticated-orcid":false,"given":"Nicol\u00e1s","family":"M\u00e1rquez","sequence":"additional","affiliation":[{"name":"Escuela de Ingenier\u00eda Comercial, Facultad de Econom\u00eda y Negocios, Universidad Santo Tom\u00e1s, Talca 3460000, Chile"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cristian","family":"Vidal-Silva","sequence":"additional","affiliation":[{"name":"Facultad de Ingenier\u00eda y Negocios, Universidad de Las Am\u00e9ricas, Manuel Montt 948, Providencia, Santiago 7500975, Chile"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,31]]},"reference":[{"key":"ref_1","first-page":"959","article-title":"bibliometrix: An R-tool for comprehensive science mapping analysis","volume":"11","author":"Aria","year":"2017","journal-title":"J. Inf."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1382","DOI":"10.1002\/asi.21525","article-title":"Science mapping software tools: Review, analysis, and cooperative study among tools","volume":"62","author":"Cobo","year":"2011","journal-title":"J. Am. Soc. Inf. Sci. Technol."},{"key":"ref_3","first-page":"212","article-title":"Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods","volume":"10","author":"Chen","year":"2016","journal-title":"J. Inf."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Gl\u00e4nzel, W., Moed, H.F., Schmoch, U., and Thelwall, M. (2019). Science Mapping and the Identification of Topics: Theoretical and Methodological Considerations. Springer Handbook of Science and Technology Indicators, Springer.","DOI":"10.1007\/978-3-030-02511-3"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"100766","DOI":"10.1016\/j.simpa.2025.100766","article-title":"PubMedMetaTool: Automated Metadata Extraction from PubMed Using Python for Bibliometric Analysis","volume":"24","author":"Souza","year":"2025","journal-title":"Softw. Impacts"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, X., Chiabrando, F., and Sammartano, G. (2026). Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review. Remote Sens., 18.","DOI":"10.3390\/rs18040628"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1007\/s11192-009-0146-3","article-title":"Software survey: VOSviewer, a computer program for bibliometric mapping","volume":"84","author":"Waltman","year":"2010","journal-title":"Scientometrics"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Newman, M.E.J. (2010). Networks: An Introduction, Oxford University Press.","DOI":"10.1093\/acprof:oso\/9780199206650.003.0001"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Gl\u00e4nzel, W., Moed, H.F., Schmoch, U., and Thelwall, M. (2019). Creation and Analysis of Large-Scale Bibliometric Networks. Springer Handbook of Science and Technology Indicators, Springer.","DOI":"10.1007\/978-3-030-02511-3"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1007\/s10489-020-02052-0","article-title":"Local Community Detection Algorithm Based on Local Modularity Density","volume":"52","author":"Guo","year":"2022","journal-title":"Appl. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1217","DOI":"10.1007\/s12652-021-03374-8","article-title":"Gravity Algorithm for the Community Detection of Large-Scale Network","volume":"14","author":"Arasteh","year":"2023","journal-title":"J. Ambient. Intell. Humaniz. Comput."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Dehmer, M., Emmert-Streib, F., Chen, Z., Li, X., and Shi, Y. (2016). Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data. Mathematical Foundations and Applications of Graph Entropy, Wiley. Chapter 9.","DOI":"10.1002\/9783527693245"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J. Stat. Mech. Theory Exp."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1177\/09610006211036734","article-title":"Application of graph theory in the library domain\u2013Building a faceted framework based on a literature review","volume":"54","year":"2022","journal-title":"J. Librariansh. Inf. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Conte, M.L., Boisvert, P., Barrison, P., Seifi, F., Landis-Lewis, Z., Flynn, A., and Friedman, C.P. (2024). Ten Simple Rules to Make Computable Knowledge Shareable and Reusable. PLoS Comput. Biol., 20.","DOI":"10.1371\/journal.pcbi.1012179"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"715","DOI":"10.5530\/jscires.20041144","article-title":"Exploring the Publication Metadata Fields in Web of Science, Scopus and Dimensions: Possibilities and Ease of doing Scientometric Analysis","volume":"13","author":"Singh","year":"2025","journal-title":"J. Scientometr. Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5613","DOI":"10.1007\/s11192-022-04475-7","article-title":"Combining Web of Science and Scopus datasets in citation-based literature study","volume":"127","author":"Kumpulainen","year":"2022","journal-title":"Scientometrics"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"5191","DOI":"10.1007\/s11192-025-05415-x","article-title":"A comprehensive approach to preprocessing data for bibliometric analysis","volume":"130","author":"Nowakowska","year":"2025","journal-title":"Scientometrics"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4573","DOI":"10.1007\/s11192-024-05076-2","article-title":"An Open-Source Tool for Merging Data from Multiple Citation Databases","volume":"129","year":"2024","journal-title":"Scientometrics"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Guillen-Aguinaga, M., Aguinaga-Ontoso, E., Guillen-Aguinaga, L., Guillen-Grima, F., and Aguinaga-Ontoso, I. (2025). Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles. Data, 10.","DOI":"10.20944\/preprints202509.1572.v1"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Marl\u00e9s-S\u00e1enz, E., G\u00f3mez-Luna, E., Guerrero, J.M., and Vasquez, J.C. (2025). Innovative Bibliometric Methodology: A New Big Data-Based Framework for Scientific Research. Energies, 18.","DOI":"10.3390\/en18102437"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"21923","DOI":"10.1007\/s00521-024-10371-3","article-title":"A Review of Multimodal-Based Emotion Recognition Techniques for Cyberbullying Detection in Online Social Media Platforms","volume":"36","author":"Wang","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"9799","DOI":"10.1109\/ACCESS.2025.3649212","article-title":"Metadata Integration: A Systematic Review on Methods, Challenges and Applications Across Multiple Domains","volume":"14","author":"Oliveira","year":"2026","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Doreian, P., Batagelj, V., and Ferligoj, A. (2019). Bibliometric Analyses of the Network Clustering Literature. Advances in Network Clustering and Blockmodeling, Wiley. Chapter 2.","DOI":"10.1002\/9781119483298"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1177\/0165551520962775","article-title":"Analysis of Direct Citation, Co-Citation and Bibliographic Coupling in Scientific Topic Identification","volume":"48","author":"Kleminski","year":"2022","journal-title":"J. Inf. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1162\/qss_a_00286","article-title":"Completeness Degree of Publication Metadata in Eight Free-Access Scholarly Databases","volume":"5","author":"Ortega","year":"2024","journal-title":"Quant. Sci. Stud."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Panagea, I.S., Dangol, A., Olijslagers, M., Diels, J., and Wyseure, G. (2025). A Database Schema for Standardized Data and Metadata Collection in Agricultural Experiments. Land, 14.","DOI":"10.3390\/land14091816"},{"key":"ref_28","first-page":"20250035","article-title":"Author Name Disambiguation in Scholarly Research: A Bibliometric Perspective","volume":"10","author":"Shamly","year":"2026","journal-title":"Open Inf. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"45","DOI":"10.31181\/sems31202535k","article-title":"Bibliometric Analysis: Comprehensive Insights into Tools, Techniques, Applications, and Solutions for Research Excellence","volume":"3","author":"Kumar","year":"2025","journal-title":"Spectr. Eng. Manag. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Arsalan, M.H., Mubin, O., Al Mahmud, A., Khan, I.A., and Hassan, A.J. (2025). Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence. Metrics, 2.","DOI":"10.3390\/metrics2020005"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ferrer-Serrano, M., Fuentelsaz, L., and Latorre-Mart\u00ednez, M.P. (J. Knowl. Econ., 2025). Knowledge Transfer and Networks: A Bibliometric Approach Through Performance Analysis, Science Mapping, and Dynamic Network Analysis, J. Knowl. Econ., in press.","DOI":"10.1007\/s13132-025-02814-6"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1002\/asi.24499","article-title":"Harmonizing and publishing heterogeneous premodern manuscript metadata as Linked Open Data","volume":"73","author":"Koho","year":"2021","journal-title":"J. Assoc. Inf. Sci. Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"e52967","DOI":"10.2196\/52967","article-title":"Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review","volume":"12","author":"Peng","year":"2024","journal-title":"JMIR Med. Inform."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kasul, N., and Halicioglu, F.H. (2026). A Bibliometric Analysis of Collaboration in Building Information Modeling: Emerging Dynamics and Future Trends. Buildings, 16.","DOI":"10.3390\/buildings16050986"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1007\/s00799-025-00425-9","article-title":"Validating and Monitoring Bibliographic and Citation Data in OpenCitations Collections","volume":"26","author":"Heibi","year":"2025","journal-title":"Int. J. Digit. Libr."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1007\/s11192-026-05540-1","article-title":"Analysing the Coverage of the University of Bologna\u2019s Bibliographic and Citation Metadata in OpenCitations Collections","volume":"131","author":"Andreose","year":"2026","journal-title":"Scientometrics"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"765","DOI":"10.1007\/s00799-024-00398-1","article-title":"Author Name Disambiguation Literature Review with Consolidated Meta-Analytic Approach","volume":"25","author":"Rodrigues","year":"2024","journal-title":"Int. J. Digit. Libr."},{"key":"ref_38","first-page":"1145","article-title":"Handling Heterogeneous Data in Knowledge Graphs: A Survey","volume":"21","author":"Singh","year":"2022","journal-title":"J. Web Eng."},{"key":"ref_39","first-page":"1178","article-title":"Constructing Bibliometric Networks: A Comparison Between Full and Fractional Counting","volume":"10","author":"Waltman","year":"2016","journal-title":"J. Inf."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1016\/j.respol.2012.02.013","article-title":"Sustainability transitions: An emerging field of research and its prospects","volume":"41","author":"Markard","year":"2012","journal-title":"Res. Policy"},{"key":"ref_41","first-page":"1","article-title":"An agenda for sustainability transitions research: State of the art and future directions","volume":"31","author":"Geels","year":"2019","journal-title":"Environ. Innov. Soc. Transitions"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/4\/213\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T13:41:21Z","timestamp":1774964481000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/4\/213"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,31]]},"references-count":41,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2026,4]]}},"alternative-id":["computers15040213"],"URL":"https:\/\/doi.org\/10.3390\/computers15040213","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,31]]}}}