{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T10:05:13Z","timestamp":1776506713346,"version":"3.51.2"},"reference-count":21,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T00:00:00Z","timestamp":1714608000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Software"],"abstract":"<jats:p>Document-oriented databases, a type of Not Only SQL (NoSQL) database, are gaining popularity owing to their flexibility in data handling and performance for large-scale data. MongoDB, a typical document-oriented database, is a database that stores data in the JSON format, where the upper field involves lower fields and fields with the same related parent. One feature of this document-oriented database is that data are dynamically stored in an arbitrary location without explicitly defining a schema in advance. This flexibility violates the above property and causes difficulties for application program readability and database maintenance. To address these issues, we propose a reconstruction support method for document structures in MongoDB. The method uses the strength of the Has-A relationship between the parent and child fields, as well as the similarity of field names in the MongoDB documents in natural language processing, to reconstruct the data structure in MongoDB. As a result, the method transforms the parent and child fields into more coherent data structures. We evaluated our methods using real-world data and demonstrated their effectiveness.<\/jats:p>","DOI":"10.3390\/software3020010","type":"journal-article","created":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T12:06:38Z","timestamp":1714651598000},"page":"206-225","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A MongoDB Document Reconstruction Support System Using Natural Language Processing"],"prefix":"10.3390","volume":"3","author":[{"given":"Kohei","family":"Hamaji","sequence":"first","affiliation":[{"name":"Honda Motor Co., Ltd., Haga 321-3321, Tochigi, Japan"}]},{"given":"Yukikazu","family":"Nakamoto","sequence":"additional","affiliation":[{"name":"Department of Information and Data Science, Nortre Dame Seishin University, Okayama 700-8516, Okayama, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2024,5,2]]},"reference":[{"key":"ref_1","first-page":"40","article-title":"A Survey on NoSQL Stores","volume":"51","author":"Davoudian","year":"2017","journal-title":"Acm Comput. Surv."},{"key":"ref_2","first-page":"29","article-title":"Development for DB schema reconstruction support tool using natural language processing","volume":"39","author":"Hamaji","year":"2022","journal-title":"Comput. Softw."},{"key":"ref_3","unstructured":"MongoDB (2024, February 02). MongoDB Manual. Available online: https:\/\/www.mongodb.com\/docs\/manual\/."},{"key":"ref_4","unstructured":"Bourhis, P., Reutter, J.L., Su\u00e1rez, F., and Vrgo\u010d, D. (2017, January 14\u201319). JSON: Data Model, Query Languages and Schema Specification. Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Chicago, IL, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1007\/s40607-014-0009-9","article-title":"The Sketch Engine: Ten Years on","volume":"1","author":"Kilgarriff","year":"2014","journal-title":"Lexicography"},{"key":"ref_6","unstructured":"Sketch Engine Team (2024, February 02). CQL\u2014Corpus Query Language. Available online: https:\/\/www.sketchengine.eu\/documentation\/corpus-querying\/."},{"key":"ref_7","unstructured":"Rychl\u00fd, P. (2008, January 5\u20137). A Lexicographer-Friendly Association Score. Proceedings of the Recent Advances in Slavonic Natural Language Processing, Karlova Stud\u00e1nka, Czech Republic."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching Word Vectors with Subword Information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_9","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations, Scottsdale, AZ, USA."},{"key":"ref_10","first-page":"59","article-title":"A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases","volume":"31","author":"Parimala","year":"2011","journal-title":"Int. J. Adv. Sci. Technol."},{"key":"ref_11","unstructured":"Sketch Engine Team (2024, February 02). TenTen Corpus Family. Available online: https:\/\/www.sketchengine.eu\/documentation\/."},{"key":"ref_12","unstructured":"Su, T., and Dy, J. (2004, January 15\u201317). A deterministic method for initializing K-means clustering. Proceedings of the International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA."},{"key":"ref_13","unstructured":"MongoDB (2024, February 02). Compass. Available online: https:\/\/www.mongodb.com\/products\/tools\/compass\/."},{"key":"ref_14","unstructured":"NoSQL Manager Group (2024, February 02). How the NoSQL Manager Helps You to Work with MongoDB. Available online: https:\/\/www.mongodbmanager.com\/."},{"key":"ref_15","unstructured":"M\u00f6ller, M.L., Berton, N., Klettke, M., Scherzinger, S., and St\u00f6rl, U. (2019, January 4\u20138). jHound: Large-Scale Profiling of Open JSON Data. Proceedings of the 15th Conference on Database Systems for Business, Technology and Web, Dresden, Germany."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"922","DOI":"10.14778\/2777598.2777601","article-title":"Schema management for document stores","volume":"8","author":"Wang","year":"2015","journal-title":"Proc. Vldb Endow."},{"key":"ref_17","unstructured":"Klettke, M., St\u00f6rl, U., and Scherzinger, S. (2015, January 2\u20136). Schema extraction and structural outlier detection for JSON-based NoSQL Data Stores. Proceedings of the 16th Conference on Database Systems for Business, Technology and Web, Brussels, Belgium."},{"key":"ref_18","unstructured":"Ruiz, D.S., Morales, S.F., and Molina, J.G. (2015, January 19\u201322). Inferring Versioned Schemas from NoSQL Databases and Its Applications. Proceedings of the International Conference on Conceptual Modeling, Stockholm, Sweden."},{"key":"ref_19","unstructured":"Izquierdo, J.L.C., and Cabot, J. (2013, January 8\u201312). Discovering implicit schemas in JSON data. Proceedings of the International Conference on Web Engineering, Aalborg, Denmark."},{"key":"ref_20","unstructured":"Baaziz, M.A., Lahmar, H.B., Colazzo, D., Ghelli, G., and Sartiani, C. (2013, January 14\u201318). Schema Inference for Massive JSON Datasets. Proceedings of the Extending Database Technology, Venice, Italy."},{"key":"ref_21","first-page":"544","article-title":"Data Modeling Guidelines for NoSQL Document-Store Databases","volume":"9","author":"Imam","year":"2018","journal-title":"Int. J. Adv. Comput. Sci. Appl."}],"container-title":["Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2674-113X\/3\/2\/10\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:38:56Z","timestamp":1760107136000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2674-113X\/3\/2\/10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,2]]},"references-count":21,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["software3020010"],"URL":"https:\/\/doi.org\/10.3390\/software3020010","relation":{},"ISSN":["2674-113X"],"issn-type":[{"value":"2674-113X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,2]]}}}