{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:33:36Z","timestamp":1760142816577,"version":"build-2065373602"},"reference-count":28,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T00:00:00Z","timestamp":1704153600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["42201505","622QN352","2021YFF070420304"],"award-info":[{"award-number":["42201505","622QN352","2021YFF070420304"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Hainan Province of China","award":["42201505","622QN352","2021YFF070420304"],"award-info":[{"award-number":["42201505","622QN352","2021YFF070420304"]}]},{"name":"National Key Research and Development Program of China","award":["42201505","622QN352","2021YFF070420304"],"award-info":[{"award-number":["42201505","622QN352","2021YFF070420304"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Semantic folding theory (SFT) is an emerging cognitive science theory that aims to explain how the human brain processes and organizes semantic information. The distribution of text into semantic grids is key to SFT. We propose a sentence-level semantic division baseline with 100 grids (SSDB-100), the only dataset we are currently aware of that performs a relevant validation of the sentence-level SFT algorithm, to evaluate the validity of text distribution in semantic grids and divide it using classical division algorithms on SSDB-100. In this article, we describe the construction of SSDB-100. First, a semantic division questionnaire with broad coverage was generated by limiting the uncertainty range of the topics and corpus. Subsequently, through an expert survey, 11 human experts provided feedback. Finally, we analyzed and processed the feedback; the average consistency index for the used feedback was 0.856 after eliminating the invalid feedback. SSDB-100 has 100 semantic grids with clear distinctions between the grids, allowing the dataset to be extended using semantic methods.<\/jats:p>","DOI":"10.3390\/make6010003","type":"journal-article","created":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T10:36:59Z","timestamp":1704191819000},"page":"41-52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An Evaluative Baseline for Sentence-Level Semantic Division"],"prefix":"10.3390","volume":"6","author":[{"given":"Kuangsheng","family":"Cai","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}]},{"given":"Zugang","family":"Chen","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}]},{"given":"Hengliang","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8651-9505","authenticated-orcid":false,"given":"Shaohua","family":"Wang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"given":"Guoqing","family":"Li","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"given":"Jing","family":"Li","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"given":"Feng","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}]},{"given":"Hang","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1098\/rstb.2010.0223","article-title":"Unity and diversity in human language","volume":"366","author":"Fitch","year":"2011","journal-title":"Philos. Trans. R. Soc. Lond. B Biol. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1006\/brln.2000.2339","article-title":"Language Representation in the Human Brain: Evidence from Cortical Mapping","volume":"74","author":"Bhatnagar","year":"2000","journal-title":"Brain Lang."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1126\/science.aax0289","article-title":"The neurobiology of language beyond single-word processing","volume":"366","author":"Hagoort","year":"2019","journal-title":"Science"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/MSPEC.2007.339647","article-title":"Why Can\u2019t a Computer be more Like a Brain?","volume":"44","author":"Hawkins","year":"2007","journal-title":"IEEE Spectr."},{"key":"ref_5","unstructured":"Hawkins, J., Ahmad, S., Purdy, S., and Lavin, A. (2023, November 08). Biological and Machine Intelligence. Release 0.4. 2016\u20132020. Available online: https:\/\/numenta.com\/resources\/biological-and-machine-intelligence\/."},{"key":"ref_6","unstructured":"Ahmad, S., and Hawkins, J. (2015). Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory. arXiv."},{"key":"ref_7","unstructured":"Purdy, S. (2016). Encoding Data for HTM Systems. arXiv."},{"key":"ref_8","unstructured":"Ahmad, S., and Hawkins, J. (2016). How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"111","DOI":"10.3389\/fncom.2017.00111","article-title":"The HTM Spatial Pooler-A Neocortical Algorithm for Online Sparse Distributed Coding","volume":"11","author":"Cui","year":"2017","journal-title":"Front. Comput. Neurosci."},{"key":"ref_10","unstructured":"Webber, F.D.S. (2015). Semantic Folding Theory\u2014White Paper, Cortical.io."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"5585238","DOI":"10.1155\/2021\/5585238","article-title":"Anomalous Behavior Detection Framework Using HTM-Based Semantic Folding Technique","volume":"2021","author":"Khan","year":"2021","journal-title":"Comput. Math. Methods Med."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439726","article-title":"Deep Learning-based Text Classification: A Comprehensive Review","volume":"54","author":"Minaee","year":"2022","journal-title":"Acm Comput. Surv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1017\/S0269888914000277","article-title":"A survey on text mining in social networks","volume":"30","author":"Irfan","year":"2015","journal-title":"Knowl. Eng. Rev."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"664","DOI":"10.1016\/j.neucom.2017.06.053","article-title":"A review of clustering techniques and developments","volume":"267","author":"Saxena","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, D., Li, T., Zhu, S., and Ding, C. (2008, January 20\u201324). Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore.","DOI":"10.1145\/1390334.1390387"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zha, H. (2002, January 11\u201315). Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland.","DOI":"10.1145\/564376.564398"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Geiss, J. (2009, January 4). Creating a Gold Standard for Sentence Clustering in Multi-Document Summarization. Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, Suntec, Singapore.","DOI":"10.3115\/1667884.1667898"},{"key":"ref_18","unstructured":"(2023, November 08). Yelp Dataset. Available online: https:\/\/www.yelp.com\/dataset."},{"key":"ref_19","unstructured":"(2023, November 08). Large Movie Review Dataset. Available online: http:\/\/ai.stanford.edu\/~amaas\/data\/sentiment\/."},{"key":"ref_20","unstructured":"Richard, S., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., and Potts, C. (2013, January 18\u201321). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA."},{"key":"ref_21","unstructured":"Kaggle (2023, November 08). Consumer Reviews of Amazon Products. Available online: https:\/\/www.kaggle.com\/datasets\/datafiniti\/consumer-reviews-of-amazon-products."},{"key":"ref_22","unstructured":"(2023, November 08). 20 Newsgroups. Available online: http:\/\/qwone.com\/~jason\/20Newsgroups\/."},{"key":"ref_23","unstructured":"Reuters (2023, November 08). Available online: https:\/\/martin-thoma.com\/nlp-reuters."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Seno, E.R., and Nunes, M.D. (2008, January 8\u201310). Some Experiments on Clustering Similar Sentences of Texts in Portuguese. Proceedings of the 8th International Conference on Computational Processing of the Portuguese Language, Aveiro, Portugal.","DOI":"10.1007\/978-3-540-85980-2_14"},{"key":"ref_25","unstructured":"(2023, November 08). Wikipedia Dataset. Available online: https:\/\/dumps.wikimedia.org\/."},{"key":"ref_26","unstructured":"(2023, November 08). English-Corpora. Available online: https:\/\/www.english-corpora.org\/."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/01690969108406936","article-title":"Contextual correlates of semantic similarity","volume":"6","author":"Miller","year":"1991","journal-title":"Lang. Cognitive Proc."},{"key":"ref_28","unstructured":"Toral, A., Mu\u00f1oz, R., and Monachini, M. (2008, January 28\u201330). Named Entity WordNet. Proceedings of the International Conference on Language Resources and Evaluation, Marrakech, Morocco."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/1\/3\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:38:30Z","timestamp":1760103510000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/1\/3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,2]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["make6010003"],"URL":"https:\/\/doi.org\/10.3390\/make6010003","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2024,1,2]]}}}