{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T10:17:56Z","timestamp":1773656276001,"version":"3.50.1"},"reference-count":24,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2018,12,13]],"date-time":"2018-12-13T00:00:00Z","timestamp":1544659200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.<\/jats:p>","DOI":"10.3390\/data3040066","type":"journal-article","created":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T03:58:17Z","timestamp":1544759897000},"page":"66","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Similar Text Fragments Extraction for Identifying Common Wikipedia Communities"],"prefix":"10.3390","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6011-135X","authenticated-orcid":false,"given":"Svitlana","family":"Petrasova","sequence":"first","affiliation":[{"name":"Department of Intelligent Computer Systems, National Technical University \u201cKharkiv Polytechnic Institute\u201d, 61002 Kharkiv, Ukraine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9826-0286","authenticated-orcid":false,"given":"Nina","family":"Khairova","sequence":"additional","affiliation":[{"name":"Department of Intelligent Computer Systems, National Technical University \u201cKharkiv Polytechnic Institute\u201d, 61002 Kharkiv, Ukraine"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"W\u0142odzimierz","family":"Lewoniewski","sequence":"additional","affiliation":[{"name":"Department of Information Systems, Poznan University of Economics and Business, 61-875 Poznan, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8318-3794","authenticated-orcid":false,"given":"Orken","family":"Mamyrbayev","sequence":"additional","affiliation":[{"name":"Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kuralay","family":"Mukhsina","sequence":"additional","affiliation":[{"name":"Department of Informatics, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,12,13]]},"reference":[{"key":"ref_1","unstructured":"(2018, November 30). Wikipedia Community. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia_community."},{"key":"ref_2","unstructured":"(2018, September 15). Research Fronts. Available online: https:\/\/clarivate.com.cn\/research_fronts_2017\/2017_research_front_en.pdf."},{"key":"ref_3","first-page":"89","article-title":"Scientometric databases and their quantitative indices (Part I. Comparative characteristic of scientometric databases)","volume":"8","author":"Chaikovsky","year":"2013","journal-title":"Bull. Natl. Acad. Sci. Ukraine"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1007\/s11192-010-0265-x","article-title":"Correlation between impact and collaboration","volume":"86","author":"Hsu","year":"2011","journal-title":"Scientometrics"},{"key":"ref_5","first-page":"210","article-title":"Bibliomertrics\u2014What and how we can evaluate in science","volume":"44","year":"2013","journal-title":"Large Syst. Manag."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/s11192-016-1844-2","article-title":"Towards a new perspective on context based citation index of research articles","volume":"107","author":"Parvez","year":"2016","journal-title":"Scientometrics"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1007\/s11192-016-1950-1","article-title":"Predicting citation patterns: Defining and determining influence","volume":"108","author":"Brizan","year":"2016","journal-title":"Scientometrics"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"359","DOI":"10.3103\/S0147688215050068","article-title":"The study of systems and methods for scientometric analysis of scientific publications","volume":"42","author":"Shvets","year":"2015","journal-title":"Sci. Tech. Inf. Process."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1759","DOI":"10.1002\/asi.22896","article-title":"Improving the accuracy of co-citation clustering using full text","volume":"64","author":"Boyack","year":"2013","journal-title":"J. Am. Soc. Inf. Sci. Technol."},{"key":"ref_10","unstructured":"Thijs, B., Gl\u00e4nzel, W., and Meyer, M. (2015, January 29). Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of \u201cinformation System Research\u201d. Proceedings of the 1st Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics, Istanbul, Turkey."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1007\/978-3-642-54930-4_19","article-title":"Paraphrase Collocations Extraction Based on Concept Expansion","volume":"Volume 278","author":"Wen","year":"2014","journal-title":"Knowledge Engineering and Management"},{"key":"ref_12","unstructured":"Wang, R., and Callison-Burch, C. (2011, January 24). Paraphrase Fragment Extraction from Monolingual Comparable Corpora. Proceedings of the 4th Workshop on Building and Using Comparable Corpora, Portland, OR, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lytras, M.D., Aljohani, N., Damiani, E., and Chui, K.T. (2018). Innovations, Developments, and Applications of Semantic Web and Information Systems, IGI Global.","DOI":"10.4018\/978-1-5225-5042-6"},{"key":"ref_14","unstructured":"Santanu, P., Pintu, L., and Sudip, K.N. (2014, January 6\u201312). Role of paraphrases in PB-SMT. Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing, Kathmandu, Nepal."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Barzilay, R., and Elhadad, N. (2003, January 11\u201312). Sentence alignment for monolingual comparable corpora. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan.","DOI":"10.3115\/1119355.1119359"},{"key":"ref_16","unstructured":"Nelken, R., and Shieber, S.M. (2006, January 3\u20137). Towards robust context-sensitive sentence alignment for monolingual corpora. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy."},{"key":"ref_17","unstructured":"Coster, W., and Kauchak, D. (2011, January 19\u201324). Simple English Wikipedia: A new text simplification task. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_18","unstructured":"Bott, S., and Saggion, H. (2011, January 24). An unsupervised alignment algorithm for text simplification corpus construction. Proceedings of the Workshop on Monolingual Text-To-Text Generation, Portland, OR, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Petrasova, S., Khairova, N., and Lewoniewski, W. (2018, January 21\u201325). Building the semantic similarity model for social network data streams. Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing, Lviv, Ukraine.","DOI":"10.1109\/DSMP.2018.8478480"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Khairova, N., Petrasova, S., Lewoniewski, W., Mamyrbayev, O., and Mukhsina, K. (2018, January 9\u201312). Automatic Extraction of Synonymous Collocation Pairs from a Text Corpus. Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, Poznan, Poland.","DOI":"10.15439\/2018F186"},{"key":"ref_21","unstructured":"(2018, April 25). Wikipedia:WikiProject_Albums. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia:WikiProject_Albums."},{"key":"ref_22","unstructured":"(2018, April 15). Wikipedia:WikiProject_Film. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia:WikiProject_Film."},{"key":"ref_23","unstructured":"(2018, April 25). Wikipedia:WikiProject_Biography\/Politics_and_government. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia:WikiProject_Biography\/Politics_and_government."},{"key":"ref_24","unstructured":"(2018, April 25). Wikipedia:WikiProject_Biography\/Science_and_academia. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia:WikiProject_Biography\/Science_and_academia."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/3\/4\/66\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:33:43Z","timestamp":1760196823000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/3\/4\/66"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,13]]},"references-count":24,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2018,12]]}},"alternative-id":["data3040066"],"URL":"https:\/\/doi.org\/10.3390\/data3040066","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,13]]}}}