{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T17:45:10Z","timestamp":1779903910091,"version":"3.53.1"},"reference-count":45,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,4,15]],"date-time":"2020-04-15T00:00:00Z","timestamp":1586908800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Science of Colombia","award":["64366"],"award-info":[{"award-number":["64366"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>This paper analyses the capabilities of different techniques to build a semantic representation of educational digital resources. Educational digital resources are modeled using the Learning Object Metadata (LOM) standard, and these semantic representations can be obtained from different LOM fields, like the title, description, among others, in order to extract the features\/characteristics from the digital resources. The feature extraction methods used in this paper are the Best Matching 25 (BM25), the Latent Semantic Analysis (LSA), Doc2Vec, and the Latent Dirichlet allocation (LDA). The utilization of the features\/descriptors generated by them are tested in three types of educational digital resources (scientific publications, learning objects, patents), a paraphrase corpus and two use cases: in an information retrieval context and in an educational recommendation system. For this analysis are used unsupervised metrics to determine the feature quality proposed by each one, which are two similarity functions and the entropy. In addition, the paper presents tests of the techniques for the classification of paraphrases. The experiments show that according to the type of content and metric, the performance of the feature extraction methods is very different; in some cases are better than the others, and in other cases is the inverse.<\/jats:p>","DOI":"10.3390\/computation8020030","type":"journal-article","created":{"date-parts":[[2020,4,16]],"date-time":"2020-04-16T05:15:41Z","timestamp":1587014141000},"page":"30","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["Comparison and Evaluation of Different Methods for the Feature Extraction from Educational Contents"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4194-6882","authenticated-orcid":false,"given":"Jose","family":"Aguilar","sequence":"first","affiliation":[{"name":"Escuela de Sistemas, Facultad de Ingenier\u00eda, Universidad de los Andes, M\u00e9rida 5101, Venezuela"},{"name":"GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8435-7394","authenticated-orcid":false,"given":"Camilo","family":"Salazar","sequence":"additional","affiliation":[{"name":"GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Henry","family":"Velasco","sequence":"additional","affiliation":[{"name":"LANTIA SAS, Medellin 050001, Colombia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0706-5881","authenticated-orcid":false,"given":"Julian","family":"Monsalve-Pulido","sequence":"additional","affiliation":[{"name":"GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5506-0022","authenticated-orcid":false,"given":"Edwin","family":"Montoya","sequence":"additional","affiliation":[{"name":"GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,15]]},"reference":[{"key":"ref_1","first-page":"198","article-title":"Learning object evaluation: Computer-mediated collaboration and inter-rater reliability","volume":"25","author":"Vargo","year":"2003","journal-title":"Int. J. Comput. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Pacheco, F., Exposito, E., Aguilar, J., Gineste, M., and Baudoin, C. (2018, January 8\u201313). A novel statistical based feature extraction approach for the inner-class feature estimation using linear regression. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil.","DOI":"10.1109\/IJCNN.2018.8488992"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1109\/TLA.2018.8327424","article-title":"Knowledge Extraction System from Unstructured Documents","volume":"16","author":"Rodriguez","year":"2018","journal-title":"IEEE Latin Am. Trans."},{"key":"ref_4","unstructured":"Learning Technology Standards Committeeof the IEEE (2020, March 26). IEEE P1484.12.2\/D1. Final Standard for Learning Technology\u2014Learning Object Metadata. Available online: http:\/\/www.dia.uniroma3.it\/~sciarro\/e-learning\/LOM_1484_12_1_v1_Final_Draft.pdf."},{"key":"ref_5","unstructured":"Fano, E., Karlgren, J., and Nivre, J. (2019, January 9\u201312). Uppsala University and Gavagai at CLEF Erisk: Comparing word embedding models. Proceedings of the Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum (CLEF 2019), Lugano, Switzerland."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Singh, A.K., and Shashi, M. (2019). Vectorization of Text Documents for Identifying Unifiable News Articles. Int. J. Adv. Comput. Sci. Appl., 10.","DOI":"10.14569\/IJACSA.2019.0100742"},{"key":"ref_7","unstructured":"Peng, H., Wang, J., and Shen, Q. (February, January 30). Improving Text Models with Latent Feature Vector Representations. Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA."},{"key":"ref_8","unstructured":"Niu, L., Dai, X., Zhang, J., and Chen, J. (2015, January 24\u201325). Topic2Vec: Learning distributed representations of topics. Proceedings of the 2015 International Conference on Asian Language Processing (IALP), Suzhou, China."},{"key":"ref_9","unstructured":"Ritu, Z.S., Nowshin, N., Nahid, M.M.H., and Ismail, S. (2018, January 21\u201322). Performance Analysis of Different Word Embedding Models on Bangla Language. Proceedings of the 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Gorro, K., Ancheta, J.R., Capao, K., Oco, N., Roxas, R.E., Sabellano, M.J., Nonnecke, B., Mohanty, S., Crittenden, C., and Goldberg, K. (2017, January 5\u20137). Qualitative data analysis of disaster risk reduction suggestions assisted by topic modeling and word2vec. Proceedings of the 2017 International Conference on Asian Language Processing (IALP), Singapore.","DOI":"10.1109\/IALP.2017.8300601"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kadhim, A.I. (2019, January 2\u20134). Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF. Proceedings of the 2019 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq.","DOI":"10.1109\/ICOASE.2019.8723825"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yang, J., Ward, J., Gharavi, E., Dawson, J., and Alvarado, R. (2019, January 26). Bi-directional Relevance Matching between Medical Corpora. Proceedings of the 2019 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA.","DOI":"10.1109\/SIEDS.2019.8735639"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bhoir, S., Ghorpade, T., and Mane, V. (2017, January 1\u20132). Comparative analysis of different word embedding models. Proceedings of the 2017 International Conference on Advances in Computing, Communication and Control (ICAC3), Mumbai, India.","DOI":"10.1109\/ICAC3.2017.8318770"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hoque, M.T., Islam, A., Ahmed, E., Mamun, K.A., and Huda, M.N. (2019, January 7\u20139). Analyzing Performance of Different Machine Learning Approaches With Doc2vec for Classifying Sentiment of Bengali Natural Language. Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox\u2019sBazar, Bangladesh.","DOI":"10.1109\/ECACE.2019.8679272"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Imaduddin, H., and Fauziati, S. (2019, January 13\u201315). Word Embedding Comparison for Indonesian Language Sentiment Analysis. Proceedings of the 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT), Yogyakarta, Indonesia.","DOI":"10.1109\/ICAIIT.2019.8834536"},{"key":"ref_16","unstructured":"Augustyniak, \u0141., Kajdanowicz, T., and Kazienko, P. (2019). Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liang, Q., Wu, P., and Huang, C. (2019, January 11\u201313). An Efficient Method for Text Classification Task. Proceedings of the 2019 International Conference on Big Data Engineering, Hong Kong, China.","DOI":"10.1145\/3341620.3341631"},{"key":"ref_18","unstructured":"Galke, L., Mai, F., Schelten, A., Brunsch, D., and Scherp, A. (2017). Comparing Titles vs. Full-text for Multi-Label Classification of Scientific Papers and News Articles. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1017\/S1351324917000420","article-title":"Unsupervised learning of semantic representation for documents with the law of total probability","volume":"24","author":"Wei","year":"2018","journal-title":"Nat. Lang. Eng."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gupta, S., and Varma, V. (2017, January 3\u20137). Scientific Article Recommendation by Using Distributed Representations of Text and Graph. Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia.","DOI":"10.1145\/3041021.3053062"},{"key":"ref_21","unstructured":"Nandi, R.N., Zaman, M.A., Al Muntasir, T., Sumit, S.H., Sourov, T., and Rahman, M.J.U. (2018, January 21\u201322). Bangla News Recommendation Using doc2vec. Proceedings of the 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, Bangladesh."},{"key":"ref_22","unstructured":"Wan, S., Dras, M., Dale, R., and Paris, C. (December, January 30). Using dependency-based features to take the\u2019para-farce\u2019out of paraphrase. Proceedings of the Australasian Language Technology Workshop 2006, Sydney, Australia."},{"key":"ref_23","unstructured":"Fernando, S., and Stevenson, M. (2020, April 15). A semantic similarity approach to paraphrase detection. Available online: https:\/\/www.researchgate.net\/profile\/Samuel_Fernando\/publication\/228616213_A_Semantic_Similarity_Approach_to_Paraphrase_Detection\/links\/02e7e5204b323983fb000000\/A-Semantic-Similarity-Approach-to-Paraphrase-Detection.pdf."},{"key":"ref_24","unstructured":"Madnani, N., Tetreault, J., and Chodorow, M. (2012, January 3\u20138). Re-examining machine translation metrics for paraphrase identification. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Montreal, QC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"119","DOI":"10.13053\/rcs-70-1-10","article-title":"Feature Analysis for Paraphrase Recognition and Textual Entailment","volume":"70","author":"Calvo","year":"2013","journal-title":"Res. Comput. Sci."},{"key":"ref_26","first-page":"517","article-title":"Dependency vs. constituent based syntactic n-grams in text similarity measures for paraphrase recognition","volume":"18","author":"Calvo","year":"2014","journal-title":"Comput. Sist."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kenter, T., and De Rijke, M. (2015, January 19\u201323). Short text similarity with word embeddings. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia.","DOI":"10.1145\/2806416.2806475"},{"key":"ref_28","unstructured":"Lee, J., and Cheah, Y.N. (2015, January 4\u20135). Semantic Relatedness Measure for Identifying Relevant Answers in Online Community Question Answering Services. Proceedings of the 9th International Conference on IT in Asia (CITA), Kuching, Sarawak Malaysia."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lee, J.C., and Cheah, Y.N. (2016, January 16\u201319). Paraphrase detection using semantic relatedness based on Synset Shortest Path in WordNet. Proceedings of the 2016 International Conference On Advanced Informatics: Concepts, Theory and Application (ICAICTA), George Town, Malaysia.","DOI":"10.1109\/ICAICTA.2016.7803127"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Mahajan, R.S., and Zaveri, M.A. (2017, January 14\u201316). Modeling Paraphrase Identification Using Supervised Learning Methods Against Various Datasets and Features. Proceedings of the 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Coimbatore, India.","DOI":"10.1109\/ICCIC.2017.8524379"},{"key":"ref_31","unstructured":"Mihalcea, R., Corley, C., and Strapparava, C. (2006, January 16\u201320). Corpus-based and knowledge-based measures of text semantic similarity. Proceedings of the National Conference on Artificial Intelligence, Boston, MA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wu, Z., and Palmer, M. (1994, January 27\u201330). Verbs semantics and lexical selection. Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, Las Cruces, NM, USA.","DOI":"10.3115\/981732.981751"},{"key":"ref_33","unstructured":"Mandala, R., Takenobu, T., and Hozumi, T. (1998, January 16). The use of WordNet in information retrieval. Proceedings of the Workshop Usage of WordNet in Natural Language Processing Systems, Montreal, QC, Canada."},{"key":"ref_34","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Wu, C., Potdar, V., and Chang, E. (2008). Latent semantic analysis\u2013the dynamics of semantics web services discovery. Advances in Web Semantics I, Springer.","DOI":"10.1007\/978-3-540-89784-2_14"},{"key":"ref_36","unstructured":"Seifi, S.T., and Ekhveh, A.A. (2019, January 23\u201325). Representing Unequal Data Series in Vector Space with Its Application in Bank Customer Clustering. Proceedings of the International Congress on High-Performance Computing and Big Data Analysis, Tehran, Iran."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Cleverdon, C. (1967). The Cranfield tests on index language devices. Aslib Proceedings, MCB UP Ltd.","DOI":"10.1108\/eb050097"},{"key":"ref_38","unstructured":"Bajaj, P., Campos, D., Craswell, N., Deng, L., Gao, J., Liu, X., Majumder, R., McNamara, A., Mitra, B., and Nguyen, T. (2016). MS MARCO: A human generated MAchine Reading COmprehension dataset. arXiv."},{"key":"ref_39","unstructured":"Nogueira, R., and Cho, K. (2019). Passage Re-ranking with BERT. arXiv."},{"key":"ref_40","unstructured":"Mitra, B., Rosset, C., Hawking, D., Craswell, N., Diaz, F., and Yilmaz, E. (2019). Incorporating query term independence assumption for efficient retrieval and ranking using deep neural networks. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Rosset, C., Mitra, B., Xiong, C., Craswell, N., Song, X., and Tiwary, S. (2019). An Axiomatic Approach to Regularizing Neural Ranking Models. arXiv.","DOI":"10.1145\/3331184.3331296"},{"key":"ref_42","unstructured":"Nogueira, R., Yang, W., Cho, K., and Lin, J. (2019). Multi-stage document ranking with BERT. arXiv."},{"key":"ref_43","unstructured":"Padigela, H., Zamani, H., and Croft, W.B. (2019). Investigating the Successes and Failures of BERT for Passage Re-Ranking. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2207","DOI":"10.1007\/s10462-019-09731-6","article-title":"Applicability of LAMDA as classification model in the oil production","volume":"53","author":"Morales","year":"2019","journal-title":"Artif. Intell. Rev."},{"key":"ref_45","unstructured":"Waissman, J., Sarrate, R., Escobet, T., Aguilar, J., and Dahhou, B. (2000, January 19). Wastewater treatment process supervision by means of a fuzzy automaton model. Proceedings of the 2000 IEEE International Symposium on Intelligent Control, Rio Patras, Greece."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/8\/2\/30\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:30:49Z","timestamp":1760362249000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/8\/2\/30"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,15]]},"references-count":45,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["computation8020030"],"URL":"https:\/\/doi.org\/10.3390\/computation8020030","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,15]]}}}