{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:36:35Z","timestamp":1760236595526,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2021,12,6]],"date-time":"2021-12-06T00:00:00Z","timestamp":1638748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000006","name":"Office of Naval Research","doi-asserted-by":"publisher","award":["N00014-17-1-2300","N00014-20-1-2623"],"award-info":[{"award-number":["N00014-17-1-2300","N00014-20-1-2623"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100005246","name":"Institute of Education Sciences","doi-asserted-by":"publisher","award":["R305A190050"],"award-info":[{"award-number":["R305A190050"]}],"id":[{"id":"10.13039\/100005246","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Romanian National Authority for Scientific Research and Innovation, CNCS \u2013 UEFISCDI","award":["TE 70 PN-III-P1-1.1-TE-2019-2209"],"award-info":[{"award-number":["TE 70 PN-III-P1-1.1-TE-2019-2209"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Learning to paraphrase supports both writing ability and reading comprehension, particularly for less skilled learners. As such, educational tools that integrate automated evaluations of paraphrases can be used to provide timely feedback to enhance learner paraphrasing skills more efficiently and effectively. Paraphrase identification is a popular NLP classification task that involves establishing whether two sentences share a similar meaning. Paraphrase quality assessment is a slightly more complex task, in which pairs of sentences are evaluated in-depth across multiple dimensions. In this study, we focus on four dimensions: lexical, syntactical, semantic, and overall quality. Our study introduces and evaluates various machine learning models using handcrafted features combined with Extra Trees, Siamese neural networks using BiLSTM RNNs, and pretrained BERT-based models, together with transfer learning from a larger general paraphrase corpus, to estimate the quality of paraphrases across the four dimensions. Two datasets are considered for the tasks involving paraphrase quality: ULPC (User Language Paraphrase Corpus) containing 1998 paraphrases and a smaller dataset with 115 paraphrases based on children\u2019s inputs. The paraphrase identification dataset used for the transfer learning task is the MSRP dataset (Microsoft Research Paraphrase Corpus) containing 5801 paraphrases. On the ULPC dataset, our BERT model improves upon the previous baseline by at least 0.1 in F1-score across the four dimensions. When using fine-tuning from ULPC for the children dataset, both the BERT and Siamese neural network models improve upon their original scores by at least 0.11 F1-score. The results of these experiments suggest that transfer learning using generic paraphrase identification datasets can be successful, while at the same time obtaining comparable results in fewer epochs.<\/jats:p>","DOI":"10.3390\/computers10120166","type":"journal-article","created":{"date-parts":[[2021,12,6]],"date-time":"2021-12-06T22:18:42Z","timestamp":1638829122000},"page":"166","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Automated Paraphrase Quality Assessment Using Language Models and Transfer Learning"],"prefix":"10.3390","volume":"10","author":[{"given":"Bogdan","family":"Nicula","sequence":"first","affiliation":[{"name":"Department of Computer Science, University Politehnica of Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4815-9227","authenticated-orcid":false,"given":"Mihai","family":"Dascalu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University Politehnica of Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania"},{"name":"Academy of Romanian Scientists, Str. Ilfov, Nr. 3, 050044 Bucharest, Romania"}]},{"given":"Natalie N.","family":"Newton","sequence":"additional","affiliation":[{"name":"Department of Psychology, Arizona State University, P.O. Box 871104, Tempe, AZ 85287, USA"}]},{"given":"Ellen","family":"Orcutt","sequence":"additional","affiliation":[{"name":"Department of Educational Psychology, University of Minnesota, 56 East River Road, Minneapolis, MN 55455, USA"}]},{"given":"Danielle S.","family":"McNamara","sequence":"additional","affiliation":[{"name":"Department of Psychology, Arizona State University, P.O. Box 871104, Tempe, AZ 85287, USA"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,6]]},"reference":[{"key":"ref_1","unstructured":"(2005, January 4). Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of the Third International Workshop on Paraphrasing (IWP2005), Jeju Island, Korea."},{"key":"ref_2","unstructured":"Ganitkevitch, J., Van Durme, B., and Callison-Burch, C. (2013, January 9\u201314). PPDB: The Paraphrase Database. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"147","DOI":"10.2190\/1RU5-HDTJ-A5C8-JVWE","article-title":"Improving adolescent students\u2019 reading comprehension with iSTART","volume":"34","author":"McNamara","year":"2006","journal-title":"J. Educ. Comput. Res."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1207\/s15326950dp3801_1","article-title":"SERT: Self-explanation reading training","volume":"38","author":"McNamara","year":"2004","journal-title":"Discourse Process."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"McNamara, D., Ozuru, Y., Best, R., and O\u2019Reilly, T. (2007). The 4Pronged Comprehension Strategy Framework. Reading Comprehension Strategies: Theories, Interventions, and Technologies, University of Memphis.","DOI":"10.4324\/9780203810033"},{"key":"ref_6","first-page":"1","article-title":"An Author\u2019s Guide to Mastering Academic Writing Skills: Discussion of a Medical Manuscript","volume":"5","year":"2021","journal-title":"J. Musculoskelet. Surg. Res."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1097\/00011363-200501000-00007","article-title":"Deep-level comprehension of science texts: The role of the reader and the text","volume":"25","author":"Best","year":"2005","journal-title":"Top. Lang. Disord."},{"key":"ref_8","unstructured":"Jurafsky, D., and Martin, J.H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Pearson Prentice Hall."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Pang, B., Knight, K., and Marcu, D. (June, January 27). Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada.","DOI":"10.3115\/1073445.1073469"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Qian, L., Qiu, L., Zhang, W., Jiang, X., and Yu, Y. (2019, January 3\u20137). Exploring Diverse Expressions for Paraphrase Generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1313"},{"key":"ref_11","unstructured":"Rus, V., Banjade, R., and Lintean, M. (2014, January 26\u201331). On Paraphrase Identification Corpora. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914), Reykjavik, Iceland."},{"key":"ref_12","first-page":"32","article-title":"*SEM 2013 shared task: Semantic Textual Similarity","volume":"Volume 1","author":"Agirre","year":"2013","journal-title":"Main Conference and the Shared Task: Semantic Textual Similarity, Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Atlanta, GA, USA, 13\u201314 June 2013"},{"key":"ref_13","unstructured":"Iyer, S., Dandekar, N., and Csernai, K. (2021, October 26). First Quora Dataset Release: Question Pairs. Available online: https:\/\/quoradata.quora.com\/First-Quora-Dataset-Release-Question-Pairs."},{"key":"ref_14","unstructured":"Ji, Y., and Eisenstein, J. (2013, January 18\u201321). Discriminative Improvements to Distributional Sentence Similarity. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Shen, D., Wang, G., Wang, W., Min, M.R., Su, Q., Zhang, Y., Li, C., Henao, R., and Carin, L. (2018). Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms. arXiv.","DOI":"10.18653\/v1\/P18-1041"},{"key":"ref_16","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., and Levy, O. (2020). SpanBERT: Improving Pre-training by Representing and Predicting Spans. arXiv.","DOI":"10.1162\/tacl_a_00300"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., and Liu, Q. (2020). TinyBERT: Distilling BERT for Natural Language Understanding. arXiv.","DOI":"10.18653\/v1\/2020.findings-emnlp.372"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 25\u201329). GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, R., Zhang, J., Gao, X., Ji, F., and Chen, H. (2019). Simple and Effective Text Matching with Richer Alignment Features. arXiv.","DOI":"10.18653\/v1\/P19-1465"},{"key":"ref_21","unstructured":"McCarthy, P.M., and McNamara, D.S. (2012). The user-language paraphrase corpus. Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches, IGI Global."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1036","DOI":"10.1037\/a0032580","article-title":"Motivation and Performance in a Game-Based Intelligent Tutoring System","volume":"105","author":"Jackson","year":"2013","journal-title":"J. Educ. Psychol."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Nicula, B., Dascalu, M., Newton, N., Orcutt, E., and McNamara, D.S. (2021). Automated Paraphrase Quality Assessment Using Recurrent Neural Networks and Language Models. International Conference on Intelligent Tutoring Systems, Springer.","DOI":"10.1007\/978-3-030-80421-3_36"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014). A Convolutional Neural Network for Modelling Sentences. arXiv.","DOI":"10.3115\/v1\/P14-1062"},{"key":"ref_26","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1965","journal-title":"Sov. Phys. Dokl."},{"key":"ref_27","unstructured":"Craig, S. (2018). Please ReaderBench This Text: A Multi-Dimensional Textual Complexity Assessment Framework. Tutoring and Intelligent Tutoring Systems, Nova Science Publishers, Inc."},{"key":"ref_28","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_29","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020). Transformers: State-of-the-Art Natural Language Processing, EMNLP.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_31","unstructured":"Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., and He, Q. (2020). A Comprehensive Survey on Transfer Learning. arXiv."},{"key":"ref_32","unstructured":"Nicula, B. (2021, November 27). PASTEL: Paraphrasing Assessment Using Transfer Learning. Available online: https:\/\/github.com\/readerbench\/PASTEL."},{"key":"ref_33","unstructured":"Kingma, D.P., and Ba, J. (2017). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_34","unstructured":"Loshchilov, I., and Hutter, F. (2019). Decoupled Weight Decay Regularization. arXiv."},{"key":"ref_35","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer Normalization. arXiv."},{"key":"ref_36","unstructured":"Landauer, T.K., McNamara, D.S., Dennis, S., and Kintsch, W. (2013). Handbook of Latent Semantic Analysis, Psychology Press."},{"key":"ref_37","first-page":"49","article-title":"Analysis Methods in Neural Language Processing: A Survey","volume":"7","author":"Belinkov","year":"2018","journal-title":"CoRR"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/10\/12\/166\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:40:12Z","timestamp":1760168412000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/10\/12\/166"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,6]]},"references-count":37,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["computers10120166"],"URL":"https:\/\/doi.org\/10.3390\/computers10120166","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2021,12,6]]}}}