{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:06:15Z","timestamp":1771697175205,"version":"3.50.1"},"reference-count":41,"publisher":"MIT Press - Journals","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p> Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation. <\/jats:p><jats:p> The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems. <\/jats:p>","DOI":"10.1162\/coli_a_00153","type":"journal-article","created":{"date-parts":[[2013,1,3]],"date-time":"2013-01-03T17:12:23Z","timestamp":1357233143000},"page":"917-947","source":"Crossref","is-referenced-by-count":81,"title":["Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection"],"prefix":"10.1162","volume":"39","author":[{"given":"Alberto","family":"Barr\u00f3n-Cede\u00f1o","sequence":"first","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya"}]},{"given":"Marta","family":"Vila","sequence":"additional","affiliation":[{"name":"Universitat de Barcelona"}]},{"given":"M.","family":"Mart\u00ed","sequence":"additional","affiliation":[{"name":"Universitat de Barcelona"}]},{"given":"Paolo","family":"Rosso","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Val\u00e8ncia"}]}],"member":"281","reference":[{"key":"R3","first-page":"37","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)","author":"Barr\u00f3n-Cede\u00f1o Alberto","year":"2010"},{"key":"R4","unstructured":"Barzilay, Regina. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York."},{"key":"R5","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073448"},{"key":"R6","doi-asserted-by":"crossref","unstructured":"Barzilay, Regina and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), pages 50\u201357, Toulouse.","DOI":"10.3115\/1073012.1073020"},{"key":"R7","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034760"},{"key":"R8","unstructured":"Bhagat, Rahul. 2009. Learning Paraphrases from Text. Ph.D. thesis, University of Southern California, Los Angeles."},{"key":"R10","unstructured":"Cheung, Mei Ling Lisa. 2009. Merging Corpus Linguistics and Collaborative Knowledge Construction. Ph.D. thesis, University of Birmingham, Birmingham."},{"key":"R11","doi-asserted-by":"crossref","DOI":"10.1515\/9783112316009","volume-title":"Syntactic Structures.","author":"Chomsky Noam","year":"1957"},{"key":"R14","first-page":"1,678","volume-title":"Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002)","author":"Clough Paul","year":"2002"},{"key":"R15","doi-asserted-by":"publisher","DOI":"10.1162\/coli.08-003-R1-07-044"},{"key":"R16","first-page":"3,450","volume-title":"Proceedings of the International Conference on Education and New Learning Technologies (EDULEARN'10)","author":"Comas Rub\u00e9n","year":"2010"},{"key":"R19","first-page":"9","volume-title":"Proceedings of the Third International Workshop on Paraphrasing (IWP 2005)","author":"Dolan William B.","year":"2005"},{"key":"R20","first-page":"47","volume-title":"Proceedings of the LREC Workshop on Building Lexical Resources from Semantically Annotated Corpora","author":"Dorr Bonnie J.","year":"2004"},{"key":"R21","unstructured":"Dras, Mark. 1999. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. Ph.D. thesis, Macquarie University, Sydney."},{"key":"R22","first-page":"51","volume":"46","author":"Dutrey Camille","year":"2011","journal-title":"Procesamiento del Lenguaje Natural"},{"key":"R23","first-page":"367","volume":"43","author":"Espa\u00f1a-Bonet Cristina","year":"2009","journal-title":"Procesamiento del Lenguaje Natural"},{"key":"R24","doi-asserted-by":"publisher","DOI":"10.2307\/356602"},{"key":"R25","unstructured":"Fujita, Atsushi. 2005. Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases. Ph.D. thesis, Nara Institute of Science and Technology, Nara."},{"key":"R27","first-page":"10","volume-title":"Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009)","author":"Grozea Cristian","year":"2009"},{"key":"R29","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12116-6_59"},{"key":"R30","doi-asserted-by":"publisher","DOI":"10.1177\/1461445603005002005"},{"key":"R32","doi-asserted-by":"publisher","DOI":"10.2307\/411155"},{"key":"R35","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-0266(199606)17:6<441::AID-SMJ819>3.0.CO;2-G"},{"key":"R36","volume-title":"English Verb Classes and Alternations: A Preliminary Investigation.","author":"Levin Beth","year":"1993"},{"key":"R37","first-page":"281","volume-title":"Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability","volume":"1","author":"MacQueen J. B.","year":"1967"},{"issue":"2","key":"R38","first-page":"15","volume":"16","author":"Martin Brian","year":"2004","journal-title":"Nexus (Newsletter of the Australian Sociological Association)"},{"issue":"8","key":"R39","first-page":"1,050","volume":"12","author":"Maurer Hermann","year":"2006","journal-title":"Journal of Universal Computer Science"},{"key":"R40","first-page":"3,143","volume-title":"Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)","author":"Max Aur\u00e9lien","year":"2010"},{"key":"R41","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9084-1"},{"key":"R42","first-page":"9","volume-title":"Dictionnaire Explicatif et Combinatoire du Fran\u00b8cais Contemporain. Recherches Lexico-s\u00e9mantiques III.","author":"Mel'\u010duk Igor A.","year":"1992"},{"key":"R43","volume-title":"La Paraphrase. Mod\u00e9lisation de la Paraphrase Langagi\u00e8re.","author":"Mili\u0107evi\u0107 Jasmina","year":"2007"},{"issue":"1","key":"R47","first-page":"1","volume":"45","author":"Potthast Martin","year":"2011","journal-title":"Language Resources and Evaluation (LRE), Special Issue on Plagiarism and Authorship Analysis"},{"key":"R48","first-page":"997","volume-title":"Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)","author":"Potthast Martin","year":"2010"},{"key":"R49","first-page":"1","author":"Potthast Martin","year":"2009","journal-title":"Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009)"},{"key":"R50","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00014"},{"key":"R52","unstructured":"Shimohata, Mitsuo. 2004. Acquiring Paraphrases from Corpora and Its Application to Machine Translation. Ph.D. thesis, Nara Institute of Science and Technology, Nara."},{"key":"R53","first-page":"38","author":"Stamatatos Efstathios","year":"2009","journal-title":"Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009)"},{"key":"R54","first-page":"63","volume":"45","author":"Stein Benno","year":"2011","journal-title":"Language Resources and Evaluation (LRE), Special Issue on Plagiarism and Authorship Analysis"},{"key":"R55","doi-asserted-by":"publisher","DOI":"10.1145\/1988852.1988860"},{"key":"R56","first-page":"57","volume-title":"Language Typology and Syntactic Description. Grammatical Categories and the Lexicon","author":"Talmy Leonard","year":"1985"},{"key":"R57","first-page":"83","volume":"46","author":"Vila Marta","year":"2011","journal-title":"Procesamiento del Lenguaje Natural"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/COLI_a_00153","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:27:27Z","timestamp":1615584447000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/39\/4\/917-947\/1450"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1162\/COLI_a_00153"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00153","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,12]]}}}