{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T00:53:25Z","timestamp":1778892805618,"version":"3.51.4"},"reference-count":49,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T00:00:00Z","timestamp":1534118400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100006115","name":"Institute for Research in Fundamental Sciences","doi-asserted-by":"publisher","award":["CS1397-4-55"],"award-info":[{"award-number":["CS1397-4-55"]}],"id":[{"id":"10.13039\/501100006115","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2019,8]]},"abstract":"<jats:p>Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools for detecting cross-lingual plagiarism. Given a suspicious document, cross-lingual plagiarism detection comprises two main subtasks: retrieving documents that are candidate sources for that document and analysing those candidates one by one to determine their similarity to the suspicious document. In this article, we examine the second subtask, also called the detailed analysis subtask, where the goal is to align plagiarised fragments from source and suspicious documents in different languages. Our proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments. With these two steps, the proximity of the terms will be considered in different levels of granularity. In both steps, our approach uses a dictionary to obtain translations of individual terms instead of using a machine translation system to convert longer passages from one language to another. We used a weighting scheme to distinct multiple translations of the terms. Experimental results show that our method outperforms the methods used by the systems that achieved the best results in the PAN-2012 and PAN-2014 competitions.<\/jats:p>","DOI":"10.1177\/0165551518787696","type":"journal-article","created":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T09:03:21Z","timestamp":1534151001000},"page":"443-459","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":8,"title":["Cross-lingual text alignment for fine-grained plagiarism detection"],"prefix":"10.1177","volume":"45","author":[{"given":"Nava","family":"Ehsan","sequence":"first","affiliation":[{"name":"School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Islamic Republic of Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Azadeh","family":"Shakery","sequence":"additional","affiliation":[{"name":"School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Islamic Republic of Iran"},{"name":"School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Islamic Republic of Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frank Wm","family":"Tompa","sequence":"additional","affiliation":[{"name":"David R. Cheriton School of Computer Science, University of Waterloo, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2018,8,13]]},"reference":[{"key":"bibr1-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9114-z"},{"key":"bibr2-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2013.06.018"},{"key":"bibr3-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2016.04.006"},{"key":"bibr4-0165551518787696","first-page":"59","volume-title":"Proceedings of the 2016 ACM symposium on document engineering","author":"Ehsan N"},{"key":"bibr5-0165551518787696","volume-title":"Mono-lingual paraphrased text reuse and plagiarism detection","author":"Nawab RMA","year":"2012"},{"key":"bibr6-0165551518787696","first-page":"997","volume-title":"Proceedings of the 23rd international conference on computational linguistics: posters","author":"Potthast M"},{"key":"bibr7-0165551518787696","unstructured":"Leilei K, Haoliang Q, Shuai W, et al. Approaches for candidate document retrieval and detailed comparison of plagiarism detection. In: Proceedings of the PAN, evaluation lab on uncovering plagiarism, authorship, and social software misuse, Rome, 17\u201320 September 2012."},{"key":"bibr8-0165551518787696","first-page":"402","volume-title":"Proceedings of the 6th international conference on experimental IR meets multilinguality, multimodality, and interaction (CLEF 2015)","author":"S\u00e1nchez-P\u00e9rez MA"},{"key":"bibr9-0165551518787696","volume-title":"Proceedings of the CLEF 2012 evaluation labs and workshop working notes papers","author":"Potthast M"},{"key":"bibr10-0165551518787696","volume-title":"Proceedings of the CLEF 2014 evaluation labs and workshop working notes papers","author":"Potthast M"},{"key":"bibr11-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2012.12.082"},{"key":"bibr12-0165551518787696","first-page":"56","volume-title":"SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09)","author":"Seaward L"},{"key":"bibr13-0165551518787696","first-page":"144","volume-title":"Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)","author":"Rahman R"},{"key":"bibr14-0165551518787696","first-page":"241","volume-title":"Proceeding of the Business, Technologie und Web (BTW), Magdeburg","volume":"15","author":"Tschuggnall M","year":"2013"},{"key":"bibr15-0165551518787696","volume-title":"On the mono- and cross-language detection of text reuse and plagiarism","author":"Barr\u00f3n-Cede\u00f1o A.","year":"2012"},{"key":"bibr16-0165551518787696","first-page":"76","volume-title":"Proceedings of the 2003 ACM SIGMOD international conference on management of data","author":"Schleimer S"},{"key":"bibr17-0165551518787696","first-page":"571","volume-title":"Proceedings of the 31st international ACM SIGIR conference on research and development in information retrieval","author":"Seo J"},{"key":"bibr18-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9112-1"},{"key":"bibr19-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21630"},{"key":"bibr20-0165551518787696","first-page":"10","volume-title":"Proceedings of the 3rd PAN workshop uncovering plagiarism, authorship and social software misuse","author":"Grozea C"},{"key":"bibr21-0165551518787696","first-page":"523","volume-title":"Proceedings of the 10th international conference on computational linguistics and intelligent text processing","author":"Barr\u00f3n-Cede\u00f1o A"},{"key":"bibr22-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15998-5_4"},{"key":"bibr23-0165551518787696","first-page":"34","volume":"2","author":"Chen CY","year":"2010","journal-title":"J Comput"},{"key":"bibr24-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2016.08.004"},{"key":"bibr25-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000009441.78971.be"},{"key":"bibr26-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2009.02.005"},{"key":"bibr27-0165551518787696","first-page":"18","volume-title":"Proceedings of the AAAI spring symposium on cross-language text and speech retrieval","volume":"15","author":"Dumais ST"},{"key":"bibr28-0165551518787696","first-page":"1473","volume-title":"Advances in neural information processing systems","author":"Vinokourov A","year":"2002"},{"key":"bibr29-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78646-7_51"},{"key":"bibr30-0165551518787696","first-page":"1606","volume-title":"Proceedings of the 20th international joint conference on artificial intelligence (IJCAI)","volume":"7","author":"Gabrilovich E"},{"key":"bibr31-0165551518787696","first-page":"83","volume-title":"Proceedings of the 13th international conference on artificial intelligence: methodologies, systems, and applications (AIMSA 2008)","author":"Ceska Z"},{"key":"bibr32-0165551518787696","first-page":"415","volume-title":"Proceedings of the 15th conference of the European chapter of the association for computational linguistics (EACL 2017)","volume":"2","author":"Ferrero J"},{"key":"bibr33-0165551518787696","unstructured":"Thompson V. Detecting cross-lingual plagiarism using simulated word embeddings, 2017, https:\/\/arxiv.org\/abs\/1712.10190"},{"key":"bibr34-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2012.02.003"},{"key":"bibr35-0165551518787696","first-page":"51","volume-title":"Proceedings of the Student Research Workshop associated with Recent Advances in Natural Language Processing (RANLP)","author":"Danilova V"},{"key":"bibr36-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2006.09.003"},{"key":"bibr37-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2015.12.004"},{"key":"bibr38-0165551518787696","volume-title":"Proceedings of the 5th international plagiarism conference","author":"Pataki M"},{"key":"bibr39-0165551518787696","first-page":"173","volume-title":"Proceedings of the 2011 7th international conference on natural language processing and knowledge engineering (NLP-KE)","author":"Anguita A"},{"key":"bibr40-0165551518787696","volume-title":"Proceedings of the CLEF 2010 labs and workshops, notebook papers","author":"Muhr M"},{"key":"bibr41-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"bibr42-0165551518787696","volume-title":"Proceedings of the 10th international conference on information and knowledge management (CIKM\u201901)","author":"Zhai C"},{"key":"bibr43-0165551518787696","volume-title":"Foundations of statistical natural language processing","author":"Manning CD","year":"1999"},{"key":"bibr44-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582416"},{"key":"bibr45-0165551518787696","first-page":"1","volume-title":"Proceedings of the 2nd meeting of the North American chapter of the association for computational linguistics on language technologies","author":"Papineni K."},{"key":"bibr46-0165551518787696","first-page":"489","volume-title":"Proceedings of the 5th international conference on language resources and evaluation (LREC\u20192006)","author":"Ma X."},{"key":"bibr47-0165551518787696","first-page":"216","volume-title":"Proceedings of the 48th annual meeting of the association for computational linguistics","author":"Navigli R"},{"key":"bibr48-0165551518787696","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00179"},{"key":"bibr49-0165551518787696","unstructured":"Caumanns J. A fast and simple stemming algorithm for German words. Technical report, Free University of Berlin, Berlin, October 1999."}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551518787696","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551518787696","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551518787696","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:08:41Z","timestamp":1777504121000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551518787696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,13]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,8]]}},"alternative-id":["10.1177\/0165551518787696"],"URL":"https:\/\/doi.org\/10.1177\/0165551518787696","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,8,13]]}}}