{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T08:40:18Z","timestamp":1741941618987,"version":"3.38.0"},"reference-count":33,"publisher":"China Science Publishing & Media Ltd.","issue":"2","license":[{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,5,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n               <jats:p>Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training. This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts, particularly in the realm of semantic rewriting and translation-based plagiarism detection. In this paper, we present an enhanced attentive Siamese Long Short-Term Memory (LSTM) network designed for Tibetan-Chinese plagiarism detection. Our approach begins with the introduction of translation-based data augmentation, aimed at expanding the bilingual training dataset. Subsequently, we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency. Finally, we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection. We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework.<\/jats:p>","DOI":"10.1162\/dint_a_00242","type":"journal-article","created":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T12:36:15Z","timestamp":1702989375000},"page":"488-503","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["Exploring Attentive Siamese LSTM for Low-Resource Text Plagiarism Detection"],"prefix":"10.3724","volume":"6","author":[{"given":"Wei","family":"Bao","sequence":"first","affiliation":[{"name":"China Electronics Standardization Institute, Beijing 100007, China"}]},{"given":"Jian","family":"Dong","sequence":"additional","affiliation":[{"name":"China Electronics Standardization Institute, Beijing 100007, China"},{"name":"Beihang University, Beijing 100191, China"}]},{"given":"Yang","family":"Xu","sequence":"additional","affiliation":[{"name":"China Electronics Standardization Institute, Beijing 100007, China"}]},{"given":"Yuanyuan","family":"Yang","sequence":"additional","affiliation":[{"name":"Bohai University, Jinzhou 121013, China"}]},{"given":"Xiaoke","family":"Qi","sequence":"additional","affiliation":[{"name":"School of Information Management for Law, China University of Political Science and Law, Beijing 102249, China"}]}],"member":"2026","published-online":{"date-parts":[[2024,5,1]]},"reference":[{"key":"2024071119544514100_ref1","first-page":"e012047","article-title":"Why articles are retracted: a retrospective cross-sectional study of retraction notices at biomed central","volume-title":"BMJ Open","author":"Moylan","year":"2016"},{"key":"2024071119544514100_ref2","first-page":"67","article-title":"Cross-language high similarity search using a conceptual thesaurus","volume-title":"Proceedings of the Third International Conference of the CLEF Initiative:Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics","author":"Gupta","year":"2012"},{"issue":"9","key":"2024071119544514100_ref3","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1111\/1471-0528.15689","article-title":"Plagiarism and data falsification are the most common reasons for retracted publications in obstetrics and gynaecology","volume":"126","author":"Chambers","year":"2019","journal-title":"BJOG"},{"key":"2024071119544514100_ref4","article-title":"Approaches for candidate document retrieval and detailed comparison of plagiarism detection","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum","author":"Kong","year":"2012"},{"key":"2024071119544514100_ref5","article-title":"Approaches for source retrieval and text alignment of plagiarism detection Notebook for PAN at CLEF 2013","volume-title":"Conference and Labs of the Evaluation Forum","author":"Kong","year":"2013"},{"key":"2024071119544514100_ref6","first-page":"1004","article-title":"A winning approach to text alignment for text reuse detection at pan 2014","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum","author":"Sanchez-Perez","year":"2014"},{"key":"2024071119544514100_ref7","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1016\/j.neunet.2023.09.041","article-title":"Compnet: Complementary network for single-channel speech enhancement","volume":"168","author":"Fan","year":"2023","journal-title":"Journal of Neural Networks"},{"issue":"30","key":"2024071119544514100_ref8","first-page":"1159","article-title":"A blockchain-based flexible data auditing scheme for the cloud service","volume":"6","author":"Fan","year":"2021","journal-title":"Journal of Chinese Journal of Electronics"},{"key":"2024071119544514100_ref9","article-title":"Text Alignment Module in CoReMo 2.1 Plagiarism Detector Notebook for PAN at CLEF 2013","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum","author":"Torrej\u00f3n","year":"2013"},{"key":"2024071119544514100_ref10","article-title":"CoReMo system (contextual reference monotony) a fast, low cost and high performance plagiarism analyzer system: Lab report for pan at CLEF 2010","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum","author":"Rodr\u00edguez-Torrej\u00f3n","year":"2010"},{"issue":"12","key":"2024071119544514100_ref11","doi-asserted-by":"crossref","first-page":"2512","DOI":"10.1002\/asi.21630","article-title":"Plagiarism detection using stopword n-grams","volume":"62","author":"Stamatatos","year":"2011","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"2024071119544514100_ref12","first-page":"430","article-title":"Near similarity search and plagiarism analysis","volume-title":"Proceedings of the 29th Annual Conference of the Gesellschaft f\u00fcr Klassifikation","author":"Stein","year":"2005"},{"issue":"9","key":"2024071119544514100_ref13","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/TNN.2009.2023394","article-title":"Multilayer som with tree-structured data for efficient document retrieval and plagiarism detection","volume":"20","author":"Chow","year":"2009","journal-title":"IEEE Transactions on Neural Networks"},{"key":"2024071119544514100_ref14","first-page":"471","article-title":"A coarse-to-fine framework to efficiently thwart Plagiarism","volume-title":"Journal of Pattern Recognition","author":"Zhang","year":"2011"},{"key":"2024071119544514100_ref15","first-page":"286","article-title":"Using structural information and citation evidence to detect significant plagiarism cases in scientific publications","volume-title":"Journal of the American Society for Information Science and Technology","author":"Alzahrani","year":"2012"},{"key":"2024071119544514100_ref16","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1145\/1810617.1810671","article-title":"Citation based plagiarism detection: a new approach to identify plagiarized work language independently","volume-title":"Proceedings of the 21st ACM Conference on Hypertext and Hypermedia","author":"Gipp","year":"2010"},{"key":"2024071119544514100_ref17","first-page":"1527","article-title":"Citation-based plagiarism detection: Practicability on a large-scale scientific corpus","volume-title":"Journal of the Association for Information Science and Technology","author":"Gipp","year":"2014"},{"key":"2024071119544514100_ref18","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.18653\/v1\/D15-1181","article-title":"Multi-perspective sentence similarity modeling with convolutional neural networks","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"He","year":"2015"},{"key":"2024071119544514100_ref19","first-page":"1556","article-title":"Improved semantic representations from tree-structured long short-term memory networks","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing","author":"Tai","year":"2015"},{"key":"2024071119544514100_ref20","first-page":"3294","article-title":"Skip-thought vectors","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems","author":"Kiros","year":"2015"},{"key":"2024071119544514100_ref21","first-page":"2786","article-title":"Siamese recurrent architectures for learning sentence similarity","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Mueller","year":"2016"},{"key":"2024071119544514100_ref22","first-page":"64","article-title":"Plagiarism detection system for the kurdish language","volume-title":"International Journal of Information Technology and Computer Science","author":"Karzan","year":"2017"},{"key":"2024071119544514100_ref23","first-page":"7412","article-title":"Measuring short text reuse for the urdu language","volume-title":"IEEE Access","author":"Sameen","year":"2017"},{"key":"2024071119544514100_ref24","first-page":"421","article-title":"Detecting near-duplicates in Russian documents through using fingerprint algorithm simhash","volume-title":"Procedia Computer Science","author":"Rezaeian","year":"2017"},{"key":"2024071119544514100_ref25","first-page":"012082","article-title":"Plagiarism detection for Indonesian language using winnowing with parallel processing","volume-title":"Journal of Physics: Conference Series","author":"Arifin","year":"2018"},{"key":"2024071119544514100_ref26","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1023\/B:INRT.0000009441.78971.be","article-title":"Character n-gram tokenization for european language text retrieval","volume":"7","author":"McNamee","year":"2004","journal-title":"Journal of Information retrieval"},{"key":"2024071119544514100_ref27","first-page":"1","article-title":"On cross-lingual plagiarism analysis using a statistical model","volume-title":"Proceedings of the 2008 International Conference on Uncovering Plagiarism","author":"Barr\u00f3n-Cedeno","year":"2008"},{"issue":"2","key":"2024071119544514100_ref28","doi-asserted-by":"crossref","first-page":"232","DOI":"10.14569\/IJACSA.2020.0110231","article-title":"Cross-language plagiarism detection using word embedding and inverse document frequency (IDF)","volume":"11","author":"Aljuaid","year":"2020","journal-title":"International Journal of Advanced Computer Science and Applications"},{"issue":"1","key":"2024071119544514100_ref29","first-page":"81","article-title":"A deep learning based technique for plagiarism detection: a comparative study","volume":"9","author":"Mostafa","year":"2020","journal-title":"International Journal of Artificial Intelligence"},{"key":"2024071119544514100_ref30","first-page":"175","article-title":"The university of Sydney's machine translation system for WMT19","volume-title":"Proceedings of the Fourth Conference on Machine Translation","author":"Ding","year":"2019"},{"key":"2024071119544514100_ref31","article-title":"Harnessing indirect training data for end-to-end automatic speech translation: Tricks of the trade","volume-title":"Proceedings of the 16th International Conference on Spoken Language Translation","author":"Pino","year":"2019"},{"key":"2024071119544514100_ref32","first-page":"411","article-title":"Vega-MT: The JD explore academy translation system for wmt22","volume-title":"Proceedings of the Seventh Conference on Machine Translation","author":"Zan","year":"2022"},{"key":"2024071119544514100_ref33","first-page":"21","article-title":"Tibetan-Chinese neural machine translation based on syllable segmentation","volume-title":"Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018)","author":"Lai","year":"2018"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/6\/2\/488\/2458962\/dint_a_00242.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/6\/2\/488\/2458962\/dint_a_00242.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:41:15Z","timestamp":1741938075000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":33,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,5,1]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00242","relation":{},"ISSN":["2641-435X"],"issn-type":[{"type":"electronic","value":"2641-435X"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}