{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T17:56:16Z","timestamp":1772301376090,"version":"3.50.1"},"reference-count":62,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2020,4,24]],"date-time":"2020-04-24T00:00:00Z","timestamp":1587686400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2021,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Detecting, whether a document contains sufficient new information to be deemed as <jats:italic>novel<\/jats:italic>, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of <jats:italic>novelty<\/jats:italic> of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of <jats:inline-formula><jats:alternatives><jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" mime-subtype=\"png\" xlink:href=\"S1351324920000194_inline1.png\"\/><jats:tex-math>\n$\\sim$\n<\/jats:tex-math><\/jats:alternatives><\/jats:inline-formula>3% in terms of accuracy.<\/jats:p>","DOI":"10.1017\/s1351324920000194","type":"journal-article","created":{"date-parts":[[2020,4,24]],"date-time":"2020-04-24T10:04:41Z","timestamp":1587722681000},"page":"427-454","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":11,"title":["Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection"],"prefix":"10.1017","volume":"27","author":[{"given":"Tirthankar","family":"Ghosal","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vignesh","family":"Edithal","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asif","family":"Ekbal","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pushpak","family":"Bhattacharyya","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Srinivasa Satya Sameer Kumar","family":"Chivukula","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Tsatsaronis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2020,4,24]]},"reference":[{"key":"S1351324920000194_ref41","doi-asserted-by":"publisher","DOI":"10.1109\/MCOM.2002.1039860"},{"key":"S1351324920000194_ref6","doi-asserted-by":"publisher","DOI":"10.3115\/1608810.1608812"},{"key":"S1351324920000194_ref57","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290953"},{"key":"S1351324920000194_ref18","unstructured":"Collins-Thompson, K. , Ogilvie, P. , Zhang, Y. and Callan, J. (2002). Information filtering, novelty detection, and named-page finding. In TREC."},{"key":"S1351324920000194_ref58","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775150"},{"key":"S1351324920000194_ref10","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860495"},{"key":"S1351324920000194_ref14","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2029"},{"key":"S1351324920000194_ref61","doi-asserted-by":"publisher","DOI":"10.1145\/1506250.1506256"},{"key":"S1351324920000194_ref22","unstructured":"Dasgupta, T. and Dey, L. (2016). Automatic scoring for innovativeness of textual ideas. In Workshops at the Thirtieth AAAI Conference on Artificial Intelligence."},{"key":"S1351324920000194_ref34","unstructured":"Lai, A. , Bisk, Y. and Hockenmaier, J. (2017). Natural language inference from multiple premises. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27\u2013December 1, 2017 - Volume 1: Long Papers, pp. 100\u2013109."},{"key":"S1351324920000194_ref2","doi-asserted-by":"publisher","DOI":"10.1145\/354756.354843"},{"key":"S1351324920000194_ref51","doi-asserted-by":"publisher","DOI":"10.1049\/cp:19950597"},{"key":"S1351324920000194_ref24","unstructured":"Franz, M. , Ittycheriah, A. , McCarley, J.S. and Ward, T. (2001). First story detection: Combining similarity and novelty based approaches. In Topic Detection and Tracking Workshop Report, pp. 193\u2013206."},{"key":"S1351324920000194_ref12","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291025"},{"key":"S1351324920000194_ref52","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0033283"},{"key":"S1351324920000194_ref15","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484094"},{"key":"S1351324920000194_ref25","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988738"},{"key":"S1351324920000194_ref47","unstructured":"Soboroff, I. and Harman, D. (2003). Overview of the TREC 2003 novelty track. In Proceedings of The Twelfth Text REtrieval Conference, TREC 2003, Gaithersburg, Maryland, USA, November 18\u201321, 2003, pp. 38\u201353."},{"key":"S1351324920000194_ref28","unstructured":"Ghosal, T. , Salam, A. , Tiwary, S. , Ekbal, A. and Bhattacharyya, P. (2018b). TAP-DLND 1.0 : A corpus for document level novelty detection. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7\u201312, 2018."},{"key":"S1351324920000194_ref5","unstructured":"Arora, S. , Liang, Y. and Ma, T. (2016). A simple but tough-to-beat baseline for sentence embeddings. In 5th International Conference on Learning Representations, 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings."},{"key":"S1351324920000194_ref4","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860493"},{"key":"S1351324920000194_ref54","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-010-0372-2"},{"key":"S1351324920000194_ref48","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220589"},{"key":"S1351324920000194_ref40","doi-asserted-by":"publisher","DOI":"10.1045\/september2016-mishra"},{"key":"S1351324920000194_ref3","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290954"},{"key":"S1351324920000194_ref32","doi-asserted-by":"publisher","DOI":"10.1109\/CCA.2002.1040189"},{"key":"S1351324920000194_ref9","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1075"},{"key":"S1351324920000194_ref26","doi-asserted-by":"publisher","DOI":"10.3115\/1654758.1654762"},{"key":"S1351324920000194_ref17","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390446"},{"key":"S1351324920000194_ref20","doi-asserted-by":"publisher","DOI":"10.2200\/S00509ED1V01Y201305HLT023"},{"key":"S1351324920000194_ref49","doi-asserted-by":"publisher","DOI":"10.3115\/1072133.1072182"},{"key":"S1351324920000194_ref1","unstructured":"Allan, J. , Gupta, R. and Khandelwal, V. (2001). Topic models for summarizing novelty. In ARDA Workshop on Language Modeling and Information Retrieval. Pittsburgh, Pennsylvania."},{"key":"S1351324920000194_ref33","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01307-2_7"},{"key":"S1351324920000194_ref46","unstructured":"Soboroff, I. (2004). Overview of the TREC 2004 novelty track. In Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16\u201319, 2004."},{"key":"S1351324920000194_ref36","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1067"},{"key":"S1351324920000194_ref16","doi-asserted-by":"publisher","DOI":"10.1145\/1935826.1935847"},{"key":"S1351324920000194_ref30","unstructured":"Harman, D. (2002). Overview of the TREC 2002 novelty track. In Proceedings of The Eleventh Text REtrieval Conference, TREC 2002, Gaithersburg, Maryland, USA, November 19\u201322, 2002."},{"key":"S1351324920000194_ref38","unstructured":"Lin, Z. , Feng, M. , Santos, C.N.d. , Yu, M. , Xiang, B. , Zhou, B. and Bengio, Y. (2017). A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130."},{"key":"S1351324920000194_ref7","unstructured":"Bahdanau, D. , Cho, K. and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR abs\/1409.0473."},{"key":"S1351324920000194_ref53","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2010.02.020"},{"key":"S1351324920000194_ref27","unstructured":"Ghosal, T. , Edithal, V. , Ekbal, A. , Bhattacharyya, P. , Tsatsaronis, G. and Chivukula, S.S.S.K. (2018a). Novelty goes deep. A deep neural solution to document level novelty detection. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20\u201326, 2018, pp. 2802\u20132813."},{"key":"S1351324920000194_ref31","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-41230-1_5"},{"key":"S1351324920000194_ref23","doi-asserted-by":"publisher","DOI":"10.1037\/h0031619"},{"key":"S1351324920000194_ref50","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2009.12.075"},{"key":"S1351324920000194_ref11","unstructured":"Bysani, P. (2010). Detecting novelty in the context of progressive summarization. In Proceedings of the NAACL HLT 2010 Student Research Workshop. Association for Computational Linguistics, pp. 13\u201318."},{"key":"S1351324920000194_ref21","unstructured":"Dasgupta, D. and Forrest, S. (1996). Novelty detection in time series data using ideas from immunology. In Proceedings of the International Conference on Intelligent Systems, pp. 82\u201387."},{"key":"S1351324920000194_ref19","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1070"},{"key":"S1351324920000194_ref39","unstructured":"Liu, Y. , Sun, C. , Lin, L. and Wang, X. (2016). Learning natural language inference using bidirectional LSTM model and inner-attention. CoRR abs\/1605.09090."},{"key":"S1351324920000194_ref59","unstructured":"Zhang, M. , Song, R. , Lin, C. , Ma, S. , Jiang, Z. , Jin, Y. , Liu, Y. , Zhao, L. and Ma, S. (2003). Expansion-based technologies in finding relevant and new information: Thu TREC 2002: Novelty track experiments. In NIST SPECIAL PUBLICATION SP (251), 586\u2013590."},{"key":"S1351324920000194_ref37","doi-asserted-by":"publisher","DOI":"10.1145\/1099554.1099734"},{"key":"S1351324920000194_ref44","unstructured":"Ru, L. , Zhao, L. , Zhang, M. and Ma, S. (2004). Improved feature selection and redundance computing-thuir at trec 2004 novelty track. In TREC."},{"key":"S1351324920000194_ref13","doi-asserted-by":"publisher","DOI":"10.1109\/ICNN.1997.614010"},{"key":"S1351324920000194_ref45","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220665"},{"key":"S1351324920000194_ref55","doi-asserted-by":"publisher","DOI":"10.1109\/WI-IAT.2012.128"},{"key":"S1351324920000194_ref56","unstructured":"Wayne, C.L. (1997). Topic detection and tracking (tdt). In Workshop Held at the University of Maryland on, vol. 27. Citeseer, pp. 28."},{"key":"S1351324920000194_ref29","doi-asserted-by":"publisher","DOI":"10.1016\/S0954-1810(99)00022-9"},{"key":"S1351324920000194_ref8","unstructured":"Bentivogli, L. , Clark, P. , Dagan, I. and Giampiccolo, D. (2011). The seventh pascal recognizing textual entailment challenge. In TAC."},{"key":"S1351324920000194_ref60","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564393"},{"key":"S1351324920000194_ref35","unstructured":"Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1188\u20131196."},{"key":"S1351324920000194_ref62","doi-asserted-by":"crossref","unstructured":"Zhao, P. and Lee, D.L. (2016). How much novelty is relevant? it depends on your curiosity. In 39th International ACM SIGIR Conference on Research and Development, Pisa, Italy, pp. 100.","DOI":"10.1145\/2911451.2911488"},{"key":"S1351324920000194_ref43","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1244"},{"key":"S1351324920000194_ref42","unstructured":"Mou, L. , Men, R. , Li, G. , Xu, Y. , Zhang, L. , Yan, R. and Jin, Z. (2015). Recognizing entailment and contradiction by tree-based convolution. CoRR abs\/1512.08422."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324920000194","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,9]],"date-time":"2021-07-09T06:00:22Z","timestamp":1625810422000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324920000194\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,24]]},"references-count":62,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["S1351324920000194"],"URL":"https:\/\/doi.org\/10.1017\/s1351324920000194","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,24]]},"assertion":[{"value":"\u00a9 Cambridge University Press 2020","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}