{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T10:03:23Z","timestamp":1766311403027,"version":"build-2065373602"},"reference-count":25,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2012,10,19]],"date-time":"2012-10-19T00:00:00Z","timestamp":1350604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-world events with contextual ambiguity, while replicating conventional clustering on data that are either too specialized or generic to result in contextual information being actionable. These results suggest that our algorithm could potentially reduce false positive rates in existing anomaly detection systems.<\/jats:p>","DOI":"10.3390\/a5040469","type":"journal-article","created":{"date-parts":[[2012,10,22]],"date-time":"2012-10-22T05:32:03Z","timestamp":1350883923000},"page":"469-489","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Contextual Anomaly Detection in Text Data"],"prefix":"10.3390","volume":"5","author":[{"given":"Amogh","family":"Mahapatra","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Minnesota, 200 Union St SE, Minneapolis 55455, USA"}]},{"given":"Nisheeth","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Minnesota, 200 Union St SE, Minneapolis 55455, USA"}]},{"given":"Jaideep","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Minnesota, 200 Union St SE, Minneapolis 55455, USA"}]}],"member":"1968","published-online":{"date-parts":[[2012,10,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1541880.1541882","article-title":"Anomaly detection: A survey","volume":"41","author":"Chandola","year":"2009","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Manevitz, L., and Yousef, M. (2000, January 24-28). Document Classification on Neural Networks Using Only Positive Examples. Proceedings of the 23rd Annual International ACM SIGIR Conference Research and Development in Information Retrieval, New Orleans, USA.","DOI":"10.1145\/345508.345608"},{"key":"ref_3","first-page":"139","article-title":"One-class SVMs for document classification","volume":"2","author":"Manevitz","year":"2002","journal-title":"J. Mach. Learning Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Srivastava, A., and Zane-Ulman, B. (2005, January 5-12). Discovering Recurring Anomalies in Text Reports Regarding Complex Space Systems. Proceedings of IEEE Aerospace Conference, Los Alamitos, CA, USA.","DOI":"10.1109\/AERO.2005.1559692"},{"key":"ref_5","unstructured":"Agovic, A., Shan, H., and Banerjee, A. (2010, January 5-6). Analyzing Aviation Safety Reports: From Topic Modeling to Scalable Multi-label Classification. Proceedings of the Conference on Intelligent Data Understanding, Mountain View, CA, USA."},{"key":"ref_6","unstructured":"Guthrie, D., Guthrie, L., Allison, B., and Wilks, Y. (2007, January 9-12). Unsupervised Anomaly Detection. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, Hyderabad, India."},{"key":"ref_7","unstructured":"Lin, D. (1998, January 24-27). An Information-Theoretic Definition of Similarity. Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA."},{"key":"ref_8","unstructured":"Resnik, P. (1995, January 20-25). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, CA, USA."},{"key":"ref_9","unstructured":"Jiang, J.J., and Conrath, D.W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan."},{"key":"ref_10","unstructured":"Mangalath, P., Quesada, J., and Kintsch, W. (2004, January 5-7). Analogy-making as Predication Using Relational Information and LSA Vectors. Proceedings of the 26th Annual Meeting of the Cognitive Science Society, Chicago, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1109\/TKDE.2007.48","article-title":"The google similarity distance","volume":"19","author":"Cilibrasi","year":"2007","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Bollegala, D., Matsuo, Y., and Ishizuka, M. (2009, January 20-24). Measuring the Similarity between Implicit Semantic Relations from the Web. Proceedings of the 18th International Conference on World Wide Web, ACM, Madrid, Spain.","DOI":"10.1145\/1526709.1526797"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, D., Hua, X., Yang, L., Wang, L., and Zhang, H. (2009, January 20-24). Tag Ranking. Proceedings of the 18th International Conference on The World Wide Web, Madrid, Spain.","DOI":"10.1145\/1526709.1526757"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gligorov, R., Kate, W., Aleksovski, Z., and Harmelen, F. (2007, January 8\u201312). Using Google Distance to Weight Approximate Ontology Matches. Proceedings of the 16th International Conference on the World Wide Web, Banff ALberta, Canada.","DOI":"10.1145\/1242572.1242676"},{"key":"ref_15","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learning Res."},{"key":"ref_16","unstructured":"Newman, D., Asuncion, A., Smyth, P., and Welling, M. (2008, January 8-11). Distributed Inference for Latent Dirichlet Allocation. Proceedings of NIPS 2008, Vancouver, Canada."},{"key":"ref_17","unstructured":"Topic Modelling toolbox. Available online:http:\/\/psiexp.ss.uci.edu\/research\/programsdata."},{"key":"ref_18","unstructured":"WordNet. Available online:http:\/\/wordnet.princeton.edu\/."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pedersen, T., Patwardhan, S., and Michelizzi, J. (2004, January 25-29). WordNet: Similarity-measuring the Relatedness of Concepts. Proceedings of the 19th National Conference on Artificial Intelligence, San Jose CA, USA.","DOI":"10.3115\/1614025.1614037"},{"key":"ref_20","unstructured":"Frank, A., and Asuncion, A. UCI Machine Learning Repository. Available online:http:\/\/archive.ics.uci.edu\/ml."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Srivastava, N., and Srivastava, J. (2010, January 13-16). A hybrid-logic Approach Towards Fault Detection in Complex Cyber-Physical Systems. Proceedings of the Annual Conference of the Prognostics and Health Management Society, Portland, Oregon, USA.","DOI":"10.36001\/phmconf.2010.v2i1.1888"},{"key":"ref_22","unstructured":"Wagstaff, K., Rogers, S., and Schroedl, S. (July, January 28). Constrained K-Means Clustering With Background Knowledge. Proceedings of the International Conference on Machine Learning, Williamstown, MA, USA."},{"key":"ref_23","unstructured":"Sontag, D., and Roy, D. (2011). Complexity of inference in Latent Dirichlet Allocation, NIPS."},{"key":"ref_24","unstructured":"Petrovi, S., Osborne, M., and Lavrenko, V. (2010, January 1-6). Streaming First Story Detection with Application to Twitter. Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, LA, USA."},{"key":"ref_25","unstructured":"WordNet: Similarity. Available online:http:\/\/marimba.d.umn.edu\/."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/5\/4\/469\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:52:57Z","timestamp":1760219577000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/5\/4\/469"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,10,19]]},"references-count":25,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2012,12]]}},"alternative-id":["a5040469"],"URL":"https:\/\/doi.org\/10.3390\/a5040469","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2012,10,19]]}}}