{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T23:28:15Z","timestamp":1775086095557,"version":"3.50.1"},"reference-count":37,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2017,1,31]],"date-time":"2017-01-31T00:00:00Z","timestamp":1485820800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2017,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper presents a system developed for detecting sexual predators in online chat conversations using a two-stage classification and behavioral features. A sexual predator is defined as a person who tries to obtain sexual favors in a predatory manner, usually with underage people. The proposed approach uses several text categorization methods and empirical behavioral features developed especially for the task at hand. After investigating various approaches for solving the sexual predator identification problem, we have found that a two-stage classifier achieves the best results. In the first stage, we employ a Support Vector Machine classifier to distinguish conversations having suspicious content from safe online discussions. This is useful as most chat conversations in real life do not contain a sexual predator, therefore it can be viewed as a filtering phase that enables the actual detection of predators to be done only for suspicious chats that contain a sexual predator with a very high degree. In the second stage, we detect which of the users in a suspicious discussion is an actual predator using a Random Forest classifier. The system was tested on the corpus provided by the PAN 2012 workshop organizers and the results are encouraging because, as far as we know, our solution outperforms all previous approaches developed for solving this task.<\/jats:p>","DOI":"10.1017\/s1351324916000395","type":"journal-article","created":{"date-parts":[[2017,1,31]],"date-time":"2017-01-31T04:25:21Z","timestamp":1485836721000},"page":"589-616","source":"Crossref","is-referenced-by-count":12,"title":["Detecting sexual predators in chats using behavioral features and imbalanced learning"],"prefix":"10.1017","volume":"23","author":[{"given":"CLAUDIA","family":"CARDEI","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7255-5537","authenticated-orcid":false,"given":"TRAIAN","family":"REBEDEA","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2017,1,31]]},"reference":[{"key":"S1351324916000395_ref017","first-page":"445","volume-title":"Proceedings of 13th European Conference on Artificial Intelligence","author":"Kukar","year":"1998"},{"key":"S1351324916000395_ref005","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-007-5070-8"},{"key":"S1351324916000395_ref034","doi-asserted-by":"publisher","DOI":"10.1016\/S0747-5632(01)00059-0"},{"key":"S1351324916000395_ref026","volume-title":"Linguistic inquiry and word count: LIWC 2001","author":"Pennebaker","year":"2001"},{"key":"S1351324916000395_ref002","first-page":"86","volume-title":"Proceedings of the Workshop on Computational Approaches to Deception Detection","author":"Bogdanova","year":"2012"},{"key":"S1351324916000395_ref019","doi-asserted-by":"publisher","DOI":"10.1300\/J070v16n02_02"},{"key":"S1351324916000395_ref028","unstructured":"Popescu M. , and Grozea C. 2012. Kernel methods and string kernels for authorship analysis. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref027","unstructured":"Platt J. 1998. Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report msr-tr-98-14, Microsoft Research."},{"key":"S1351324916000395_ref024","unstructured":"Parapar J. , Losada D. E. , and Barreiro A. 2012. A learning-based approach for the identification of sexual predators in chat logs. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref015","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007737"},{"key":"S1351324916000395_ref007","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1145\/312129.312220","volume-title":"Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Domingos","year":"1999"},{"key":"S1351324916000395_ref025","unstructured":"Peersman C. , Vaassen F. , Van Asch V. , and Daelemans W. 2012. Conversation level constraints on pedophile detection in chat rooms. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref020","unstructured":"Maloof M. A. 2003. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of ICML-2003 Workshop on Learning from Imbalanced Data Sets II, vol. 2, Washington, DC."},{"key":"S1351324916000395_ref018","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1109\/TSMCB.2008.2007853","article-title":"Exploratory undersampling for class-imbalance learning","volume":"39","author":"Liu","year":"2009","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics"},{"key":"S1351324916000395_ref029","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2007.04.009"},{"key":"S1351324916000395_ref023","unstructured":"Morris C. , and Hirst G. 2012. Identifying sexual predators by svm classification with lexical and behavioral features. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref010","doi-asserted-by":"publisher","DOI":"10.1177\/1077559504271287"},{"key":"S1351324916000395_ref022","doi-asserted-by":"publisher","DOI":"10.1016\/j.amepre.2007.02.001"},{"key":"S1351324916000395_ref013","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"S1351324916000395_ref003","first-page":"110","volume-title":"Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis","author":"Bogdanova","year":"2012"},{"key":"S1351324916000395_ref006a","volume-title":"Elements of Information Theory","author":"Cover","year":"2012"},{"key":"S1351324916000395_ref009","unstructured":"Escalante H. J. , Erro L. , Villesanor E. Villatoro-Tello A. Ju\u00e1 rez , and Montes-y G\u00f3mez M. 2013. Sexual predator detection in chats with chained classifiers. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Association for Computational Linguistics, pp. 46\u201354, Atlanta, Georgia."},{"key":"S1351324916000395_ref014","unstructured":"Inches G. , and Crestani F. 2012. Overview of the international sexual predator identification competition at PAN-2012. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref012","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"S1351324916000395_ref008","unstructured":"Eriksson G. , and Karlgren J. 2012. Features for modelling characteristics of conversations. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref004","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"S1351324916000395_ref001","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"The Journal of Machine Learning Research"},{"key":"S1351324916000395_ref011","doi-asserted-by":"publisher","DOI":"10.1037\/h0057532"},{"key":"S1351324916000395_ref006","first-page":"28","volume-title":"AAAI Fall Symposium on Communicative Action in Humans and Machines","author":"Core","year":"1997"},{"key":"S1351324916000395_ref036","doi-asserted-by":"publisher","DOI":"10.1037\/0003-066X.63.2.111"},{"key":"S1351324916000395_ref035","doi-asserted-by":"publisher","DOI":"10.1016\/j.jadohealth.2004.05.006"},{"key":"S1351324916000395_ref031","doi-asserted-by":"publisher","DOI":"10.1186\/s13388-014-0003-7"},{"key":"S1351324916000395_ref021","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"S1351324916000395_ref030","unstructured":"Vartapetiance A. , and Gillam L. 2012. Quite simple approaches for authorship attribution, intrinsic plagiarism detection and sexual predator identification. In Proceedings of the 6th PAN workshop at CLEF2012 on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN2012), Rome."},{"key":"S1351324916000395_ref033","doi-asserted-by":"publisher","DOI":"10.1016\/j.avb.2012.09.003"},{"key":"S1351324916000395_ref032","unstructured":"Villatoro-Tello E. , Ju\u00e1 rez-Gonz\u00e1 lez A. , Escalante H. J. , Montes-y G\u00f3mez M. , and Pineda L. V. 2012. A two-step approach for effective detection of misbehaving users in chats. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."},{"key":"S1351324916000395_ref016","unstructured":"Kontostathis A. , Garron A. , Reynolds K. , West W. , and Edwards L. 2012. Identifying predators using ChatCoder 2.0. In Proceedings of CLEF 2012 (Online Working Notes\/Labs\/Workshop), CEUR-WS, Rome, Italy."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324916000395","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,17]],"date-time":"2019-04-17T16:35:43Z","timestamp":1555518943000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324916000395\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,1,31]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,7]]}},"alternative-id":["S1351324916000395"],"URL":"https:\/\/doi.org\/10.1017\/s1351324916000395","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,1,31]]}}}