{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T10:54:38Z","timestamp":1770461678488,"version":"3.49.0"},"reference-count":47,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2021,11,29]],"date-time":"2021-11-29T00:00:00Z","timestamp":1638144000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["DTA"],"published-print":{"date-parts":[[2022,6,22]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>Twitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective identification of bot accounts is conducive to accurately judge the disseminated information for the public. However, in actual fake account identification, it is expensive and inefficient to manually label Twitter accounts, and the labeled data are usually unbalanced in classes. To this end, the authors propose a novel framework to solve these problems.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>In the proposed framework, the authors introduce the concept of semi-supervised self-training learning and apply it to the real Twitter account data set from Kaggle. Specifically, the authors first train the classifier in the initial small amount of labeled account data, then use the trained classifier to automatically label large-scale unlabeled account data. Next, iteratively select high confidence instances from unlabeled data to expand the labeled data. Finally, an expanded Twitter account training set is obtained. It is worth mentioning that the resampling technique is integrated into the self-training process, and the data class is balanced at the initial stage of the self-training iteration.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>The proposed framework effectively improves labeling efficiency and reduces the influence of class imbalance. It shows excellent identification results on 6 different base classifiers, especially for the initial small-scale labeled Twitter accounts.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>This paper provides novel insights in identifying Twitter fake accounts. First, the authors take the lead in introducing a self-training method to automatically label Twitter accounts from the semi-supervised background. Second, the resampling technique is integrated into the self-training process to effectively reduce the influence of class imbalance on the identification effect.<\/jats:p><\/jats:sec>","DOI":"10.1108\/dta-07-2021-0196","type":"journal-article","created":{"date-parts":[[2021,11,26]],"date-time":"2021-11-26T03:57:27Z","timestamp":1637899047000},"page":"409-428","source":"Crossref","is-referenced-by-count":9,"title":["A novel semi-supervised self-training method based on resampling for Twitter fake account identification"],"prefix":"10.1108","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9847-0358","authenticated-orcid":false,"given":"Ziming","family":"Zeng","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5651-5260","authenticated-orcid":false,"given":"Tingting","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8760-9254","authenticated-orcid":false,"given":"Shouqiang","family":"Sun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7555-8186","authenticated-orcid":false,"given":"Jingjing","family":"Sun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7130-2004","authenticated-orcid":false,"given":"Jie","family":"Yin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","published-online":{"date-parts":[[2021,11,29]]},"reference":[{"issue":"1","key":"key2022062212532660300_ref001","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1108\/OIR-02-2018-0065","article-title":"What the fake? Assessing the extent of networked political spamming and bots in the propagation of# fakenews on Twitter","volume":"43","year":"2019","journal-title":"Online Information Review"},{"key":"key2022062212532660300_ref002","first-page":"1","article-title":"Mixmatch: a holistic approach to semi-supervised learning","year":"2019","journal-title":"33rd Conference on Neural Information Processing Systems"},{"key":"key2022062212532660300_ref003","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1016\/j.neucom.2013.05.059","article-title":"A method for resampling imbalanced datasets in binary classification tasks for real-world problems","volume":"135","year":"2014","journal-title":"Neurocomputing"},{"key":"key2022062212532660300_ref004","doi-asserted-by":"publisher","first-page":"817","DOI":"10.1109\/ICDM.2016.0096","article-title":"Debot: Twitter bot detection via warped correlation","year":"2016"},{"key":"key2022062212532660300_ref005","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1145\/3292522.3326030","article-title":"Better safe than sorry: an adversarial approach to improve social bot detection","year":"2019"},{"issue":"6","key":"key2022062212532660300_ref006","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102317","article-title":"SimilCatch: enhanced social spammers detection on Twitter using Markov random fields","volume":"57","year":"2020","journal-title":"Information Processing and Management"},{"key":"key2022062212532660300_ref007","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/PCCC.2016.7820655","article-title":"A support vector machine based naive Bayes algorithm for spam filtering","year":"2016"},{"key":"key2022062212532660300_ref008","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1613\/jair.1.11192","article-title":"SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary","volume":"61","year":"2018","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"7","key":"key2022062212532660300_ref009","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1145\/2818717","article-title":"The rise of social bots","volume":"59","year":"2016","journal-title":"Communications of the ACM"},{"key":"key2022062212532660300_ref010","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.eswa.2018.04.031","article-title":"Safety-aware graph-based semi-supervised learning","volume":"107","year":"2018","journal-title":"Expert Systems with Applications"},{"key":"key2022062212532660300_ref011","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1016\/j.engappai.2014.11.001","article-title":"A combined negative selection algorithm\u2013particle swarm optimization for an email spam detection system","volume":"39","year":"2015","journal-title":"Engineering Applications of Artificial Intelligence"},{"issue":"1","key":"key2022062212532660300_ref012","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0192-5","article-title":"Survey on deep learning with class imbalance","volume":"6","year":"2019","journal-title":"Journal of Big Data"},{"key":"key2022062212532660300_ref013","doi-asserted-by":"publisher","first-page":"312","DOI":"10.1016\/j.ins.2018.08.019","article-title":"Deep neural networks for bot detection","volume":"467","year":"2018","journal-title":"Information Sciences"},{"issue":"1","key":"key2022062212532660300_ref014","doi-asserted-by":"publisher","first-page":"110","DOI":"10.11772\/j.issn.1001-9081.2017071721","article-title":"Self-training method based on semi-supervised clustering and data editing","volume":"38","year":"2018","journal-title":"Computer Applications"},{"issue":"5","key":"key2022062212532660300_ref015","doi-asserted-by":"publisher","first-page":"465","DOI":"10.14188\/j.1671-8836.2019.05.007","article-title":"Improved naive Bayes self-training algorithm based on weighted K-nearest neighbor","volume":"65","year":"2019","journal-title":"Wuhan University Journal of Natural Sciences"},{"key":"key2022062212532660300_ref016","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s12652-020-01971-7","article-title":"Divide-and-conquer ensemble self-training method based on probability difference","year":"2020","journal-title":"Journal of Ambient Intelligence and Humanized Computing"},{"issue":"39","key":"key2022062212532660300_ref017","doi-asserted-by":"publisher","first-page":"2822","DOI":"10.11772\/j.issn.1001-9081.2019040606","article-title":"Semi-supervised self-training PU learning based on novel spy technology","volume":"10","year":"2019","journal-title":"Journal of Computer Applications"},{"key":"key2022062212532660300_ref018","doi-asserted-by":"publisher","first-page":"105804","DOI":"10.1016\/j.knosys.2020.105804","article-title":"An effective framework based on local cores for self-labeled semi-supervised classification","year":"2020","journal-title":"Knowledge-Based Systems"},{"key":"key2022062212532660300_ref019","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ICHI.2019.8904753","article-title":"Combining resampling and machine learning to improve sleep-wake detection of Fitbit wristbands","year":"2019"},{"issue":"2","key":"key2022062212532660300_ref020","doi-asserted-by":"publisher","first-page":"289","DOI":"10.13209\/j.0479-8023.2015.048","article-title":"A weibo bot-users indentification model based on random forest","volume":"52","year":"2015","journal-title":"Acta Scientiarum Naturalium Universitatis Pekinensis"},{"key":"key2022062212532660300_ref021","doi-asserted-by":"publisher","first-page":"45800","DOI":"10.1109\/ACCESS.2019.2904220","article-title":"Contrast pattern-based classification for bot detection on Twitter","volume":"7","year":"2019","journal-title":"IEEE Access"},{"issue":"6","key":"key2022062212532660300_ref022","doi-asserted-by":"publisher","first-page":"3212","DOI":"10.1007\/s10489-020-02014-6","article-title":"A co-training method based on entropy and multi-criteria","volume":"51","year":"2021","journal-title":"Applied Intelligence"},{"issue":"7","key":"key2022062212532660300_ref023","doi-asserted-by":"publisher","first-page":"1805","DOI":"10.16208\/j.issn1000-7024.2016.07.020","article-title":"Clustering-based under-sampling ensemble method for software defect prediction","volume":"37","year":"2016","journal-title":"Computer Engineering and Design"},{"key":"key2022062212532660300_ref024","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1109\/ASONAM.2016.7752287","article-title":"A new approach to bot detection: striking the balance between precision and recall","year":"2016"},{"issue":"1","key":"key2022062212532660300_ref025","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1177\/0165551516677911","article-title":"An ensemble scheme based on language function analysis and feature engineering for text genre classification","volume":"44","year":"2018","journal-title":"Journal of Information Science"},{"key":"key2022062212532660300_ref026","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1007\/978-3-319-33625-1_16","article-title":"Exploring performance of instance selection methods in text sentiment classification","volume-title":"Artificial Intelligence Perspectives in Intelligent Systems","year":"2016"},{"issue":"1","key":"key2022062212532660300_ref027","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1177\/0165551515613226","article-title":"A feature selection model based on genetic rank aggregation for text sentiment classification","volume":"43","year":"2017","journal-title":"Journal of Information Science"},{"issue":"4","key":"key2022062212532660300_ref028","doi-asserted-by":"publisher","first-page":"518","DOI":"10.21609\/jiki.v8i1.280","article-title":"Bot spammer detection in Twitter using tweet similarity and time interval entropy","volume":"105","year":"2015","journal-title":"Journal of Inorganic Biochemistry"},{"issue":"125","key":"key2022062212532660300_ref029","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3389\/fphy.2020.00125","article-title":"Measuring bot and human behavioral dynamics","volume":"8","year":"2020","journal-title":"Frontiers in Physics"},{"issue":"3","key":"key2022062212532660300_ref030","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/widm.1301","article-title":"Hyperparameters and tuning strategies for random forest","volume":"9","year":"2019","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"issue":"1","key":"key2022062212532660300_ref031","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1007\/s12083-019-00721-7","article-title":"Task offloading in mobile fog computing by classification and regression tree","volume":"13","year":"2020","journal-title":"Peer-to-Peer Networking and Applications"},{"issue":"7","key":"key2022062212532660300_ref032","doi-asserted-by":"publisher","first-page":"1941","DOI":"10.11772\/j.issn.1001-9081.2018010178","article-title":"Anomaly detection based on synthetic minority oversampling technique and deep belief network","volume":"38","year":"2018","journal-title":"Journal of Computer Applications"},{"key":"key2022062212532660300_ref033","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1016\/j.ins.2016.08.077","article-title":"Medical decision support system for extremely imbalanced datasets","volume":"384","year":"2017","journal-title":"Information Sciences"},{"key":"key2022062212532660300_ref034","doi-asserted-by":"publisher","first-page":"3056","DOI":"10.1109\/ICCV.2015.350","article-title":"Tracking-by-segmentation with online gradient boosting decision tree","year":"2015"},{"issue":"6","key":"key2022062212532660300_ref035","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1109\/MC.2016.183","article-title":"The DARPA Twitter bot challenge","volume":"49","year":"2016","journal-title":"Computer"},{"key":"key2022062212532660300_ref036","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1016\/j.knosys.2017.06.023","article-title":"A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields","volume":"132","year":"2017","journal-title":"Knowledge-Based Systems"},{"key":"key2022062212532660300_ref037","doi-asserted-by":"publisher","first-page":"6540","DOI":"10.1109\/ACCESS.2018.2796018","article-title":"Using machine learning to detect fake identities: bots vs humans","volume":"6","year":"2018","journal-title":"IEEE Access"},{"key":"key2022062212532660300_ref038","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/978-3-319-59424-8_3","article-title":"Genetic algorithms based resampling for the classification of unbalanced datasets","year":"2017"},{"issue":"1","key":"key2022062212532660300_ref039","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1609\/icwsm.v11i1.14871","article-title":"Online human-bot interactions: detection, estimation, and characterization","volume":"11","year":"2017","journal-title":"Proceedings of the International AAAI Conference on Web and Social Media"},{"issue":"4","key":"key2022062212532660300_ref040","doi-asserted-by":"publisher","first-page":"1315","DOI":"10.1109\/TCBB.2017.2712607","article-title":"A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data","volume":"15","year":"2017","journal-title":"IEEE\/ACM Transactions on Computational Biology and Bioinformatics"},{"issue":"30","key":"key2022062212532660300_ref041","doi-asserted-by":"publisher","first-page":"8461","DOI":"10.1364\/AO.56.008461","article-title":"Self-training-based spectral image reconstruction for art paintings with multispectral imaging","volume":"56","year":"2017","journal-title":"Applied Optics"},{"issue":"12","key":"key2022062212532660300_ref042","doi-asserted-by":"publisher","first-page":"5115","DOI":"10.1109\/JSEN.2018.2830743","article-title":"Vision-based human action classification using adaptive boosting algorithm","volume":"18","year":"2018","journal-title":"IEEE Sensors Journal"},{"key":"key2022062212532660300_ref043","first-page":"1476","article-title":"S4l: self-supervised semi-supervised learning","year":"2019"},{"key":"key2022062212532660300_ref044","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/j.neucom.2018.03.069","article-title":"A P-ADMM for sparse quadratic kernel-free least squares semi-supervised support vector machine","volume":"306","year":"2018","journal-title":"Neurocomputing"},{"key":"key2022062212532660300_ref045","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.knosys.2014.03.015","article-title":"Binary PSO with mutation operator for feature selection using decision tree applied to spam detection","volume":"64","year":"2014","journal-title":"Knowledge-Based Systems"},{"issue":"11","key":"key2022062212532660300_ref046","doi-asserted-by":"publisher","first-page":"15","DOI":"10.3969\/j.issn.1003-0077.2019.11.002","article-title":"Research progress of event summarization based on social media","volume":"33","year":"2019","journal-title":"Journal of Chinese Information Processing"},{"issue":"1","key":"key2022062212532660300_ref047","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1007\/s10844-013-0254-7","article-title":"Cost-sensitive three-way email spam filtering","volume":"42","year":"2014","journal-title":"Journal of Intelligent Information Systems"}],"container-title":["Data Technologies and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/DTA-07-2021-0196\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/DTA-07-2021-0196\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:15:15Z","timestamp":1753398915000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/dta\/article\/56\/3\/409-428\/510003"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,29]]},"references-count":47,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,11,29]]},"published-print":{"date-parts":[[2022,6,22]]}},"alternative-id":["10.1108\/DTA-07-2021-0196"],"URL":"https:\/\/doi.org\/10.1108\/dta-07-2021-0196","relation":{},"ISSN":["2514-9288"],"issn-type":[{"value":"2514-9288","type":"print"}],"subject":[],"published":{"date-parts":[[2021,11,29]]}}}