{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T13:25:14Z","timestamp":1772025914573,"version":"3.50.1"},"reference-count":39,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2022,1,6]],"date-time":"2022-01-06T00:00:00Z","timestamp":1641427200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["DTA"],"published-print":{"date-parts":[[2022,8,23]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.<\/jats:p><\/jats:sec>","DOI":"10.1108\/dta-09-2021-0233","type":"journal-article","created":{"date-parts":[[2022,1,5]],"date-time":"2022-01-05T07:07:15Z","timestamp":1641366435000},"page":"602-625","source":"Crossref","is-referenced-by-count":9,"title":["Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising"],"prefix":"10.1108","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0338-1270","authenticated-orcid":false,"given":"Deepti","family":"Sisodia","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9845-290X","authenticated-orcid":false,"given":"Dilip Singh","family":"Sisodia","sequence":"additional","affiliation":[]}],"member":"140","published-online":{"date-parts":[[2022,1,6]]},"reference":[{"key":"key2022082217163101000_ref001","first-page":"255","article-title":"KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework","volume-title":"Journal of Multiple-Valued Logic and Soft Computing","year":"2011"},{"issue":"1","key":"key2022082217163101000_ref002","doi-asserted-by":"publisher","first-page":"717","DOI":"10.1007\/s12652-020-02054-3","article-title":"Consensus and majority vote feature selection methods and a detection technique for web phishing","volume":"12","year":"2021","journal-title":"Journal of Ambient Intelligence and Humanized Computing"},{"key":"key2022082217163101000_ref003","first-page":"1","article-title":"Random forests for the detection of click fraud in online mobile advertising","year":"2012"},{"issue":"2","key":"key2022082217163101000_ref004","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1007\/s10115-015-0827-6","article-title":"Learning from automatically labeled data: case study on click fraud prediction","volume":"46","year":"2016","journal-title":"Knowledge and Information Systems"},{"issue":"1","key":"key2022082217163101000_ref005","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","year":"2001","journal-title":"Machine Learning"},{"issue":"16","key":"key2022082217163101000_ref006","doi-asserted-by":"publisher","first-page":"6241","DOI":"10.1016\/j.eswa.2013.05.051","article-title":"Feature subset selection filter-wrapper based on low quality data","volume":"40","year":"2013","journal-title":"Expert Systems with Applications"},{"issue":"1","key":"key2022082217163101000_ref007","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.compeleceng.2013.11.024","article-title":"A survey on feature selection methods","volume":"40","year":"2014","journal-title":"Computers and Electrical Engineering"},{"key":"key2022082217163101000_ref008","unstructured":"Documentation \u2013 SciPy.org (n.d), available at: https:\/\/www.scipy.org\/docs.html (accessed 22 August 2020)."},{"issue":"3","key":"key2022082217163101000_ref009","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1002\/bs.3830190303","article-title":"Simple voting systems and majority rule","volume":"19","year":"1974","journal-title":"Behavioral Science"},{"issue":"14","key":"key2022082217163101000_ref010","doi-asserted-by":"publisher","first-page":"6371","DOI":"10.1016\/j.eswa.2014.04.019","article-title":"MIFS-ND: a mutual information-based feature selection method","volume":"41","year":"2014","journal-title":"Expert Systems with Applications"},{"issue":"2","key":"key2022082217163101000_ref011","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1016\/j.aca.2011.07.027","article-title":"An introduction to variable and feature selection","volume":"3","year":"2003","journal-title":"Journal of Machine Learning Research: JMLR"},{"issue":"1-2","key":"key2022082217163101000_ref012","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/S0004-3702(97)00043-X","article-title":"Wrappers for feature subset selection","volume":"97","year":"1997","journal-title":"Artificial Intelligence"},{"issue":"6","key":"key2022082217163101000_ref013","first-page":"1","article-title":"Feature selection: a data perspective","volume":"50","year":"2017","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"key2022082217163101000_ref014","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.patrec.2017.03.018","article-title":"A new feature selection method based on a validity index of feature subset","volume":"92","year":"2017","journal-title":"Pattern Recognition Letters"},{"key":"key2022082217163101000_ref015","unstructured":"NumPy Reference \u2013 NumPy v1.19 Manual (n.d), available at: https:\/\/numpy.org\/doc\/stable\/reference\/ (accessed 22 August 2020)."},{"issue":"1","key":"key2022082217163101000_ref016","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1145\/2623330.2623718","article-title":"Detecting click fraud in online advertising: a data mining approach","volume":"15","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"issue":"1","key":"key2022082217163101000_ref017","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13638-016-0623-3","article-title":"Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing","volume":"2016","year":"2016","journal-title":"EURASIP Journal on Wireless Communications and Networking"},{"key":"key2022082217163101000_ref018","unstructured":"Pandas Documentation \u2013 Pandas 1.1.1 Documentation (n.d), available at: https:\/\/pandas.pydata.org\/docs\/ (accessed 22 August 2020)."},{"key":"key2022082217163101000_ref019","doi-asserted-by":"publisher","first-page":"370","DOI":"10.1007\/978-3-319-03844-5_38","article-title":"A novel ensemble learning-based approach for click fraud detection in mobile advertising","year":"2013"},{"key":"key2022082217163101000_ref020","first-page":"1","article-title":"Feature engineering for click fraud detection","year":"2012"},{"key":"key2022082217163101000_ref021","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.inffus.2018.09.013","article-title":"Machine learning algorithms for wireless sensor networks: a survey","volume":"49","year":"2019","journal-title":"Information Fusion"},{"issue":"4","key":"key2022082217163101000_ref022","first-page":"705","article-title":"Logistic regression diagnostics","volume":"9","year":"1981","journal-title":"Annals of Statistics"},{"key":"key2022082217163101000_ref023","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/0-387-25465-X","article-title":"Decision trees","volume-title":"Data Mining and Knowledge Discovery","year":"2005"},{"key":"key2022082217163101000_ref024","unstructured":"Scikit-Learn: Machine Learning in Python \u2013 Scikit-Learn 0.23.2 Documentation (n.d), available at: https:\/\/scikit-learn.org\/stable\/ (accessed 22 August 2020)."},{"issue":"2","key":"key2022082217163101000_ref025","doi-asserted-by":"publisher","first-page":"216","DOI":"10.1108\/DTA-04-2020-0093","article-title":"Gradient boosting learning for fraudulent publisher detection in online advertising","volume":"55","year":"2020","journal-title":"Data Technologies and Applications"},{"key":"key2022082217163101000_ref026","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/02564602.2021.1915892","article-title":"Data sampling strategies for click fraud detection using imbalanced user click data of online advertising: an empirical review","year":"2021","journal-title":"IETE Technical Review"},{"key":"key2022082217163101000_ref027","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/J.JESTCH.2021.05.015","article-title":"Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset","year":"2021","journal-title":"Engineering Science and Technology: An International Journal"},{"key":"key2022082217163101000_ref028","doi-asserted-by":"publisher","first-page":"2747","DOI":"10.1109\/ICPCSI.2017.8392219","article-title":"Performance evaluation of class balancing techniques for credit card fraud detection","year":"2018"},{"key":"key2022082217163101000_ref029","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1109\/IC3.2015.7346672","article-title":"Prediction of click frauds in mobile advertising","year":"2015"},{"issue":"1","key":"key2022082217163101000_ref030","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1016\/j.aci.2018.08.003","article-title":"Classification assessment methods","volume":"17","year":"2021","journal-title":"Applied Computing and Informatics"},{"key":"key2022082217163101000_ref031","doi-asserted-by":"publisher","DOI":"10.1016\/j.mlwa.2020.100016","article-title":"A hybrid and effective learning approach for click fraud detection","volume":"3","year":"2021","journal-title":"Machine Learning with Applications"},{"key":"key2022082217163101000_ref032","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/j.jbi.2018.07.014","article-title":"Relief-based feature selection: introduction and review","volume":"85","year":"2018","journal-title":"Journal of Biomedical Informatics"},{"key":"key2022082217163101000_ref033","first-page":"90","article-title":"Data mining approach to filter click-spam in mobile ad networks","year":"2015"},{"key":"key2022082217163101000_ref034","first-page":"1","article-title":"Hybrid models for click fraud detection in mobile advertising","year":"2012"},{"issue":"9","key":"key2022082217163101000_ref035","doi-asserted-by":"publisher","first-page":"2839","DOI":"10.1016\/j.patcog.2015.03.009","article-title":"Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation","volume":"48","year":"2015","journal-title":"Pattern Recognition"},{"key":"key2022082217163101000_ref036","first-page":"419","article-title":"Click fraud detection on the advertiser side","year":"2014"},{"key":"key2022082217163101000_ref037","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1109\/ICMLA.2007.35","article-title":"Enhanced recursive feature elimination Xue-Wen","year":"2007"},{"key":"key2022082217163101000_ref038","first-page":"412","article-title":"A comparative study on feature selection in text categorization","year":"1997"},{"key":"key2022082217163101000_ref039","doi-asserted-by":"publisher","first-page":"1360","DOI":"10.1109\/CompComm.2018.8780941","article-title":"A click fraud detection scheme based on cost sensitive BPNN and ABC in mobile advertising","year":"2018"}],"container-title":["Data Technologies and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/DTA-09-2021-0233\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/DTA-09-2021-0233\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:15:23Z","timestamp":1753398923000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/dta\/article\/56\/4\/602-625\/45034"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,6]]},"references-count":39,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,1,6]]},"published-print":{"date-parts":[[2022,8,23]]}},"alternative-id":["10.1108\/DTA-09-2021-0233"],"URL":"https:\/\/doi.org\/10.1108\/dta-09-2021-0233","relation":{},"ISSN":["2514-9288"],"issn-type":[{"value":"2514-9288","type":"print"}],"subject":[],"published":{"date-parts":[[2022,1,6]]}}}