{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T02:30:13Z","timestamp":1768703413282,"version":"3.49.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,4,18]],"date-time":"2019-04-18T00:00:00Z","timestamp":1555545600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the Microsoft Research Ph.D. program"},{"name":"the Google Faculty Research Awards program"},{"name":"the Criteo Faculty Research Award program"},{"name":"the Netherlands Institute for Sound and Vision, and the Netherlands Organisation for Scientific Research","award":["652.002.001, 612.001.551, and 652.001.003"],"award-info":[{"award-number":["652.002.001, 612.001.551, and 652.001.003"]}]},{"name":"Science and Technology Innovation Committee Foundation of Shenzhen","award":["ZDSYS201703031748284"],"award-info":[{"award-number":["ZDSYS201703031748284"]}]},{"name":"European Community's Seventh Framework Programme","award":["312827 (VOX-Pol)"],"award-info":[{"award-number":["312827 (VOX-Pol)"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["91846111 and 91746209"],"award-info":[{"award-number":["91846111 and 91746209"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Ahold Delhaize, Amsterdam Data Science, and the Bloomberg Research Grant program"},{"name":"the China Scholarship Council"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2019,4,30]]},"abstract":"<jats:p>\n            Sparse Bayesian learning is a state-of-the-art supervised learning algorithm that can choose a subset of relevant samples from the input data and make reliable probabilistic predictions. However, in the presence of high-dimensional data with irrelevant features, traditional sparse Bayesian classifiers suffer from performance degradation and low efficiency due to the incapability of eliminating irrelevant features. To tackle this problem, we propose a novel sparse Bayesian embedded feature selection algorithm that adopts truncated Gaussian distributions as both sample and feature priors. The proposed algorithm, called probabilistic feature selection and classification vector machine (PFCVM\n            <jats:sub>LP<\/jats:sub>\n            ) is able to simultaneously select relevant features and samples for classification tasks. In order to derive the analytical solutions, Laplace approximation is applied to compute approximate posteriors and marginal likelihoods. Finally, parameters and hyperparameters are optimized by the type-II maximum likelihood method. Experiments on three datasets validate the performance of PFCVM\n            <jats:sub>LP<\/jats:sub>\n            along two dimensions: classification performance and effectiveness for feature selection. Finally, we analyze the generalization performance and derive a generalization error bound for PFCVM\n            <jats:sub>LP<\/jats:sub>\n            . By tightening the bound, the importance of feature selection is demonstrated.\n          <\/jats:p>","DOI":"10.1145\/3309541","type":"journal-article","created":{"date-parts":[[2019,4,19]],"date-time":"2019-04-19T16:56:23Z","timestamp":1555692983000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["Probabilistic Feature Selection and Classification Vector Machine"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2217-6202","authenticated-orcid":false,"given":"Bingbing","family":"Jiang","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chang","family":"Li","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, the Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maarten De","family":"Rijke","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, the Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"Yao","sequence":"additional","affiliation":[{"name":"Southern University of Science and Technology, Shenzhen, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3918-384X","authenticated-orcid":false,"given":"Huanhuan","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,4,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.96.12.6745"},{"key":"e_1_2_1_2_1","volume-title":"Adaptive Control Processes: A Guided Tour","author":"Bellman Richard Ernest","unstructured":"Richard Ernest Bellman . 1961. Adaptive Control Processes: A Guided Tour . Princeton University Press , Princeton . Richard Ernest Bellman. 1961. Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton."},{"key":"e_1_2_1_3_1","volume-title":"Statistical Decision Theory and Bayesian Analysis (2nd. ed.)","author":"Berger James O.","unstructured":"James O. Berger . 1985. Statistical Decision Theory and Bayesian Analysis (2nd. ed.) . Springer . James O. Berger. 1985. Statistical Decision Theory and Bayesian Analysis (2nd. ed.). Springer."},{"key":"e_1_2_1_4_1","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop Christopher M.","unstructured":"Christopher M. Bishop . 2006. Pattern Recognition and Machine Learning . Springer , New York . Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, New York."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 15th International Conference on Machine Learning.","volume":"98","author":"Paul","unstructured":"Paul S. Bradley and Olvi L. Mangasarian. 1998. Feature selection via concave minimization and support vector machines . In Proceedings of the 15th International Conference on Machine Learning. Vol. 98 , 82--90. Paul S. Bradley and Olvi L. Mangasarian. 1998. Feature selection via concave minimization and support vector machines. In Proceedings of the 15th International Conference on Machine Learning. Vol. 98, 82--90."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014063"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2009.2014161"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2013.2275077"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022627411411"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248548"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/NER.2013.6695876"},{"key":"e_1_2_1_14_1","volume-title":"Stork","author":"Duda Richard O.","year":"2012","unstructured":"Richard O. Duda , Peter E. Hart , and David G . Stork . 2012 . Pattern Classification. John Wiley 8 Sons. Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 16th Conference on Neural Information Processing Systems. 383--390","author":"Anita","unstructured":"Anita C. Faul and Michael E. Tipping. 2002. Analysis of sparse Bayesian learning . In Proceedings of the 16th Conference on Neural Information Processing Systems. 383--390 . Anita C. Faul and Michael E. Tipping. 2002. Analysis of sparse Bayesian learning. In Proceedings of the 16th Conference on Neural Information Processing Systems. 383--390."},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"T. R. Golub D. K. Slonim P. Tamayo C. Huard M. Gaasenbeek J. P. Mesirov H. Coller M. L. Loh J. R. Downing M. A. Caligiuri C. D. Bloomfield and E. S. Lander. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 5439 (1999) 531--537.  T. R. Golub D. K. Slonim P. Tamayo C. Huard M. Gaasenbeek J. P. Mesirov H. Coller M. L. Loh J. R. Downing M. A. Caligiuri C. D. Bloomfield and E. S. Lander. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 5439 (1999) 531--537.","DOI":"10.1126\/science.286.5439.531"},{"key":"e_1_2_1_17_1","volume-title":"CVX: Matlab software for disciplined convex programming.","author":"Grant Michael","year":"2008","unstructured":"Michael Grant , Stephen Boyd , and Yinyu Ye . 2008 . CVX: Matlab software for disciplined convex programming. Retrieved from http:\/\/cvxr.com\/cvx\/doc\/install.html. Michael Grant, Stephen Boyd, and Yinyu Ye. 2008. CVX: Matlab software for disciplined convex programming. Retrieved from http:\/\/cvxr.com\/cvx\/doc\/install.html."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2544779"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.08.048"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 18th Advances in Neural Information Processing Systems. 507--514","author":"He Xiaofei","year":"2005","unstructured":"Xiaofei He , Deng Cai , and Partha Niyogi . 2005 . Laplacian score for feature selection . In Proceedings of the 18th Advances in Neural Information Processing Systems. 507--514 . Xiaofei He, Deng Cai, and Partha Niyogi. 2005. Laplacian score for feature selection. In Proceedings of the 18th Advances in Neural Information Processing Systems. 507--514."},{"key":"e_1_2_1_21_1","article-title":"The minimum error minimax probability machine","author":"Huang Kaizhu","year":"2004","unstructured":"Kaizhu Huang , Haiqin Yang , Irwin King , Michael R Lyu , and Laiwan Chan . 2004 . The minimum error minimax probability machine . Journal of Machine Learning Research 5 , ( Oct. 2004), 1253--1286. Kaizhu Huang, Haiqin Yang, Irwin King, Michael R Lyu, and Laiwan Chan. 2004. The minimum error minimax probability machine. Journal of Machine Learning Research 5, (Oct. 2004), 1253--1286.","journal-title":"Journal of Machine Learning Research 5"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2749574"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-006-0040-8"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.127"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/640075.640097"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2004.55"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2007.891191"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244303321897726"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIBD.2014.7011521"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2018.07.015"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/18.10.1332"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1992.4.3.415"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964297"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2016.06.028"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2013.2260736"},{"key":"e_1_2_1_36_1","unstructured":"D. J. Newman S. Hettich C. L. Blake and C. J. Merz. 1998. UCI repository of machine learning databases. Retrieved from http:\/\/www.ics.uci.edu\/ mlearn\/MLRepository.html.  D. J. Newman S. Hettich C. L. Blake and C. J. Merz. 1998. UCI repository of machine learning databases. Retrieved from http:\/\/www.ics.uci.edu\/ mlearn\/MLRepository.html."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2009.09.003"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems. 1813--1821","author":"Nie Feiping","unstructured":"Feiping Nie , Heng Huang , Xiao Cai , and Chris H. Ding . 2010. Efficient and robust feature selection via joint L2, 1-norms minimization . In Proceedings of the Advances in Neural Information Processing Systems. 1813--1821 . Feiping Nie, Heng Huang, Xiao Cai, and Chris H. Ding. 2010. Efficient and robust feature selection via joint L2, 1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems. 1813--1821."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 23rd AAAI Conference on Artificial Intelligence.","volume":"2","author":"Nie Feiping","year":"2008","unstructured":"Feiping Nie , Shiming Xiang , Yangqing Jia , Changshui Zhang , and Shuicheng Yan . 2008 . Trace ratio criterion for feature selection . In Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Vol. 2 , 671--676. Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, and Shuicheng Yan. 2008. Trace ratio criterion for feature selection. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Vol. 2, 671--676."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46227-1_28"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.159"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2010.2064787"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btg308"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/2831071.2831072"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1162\/15324430152748236"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3264-1"},{"key":"e_1_2_1_47_1","first-page":"360","article-title":"Understanding interobserver agreement: The kappa statistic","volume":"37","author":"Viera Anthony J.","year":"2005","unstructured":"Anthony J. Viera and Joanne M. Garrett . 2005 . Understanding interobserver agreement: The kappa statistic . Family Medicine 37 , 5 (2005), 360 -- 363 . Anthony J. Viera and Joanne M. Garrett. 2005. Understanding interobserver agreement: The kappa statistic. Family Medicine 37, 5 (2005), 360--363.","journal-title":"Family Medicine"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2616305"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2441716"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.201162998"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems. 668--674","author":"Weston Jason","year":"2000","unstructured":"Jason Weston , Sayan Mukherjee , Olivier Chapelle , Massimiliano Pontil , Tomaso A. Poggio , and Vladimir Vapnik . 2000 . Feature selection for SVMs . In Proceedings of the Advances in Neural Information Processing Systems. 668--674 . Jason Weston, Sayan Mukherjee, Olivier Chapelle, Massimiliano Pontil, Tomaso A. Poggio, and Vladimir Vapnik. 2000. Feature selection for SVMs. In Proceedings of the Advances in Neural Information Processing Systems. 668--674."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbm007"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.197"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the 27th International Conference on Machine Learning. 1159--1166","author":"Wu Xindong","year":"2010","unstructured":"Xindong Wu , Kui Yu , Hao Wang , and Wei Ding . 2010 . Online streaming feature selection . In Proceedings of the 27th International Conference on Machine Learning. 1159--1166 . Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In Proceedings of the 27th International Conference on Machine Learning. 1159--1166."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3070646"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499907.2499909"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2700409"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2976744"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAMD.2015.2431497"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2017.2712143"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3309541","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3309541","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:22Z","timestamp":1750268962000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3309541"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,18]]},"references-count":59,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,4,30]]}},"alternative-id":["10.1145\/3309541"],"URL":"https:\/\/doi.org\/10.1145\/3309541","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,18]]},"assertion":[{"value":"2017-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-04-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}