{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,28]],"date-time":"2025-09-28T20:26:44Z","timestamp":1759091204845,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,12,13]],"date-time":"2019-12-13T00:00:00Z","timestamp":1576195200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,2,29]]},"abstract":"<jats:p>We examine a new form of smooth approximation to the zero one loss in which learning is performed using a reformulation of the widely used logistic function. Our approach is based on using the posterior mean of a novel generalized Beta-Bernoulli formulation. This leads to a generalized logistic function that approximates the zero one loss, but retains a probabilistic formulation conferring a number of useful properties. The approach is easily generalized to kernel logistic regression and easily integrated into methods for structured prediction. We present experiments in which we learn such models using an optimization method consisting of a combination of gradient descent and coordinate descent using localized grid search so as to escape from local minima. Our experiments indicate that optimization quality is improved when learning metaparameters are themselves optimized using a validation set. Our experiments show improved performance relative to widely used logistic and hinge loss methods on a wide variety of problems ranging from standard UC Irvine and libSVM evaluation datasets to product review predictions and a visual information extraction task. We observe that the approach is as follows: (1) more robust to outliers compared to the logistic and hinge losses; (2) outperforms comparable logistic and max margin models on larger scale benchmark problems; (3) when combined with Gaussian\u2013Laplacian mixture prior on parameters the kernelized version of our formulation yields sparser solutions than Support Vector Machine classifiers; and (4) when integrated into a probabilistic structured prediction technique our approach provides more accurate probabilities yielding improved inference and increasing information extraction performance.<\/jats:p>","DOI":"10.1145\/3365672","type":"journal-article","created":{"date-parts":[[2019,12,13]],"date-time":"2019-12-13T14:08:57Z","timestamp":1576246137000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["A New Smooth Approximation to the Zero One Loss with a Probabilistic Interpretation"],"prefix":"10.1145","volume":"14","author":[{"given":"Md Kamrul","family":"Hasan","sequence":"first","affiliation":[{"name":"\u00c9cole Polytechnique Montr\u00e9al, QC, Canada"}]},{"given":"Christopher","family":"Pal","sequence":"additional","affiliation":[{"name":"Mila, \u00c9cole Polytechnique Montr\u00e9al, QC, Canada"}]}],"member":"320","published-online":{"date-parts":[[2019,12,13]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml.  K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143870"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)","author":"Cotter Andrew","year":"2013","unstructured":"Andrew Cotter , Shai Shalev-Shwartz , and Nati Srebro . 2013 . Learning optimally sparse support vector machines . In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) . 266--274. Andrew Cotter, Shai Shalev-Shwartz, and Nati Srebro. 2013. Learning optimally sparse support vector machines. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913). 266--274."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2468069"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of Neural Information Processing Systems (NIPS\u201908)","author":"Do Chuong B.","year":"2008","unstructured":"Chuong B. Do , Quoc Le , Choon Hui Teo , Olivier Chapelle , and Alex Smola . 2008 . Tighter bounds for structured estimation . In Proceedings of Neural Information Processing Systems (NIPS\u201908) . Chuong B. Do, Quoc Le, Choon Hui Teo, Olivier Chapelle, and Alex Smola. 2008. Tighter bounds for structured estimation. In Proceedings of Neural Information Processing Systems (NIPS\u201908)."},{"volume-title":"Proceedings of the 25th International Conference on Machine Learning. ACM","author":"Dredze M.","key":"e_1_2_1_6_1","unstructured":"M. Dredze , K. Crammer , and F. Pereira . 2008. Confidence-weighted linear classification . In Proceedings of the 25th International Conference on Machine Learning. ACM , New York, NY, 264--271. M. Dredze, K. Crammer, and F. Pereira. 2008. Confidence-weighted linear classification. In Proceedings of the 25th International Conference on Machine Learning. ACM, New York, NY, 264--271."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.109"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/120865094"},{"volume-title":"Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 221--231","author":"Gimpel Kevin","key":"e_1_2_1_9_1","unstructured":"Kevin Gimpel and Noah A. Smith . 2012. Structured ramp loss minimization for machine translation . In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 221--231 . Kevin Gimpel and Noah A. Smith. 2012. Structured ramp loss minimization for machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 221--231."},{"volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence.","key":"e_1_2_1_10_1","unstructured":"Md. Kamrul Hasan and Chris Pal. 2014. Experiments on visual information extraction with the faces of Wikipedia . In Proceedings of the AAAI Conference on Artificial Intelligence. Md. Kamrul Hasan and Chris Pal. 2014. Experiments on visual information extraction with the faces of Wikipedia. In Proceedings of the AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273539"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.2307\/1910129"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2456899"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3176825"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3176838"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1079120129"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.43.4.570"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3009657.3009730"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1117\/1.2819119"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)","author":"Nguyen Tan","year":"2013","unstructured":"Tan Nguyen and Scott Sanner . 2013 . Algorithms for direct 0--1 loss optimization in binary classification . In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) . 1085--1093. Tan Nguyen and Scott Sanner. 2013. Algorithms for direct 0--1 loss optimization in binary classification. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913). 1085--1093."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2003.809399"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.149"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214503000000639"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/945365.964289"},{"key":"e_1_2_1_25_1","volume-title":"Williamson","author":"Rooyen Brendan Van","year":"2015","unstructured":"Brendan Van Rooyen , Aditya Menon , and Robert C . Williamson . 2015 . Learning with symmetric label noise: The importance of being unhinged. In Proceedings of the Advances in Neural Information Processing Systems . 10--18. Brendan Van Rooyen, Aditya Menon, and Robert C. Williamson. 2015. Learning with symmetric label noise: The importance of being unhinged. In Proceedings of the Advances in Neural Information Processing Systems. 10--18."},{"volume-title":"The Nature of Statistical Learning Theory","author":"Vapnik Vladimir","key":"e_1_2_1_26_1","unstructured":"Vladimir Vapnik . 2000. The Nature of Statistical Learning Theory . Springer . Vladimir Vapnik. 2000. The Nature of Statistical Learning Theory. Springer."},{"key":"e_1_2_1_27_1","unstructured":"Pascal Vincent. 2004. Mod\u00e8les \u00e0 noyaux \u00e0 structure locale. Citeseer.  Pascal Vincent. 2004. Mod\u00e8les \u00e0 noyaux \u00e0 structure locale. Citeseer."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013955821559"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.4108\/icst.collaboratecom.2012.250689"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214507000000617"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2956556"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1162\/08997660360581958"},{"key":"e_1_2_1_33_1","unstructured":"Jingwei Zhang Tongliang Liu and Dacheng Tao. 2018. On the rates of convergence from surrogate risk minimizers to the Bayes optimal classifier. arXiv:1802.03688.  Jingwei Zhang Tongliang Liu and Dacheng Tao. 2018. On the rates of convergence from surrogate risk minimizers to the Bayes optimal classifier. arXiv:1802.03688."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011441423217"},{"key":"e_1_2_1_35_1","first-page":"1","article-title":"Smoothing multivariate performance measures","volume":"10","author":"Zhang Xinhua","year":"2011","unstructured":"Xinhua Zhang , Ankan Saha , and S. V. N. Vishwanathan . 2011 . Smoothing multivariate performance measures . Journal of Machine Learning Research 10 (2011), 1 -- 55 . Xinhua Zhang, Ankan Saha, and S. V. N. Vishwanathan. 2011. Smoothing multivariate performance measures. Journal of Machine Learning Research 10 (2011), 1--55.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2005.00503.x"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3365672","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3365672","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:36Z","timestamp":1750202616000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3365672"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,13]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,2,29]]}},"alternative-id":["10.1145\/3365672"],"URL":"https:\/\/doi.org\/10.1145\/3365672","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2019,12,13]]},"assertion":[{"value":"2017-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}