{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T04:30:58Z","timestamp":1764304258103},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1-2","license":[{"start":{"date-parts":[[2005,5,1]],"date-time":"2005-05-01T00:00:00Z","timestamp":1114905600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2005,5]]},"DOI":"10.1007\/s10994-005-0462-7","type":"journal-article","created":{"date-parts":[[2005,6,9]],"date-time":"2005-06-09T09:02:36Z","timestamp":1118307756000},"page":"55-76","source":"Crossref","is-referenced-by-count":21,"title":["PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification"],"prefix":"10.1007","volume":"59","author":[{"given":"Thore","family":"Graepel","sequence":"first","affiliation":[]},{"given":"Ralf","family":"Herbrich","sequence":"additional","affiliation":[]},{"given":"John","family":"Shawe-Taylor","sequence":"additional","affiliation":[]}],"member":"297","reference":[{"key":"462_CR1","doi-asserted-by":"crossref","unstructured":"Bartlett, P. & Shawe-Taylor, J. (1998). Generalization performance of support vector machines and other pattern classifiers. Advances in Kernel Methods\u2014Support Vector Learning (pp. 43\u201354). MIT Press.","DOI":"10.7551\/mitpress\/1130.003.0007"},{"key":"462_CR2","first-page":"335","volume":"2","author":"A. Cannon","year":"2002","unstructured":"Cannon, A., Ettinger, J. M., Hush, D., & Scovel, C. (2002). Machine learning with data dependent hypothesis classes. Journal of Machine Learning Research, 2, 335\u2013358.","journal-title":"Journal of Machine Learning Research"},{"key":"462_CR3","unstructured":"Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2002). On the generalization ability of on-line learning algorithms. Advances in Neural Information Processing Systems (vol. 14). Cambridge, MA: MIT Press."},{"key":"462_CR4","first-page":"273","volume":"20","author":"C. Cortes","year":"1995","unstructured":"Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273\u2013297.","journal-title":"Machine Learning"},{"issue":"1","key":"462_CR5","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","volume":"13","author":"T. M. Cover","year":"1967","unstructured":"Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classifications. IEEE Transactions on Information Theory, 13:1, 21\u201327.","journal-title":"IEEE Transactions on Information Theory"},{"key":"462_CR6","unstructured":"Cristianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge, UK: Cambridge University Press."},{"key":"462_CR7","first-page":"1","volume":"27","author":"S. Floyd","year":"1995","unstructured":"Floyd, S. & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik Chervonenkis dimension. Machine Learning, 27, 1\u201336.","journal-title":"Machine Learning"},{"key":"462_CR8","unstructured":"Graepel, T., Herbrich, R., & Shawe-Taylor, J. (2000). Generalisation error bounds for sparse linear classifiers. Proceedings of the Annual Conference on Computational Learning Theory (pp. 298\u2013303)."},{"key":"462_CR9","doi-asserted-by":"crossref","unstructured":"Herbrich, R. (2001). Learning Kernel Classifiers: Theory and Algorithms. MIT Press.","DOI":"10.7551\/mitpress\/4170.001.0001"},{"issue":"12","key":"462_CR10","doi-asserted-by":"crossref","first-page":"3140","DOI":"10.1109\/TIT.2002.805090","volume":"48","author":"R. Herbrich","year":"2002","unstructured":"Herbrich, R. & Graepel, T. (2002). A PAC-Bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 48:12, 3140\u20133150.","journal-title":"IEEE Transactions on Information Theory"},{"key":"462_CR11","first-page":"175","volume":"3","author":"R. Herbrich","year":"2002","unstructured":"Herbrich, R. & Williamson, R. C. (2002). Algorithmic luckiness. Journal of Machine Learning Research, 3, 175\u2013212.","journal-title":"Journal of Machine Learning Research"},{"key":"462_CR12","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1080\/01621459.1963.10500830","volume":"58","author":"W. Hoeffding","year":"1963","unstructured":"Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13\u201330.","journal-title":"Journal of the American Statistical Association"},{"key":"462_CR13","unstructured":"Langford, J. & Shawe-Taylor, J. (2003). PAC-Bayes and margins. Advances in Neural Information Processing Systems 15 (pp. 439\u2013446). Cambridge, MA: MIT Press."},{"key":"462_CR14","first-page":"285","volume":"2","author":"N. Littlestone","year":"1988","unstructured":"Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-treshold algorithm. Machine Learning, 2, 285\u2013318.","journal-title":"Machine Learning"},{"key":"462_CR15","doi-asserted-by":"crossref","unstructured":"Littlestone, N. (1989). From on-line to batch learning. Proceedings of the Second Annual Conference on Computational Learning Theory (pp. 269\u2013284).","DOI":"10.1016\/B978-0-08-094829-4.50022-2"},{"key":"462_CR16","unstructured":"Littlestone, N. & Warmuth, M. (1986). Relating data compression and learnability. Technical Report, University of California, Santa Cruz."},{"key":"462_CR17","unstructured":"Marchand, M. & Shawe-Taylor, J. (2001). Learning with the set covering machine. Proceedings of the Eighteenth International Conference on Machine Learning (ICML\u20192001) (pp. 345\u2013352). San Francisco, CA: Morgan Kaufmann."},{"key":"462_CR18","doi-asserted-by":"crossref","unstructured":"McAllester, D. A. (1998). Some PAC Bayesian theorems. Proceedings of the Annual Conference on Computational Learning Theory (pp. 230\u2013234). Madison, Wisconsin: ACM Press.","DOI":"10.1145\/279943.279989"},{"key":"462_CR19","doi-asserted-by":"crossref","unstructured":"McAllester, D. A. (1999). PAC-Bayesian model averaging. Proceedings of the Annual Conference on Computational Learning Theory (pp. 164\u2013170). Santa Cruz, USA.","DOI":"10.1145\/307400.307435"},{"key":"462_CR20","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/0005-1098(78)90005-5","volume":"14","author":"J. Rissanen","year":"1978","unstructured":"Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465\u2013471.","journal-title":"Automatica"},{"key":"462_CR21","unstructured":"Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptron and Theory of Brain Mechanisms. Washington D.C.: Spartan\u2013Books"},{"issue":"5","key":"462_CR22","doi-asserted-by":"crossref","first-page":"1926","DOI":"10.1109\/18.705570","volume":"44","author":"J. Shawe-Taylor","year":"1998","unstructured":"Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:5, 1926\u20131940.","journal-title":"IEEE Transactions on Information Theory"},{"key":"462_CR23","doi-asserted-by":"crossref","unstructured":"Shawe-Taylor, J. & Williamson, R. C. (1997). A PAC analysis of a Bayesian estimator. Technical Report, Royal Holloway, University of London, NC2-TR-1997-013.","DOI":"10.1145\/267460.267466"},{"key":"462_CR24","first-page":"211","volume":"1","author":"M. Tipping","year":"2001","unstructured":"Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211\u2013244.","journal-title":"Journal of Machine Learning Research"},{"key":"462_CR25","unstructured":"Vapnik, V. (1998). Statistical Learning Theory. New York: John Wiley and Sons."},{"key":"462_CR26","doi-asserted-by":"crossref","unstructured":"Vit\u00e1nyi, P. & Li, M. (1997). On prediction by data compression. Proceedings of the European Conference on Machine Learning (pp. 14\u201330).","DOI":"10.1007\/3-540-62858-4_69"},{"key":"462_CR27","unstructured":"Warmuth, M. (2003). Open problems: Compressing to VC dimension many points. Proceedings of the Annual Conference on Computational Learning Theory."},{"issue":"6","key":"462_CR28","first-page":"415","volume":"4","author":"A. D. Wyner","year":"1992","unstructured":"Wyner, A. D., Ziv, J., & Wyner, A. J. (1992). On the role of pattern matching in information theory. IEEE Transactions on Information Theory, 4:6, 415\u2013447.","journal-title":"IEEE Transactions on Information Theory"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-005-0462-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-005-0462-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-005-0462-7","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,26]],"date-time":"2024-01-26T13:20:47Z","timestamp":1706275247000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-005-0462-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":28,"journal-issue":{"issue":"1-2","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["462"],"URL":"https:\/\/doi.org\/10.1007\/s10994-005-0462-7","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2005,5]]}}}