{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T22:33:13Z","timestamp":1777761193547,"version":"3.51.4"},"reference-count":90,"publisher":"Emerald","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,2,18]]},"abstract":"<jats:p>Numerous voice, still image, audio, and video compression standards have been developed over the last 25 years, and significant advances in the state of the art have been achieved. However, in the more than 50 years since Shannon\u2019s seminal 1959 paper, no rate distortion bounds for voice and video have been forthcoming. In this volume, we present the first rate distortion bounds for voice and video that actually lower bound the operational rate distortion performance of the best-performing voice and video codecs. The bounds indicate that improvements in rate distortion performance of approximately 50% over the best-performing voice and video codecs are possible. Research directions to improve the new bounds are discussed.<\/jats:p>","DOI":"10.1561\/0100000061","type":"journal-article","created":{"date-parts":[[2014,2,18]],"date-time":"2014-02-18T08:25:05Z","timestamp":1392711905000},"page":"379-514","source":"Crossref","is-referenced-by-count":7,"title":["Rate Distortion Bounds for Voice and Video"],"prefix":"10.1108","volume":"10","author":[{"given":"Jerry D.","family":"Gibson","sequence":"first","affiliation":[{"name":"University of California , Santa Barbara"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Hu","sequence":"additional","affiliation":[{"name":"University of California , Santa Barbara"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","published-online":{"date-parts":[[2014,2,18]]},"reference":[{"key":"2026032712270286900_ref001","article-title":"Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions","author":"3GPP","year":"2011","journal-title":"TS 26.090, 3rd Generation Partnership Project (3GPP)"},{"key":"2026032712270286900_ref002","article-title":"Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions","author":"3GPP","year":"2011","journal-title":"TS 26.190, 3rd Generation Partnership Project (3GPP)"},{"key":"2026032712270286900_ref003","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1121\/1.1912679","article-title":"Speech analysis and synthesis by linear prediction of the speech wave","volume":"50","author":"Atal","year":"1971","journal-title":"Journal of the Acoustic Society of America"},{"key":"2026032712270286900_ref004","doi-asserted-by":"crossref","first-page":"1973","DOI":"10.1002\/j.1538-7305.1970.tb04297.x","article-title":"Adaptive predictive coding of speech signals","author":"Atal","year":"1970","journal-title":"The Bell System technical journal"},{"key":"2026032712270286900_ref005","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1109\/TSA.2002.804299","article-title":"The adaptive multirate wideband speech codec (AMR-WB)","volume":"10","author":"Bessette","year":"2002","journal-title":"IEEE Trans. on Speech and Audio Processing"},{"key":"2026032712270286900_ref006","volume-title":"Rate Distortion Theory","author":"Berger","year":"1971"},{"issue":"6","key":"2026032712270286900_ref007","doi-asserted-by":"crossref","first-page":"2693","DOI":"10.1109\/18.720552","article-title":"Lossy Source Coding","volume":"44","author":"Berger","year":"1998","journal-title":"IEEE Trans. on Information Theory"},{"key":"2026032712270286900_ref008","volume-title":"Soource coding of composite sources","author":"Carter","year":"1984"},{"issue":"1","key":"2026032712270286900_ref009","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1109\/76.554439","article-title":"A new rate control scheme using quadratic rate distortion model","volume":"7","author":"Chiang","year":"1997","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2026032712270286900_ref010","volume-title":"Elements of Information Theory","author":"Cover","year":"1991"},{"key":"2026032712270286900_ref011","volume-title":"Elements of information theory","author":"Cover","year":"1991"},{"issue":"10","key":"2026032712270286900_ref012","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1109\/MCOM.2009.5273816","article-title":"Itu-t coders for wideband, superwideband, and fullband speech communication [series editorial]","volume":"47","author":"Cox","year":"2009","journal-title":"Communications Magazine, IEEE"},{"issue":"5","key":"2026032712270286900_ref013","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1109\/TCSVT.2007.894053","article-title":"Rate control for H.264 video with enhanced rate and distortion models","volume":"17","author":"Kwon","year":"2007","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"4","key":"2026032712270286900_ref014","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1109\/TCOM.1982.1095508","article-title":"Subjective Evaluation of Several Efficient Speech Coders","volume":"30","author":"Daumer","year":"1982","journal-title":"IEEE Trans. on Communications"},{"issue":"7","key":"2026032712270286900_ref015","doi-asserted-by":"crossref","first-page":"800","DOI":"10.1109\/PROC.1972.8779","article-title":"Rate-distortion theory and application","volume":"60","author":"Davisson","year":"1972","journal-title":"Proceedings of the IEEE"},{"key":"2026032712270286900_ref016","first-page":"452","volume-title":"IEEE Global Telecommunications Conference","author":"De","year":"1992"},{"key":"2026032712270286900_ref017","volume-title":"A class of composite sources and their erfodic and information theoretic properties","author":"Fontana","year":"1978"},{"key":"2026032712270286900_ref018","volume-title":"Information Theory and Reliable Communication","author":"Gallager","year":"1968"},{"key":"2026032712270286900_ref019","volume-title":"Communication of composite sources","author":"Garde","year":"1980"},{"issue":"4","key":"2026032712270286900_ref020","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/MCAS.2005.1550167","article-title":"Speech Coding Methods, Standards, and Applications","volume":"5","author":"Gibson","year":"2005","journal-title":"IEEE Circuits and Systems Magazine"},{"key":"2026032712270286900_ref021","volume-title":"Information Theory and Applications Workshop (ITA)","author":"Gibson","year":"2010"},{"key":"2026032712270286900_ref022","volume-title":"Information Theory and Applications Workshop","author":"Gibson","year":"2012"},{"key":"2026032712270286900_ref023","volume-title":"Digital compression for multimedia: principles and standards","author":"Gibson","year":"1998"},{"issue":"7","key":"2026032712270286900_ref024","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1109\/JSAC.1987.1146632","article-title":"The efficiency of motion-compensating prediction for hybrid coding of video sequences","volume":"SAC-5","author":"Girod","year":"1987","journal-title":"IEEE Journal on selected areas in communications"},{"key":"2026032712270286900_ref025","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1109\/26.223785","article-title":"Motion-compensating prediction with fractional-pel accuracy","volume":"41","author":"Girod","year":"1993","journal-title":"IEEE Transactions on Communications"},{"issue":"4","key":"2026032712270286900_ref026","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1109\/TIT.1970.1054470","article-title":"Information rates of autoregressive processes","volume":"16","author":"Gray","year":"1970","journal-title":"IEEE Trans. on Information Theory"},{"issue":"4","key":"2026032712270286900_ref027","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1109\/TIT.1973.1055050","article-title":"A new class of lower bounds to information rates of stationary sources via conditional rate-distortion functions","volume":"IT-19","author":"Gray","year":"1973","journal-title":"IEEE Trans. on Information Theory"},{"issue":"4","key":"2026032712270286900_ref028","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1109\/TIT.1973.1055050","article-title":"A new class of lower bounds to information rates of stationary sources via conditional rate-distortion functions","volume":"IT-19","author":"Gray","year":"1973","journal-title":"IEEE Tran. Inform. Theory"},{"key":"2026032712270286900_ref029","first-page":"353","volume-title":"Signal Processing III: Theories and Applications","author":"Brehm","year":"1986"},{"issue":"1","key":"2026032712270286900_ref030","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1109\/TCOM.1971.1090601","article-title":"Image coding by linear transformation and block quantization","volume":"Com-19","author":"Habibi","year":"1971","journal-title":"IEEE Transactions on Communication Technology"},{"key":"2026032712270286900_ref031","doi-asserted-by":"crossref","DOI":"10.1109\/ALLERTON.2008.4797667","article-title":"New rate distortion bounds for natural videos based on a texture dependent correlation model in the spatial-temporal domain","author":"Hu","year":"2008","journal-title":"the 46th Annual Allerton Conference on Communication, Controls, and Computing"},{"key":"2026032712270286900_ref032","volume-title":"Information technology \u2013 generic coding of moving pictures and associated audio information: Systems","author":"ISO\/IEC 13818-1:2000","year":"2000"},{"key":"2026032712270286900_ref033","volume-title":"Information technology \u2013 coding of audio-visual objects \u2013 part 1: Systems","author":"ISO\/IEC 14496-1:2001","year":"2001"},{"key":"2026032712270286900_ref034","article-title":"Video coding for low bit rate communication","author":"ITU Recommendations","year":"2005","journal-title":"ITU-T rec. H.263"},{"key":"2026032712270286900_ref035","volume-title":"Advanced video coding for generic audiovisual services","author":"ITU-T and ISO\/IEC JTC 1","year":"2003"},{"key":"2026032712270286900_ref036","unstructured":"ITU-T and ISO\/IEC JTC 1\n          . H.265 : High efficiency video coding. http:\/\/www.itu.int\/rec\/T-REC-H.265-201304-I, Apr. 2013."},{"key":"2026032712270286900_ref037","volume-title":"Software tools for speech and audio coding standardization","author":"ITU-T Recommendation G.191","year":"2010"},{"key":"2026032712270286900_ref038","volume-title":"Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit\/s","author":"ITU-T Recommendation G.718","year":"2008"},{"key":"2026032712270286900_ref039","volume-title":"Transmission systems and media, digital systems and networks, Digital terminal equipments-Coding of analogue signals by pulse code modulation","author":"ITU-T Recommendation G.719, Series G","year":"2008"},{"key":"2026032712270286900_ref040","volume-title":"7 kHz Audio-Coding within 64 kbits\/s","author":"ITU-T Recommendation G.722","year":"1988"},{"key":"2026032712270286900_ref041","volume-title":"Low-complexity coding at 24 and 32 kbit\/s for hands-free operation in systems with low frame loss","author":"ITU-T Recommendation G.722.1","year":"2005"},{"key":"2026032712270286900_ref042","volume-title":"Wideband coding of speech at around 16 kbit\/s using Adaptive Multi-Rate Wideband (AMR-WB)","author":"ITU-T Recommendation G.722.2","year":"2003"},{"key":"2026032712270286900_ref043","volume-title":"40, 32, 24, 16 kbit\/s Adaptive Differential Pulse Code Modulation (ADPCM)","author":"ITU-T Recommendation G.726","year":"1990"},{"key":"2026032712270286900_ref044","volume-title":"5-, 4-, 3- and 2-bit\/sample embedded Adaptive Differential Pulse Code Modulation (ADPCM)","author":"ITU-T Recommendation G.727","year":"1990"},{"key":"2026032712270286900_ref045","volume-title":"Coding of speech at 8 kbit\/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)","author":"ITU-T Recommendation G.729","year":"2007"},{"key":"2026032712270286900_ref046","volume-title":"Objective measurement of active speech level","author":"ITU-T Recommendation P.56","year":"1993"},{"key":"2026032712270286900_ref047","volume-title":"Subjective performance assessment of telephone-band and wideband digital codecs","author":"ITU-T Recommendation P.830","year":"1996"},{"key":"2026032712270286900_ref048","volume-title":"Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end Speech Quality Assessment of Narrow-band telephone networks and Speech Codecs","author":"ITU-T Recommendation P.862","year":"2001"},{"key":"2026032712270286900_ref049","volume-title":"Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end Speech Quality Assessment of Narrow-band telephone networks and Speech Codecs","author":"ITU-T Recommendation P.862","year":"2001"},{"key":"2026032712270286900_ref050","volume-title":"Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs","author":"ITU-T Recommendation P.862.2","year":"2007"},{"key":"2026032712270286900_ref051","volume-title":"Application guide for objective quality measurement based on Recommendations P.862, P.862.1 and P.862.2","author":"ITU-T Recommendation P.862.3","year":"2007"},{"issue":"6","key":"2026032712270286900_ref052","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1109\/TIT.1977.1055796","article-title":"Coding isotropic images","volume":"IT-23","author":"O\u2019neal","year":"1977","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026032712270286900_ref053","volume-title":"Digital Coding of Waveforms: Principles and Applications to Speech and Video","author":"Jayant","year":"1984"},{"issue":"16","key":"2026032712270286900_ref054","doi-asserted-by":"crossref","first-page":"1916","DOI":"10.1016\/j.comcom.2010.04.019","article-title":"Media coding for the next generation mobile system lte","volume":"33","author":"J\u00e4rvinen","year":"2010","journal-title":"Computer Communications"},{"issue":"2","key":"2026032712270286900_ref055","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0165-1684(89)90050-9","article-title":"Itakura-saito clustering and rate distortion functions for a composite source model of speech","volume":"18","author":"Kalveram","year":"1989","journal-title":"Signal Processing"},{"key":"2026032712270286900_ref056","volume-title":"Signal Processing IV: Theories and Applications","author":"Kalveram","year":"1988"},{"key":"2026032712270286900_ref057","volume-title":"Speech coding and synthesis","author":"Kleijn","year":"1995"},{"issue":"4","key":"2026032712270286900_ref058","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1109\/TIT.1956.1056823","article-title":"On the shannon theory of information transmission in the case of continuous signals","volume":"2","author":"Kolmogorov","year":"1956","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026032712270286900_ref059","doi-asserted-by":"crossref","DOI":"10.1002\/0470870109","volume-title":"Digital Speech: Coding for Low Bit Rate Communication Systems","author":"Kondoz","year":"2004"},{"key":"2026032712270286900_ref060","volume-title":"Digital speech: coding for low bit rate communication systems","author":"Kondoz","year":"2005"},{"issue":"6","key":"2026032712270286900_ref061","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1109\/76.867926","article-title":"Scalable rate control for MPEG-4 video","volume":"10","author":"Lee","year":"2000","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"2","key":"2026032712270286900_ref062","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1109\/TMM.2003.822792","article-title":"Providing adaptive QoS to layered video over wireless local area networks through real-time retry limit adaptation","volume":"6","author":"Li","year":"2004","journal-title":"IEEE Transactions on Multimedia"},{"issue":"7","key":"2026032712270286900_ref063","doi-asserted-by":"crossref","DOI":"10.1109\/TCSVT.2003.815169","article-title":"Introduction to the special issue on the H.264\/AVC video coding standard","volume":"13","author":"Luthra","year":"2003","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2026032712270286900_ref064","first-page":"I\u2013281","article-title":"On the architecture of the cdma2000\u0151 variable-rate multimode wideband (VMR-WB) speech coding standard","author":"Jelinek","year":"2004","journal-title":"Proc. ICASSP"},{"key":"2026032712270286900_ref065","article-title":"Rate control on JVT standard","author":"Ma","year":"2002","journal-title":"Joint Video Team (JVT) of ISO\/IEC MPEG & ITU-T VCEG, JVT-D030"},{"issue":"1","key":"2026032712270286900_ref066","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1109\/MSP.2012.2219672","article-title":"High efficiency video coding: The next frontier in video compression","volume":"30","author":"Ohm","year":"2013","journal-title":"IEEE Signal Processing Magazine"},{"issue":"6","key":"2026032712270286900_ref067","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/79.733495","article-title":"Rate-distortion methods for image and video compression","volume":"15","author":"Ortega","year":"1998","journal-title":"IEEE Signal Processing Magazine"},{"key":"2026032712270286900_ref068","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1002\/j.1538-7305.1973.tb02007.x","article-title":"Adaptive Quantization in Differential PCM Coding of Speech","volume":"52","author":"Cummiskey","year":"1973","journal-title":"The Bell System technical journal"},{"key":"2026032712270286900_ref069","article-title":"Perceptual criteria for image quality evaluation","author":"Pappas","year":"2000","journal-title":"Handbook of Image & Video Processing (A. Bivok eds.), Academic Press"},{"key":"2026032712270286900_ref070","first-page":"213","article-title":"Mutual information between a pair of stationary gaussian random processes","volume":"99","author":"Pinsker","year":"1954","journal-title":"Dokl. Akad. Nauk. USSR"},{"issue":"2","key":"2026032712270286900_ref071","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1109\/MMUL.2013.24","article-title":"MPEG unified speech and audio coding","volume":"20","author":"Quackenbush","year":"2013","journal-title":"IEEE Multimedia"},{"key":"2026032712270286900_ref072","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1109\/TCOM.1983.1095893","article-title":"Distributions of the two-dimensional DCT coefficients for images","volume":"31","author":"Reininger","year":"1983","journal-title":"IEEE Transactions on Communications"},{"key":"2026032712270286900_ref073","article-title":"Phonetically Switched Tree coding of speech with a G.727 Code Generator","author":"Ramadas","year":"2009","journal-title":"the 43rd Annual Asilomar Conference on Signals, Systems, and Computers"},{"issue":"1","key":"2026032712270286900_ref074","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1109\/76.744284","article-title":"Rate control in DCT video coding for low-delay communications","volume":"9","author":"Ribas-Corbera","year":"1999","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2026032712270286900_ref075","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Sys. Tech. Journal"},{"key":"2026032712270286900_ref076","first-page":"142","article-title":"Coding Theorems for a Discrete Source with a Fidelity Criterion","volume":"7","author":"Shannon","year":"1959","journal-title":"IRE Conv. Rec"},{"key":"2026032712270286900_ref077","first-page":"142","article-title":"Coding theorems for a discrete source with a fidelity criterion","volume":"4","author":"Shannon","year":"1959","journal-title":"IRE National Convention Record"},{"key":"2026032712270286900_ref078","first-page":"2657","article-title":"Study of DCT coefficient distributions","author":"Smoot","year":"1996","journal-title":"SPIE Symposium on Electronic Imaging, San Jose, CA"},{"issue":"6","key":"2026032712270286900_ref079","doi-asserted-by":"crossref","DOI":"10.1109\/49.848253","article-title":"Analysis of video transmission over lossy channels","volume":"18","author":"Stuhlmuller","year":"2000","journal-title":"IEEE Journal on Selected Areas in Communications"},{"issue":"12","key":"2026032712270286900_ref080","doi-asserted-by":"crossref","first-page":"3690","DOI":"10.1109\/TIP.2006.884921","article-title":"Analysis of superimposed oriented patterns","volume":"15","author":"Aach","year":"2006","journal-title":"IEEE Transactions on Image Processing"},{"key":"2026032712270286900_ref081","article-title":"Rate distortion theory for image and video coding","author":"Tziritas","year":"1995","journal-title":"International Conference on Digital Signal Processing, Cyprus"},{"key":"2026032712270286900_ref082","article-title":"Panel on new perspectives for information theory","author":"Verdu","year":"2011","journal-title":"Information Theory Workshop, Paraty, Brazil"},{"key":"2026032712270286900_ref083","volume-title":"The opus codec","author":"Vos"},{"key":"2026032712270286900_ref084","volume-title":"Speech Coding and Synthesis","author":"Kleijn","year":"1995"},{"key":"2026032712270286900_ref085","volume-title":"Some techniques in universal source coding and coding for composite sources","author":"Wallace","year":"1982"},{"key":"2026032712270286900_ref086","volume-title":"Proceedings, IEEE ICASSP","author":"Wang","year":"1992"},{"key":"2026032712270286900_ref087","first-page":"49","volume-title":"Proceedings, IEEE ICASSP","author":"Wang","year":"1989"},{"key":"2026032712270286900_ref088","first-page":"1041","volume-title":"The Handbook of Video Databases: Design and Applications","author":"Wang","year":"2003"},{"key":"2026032712270286900_ref089","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1109\/TCSVT.2003.815165","article-title":"Overview of the H.264\/AVC video coding standard","volume":"13","author":"Wiegand","year":"2003","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"5","key":"2026032712270286900_ref090","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1109\/TCSVT.2007.894041","article-title":"Rate-distortion modeling for efficient H.264\/AVC encoding","volume":"17","author":"Tu","year":"2007","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"}],"container-title":["Foundations and Trends\u00ae in Communications and Information Theory"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftcit\/article-pdf\/10\/4\/379\/10865209\/0100000061en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftcit\/article-pdf\/10\/4\/379\/10865209\/0100000061en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T14:10:37Z","timestamp":1777471837000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftcit\/article\/10\/4\/379\/1319353\/Rate-Distortion-Bounds-for-Voice-and-Video"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2,18]]},"references-count":90,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,2,18]]}},"URL":"https:\/\/doi.org\/10.1561\/0100000061","relation":{},"ISSN":["1567-2190","1567-2328"],"issn-type":[{"value":"1567-2190","type":"print"},{"value":"1567-2328","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,2,18]]}}}