{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T18:11:16Z","timestamp":1776103876907,"version":"3.50.1"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,7,27]],"date-time":"2014-07-27T00:00:00Z","timestamp":1406419200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100007458","name":"Qatar Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007458","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CGV-1111415, 1122374"],"award-info":[{"award-number":["CGV-1111415, 1122374"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006919","name":"Massachusetts Institute of Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006919","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006112","name":"Microsoft Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006112","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2014,7,27]]},"abstract":"<jats:p>When sound hits an object, it causes small vibrations of the object's surface. We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects---a glass of water, a potted plant, a box of tissues, or a bag of chips---into visual microphones. We recover sounds from high-speed footage of a variety of objects with different properties, and use both real and simulated data to examine some of the factors that affect our ability to visually recover sound. We evaluate the quality of recovered sounds using intelligibility and SNR metrics and provide input and recovered audio samples for direct comparison. We also explore how to leverage the rolling shutter in regular consumer cameras to recover audio from standard frame-rate videos, and use the spatial resolution of our method to visualize how sound-related vibrations vary over an object's surface, which we can use to recover the vibration modes of an object.<\/jats:p>","DOI":"10.1145\/2601097.2601119","type":"journal-article","created":{"date-parts":[[2014,7,22]],"date-time":"2014-07-22T15:08:20Z","timestamp":1406041700000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":310,"title":["The visual microphone"],"prefix":"10.1145","volume":"33","author":[{"given":"Abe","family":"Davis","sequence":"first","affiliation":[{"name":"MIT CSAIL"}]},{"given":"Michael","family":"Rubinstein","sequence":"additional","affiliation":[{"name":"Microsoft Research and MIT CSAIL"}]},{"given":"Neal","family":"Wadhwa","sequence":"additional","affiliation":[{"name":"MIT CSAIL"}]},{"given":"Gautham J.","family":"Mysore","sequence":"additional","affiliation":[{"name":"Adobe Research"}]},{"given":"Fr\u00e9do","family":"Durand","sequence":"additional","affiliation":[{"name":"MIT CSAIL"}]},{"given":"William T.","family":"Freeman","sequence":"additional","affiliation":[{"name":"MIT CSAIL"}]}],"member":"320","published-online":{"date-parts":[[2014,7,27]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015706.1015718"},{"key":"e_1_2_2_2_1","first-page":"113","article-title":"Suppression of acoustic noise in speech using spectral subtraction. Acoustics, Speech and Signal Processing","volume":"27","author":"Boll S.","year":"1979","unstructured":"Boll , S. 1979 . Suppression of acoustic noise in speech using spectral subtraction. Acoustics, Speech and Signal Processing , IEEE Transactions on 27 , 2, 113 -- 120 . Boll, S. 1979. Suppression of acoustic noise in speech using spectral subtraction. Acoustics, Speech and Signal Processing, IEEE Transactions on 27, 2, 113--120.","journal-title":"IEEE Transactions on"},{"key":"e_1_2_2_3_1","volume-title":"Proceedings of the 32nd International Modal Analysis Conference (to appear).","author":"Chen J.","unstructured":"Chen , J. , Wadhwa , N. , Cha , Y.-J. , Durand , F. , Freeman , W. T. , and Buyukozturk , O . 2014. Structural modal identification through high speed camera video: Motion magnification . Proceedings of the 32nd International Modal Analysis Conference (to appear). Chen, J., Wadhwa, N., Cha, Y.-J., Durand, F., Freeman, W. T., and Buyukozturk, O. 2014. Structural modal identification through high speed camera video: Motion magnification. Proceedings of the 32nd International Modal Analysis Conference (to appear)."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1458024"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.218"},{"key":"e_1_2_2_6_1","volume-title":"Proc. DARPA Workshop on speech recognition, 93--99","author":"Fisher W. M.","unstructured":"Fisher , W. M. , Doddington , G. R. , and Goudie-Marshall , K. M . 1986. The darpa speech recognition research database: specifications and status . In Proc. DARPA Workshop on speech recognition, 93--99 . Fisher, W. M., Doddington, G. R., and Goudie-Marshall, K. M. 1986. The darpa speech recognition research database: specifications and status. In Proc. DARPA Workshop on speech recognition, 93--99."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2002.1031944"},{"key":"e_1_2_2_8_1","volume-title":"Computational Photography (ICCP), 2012 IEEE International Conference on, IEEE, 1--8.","author":"Grundmann M.","unstructured":"Grundmann , M. , Kwatra , V. , Castro , D. , and Essa , I . 2012. Calibration-free rolling shutter removal . In Computational Photography (ICCP), 2012 IEEE International Conference on, IEEE, 1--8. Grundmann, M., Kwatra, V., Castro, D., and Essa, I. 2012. Calibration-free rolling shutter removal. In Computational Photography (ICCP), 2012 IEEE International Conference on, IEEE, 1--8."},{"key":"e_1_2_2_9_1","first-page":"2819","article-title":"An effective quality evaluation protocol for speech enhancement algorithms","volume":"7","author":"Hansen J. H.","year":"1998","unstructured":"Hansen , J. H. , and Pellom , B. L. 1998 . An effective quality evaluation protocol for speech enhancement algorithms . In ICSLP , vol. 7 , 2819 -- 2822 . Hansen, J. H., and Pellom, B. L. 1998. An effective quality evaluation protocol for speech enhancement algorithms. In ICSLP, vol. 7, 2819--2822.","journal-title":"ICSLP"},{"key":"e_1_2_2_10_1","first-page":"317","article-title":"Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes. Acoustics, Speech and Signal Processing","volume":"34","author":"Janssen A.","year":"1986","unstructured":"Janssen , A. , Veldhuis , R. , and Vries , L. 1986 . Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes. Acoustics, Speech and Signal Processing , IEEE Transactions on 34 , 2, 317 -- 330 . Janssen, A., Veldhuis, R., and Vries, L. 1986. Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes. Acoustics, Speech and Signal Processing, IEEE Transactions on 34, 2, 317--330.","journal-title":"IEEE Transactions on"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Jansson E. Molin N.-E. and Sundin H. 1970. Resonances of a violin body studied by hologram interferometry and acoustical methods. Physica scripta 2 6 243.  Jansson E. Molin N.-E. and Sundin H. 1970. Resonances of a violin body studied by hologram interferometry and acoustical methods. Physica scripta 2 6 243.","DOI":"10.1088\/0031-8949\/2\/6\/002"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073204.1073223"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2004.1262177"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531350"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.4028\/www.scientific.net\/KEM.347.239"},{"key":"e_1_2_2_16_1","volume-title":"Image sensors and signal processing for digital still cameras","author":"Nakamura J.","unstructured":"Nakamura , J. 2005. Image sensors and signal processing for digital still cameras . CRC Press . Nakamura, J. 2005. Image sensors and signal processing for digital still cameras. CRC Press."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2005.13"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026553619983"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2007.04.002"},{"key":"e_1_2_2_20_1","unstructured":"Quackenbush S. R. Barnwell T. P. and Clements M. A. 1988. Objective measures of speech quality. Prentice Hall Englewood Cliffs NJ.  Quackenbush S. R. Barnwell T. P. and Clements M. A. 1988. Objective measures of speech quality . Prentice Hall Englewood Cliffs NJ."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-460X(89)90705-0"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.119725"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1006\/mssp.1998.1209"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2011.2114881"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2461966"},{"key":"e_1_2_2_27_1","volume-title":"Computational Photography (ICCP), 2014 IEEE International Conference on, IEEE.","author":"Wadhwa N.","unstructured":"Wadhwa , N. , Rubinstein , M. , Durand , F. , and Freeman , W. T . 2014. Riesz pyramid for fast phase-based video magnification . In Computational Photography (ICCP), 2014 IEEE International Conference on, IEEE. Wadhwa, N., Rubinstein, M., Durand, F., and Freeman, W. T. 2014. Riesz pyramid for fast phase-based video magnification. In Computational Photography (ICCP), 2014 IEEE International Conference on, IEEE."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185561"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1364\/OE.17.021566"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2601097.2601119","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2601097.2601119","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:19:10Z","timestamp":1750231150000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2601097.2601119"}},"subtitle":["passive recovery of sound from video"],"short-title":[],"issued":{"date-parts":[[2014,7,27]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,7,27]]}},"alternative-id":["10.1145\/2601097.2601119"],"URL":"https:\/\/doi.org\/10.1145\/2601097.2601119","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,7,27]]},"assertion":[{"value":"2014-07-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}