{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T17:58:39Z","timestamp":1776362319275,"version":"3.51.2"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2006,7,1]],"date-time":"2006-07-01T00:00:00Z","timestamp":1151712000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Speech Lang. Process."],"published-print":{"date-parts":[[2006,7]]},"abstract":"<jats:p>The acoustic environment provides a rich source of information on the types of activity, communication modes, and people involved in many situations. It can be accurately classified using recordings from microphones commonly found in PDAs and other consumer devices. We describe a prototype HMM-based acoustic environment classifier incorporating an adaptive learning mechanism and a hierarchical classification model. Experimental results show that we can accurately classify a wide variety of everyday environments. We also show good results classifying single sounds, although classification accuracy is influenced by the granularity of the classification.<\/jats:p>","DOI":"10.1145\/1149290.1149292","type":"journal-article","created":{"date-parts":[[2006,10,18]],"date-time":"2006-10-18T18:11:32Z","timestamp":1161195092000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":100,"title":["Acoustic environment classification"],"prefix":"10.1145","volume":"3","author":[{"given":"Ling","family":"Ma","sequence":"first","affiliation":[{"name":"University of York, York, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ben","family":"Milner","sequence":"additional","affiliation":[{"name":"University of East Anglia, Norwich, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dan","family":"Smith","sequence":"additional","affiliation":[{"name":"University of East Anglia, Norwich, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2006,7]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1007\/3-540-45479-9_29","article-title":"Semantic retrieval using audio analysis. In Proceedings of the Conference on Image and Video Retrieval","volume":"2383","author":"Bakker E. M.","year":"2002","unstructured":"Bakker , E. M. and Lew , M. S. 2002 . Semantic retrieval using audio analysis. In Proceedings of the Conference on Image and Video Retrieval . London UK. Lecture Notes in Computer Science , vol. 2383. 271 -- 277 . Bakker, E. M. and Lew, M. S. 2002. Semantic retrieval using audio analysis. In Proceedings of the Conference on Image and Video Retrieval. London UK. Lecture Notes in Computer Science, vol. 2383. 271--277.","journal-title":"London UK. Lecture Notes in Computer Science"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Browne P. Czirjek C. Gurrin C. Jarina R. Lee H. Marlow S. McDonald K. Murphy N. O'Connor N. E. Smeaton A. F. and Ye J. 2003. Dublin City University video track experiments for TREC 2002. Browne P. Czirjek C. Gurrin C. Jarina R. Lee H. Marlow S. McDonald K. Murphy N. O'Connor N. E. Smeaton A. F. and Ye J. 2003. Dublin City University video track experiments for TREC 2002.","DOI":"10.6028\/NIST.SP.500-251.video-dublin"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the ACM Multimedia Conference","author":"Cai R.","unstructured":"Cai , R. , Lu , L. , Zhang , H. J. , and Cai , L . -H. 2003. Using structure patterns of temporal and spectral feature in audio similarity measure . In Proceedings of the ACM Multimedia Conference . Berkeley, CA. (Nov.). 219--222. 10.1145\/957013.957056 Cai, R., Lu, L., Zhang, H. J., and Cai, L.-H. 2003. Using structure patterns of temporal and spectral feature in audio similarity measure. In Proceedings of the ACM Multimedia Conference. Berkeley, CA. (Nov.). 219--222. 10.1145\/957013.957056"},{"key":"e_1_2_1_4_1","volume-title":"Workshop on Perceptual User Interfaces. 37--42","author":"Clarkson B.","unstructured":"Clarkson , B. , Sawhney , N. , and Pentland , A . 1998. Auditory context awareness via wearable computing . Workshop on Perceptual User Interfaces. 37--42 . Clarkson, B., Sawhney, N., and Pentland, A. 1998. Auditory context awareness via wearable computing. Workshop on Perceptual User Interfaces. 37--42."},{"key":"e_1_2_1_5_1","unstructured":"Couvreur L. and Laniray M. 2004. Automatic noise recognition in urban environments based on artificial neural networks and hidden Markov models. Inter-noise2004. Prague Czech Republic. Couvreur L. and Laniray M. 2004. Automatic noise recognition in urban environments based on artificial neural networks and hidden Markov models. Inter-noise2004. Prague Czech Republic."},{"key":"e_1_2_1_6_1","unstructured":"Duda R. O. Hart P. E. and Stork D. G. 2001. Pattern Classification 2nd Ed. Wiley New York NY. Duda R. O. Hart P. E. and Stork D. G. 2001. Pattern Classification 2nd Ed. Wiley New York NY."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s005300050106"},{"key":"e_1_2_1_8_1","first-page":"3","article-title":"Automatic classification of environmental noise events by hidden Markov models","volume":"54","author":"Gaunard P.","year":"1998","unstructured":"Gaunard , P. , Mubikangiey , C. G. , Couvreur , C. , and Fontaine , V. 1998 . Automatic classification of environmental noise events by hidden Markov models . Appl. Acoustics 54 , 3 , 187. Gaunard, P., Mubikangiey, C. G., Couvreur, C., and Fontaine, V. 1998. Automatic classification of environmental noise events by hidden Markov models. Appl. Acoustics 54, 3, 187.","journal-title":"Appl. Acoustics"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of Computer-Supported Cooperative Work (CSCW)","author":"Hindus D.","unstructured":"Hindus , D. and Schmandt , C . 1992. Ubiquitous audio: Capturing spontaneous collaboration . In Proceedings of Computer-Supported Cooperative Work (CSCW) . Toronto, Canada (Nov.). 210--217. 10.1145\/143457.143481 Hindus, D. and Schmandt, C. 1992. Ubiquitous audio: Capturing spontaneous collaboration. In Proceedings of Computer-Supported Cooperative Work (CSCW). Toronto, Canada (Nov.). 210--217. 10.1145\/143457.143481"},{"key":"e_1_2_1_10_1","unstructured":"Huang X. Acero A. and Hon H. 2001. Spoken Language Processing. Prentice Hall Englewood Cliffs NJ. Huang X. Acero A. and Hon H. 2001. Spoken Language Processing. Prentice Hall Englewood Cliffs NJ."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1006\/csla.1995.0010","article-title":"Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models","volume":"9","author":"Leggetter C. J.","year":"1995","unstructured":"Leggetter , C. J. and Woodland , P. C. 1995 . Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models . Comput. Speech Lang. 9 , 171 -- 185 . Leggetter, C. J. and Woodland, P. C. 1995. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9, 171--185.","journal-title":"Comput. Speech Lang."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the ACM Multimedia Conference","author":"Liu L.","unstructured":"Liu , L. , Jiang , H. , and Zhang , H . -J. 2001. A robust audio classification and segmentation method . In Proceedings of the ACM Multimedia Conference . Ottawa, Canada. 203--211. 10.1145\/500141.500173 Liu, L., Jiang, H., and Zhang, H.-J. 2001. A robust audio classification and segmentation method. In Proceedings of the ACM Multimedia Conference. Ottawa, Canada. 203--211. 10.1145\/500141.500173"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1109\/TSA.2002.804546","article-title":"Content analysis for audio classification and segmentation","volume":"10","author":"Liu L.","year":"2002","unstructured":"Liu , L. , Zhang , H.-J. , and Jiang , H. 2002 . Content analysis for audio classification and segmentation . IEEE Trans. Speech Audio Process. 10 , 7, 504 -- 516 . Liu, L., Zhang, H.-J., and Jiang, H. 2002. Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10, 7, 504--516.","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of Eurospeech","author":"Ma L.","unstructured":"Ma , L. , Smith , D. J. , and Milner , B. P . 2003. Context awareness using environmental noise classification . In Proceedings of Eurospeech . Geneva, Switzerland, 2237--2240. Ma, L., Smith, D. J., and Milner, B. P. 2003. Context awareness using environmental noise classification. In Proceedings of Eurospeech. Geneva, Switzerland, 2237--2240."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1007\/978-3-540-45227-0_36","article-title":"Environmental noise classification for context-aware applications. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA)","volume":"2736","author":"Ma L.","year":"2003","unstructured":"Ma , L. , Smith , D. J. , and Milner , B. P. 2003 . Environmental noise classification for context-aware applications. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA) . Lecture Notes in Computer Science , vol. 2736. 360 -- 370 . Ma, L., Smith, D. J., and Milner, B. P. 2003. Environmental noise classification for context-aware applications. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA). Lecture Notes in Computer Science, vol. 2736. 360--370.","journal-title":"Lecture Notes in Computer Science"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of Conference on Human Factors in Computing Systems (CHI'98)","author":"Mynatt E. D.","unstructured":"Mynatt , E. D. , Back , M. , Want , R. , Baer , M. , and Ellis , J. B . 1998. Designing audio aura . In Proceedings of Conference on Human Factors in Computing Systems (CHI'98) . 566--573. 10.1145\/274644.274720 Mynatt, E. D., Back, M., Want, R., Baer, M., and Ellis, J. B. 1998. Designing audio aura. In Proceedings of Conference on Human Factors in Computing Systems (CHI'98). 566--573. 10.1145\/274644.274720"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of EuroSpeech. 2157--2160","author":"Nishiura T.","unstructured":"Nishiura , T. , Nakamura , S. , Miki , K. , and Shikano , K . 2003. Environment sound source identification based on hidden Markov model for robust speech recognition . In Proceedings of EuroSpeech. 2157--2160 . Nishiura, T., Nakamura, S., Miki, K., and Shikano, K. 2003. Environment sound source identification based on hidden Markov model for robust speech recognition. In Proceedings of EuroSpeech. 2157--2160."},{"key":"e_1_2_1_18_1","volume-title":"110th Convention of Audio Engineering Society.","author":"Peltonen V. T. K.","unstructured":"Peltonen , V. T. K. , Eronen , A. J. , Parviainen , M. P. , and Klapuri , A. P . 2001. Recognition of everyday auditory environments: Potentials, latencies and cues , 110th Convention of Audio Engineering Society. Peltonen, V. T. K., Eronen, A. J., Parviainen, M. P., and Klapuri, A. P. 2001. Recognition of everyday auditory environments: Potentials, latencies and cues, 110th Convention of Audio Engineering Society."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Conference on Acoustic, Speech, and Signal Processing","author":"Peltonen V.","unstructured":"Peltonen , V. , Tuomi , J. , Klapuri , A. , Huopaniemi , J. , and Sorsa , T . 2002. Computational auditory environment recognition . In Proceedings of the International Conference on Acoustic, Speech, and Signal Processing . Orlando, FL. Peltonen, V., Tuomi, J., Klapuri, A., Huopaniemi, J., and Sorsa, T. 2002. Computational auditory environment recognition. In Proceedings of the International Conference on Acoustic, Speech, and Signal Processing. Orlando, FL."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Qu\u00e9not G. M. Moraru D. Besacier L. and Hulhem P. 2003. CLIPS at TREC-11: Experiments in video retrieval. TREC-2002. Qu\u00e9not G. M. Moraru D. Besacier L. and Hulhem P. 2003. CLIPS at TREC-11: Experiments in video retrieval. TREC-2002.","DOI":"10.6028\/NIST.SP.500-251.video-clips-imag"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/355324.355327"},{"key":"e_1_2_1_22_1","unstructured":"Sawhney N. 1997. Situational awareness from environmental sounds. Tech. rep. for Modeling Adaptive Behavior (MAS 738). MIT Media Lab. Sawhney N. 1997. Situational awareness from environmental sounds. Tech. rep. for Modeling Adaptive Behavior (MAS 738). MIT Media Lab."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1331--1334","author":"Scheirer E.","unstructured":"Scheirer , E. and Slaney , M . 1997. Construction and evaluation of a robust multifeature speech\/music discriminator . In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1331--1334 . Scheirer, E. and Slaney, M. 1997. Construction and evaluation of a robust multifeature speech\/music discriminator. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1331--1334."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.393.0660"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/3-540-45113-7_3","article-title":"TRECVID: Benchmarking the effectivenss of information retrieval tasks on digital video","volume":"2728","author":"Smeaton A. F.","year":"2003","unstructured":"Smeaton , A. F. and Over , P. 2003 . TRECVID: Benchmarking the effectivenss of information retrieval tasks on digital video . Lecture Notes in Computer Science , vol. 2728. 19 -- 27 . Smeaton, A. F. and Over, P. 2003. TRECVID: Benchmarking the effectivenss of information retrieval tasks on digital video. Lecture Notes in Computer Science, vol. 2728. 19--27.","journal-title":"Lecture Notes in Computer Science"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00779-005-0045-4"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the ACM Multimedia Conference. 393--340","author":"Srinivasen S.","year":"1946","unstructured":"Srinivasen , S. , Petkovic , D. , and Poncelon , D. B . 1999. Towards robust features for classifying audio in the CueVideo system . In Proceedings of the ACM Multimedia Conference. 393--340 . 10.1145\/3 1946 3.319658 Srinivasen, S., Petkovic, D., and Poncelon, D. B. 1999. Towards robust features for classifying audio in the CueVideo system. In Proceedings of the ACM Multimedia Conference. 393--340. 10.1145\/319463.319658"},{"key":"e_1_2_1_28_1","volume-title":"International Semantic Web Conference. 138--141","author":"St\u00e4ger M.","year":"2004","unstructured":"St\u00e4ger , M. , Lukowitz , P. , and Tr\u00f6ster , G . 2004. Implementation and evaluation of a low-power sound-based user activity recognition system . International Semantic Web Conference. 138--141 . 10.1109\/ISWC. 2004 .25 St\u00e4ger, M., Lukowitz, P., and Tr\u00f6ster, G. 2004. Implementation and evaluation of a low-power sound-based user activity recognition system. International Semantic Web Conference. 138--141. 10.1109\/ISWC.2004.25"},{"key":"e_1_2_1_29_1","volume-title":"Using a PDA for audio capture. BSc Project","author":"Steward J.","unstructured":"Steward , J. 2005. Using a PDA for audio capture. BSc Project , University of East Anglia , Norwich, UK . Steward, J. 2005. Using a PDA for audio capture. BSc Project, University of East Anglia, Norwich, UK."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 4th International Conference on Computer and Information Technology (CIT '04)","author":"Toyoda Y.","unstructured":"Toyoda , Y. , Huang , J. , Ding , S. , and Liu , Y . 2004. Environmental sound recognition by multilayered neural networks . In Proceedings of the 4th International Conference on Computer and Information Technology (CIT '04) . 123--127. Toyoda, Y., Huang, J., Ding, S., and Liu, Y. 2004. Environmental sound recognition by multilayered neural networks. In Proceedings of the 4th International Conference on Computer and Information Technology (CIT '04). 123--127."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1109\/TSA.2002.800560","article-title":"Musical genre classification of audio signals","volume":"10","author":"Tzanetakis G.","year":"2002","unstructured":"Tzanetakis , G. and Cook , P. 2002 . Musical genre classification of audio signals . IEEE Trans. Speech Audio Process. 10 , 5, 293 -- 302 . Tzanetakis, G. and Cook, P. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 5, 293--302.","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Conference on Asian Digital Libraries. 279--289","author":"Vega V. S B.","year":"2003","unstructured":"Vega , V. S B. , Bressan , S. 2003 . Continuous naive bayesian classifications . In Proceedings of the International Conference on Asian Digital Libraries. 279--289 . Vega, V. S B., Bressan, S. 2003. Continuous naive bayesian classifications. In Proceedings of the International Conference on Asian Digital Libraries. 279--289."},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Vendrig J. den Hartog J. van Leeuwen D. Patras I. Raaijmakers S. van Rest J. Snoek C. and Worring M. 2003. TREC feature extraction by Active learning TREC-2002. Vendrig J. den Hartog J. van Leeuwen D. Patras I. Raaijmakers S. van Rest J. Snoek C. and Worring M. 2003. TREC feature extraction by Active learning TREC-2002.","DOI":"10.6028\/NIST.SP.500-251.video-amsterdam_isis"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/93.556537"},{"key":"e_1_2_1_35_1","unstructured":"Wu L. Guo Y. Qiu X. Feng Z. Rong J. Jin W. Zhou D. Wang R. and Jin M. 2003. TRECVid 2003. TREC-2003. Wu L. Guo Y. Qiu X. Feng Z. Rong J. Jin W. Zhou D. Wang R. and Jin M. 2003. TRECVid 2003. TREC-2003."},{"key":"e_1_2_1_36_1","unstructured":"Young S. Evermann G. Kershaw D. Moore G. Odell J. Ollason D. Valtchev V. and Woodland P. 2001. The HTK Book 3.1. Cambridge University Engineering Department Cambridge UK. http:\/\/htk.eng.cam.ac.uk. Young S. Evermann G. Kershaw D. Moore G. Odell J. Ollason D. Valtchev V. and Woodland P. 2001. The HTK Book 3.1. Cambridge University Engineering Department Cambridge UK. http:\/\/htk.eng.cam.ac.uk."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the ACM Conference on Computer and Communications Security. Alexandria, VA (Nov). 10","author":"Zhuang L.","unstructured":"Zhuang , L. , Zhou , F. , and Tyger , J. D . 2005. Keyboard acoustic emanations revisited . In Proceedings of the ACM Conference on Computer and Communications Security. Alexandria, VA (Nov). 10 .1145\/1102120.1102169 Zhuang, L., Zhou, F., and Tyger, J. D. 2005. Keyboard acoustic emanations revisited. In Proceedings of the ACM Conference on Computer and Communications Security. Alexandria, VA (Nov). 10.1145\/1102120.1102169"}],"container-title":["ACM Transactions on Speech and Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1149290.1149292","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1149290.1149292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:31:13Z","timestamp":1750264273000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1149290.1149292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,7]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2006,7]]}},"alternative-id":["10.1145\/1149290.1149292"],"URL":"https:\/\/doi.org\/10.1145\/1149290.1149292","relation":{},"ISSN":["1550-4875","1550-4883"],"issn-type":[{"value":"1550-4875","type":"print"},{"value":"1550-4883","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,7]]},"assertion":[{"value":"2006-07-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}