{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T19:09:40Z","timestamp":1775243380245,"version":"3.50.1"},"reference-count":317,"publisher":"Emerald","issue":"4-5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,7,23]]},"abstract":"<jats:p>Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR.<\/jats:p>","DOI":"10.1561\/1500000020","type":"journal-article","created":{"date-parts":[[2012,7,23]],"date-time":"2012-07-23T09:19:46Z","timestamp":1343035186000},"page":"235-422","source":"Crossref","is-referenced-by-count":39,"title":["Spoken Content Retrieval: A Survey of Techniques and Technologies"],"prefix":"10.1561","volume":"5","author":[{"given":"Martha","family":"Larson","sequence":"first","affiliation":[{"name":"Faculty of Electrical Engineering, Mathematics and Computer Science, Multimedia Information Retrieval Lab, Delft University of Technology , Delft,","place":["The Netherlands"]}]},{"given":"Gareth J. F.","family":"Jones","sequence":"additional","affiliation":[{"name":"Centre for Next Generation Localisation, School of Computing, Dublin City University , Dublin,","place":["Ireland"]}]}],"member":"140","published-online":{"date-parts":[[2012,7,23]]},"reference":[{"key":"2026040314323279400_ref001","first-page":"223","article-title":"Overview of the IR for spoken documents task in NTCIR-9 Workshop","volume-title":"Proceedings of the Nil Test Collection for IR Systems Workshop","author":"Akiba","year":"2011"},{"key":"2026040314323279400_ref002","first-page":"4873","article-title":"An audio indexing system for election video material","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Alberti","year":"2009"},{"key":"2026040314323279400_ref003","first-page":"323","volume-title":"Information Retrieval Techniques for Speech Applications","author":"Allan","year":"2002"},{"key":"2026040314323279400_ref004","volume-title":"The Kluwer International Series on Information Retrieval","author":"Allan","year":"2002"},{"issue":"1","key":"2026040314323279400_ref005","first-page":"103","article-title":"Robust techniques for organizing and retrieving spoken documents","volume":"2003","author":"Allan","year":"2003","journal-title":"EURASIP Journal on Advances in Signal Processing"},{"key":"2026040314323279400_ref006","first-page":"26","article-title":"Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system","volume-title":"Proceedings of the NIST Machine Learning for Multimodal Interaction, Meeting Recognition Workshop","author":"Anguera","year":"2005"},{"key":"2026040314323279400_ref007","volume-title":"Contemporary Linguistics","author":"Archibald","year":"2001"},{"issue":"5","key":"2026040314323279400_ref008","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1109\/TASL.2008.2012313","article-title":"Turkish broadcast news transcription and retrieval","volume":"17","author":"Arisoy","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040314323279400_ref009","volume-title":"Proceedings of the ACM User Interface Software and Technology Conference","author":"Arons","year":"1993"},{"issue":"1","key":"2026040314323279400_ref010","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/244754.244758","article-title":"SpeechSkimmer: A system for interactively skimming recorded speech","volume":"4","author":"Arons","year":"1997","journal-title":"Transactions on Computer Human Interaction"},{"issue":"4","key":"2026040314323279400_ref011","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1145\/191642.191653","article-title":"The future of speech and audio in the interface: A CHI \u201994 workshop","volume":"26","author":"Arons","year":"1994","journal-title":"SIGCHI Bulletin"},{"issue":"1","key":"2026040314323279400_ref012","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1006\/csla.2001.0185","article-title":"An overview of decoding techniques for large vocabulary continuous speech recognition","volume":"16","author":"Aubert","year":"2002","journal-title":"Computer Speech & Language"},{"key":"2026040314323279400_ref013","first-page":"132","article-title":"Automatic language model adaptation for spoken document retrieval","volume-title":"Proceedings of the RIAO Conference on Content-Based Multimedia Information Access","author":"Auzanne","year":"2000"},{"key":"2026040314323279400_ref014","volume-title":"Modern Information Retrieval: The Concepts and Technology Behind Search","author":"Baeza-Yates","year":"2010"},{"key":"2026040314323279400_ref015","first-page":"1950","article-title":"Very-large-vocabulary Mandarin voice message file retrieval using speech queries","volume-title":"Proceedings of the International Conference on Spoken Language Processing","author":"Bai","year":"1996"},{"issue":"1","key":"2026040314323279400_ref016","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1109\/TASSP.1975.1162650","article-title":"The DRAGON system \u2014 an overview","volume":"23","author":"Baker","year":"1975","journal-title":"IEEE Transactions on Acoustics, Speech and Signal Processing"},{"key":"2026040314323279400_ref017","doi-asserted-by":"crossref","DOI":"10.21437\/Interspeech.2006-15","article-title":"A TextTiling based approach to topic boundary detection in meetings","volume-title":"Proceedings of Interspeech","author":"Banerjee","year":"2006"},{"issue":"10","key":"2026040314323279400_ref018","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1016\/j.specom.2008.05.008","article-title":"Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news","volume":"50","author":"Batista","year":"2008","journal-title":"Speech Communication"},{"issue":"3","key":"2026040314323279400_ref019","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/0306-4573(94)00057-A","article-title":"Combining the evidence of multiple query representations for information retrieval","volume":"31","author":"Belkin","year":"1995","journal-title":"Information Processing & Management"},{"key":"2026040314323279400_ref020","first-page":"V\/1021","article-title":"Automatic speech recognition and intrinsic speech variation","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Benzeguiba","year":"2006"},{"key":"2026040314323279400_ref021","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1108\/14684521011054053","article-title":"Podcast search: User goals and retrieval technologies","volume":"34","author":"Besser","year":"2010","journal-title":"Online Information Review"},{"key":"2026040314323279400_ref022","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/0167-6393(96)00003-9","article-title":"Towards increasing speech recognition error rates","volume":"18","author":"Bourlard","year":"1996","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref023","article-title":"Recognition and understanding of meetings overview of the European AMI and AMIDA projects","author":"Bourlard","year":"2008"},{"issue":"1-7","key":"2026040314323279400_ref024","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/S0169-7552(98)00110-X","article-title":"The anatomy of a large-scale hypertextual web search engine","volume":"30","author":"Brin","year":"1998","journal-title":"Computer Networks and ISDN Systems"},{"issue":"4","key":"2026040314323279400_ref025","doi-asserted-by":"crossref","first-page":"985","DOI":"10.1147\/sj.404.0985","article-title":"Toward speech as a knowledge resource","volume":"40","author":"Brown","year":"2001","journal-title":"IBM Systems Journal"},{"key":"2026040314323279400_ref026","first-page":"35","article-title":"Automatic content-based retrieval of broadcast news","volume-title":"Proceedings of the Annual ACM International Conference on Multimedia","author":"Brown","year":"1995"},{"key":"2026040314323279400_ref027","first-page":"307","article-title":"Openvocabulary speech indexing for voice and video mail retrieval","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Brown","year":"1996"},{"key":"2026040314323279400_ref028","first-page":"47","article-title":"Video mail retrieval using voice: An overview of the Cambridge\/Olivetti retrieval system","volume-title":"Proceedings of the ACM Multimedia Workshop on Multimedia Database Management Systems","author":"Brown","year":"1994"},{"key":"2026040314323279400_ref029","first-page":"35","article-title":"Automatic content-based retrieval of broadcast news","volume-title":"Proceedings of the Third ACM International Conference on Multimedia","author":"Brown","year":"1995"},{"key":"2026040314323279400_ref030","first-page":"69","article-title":"Automatic query expansion using SMART: TREC 3","volume-title":"Proceedings of the Third Text Retrieval Conference","author":"Buckley","year":"1995"},{"key":"2026040314323279400_ref031","first-page":"339","article-title":"Spontaneous speech effects in large vocabulary speech recognition applications","volume-title":"Proceedings of the Workshop on Speech and Natural Language","author":"Butzberger","year":"1992"},{"key":"2026040314323279400_ref032","volume-title":"Information Retrieval: Implementing and Evaluating Search Engines","author":"B\u00fcttcher","year":"2010"},{"issue":"4","key":"2026040314323279400_ref033","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1109\/TSA.2004.828702","article-title":"Automatic recognition of spontaneous speech for access to multilingual oral history archives","volume":"12","author":"Byrne","year":"2004","journal-title":"IEEE Transactions on Speech and Audio Processing, Special Issue on Spontaneous Speech Processing"},{"key":"2026040314323279400_ref034","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1007\/11677482_3","volume-title":"Machine Learning for Multimodal Interaction","author":"Carletta","year":"2006"},{"key":"2026040314323279400_ref035","first-page":"93","article-title":"Multimodal indexing of digital audio-visual documents: A Case study for cultural heritage data","volume-title":"Proceedings of the International Workshop on Content-Based Multimedia Indexing","author":"Carmichael","year":"2008"},{"key":"2026040314323279400_ref036","doi-asserted-by":"crossref","DOI":"10.1002\/9780470756591","volume-title":"The Handbook of Language Variation and Change","author":"Chambers","year":"2004"},{"key":"2026040314323279400_ref037","first-page":"443","volume-title":"Proceedings of the Annual Meeting on Association for Computational Linguistics","author":"Chelba","year":"2005"},{"issue":"3","key":"2026040314323279400_ref038","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1109\/MSP.2008.917992","article-title":"Retrieval and browsing of spoken content","volume":"25","author":"Chelba","year":"2008","journal-title":"IEEE Signal Processing Magazine"},{"issue":"3","key":"2026040314323279400_ref039","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1016\/j.csl.2006.09.001","article-title":"Soft indexing of speech content for search in spoken documents","volume":"21","author":"Chelba","year":"2007","journal-title":"Computer Speech and Language"},{"issue":"1","key":"2026040314323279400_ref040","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.patrec.2005.06.010","article-title":"Exploring the use of latent topical information for statistical Chinese spoken document retrieval","volume":"27","author":"Chen","year":"2006","journal-title":"Pattern Recognition Letters"},{"issue":"5","key":"2026040314323279400_ref041","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1109\/TSA.2002.802541","article-title":"Discriminating capabilities of syllablebased features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese","volume":"10","author":"Chen","year":"2002","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref042","first-page":"1\/229","article-title":"The use of emphasis to automatically summarize a spoken discourse","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Chen","year":"1992"},{"issue":"4","key":"2026040314323279400_ref043","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1006\/csla.1999.0128","article-title":"An empirical study of smoothing techniques for language modeling","volume":"13","author":"Chen","year":"1999","journal-title":"Computer Speech and Language"},{"key":"2026040314323279400_ref044","article-title":"Speaker, environment and channel change detection and clustering via the bayesian information criterion","volume-title":"Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop","author":"Chen","year":"1998"},{"issue":"1","key":"2026040314323279400_ref045","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1109\/TASL.2008.2005031","article-title":"A probabilistic generative framework for extractive broadcast news speech summarization","volume":"17","author":"Chen","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040314323279400_ref046","article-title":"Improving the front-end of Kunststofzuiger","author":"Cheong","year":"2008"},{"issue":"1","key":"2026040314323279400_ref047","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1658377.1658379","article-title":"Statistical lattice-based spoken document retrieval","volume":"28","author":"Chia","year":"2010","journal-title":"ACM Transactions on Information Systems"},{"key":"2026040314323279400_ref048","first-page":"26","article-title":"Advances in domain independent linear text segmentation","volume-title":"Proceedings of the North American Chapter of the Association for Computational Linguistics Conference","author":"Choi","year":"2000"},{"key":"2026040314323279400_ref049","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1145\/1282280.1282351","article-title":"Merging storyboard strategies and automatic retrieval for improving interactive video search","volume-title":"Proceedings of the ACM International Conference on Image and Video Retrieval","author":"Christel","year":"2007"},{"key":"2026040314323279400_ref050","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-540-30120-2_1","volume-title":"Text, Speech and Dialogue","author":"Church","year":"2004"},{"key":"2026040314323279400_ref051","volume-title":"An Introduction to Phonetics and Phonology (Blackwell Textbooks in Linguistics)","author":"Clark","year":"2007"},{"key":"2026040314323279400_ref052","doi-asserted-by":"crossref","DOI":"10.1109\/HICSS.2001.926473","article-title":"Speech transcript analysis for automatic search","volume-title":"Proceedings of the Annual Hawaii International Conference on System Sciences, 2001","author":"Coden","year":"2001"},{"issue":"1","key":"2026040314323279400_ref053","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/584449.584454","article-title":"ACM SIGIR 2001 workshop \u201cInformation Retrieval Techniques for Speech Applications\u201d","volume":"36","author":"Coden","year":"2002","journal-title":"SIGIR Forum"},{"issue":"1","key":"2026040314323279400_ref054","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/89.365385","article-title":"The challenge of spoken language systems: Research directions for the nineties","volume":"3","author":"Cole","year":"1995","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"issue":"3","key":"2026040314323279400_ref055","doi-asserted-by":"crossref","DOI":"10.1145\/2328967.2328972","article-title":"Sibyl, a factoid question answering system for spoken documents","volume":"30","author":"Comas","year":"2012","journal-title":"ACM Transactions on Information Systems"},{"issue":"7","key":"2026040314323279400_ref056","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1002\/asi.20350","article-title":"Written versus spoken queries: A qualitative and quantitative comparative analysis","volume":"57","author":"Crestani","year":"2006","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"2026040314323279400_ref057","volume-title":"Search Engines: Information Retrieval in Practice","author":"Croft","year":"2009"},{"key":"2026040314323279400_ref058","doi-asserted-by":"crossref","first-page":"212","DOI":"10.3115\/1289189.1289199","article-title":"Speech in noisy environments (SPINE) adds new dimension to speech recognition R & D","volume-title":"Proceedings of the International Conference on Human Language Technology Research","author":"Crystal","year":"2002"},{"key":"2026040314323279400_ref059","first-page":"503","article-title":"Distributed meetings: A meeting capture and broadcasting system","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Cutler","year":"2002"},{"key":"2026040314323279400_ref060","article-title":"A novel feature combination approach for spoken document classification with support vector machines","volume-title":"Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) Multimedia Information Retrieval Workshop","author":"Dai","year":"2003"},{"issue":"1","key":"2026040314323279400_ref061","first-page":"3:1","article-title":"Access to recorded interviews: A research agenda","volume":"1","author":"de Jong","year":"2008","journal-title":"ACM Journal on Computing and Cultural Heritage"},{"key":"2026040314323279400_ref062","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1007\/11930334_18","volume-title":"Semantic Multimedia","author":"de Jong","year":"2006"},{"issue":"3","key":"2026040314323279400_ref063","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1109\/TCSVT.2007.890834","article-title":"Multimedia search without visual analysis: The value of linguistic and contextual information","volume":"17","author":"de Jong","year":"2007","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"2026040314323279400_ref064","article-title":"Improving information retrieval with latent semantic indexing","volume-title":"Proceedings of the 51st ASIS Annual Meeting","author":"Deerwester","year":"1988"},{"key":"2026040314323279400_ref065","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1007\/3-540-45637-6_4","volume-title":"Information Retrieval Techniques for Speech Applications","author":"D\u00e9silets","year":"2002"},{"key":"2026040314323279400_ref066","first-page":"1334","article-title":"Topic segmentation algorithms for text summarization and passage retrieval: An exhaustive evaluation","volume-title":"Proceedings of the National Conference on Artificial Intelligence \u2014 Volume 2","author":"Dias","year":"2007"},{"key":"2026040314323279400_ref067","volume-title":"The Rise and Fall of Languages","author":"Dixon","year":"1998"},{"key":"2026040314323279400_ref068","first-page":"1\/221","article-title":"Understanding and improving speech recognition performance through the use of diagnostic tools","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Eide","year":"1995"},{"issue":"3","key":"2026040314323279400_ref069","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1016\/j.patcog.2010.09.020","article-title":"Survey on speech emotion recognition: Features, classification schemes, and databases","volume":"44","author":"El Ayadi","year":"2011","journal-title":"Pattern Recognition"},{"key":"2026040314323279400_ref070","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1007\/11671299_43","article-title":"Information retrieval from spoken documents","volume-title":"Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics","author":"Fap\u0161o","year":"2006"},{"issue":"1-2","key":"2026040314323279400_ref071","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/S0167-6393(00)00022-4","article-title":"A system for the retrieval of Italian broadcast news","volume":"32","author":"Federico","year":"2000","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref072","article-title":"Phoneme-level indexing for fast and vocabularyindependent voice\/voice retrieval","volume-title":"Proceedings of the ESCA Workshop: Accessing Information in Spoken Audio","author":"Ferrieux","year":"1999"},{"key":"2026040314323279400_ref073","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1109\/ASRU.1997.659110","article-title":"A post-processing system to yield reduced word error rates: Recogniser output voting error reduction (rover)","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Fiscus","year":"1997"},{"key":"2026040314323279400_ref074","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1007\/978-3-540-68585-2_36","volume-title":"Multimodal Technologies for Perception of Humans","author":"Fiscus","year":"2008"},{"key":"2026040314323279400_ref075","first-page":"45","article-title":"Results of the 2006 spoken term detection evaluation","volume-title":"Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR), Searching Spontaneous Conversational Speech Workshop","author":"Fiscus","year":"2007"},{"issue":"1","key":"2026040314323279400_ref076","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1007\/s005300050106","article-title":"An overview of audio information retrieval","volume":"7","author":"Foote","year":"1999","journal-title":"Multimedia Systems"},{"key":"2026040314323279400_ref077","doi-asserted-by":"crossref","first-page":"2145","DOI":"10.21437\/Eurospeech.1995-513","article-title":"Talkerindependent keyword spotting for information retrieval","volume-title":"Proceedings of Eurospeech","author":"Foote","year":"1995"},{"key":"2026040314323279400_ref078","article-title":"Using term clouds to represent segment-level semantic content of podcasts","volume-title":"Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR), Searching Spontaneous Conversational Speech Workshop","author":"Fuller","year":"2008"},{"key":"2026040314323279400_ref079","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1007\/978-3-540-49127-9_32","volume-title":"Springer Handbook of Speech Processing","author":"Furui","year":"2008"},{"issue":"4","key":"2026040314323279400_ref080","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1109\/TSA.2004.828699","article-title":"Speech-to-text and speech-to-speech summarization of spontaneous speech","volume":"12","author":"Furui","year":"2004","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref081","volume-title":"The Application of Hidden Markov Models in Speech Recognition","author":"Gales","year":"2008"},{"key":"2026040314323279400_ref082","first-page":"1","article-title":"The TREC spoken document retrieval track: A success story","volume-title":"Proceedings of the RIAO Conference on Content-Based Multimedia Information Access","author":"Garofolo","year":"2000"},{"key":"2026040314323279400_ref083","first-page":"1","article-title":"Spoken document retrieval: 1998 evaluation and investigation of new metrics","volume-title":"Proceedings of the ESCA Workshop: Accessing Information in Spoken Audio","author":"Garofolo","year":"1999"},{"issue":"2","key":"2026040314323279400_ref084","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1145\/328236.328148","article-title":"Transcribing broadcast news for audio and video indexing","volume":"13","author":"Gauvain","year":"2000","journal-title":"Communications of the ACM"},{"key":"2026040314323279400_ref085","first-page":"H\/471","article-title":"Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech","volume-title":"Proceedings of the IEEE International Conference on Acoustics Speech, and Signal Processing","author":"Gillick","year":"1993"},{"key":"2026040314323279400_ref086","first-page":"2556","article-title":"Recent progress in the MIT spoken lecture processing project","volume-title":"Proceedings of Interspeech","author":"Glass","year":"2007"},{"key":"2026040314323279400_ref087","first-page":"168","article-title":"A system for retrieving speech documents","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Glavitsch","year":"1992"},{"issue":"4","key":"2026040314323279400_ref088","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1145\/190627.190645","article-title":"Metadata for integrating speech documents in a text retrieval system","volume":"23","author":"Glavitsch","year":"1994","journal-title":"SIGMOD Record"},{"key":"2026040314323279400_ref089","volume-title":"Information Retrieval: Searching in the 21st Century","author":"Goker","year":"2007"},{"key":"2026040314323279400_ref090","volume-title":"Speech and Audio Signal Processing: Processing and Perception of Speech and Music","author":"Gold","year":"1999"},{"issue":"4","key":"2026040314323279400_ref091","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1007\/s00799-004-0101-0","article-title":"Accessing the spoken word","volume":"5","author":"Goldman","year":"2005","journal-title":"International Journal on Digital Libraries"},{"key":"2026040314323279400_ref092","first-page":"3073","article-title":"PodCastle: Recent advances of a spoken document retrieval service improved by anonymous user contributions","volume-title":"Proceedings of Interspeech","author":"Goto","year":"2011"},{"key":"2026040314323279400_ref093","first-page":"2397","article-title":"PodCastle: A Web 2.0 approach to speech recognition research","volume-title":"Proceedings of Interspeech","author":"Goto","year":"2007"},{"key":"2026040314323279400_ref094","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1145\/544220.544224","article-title":"Supporting access to large digital oral history archives","volume-title":"Proceedings of the ACM\/IEEE-CS Joint Conference on Digital Libraries","author":"Gustman","year":"2002"},{"key":"2026040314323279400_ref095","first-page":"133","article-title":"Segment generation and clustering in the HTK Broadcast News Transcription System","volume-title":"Proceedings of the Broadcast News Transcription and Understanding Workshop","author":"Hain","year":"1998"},{"issue":"6","key":"2026040314323279400_ref096","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.1109\/TSA.2005.852999","article-title":"Automatic transcription of conversational telephone speech","volume":"13","author":"Hain","year":"2005","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"issue":"4","key":"2026040314323279400_ref097","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1016\/j.csl.2005.07.005","article-title":"Beyond ASR 1-best: Using word confusion networks in spoken language understanding","volume":"20","author":"Hakkani-T\u00fcr","year":"2006","journal-title":"Computer Speech and Language"},{"key":"2026040314323279400_ref098","first-page":"1\/596","article-title":"A general algorithm for word graph matrix decomposition","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Hakkani-T\u00fcr","year":"2003"},{"issue":"1","key":"2026040314323279400_ref099","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1109\/TMM.2004.840618","article-title":"Affective video content representation and modeling","volume":"7","author":"Hanjalic","year":"2005","journal-title":"IEEE Transactions on Multimedia"},{"issue":"5","key":"2026040314323279400_ref100","doi-asserted-by":"crossref","first-page":"712","DOI":"10.1109\/TSA.2005.852088","article-title":"SpeechFind: Advances in spoken document retrieval for a national gallery of the spoken word","volume":"13","author":"Hansen","year":"2005","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref101","first-page":"791","article-title":"Selection and ranking of text from highly imperfect transcripts for retrieval of video content","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Haubold","year":"2007"},{"key":"2026040314323279400_ref102","first-page":"288","article-title":"Speech recognition in the Informedia Digital Video Library: Uses and limitations","volume-title":"Proceedings of the International Conference on Tools with Artificial Intelligence","author":"G. Hauptmann","year":"1995"},{"key":"2026040314323279400_ref103","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1145\/1027527.1027681","article-title":"Successful approaches in the TREC video retrieval evaluations","volume-title":"Proceedings of the Annual ACM International Conference on Multimedia","author":"G. Hauptmann","year":"2004"},{"key":"2026040314323279400_ref104","first-page":"1\/195","article-title":"Indexing and search of multimodal information","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"G. Hauptmann","year":"1997"},{"key":"2026040314323279400_ref105","first-page":"215","volume-title":"Intelligent Multimedia Information Retrieval","author":"G. Hauptmann","year":"1997"},{"key":"2026040314323279400_ref106","first-page":"9","article-title":"Multi-paragraph segmentation of expository text","volume-title":"Proceedings of the Annual Meeting on Association for Computational Linguistics","author":"A. Hearst","year":"1994"},{"key":"2026040314323279400_ref107","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139644082","volume-title":"Search User Interfaces","author":"A. Hearst","year":"2009"},{"key":"2026040314323279400_ref108","first-page":"23","article-title":"Disclosing spoken culture: User interfaces for access to spoken word archives","volume-title":"Proceedings of the British HCI Group Annual Conference on Human Computer Interaction","author":"F. L. Heeren","year":"2008"},{"key":"2026040314323279400_ref109","first-page":"903","article-title":"Radio Oranje: Searching the Queen\u2019s speech(es)","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"F. L. Heeren","year":"2007"},{"key":"2026040314323279400_ref110","first-page":"2121","article-title":"New words: Implications for continuous speech recognition","volume-title":"Proceedings of Euro speech","author":"L. Hetherington","year":"1993"},{"key":"2026040314323279400_ref111","volume-title":"PhD thesis","author":"Hiemstra","year":"2001"},{"key":"2026040314323279400_ref112","first-page":"70","article-title":"Studying search and archiving in a real audio database","volume-title":"Working Notes of the AAAI Spring Symposium on Intelligent Integration and Use of Text, Image, Video and Audio Corpora","author":"Hirschberg","year":"1997"},{"key":"2026040314323279400_ref113","first-page":"117","article-title":"Finding information in audio: A new paradigm for audio browsing\/retrieval","volume-title":"Proceedings of the ESCA Workshop: Accessing Information in Spoken Audio","author":"Hirschberg","year":"1999"},{"key":"2026040314323279400_ref114","first-page":"IV\/73","article-title":"Open-vocabulary spoken utterance retrieval using confusion networks","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Hori","year":"2007"},{"key":"2026040314323279400_ref115","first-page":"1\/961","article-title":"Improved spoken document retrieval with dynamic key term lexicon and Probabilistic Latent Semantic Analysis (PLSA)","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Hsieh","year":"2006"},{"key":"2026040314323279400_ref116","first-page":"98","article-title":"Automatic topic segmentation and labeling in multiparty dialogue","volume-title":"IEEE Spoken Language Technology Workshop","author":"Hsueh","year":"2006"},{"key":"2026040314323279400_ref117","volume-title":"Spoken Language Processing: A Guide to Theory, Algorithm and System Development","author":"Huang","year":"2001"},{"key":"2026040314323279400_ref118","first-page":"924","article-title":"The majority wins: A method for combining speaker diarization systems","volume-title":"Proceedings of Interspeech","author":"A. H. Huijbregts","year":"2009"},{"key":"2026040314323279400_ref119","first-page":"59","article-title":"Recording, summarizing, and accessing meeting videos: An overview of the AMI project","volume-title":"Proceedings of the IEEE International Conference of Image Analysis and Processing Workshops","author":"Jaimes","year":"2007"},{"key":"2026040314323279400_ref120","first-page":"1\/279","article-title":"A system for unrestricted topic retrieval from radio news broadcasts","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"A. James","year":"1996"},{"key":"2026040314323279400_ref121","volume-title":"PhD Thesis","author":"A. James","year":"1995"},{"key":"2026040314323279400_ref122","first-page":"1\/377","article-title":"A fast lattice-based approach to vocabulary independent wordspotting","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"A. James","year":"1994"},{"key":"2026040314323279400_ref123","first-page":"1591","article-title":"Joke-o-Mat HD: Browsing sitcoms with human derived transcripts","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Janin","year":"2010"},{"key":"2026040314323279400_ref124","volume-title":"Statistical Methods for Speech Recognition (Language, Speech, and Communication)","author":"Jelinek","year":"1998"},{"issue":"4","key":"2026040314323279400_ref125","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1016\/j.specom.2004.12.004","article-title":"Confidence measures for speech recognition: A survey","volume":"45","author":"Jiang","year":"2005","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref126","first-page":"1","article-title":"Automatic title generation for spoken broadcast news","volume-title":"Proceedings of the International Conference on Human Language Technology Research","author":"Jin","year":"2001"},{"key":"2026040314323279400_ref127","first-page":"117","article-title":"Spoken document retrieval for TREC-9 at Cambridge University","volume-title":"Proceedings of the Text REtrieval Conference","author":"E. Johnson","year":"2000"},{"key":"2026040314323279400_ref128","first-page":"118","article-title":"Exploring the incorporation of acoustic information into term weights for spoken document retrieval","volume-title":"Proceedings of the BCS Information Retrieval Specialist Group Colloquium on Information Retrieval Research","author":"J. F. Jones","year":"2000"},{"key":"2026040314323279400_ref129","volume-title":"Chapter Affect-Based Indexing for Multimedia Data","author":"J. F. Jones","year":"2012"},{"key":"2026040314323279400_ref130","first-page":"187","volume-title":"Research and Advanced Technology for Digital Libraries","author":"J. F. Jones","year":"2002"},{"key":"2026040314323279400_ref131","first-page":"1\/309","article-title":"Video mail retrieval: The effect of word spotting accuracy on precision","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"J. F. Jones","year":"1995"},{"key":"2026040314323279400_ref132","first-page":"30","article-title":"Retrieving spoken documents by combining multiple index sources","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"J. F. Jones","year":"1996"},{"key":"2026040314323279400_ref133","article-title":"A critical review of state-of-the-art technologies for cross-language speech retrieval","volume-title":"Cross-Language Text and Speech Retrieval Papers from the 1997 AAAI Spring Symposium, Technical Report SS-97-05","author":"J. F. Jones","year":"1997"},{"key":"2026040314323279400_ref134","first-page":"553","volume-title":"Comparative Evaluation of Multilingual Information Access Systems","author":"J. F. Jones","year":"2004"},{"key":"2026040314323279400_ref135","article-title":"Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech","volume-title":"Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) Searching Spontaneous Conversational Speech Workshop","author":"J. F. Jones","year":"2007"},{"key":"2026040314323279400_ref136","first-page":"283","article-title":"Improving retrieval on imperfect speech transcriptions (poster abstract)","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Jourlin","year":"1999"},{"key":"2026040314323279400_ref137","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/S0167-6393(00)00021-2","article-title":"Spoken document representations for probabilistic retrieval","volume":"32","author":"Jourlin","year":"2000","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref138","volume-title":"Elsevier Encyclopedia of Language and Linguistics","author":"H. Juang","year":"2005","edition":"Second Edition"},{"key":"2026040314323279400_ref139","volume-title":"Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition","author":"Jurafsky","year":"2008"},{"key":"2026040314323279400_ref140","first-page":"9","article-title":"Social summarization: Does social feedback improve access to speech data?","volume-title":"Proceedings of the ACM Conference on Computer Supported Cooperative Work","author":"Kalnikait\u00b4","year":"2008"},{"key":"2026040314323279400_ref141","first-page":"83","article-title":"A critical assessment of spoken utterance retrieval through approximate lattice representations","volume-title":"Proceeding of the ACM International Conference on Multimedia Information Retrieval","author":"Kazemian","year":"2008"},{"key":"2026040314323279400_ref142","doi-asserted-by":"crossref","first-page":"827","DOI":"10.21437\/Eurospeech.1997-281","article-title":"Estimating confidence using word lattices","volume-title":"Proceedings of Eurospeech","author":"Kemp","year":"1997"},{"key":"2026040314323279400_ref143","first-page":"49","article-title":"The ambient spotlight: Queryless desktop search from meeting speech","volume-title":"Proceedings of the ACM Multimedia Searching Spontaneous Conversational Speech Workshop","author":"Kilgour","year":"2010"},{"key":"2026040314323279400_ref144","first-page":"173","article-title":"Speechfind: Advances in Rich Content Based Spoken Document Retrieval","volume-title":"Information Science Reference","author":"Kim","year":"2009"},{"key":"2026040314323279400_ref145","first-page":"212","article-title":"Speaker segmentation for browsing recorded audio","volume-title":"Conference Companion on Human Factors in Computing Systems","author":"G. Kimber","year":"1995"},{"issue":"1","key":"2026040314323279400_ref146","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.specom.2009.08.009","article-title":"An overview of text-independent speaker recognition: From features to supervectors","volume":"52","author":"Kinnunen","year":"2010","journal-title":"Speech Communication"},{"issue":"1-2","key":"2026040314323279400_ref147","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/S0167-6393(00)00093-5","article-title":"Multilingual phone models for vocabulary-independent speech recognition tasks","volume":"35","author":"K\u00f6hler","year":"2001","journal-title":"Speech Communication"},{"issue":"5","key":"2026040314323279400_ref148","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/MSP.2005.1511824","article-title":"Content-based access to spoken audio","volume":"22","author":"Koumpis","year":"2005","journal-title":"IEEE Signal Processing Magazine"},{"issue":"1","key":"2026040314323279400_ref149","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1075389.1075390","article-title":"Automatic summarization of voicemail messages using lexical and prosodic features","volume":"2","author":"Koumpis","year":"2005","journal-title":"ACM Transactions on Speech and Language Processing"},{"issue":"2","key":"2026040314323279400_ref150","article-title":"Rough\u2019n\u2019Ready: A meeting recorder and browser","volume":"1","author":"Kubala","year":"1999","journal-title":"ACM Computing Surveys"},{"key":"2026040314323279400_ref151","first-page":"350","article-title":"Speech-based retrieval using semantic co-occurrence filtering","volume-title":"Proceedings of the International Conference on Human Language Technology Research","author":"Kupiec","year":"1994"},{"issue":"1-2","key":"2026040314323279400_ref152","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/S0167-6393(01)00042-5","article-title":"Thematic indexing of spoken documents by using self-organizing maps","volume":"38","author":"Kurimo","year":"2002","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref153","first-page":"1585","article-title":"An evaluation of a spoken document retrieval baseline system in finnish","volume-title":"Proceedings of Interspeech","author":"Kurimo","year":"2004"},{"key":"2026040314323279400_ref154","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/11880561_19","article-title":"Using string comparison in context for improved relevance feedback in different text media","volume-title":"Proceedings of the String Processing on Information Retrieval Conference","author":"M. Lam-Adesina","year":"2006"},{"key":"2026040314323279400_ref155","first-page":"1217","article-title":"Using syllable-based indexing features and language models to improve German spoken document retrieval","volume-title":"Proceedings of Interspeech","author":"Larson","year":"2003"},{"key":"2026040314323279400_ref156","article-title":"Overview of MediaEval 2011 rich speech retrieval task and genre tagging task","volume-title":"Working Notes Proceedings of the MediaEval Workshop","author":"Larson","year":"2011"},{"key":"2026040314323279400_ref157","article-title":"Structured audio player: Supporting radio archive workflows with automatically generated structure metadata","volume-title":"Proceedings of the RIAO Conference on Large-scale Semantic Access to Content (Text, Image, Video and Sound)","author":"Larson","year":"2007"},{"key":"2026040314323279400_ref158","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1007\/978-3-642-04447-2_119","volume-title":"Proceedings of the Cross-language Evaluation Forum Conference on Evaluating Systems for Multilingual and Multimodal Information Access","author":"Larson","year":"2009"},{"key":"2026040314323279400_ref159","first-page":"354","volume-title":"Multilingual Information Access Evaluation II. Multimedia Experiments","author":"Larson","year":"2010"},{"key":"2026040314323279400_ref160","doi-asserted-by":"crossref","DOI":"10.1109\/MMUL.2012.27","article-title":"The community and the crowd: Developing large-scale data collections for multimedia benchmarking","author":"Larson","year":"2012","journal-title":"IEEE Multimedia"},{"key":"2026040314323279400_ref161","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1991996.1992047","article-title":"Automatic tagging and geotagging in video collections and communities","volume-title":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","author":"Larson","year":"2011"},{"key":"2026040314323279400_ref162","first-page":"755","volume-title":"Advances in Information Retrieval. Proceedings of the European Conference on IR Research","author":"Larson","year":"2009"},{"key":"2026040314323279400_ref163","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139166621","volume-title":"Principles of Phonetics (Cambridge Textbooks in Linguistics)","author":"Laver","year":"1994"},{"key":"2026040314323279400_ref164","first-page":"120","article-title":"Relevance based language models","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Lavrenko","year":"2001"},{"key":"2026040314323279400_ref165","article-title":"A Korean spoken document retrieval system for lecture search","volume-title":"Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) Searching Spontaneous Conversational Speech Workshop","author":"Lee","year":"2008"},{"issue":"5","key":"2026040314323279400_ref166","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1109\/MSP.2005.1511823","article-title":"Spoken document understanding and organization","volume":"22","author":"Lee","year":"2005","journal-title":"IEEE Signal Processing Magazine"},{"key":"2026040314323279400_ref167","first-page":"505","article-title":"Combining multiple subword representations for open-vocabulary spoken document retrieval","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Lee","year":"2005"},{"key":"2026040314323279400_ref168","first-page":"673","article-title":"One-sided measures for evaluating ranked retrieval effectiveness with spontaneous conversational speech","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Liu","year":"2006"},{"issue":"5","key":"2026040314323279400_ref169","doi-asserted-by":"crossref","first-page":"1526","DOI":"10.1109\/TASL.2006.878255","article-title":"Enriching speech recognition with automatic detection of sentence boundaries and disfluencies","volume":"14","author":"Liu","year":"2006","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"1","key":"2026040314323279400_ref170","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/964161.964162","article-title":"Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion","volume":"2","author":"Lo","year":"2003","journal-title":"ACM Transactions on Asian Language Information Processing"},{"key":"2026040314323279400_ref171","first-page":"431","article-title":"IFINDER: An MPEG-7-based retrieval system for distributed multimedia content","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"L\u00f6ffler","year":"2002"},{"key":"2026040314323279400_ref172","doi-asserted-by":"crossref","first-page":"31","DOI":"10.3115\/1289189.1289250","article-title":"Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio","volume-title":"Proceedings of the International Conference on Human Language Technology Research","author":"Logan","year":"2002"},{"key":"2026040314323279400_ref173","first-page":"1997","article-title":"Confusion-based query expansion for OOV words in spoken document retrieval","volume-title":"Proceedings of Interspeech","author":"Logan","year":"2002"},{"issue":"5","key":"2026040314323279400_ref174","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1109\/TMM.2005.854429","article-title":"Approaches to reduce the effects of OOV queries on indexed spoken audio","volume":"7","author":"Logan","year":"2005","journal-title":"IEEE Transactions on Multimedia"},{"key":"2026040314323279400_ref175","first-page":"25","article-title":"Minimum cut model for spoken lecture segmentation","volume-title":"Proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics","author":"Malioutov","year":"2006"},{"key":"2026040314323279400_ref176","first-page":"51","article-title":"Spoken document retrieval from callcenter conversations","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Mamou","year":"2006"},{"issue":"4","key":"2026040314323279400_ref177","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1006\/csla.2000.0152","article-title":"Finding consensus among words: Latticebased word error minimisation","volume":"14","author":"Mangu","year":"2000","journal-title":"Computer Speech and Language"},{"key":"2026040314323279400_ref178","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"D. Manning","year":"2008"},{"key":"2026040314323279400_ref179","article-title":"Automatic detection of well recognized words in automatic speech transcription","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation","author":"Mauclair","year":"2006"},{"key":"2026040314323279400_ref180","volume-title":"Intelligent Multimedia Information Retrieval","author":"T. Maybury","year":"1997"},{"key":"2026040314323279400_ref181","first-page":"1\/385","article-title":"Approaches to topic identification on the switchboard corpus","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"McDonough","year":"1994"},{"key":"2026040314323279400_ref182","first-page":"4893","article-title":"Improved lattice-based spoken document retrieval by directly learning from the evaluation measures","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Meng","year":"2009"},{"issue":"2","key":"2026040314323279400_ref183","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.csl.2003.09.003","article-title":"Mandarin-English Information (MEI): Investigating translingual speech retrieval","volume":"18","author":"Meng","year":"2004","journal-title":"Computer Speech and Language"},{"key":"2026040314323279400_ref184","first-page":"4885","article-title":"Efficient subword lattice retrieval for German spoken term detection","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Mertens","year":"2009"},{"key":"2026040314323279400_ref185","first-page":"2127","article-title":"Merging search spaces for spoken term detection","volume-title":"Proceedings of Interspeech","author":"Mertens","year":"2009"},{"key":"2026040314323279400_ref186","first-page":"472","article-title":"A Markov random field model for term dependencies","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Metzler","year":"2005"},{"key":"2026040314323279400_ref187","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1007\/978-3-540-31865-1_36","volume-title":"Advances in Information Retrieval","author":"Mishne","year":"2005"},{"key":"2026040314323279400_ref188","first-page":"297","article-title":"A similar content retrieval method for podcast episodes","volume-title":"IEEE Spoken Language Technology Workshop","author":"Mizuno","year":"2009"},{"key":"2026040314323279400_ref189","first-page":"IV\/93","article-title":"Castsearch \u2014 context based spoken document retrieval","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"L. Molgaard","year":"2007"},{"key":"2026040314323279400_ref190","first-page":"1582","article-title":"Infolink: Analysis of dutch broadcast news and cross-media browsing","volume-title":"IEEE International Conference on Multimedia and Expo","author":"Morang","year":"2005"},{"key":"2026040314323279400_ref191","first-page":"641","article-title":"Comparison of different phone-based spoken document retrieval methods with text and spoken queries","volume-title":"Proceedings of Interspeech","author":"Moreau","year":"2005"},{"key":"2026040314323279400_ref192","first-page":"1593","article-title":"Phonetic confusion based document expansion for spoken document retrieval","volume-title":"Proceedings of Interspeech","author":"Moreau","year":"2004"},{"issue":"4","key":"2026040314323279400_ref193","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1109\/MC.2002.993772","article-title":"From multimedia retrieval to knowledge management","volume":"35","author":"J. Moreno","year":"2002","journal-title":"Computer"},{"key":"2026040314323279400_ref194","first-page":"493","article-title":"The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives","volume-title":"Proceedings of the Special Interest Group on Computer-Human Interaction (SIGCHI) Conference on Human Factors in Computing Systems","author":"Munteanu","year":"2006"},{"issue":"1-2","key":"2026040314323279400_ref195","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/S0167-6393(00)00024-8","article-title":"Experiments in spoken document retrieval using phoneme n-grams","volume":"32","author":"Ng","year":"2000","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref196","doi-asserted-by":"crossref","first-page":"1607","DOI":"10.21437\/Eurospeech.1997-460","article-title":"Subword unit representations for spoken document retrieval","volume-title":"Proceedings of Eurospeech","author":"Ng","year":"1997"},{"issue":"3","key":"2026040314323279400_ref197","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/S0167-6393(00)00008-X","article-title":"Subword-based approaches for spoken document retrieval","volume":"32","author":"Ng","year":"2000","journal-title":"Speech Communication"},{"issue":"6","key":"2026040314323279400_ref198","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.specom.2007.04.001","article-title":"Language-dependent state clustering for multilingual acoustic modelling","volume":"49","author":"Niesler","year":"2007","journal-title":"Speech Communication"},{"key":"2026040314323279400_ref199","volume-title":"The Spoken Term Detection (STD) 2006 Evaluation Plan","author":"","year":"2006"},{"key":"2026040314323279400_ref200","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1007\/11846406_61","volume-title":"Text, Speech and Dialogue","author":"Nouza","year":"2006"},{"key":"2026040314323279400_ref201","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.21437\/Eurospeech.1995-356","article-title":"The application of dynamic programming techniques to non-word based topic spotting","volume-title":"Proceedings of Eurospeech","author":"Nowell","year":"1995"},{"key":"2026040314323279400_ref202","article-title":"Speech-based information retrieval for digital libraries","volume-title":"Technical Report CS-TR-3778","author":"W. Oard","year":"1997"},{"issue":"5","key":"2026040314323279400_ref203","doi-asserted-by":"crossref","DOI":"10.1002\/bult.171","article-title":"User interface design for speech-based retrieval","volume":"26","author":"W. Oard","year":"2000","journal-title":"Bulletin of the American Society for Information Science and Technology"},{"key":"2026040314323279400_ref204","first-page":"41","article-title":"Building an information retrieval test collection for spontaneous conversational speech","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"W. Oard","year":"2004"},{"key":"2026040314323279400_ref205","doi-asserted-by":"crossref","first-page":"744","DOI":"10.1007\/978-3-540-74999-8_94","volume-title":"Evaluation of Multilingual and Multi-modal Information Retrieval","author":"W. Oard","year":"2007"},{"key":"2026040314323279400_ref206","doi-asserted-by":"crossref","DOI":"10.1109\/ISCAS.2006.1693029","article-title":"Fischlar-TRECVid-2004: Combined text-and imagebased searching of video archives","volume-title":"Proceedings of the IEEE International Symposium on Circuits and Systems","author":"A. O\u2019Connor","year":"2006"},{"key":"2026040314323279400_ref207","first-page":"2617","article-title":"Automatic transcription for a Web 2.0 service to search podcasts","volume-title":"Proceedings of Interspeech","author":"Ogata","year":"2007"},{"key":"2026040314323279400_ref208","first-page":"2187","article-title":"Vocabulary independent discriminative term frequency estimation","volume-title":"Proceedings of Interspeech","author":"S. Olsson","year":"2008"},{"key":"2026040314323279400_ref209","first-page":"91","article-title":"Combining LVCSR and vocabularyindependent ranked utterance retrieval for robust speech search","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"S. Olsson","year":"2009"},{"key":"2026040314323279400_ref210","first-page":"182","article-title":"Phrase-based query degradation modeling for vocabulary-independent ranked utterance retrieval","volume-title":"Proceedings of Human Language Technologies Conferemce of the North American Chapter of the Association for Computational Linguistics","author":"S. Olsson","year":"2009"},{"issue":"6","key":"2026040314323279400_ref211","article-title":"Towards affordable disclosure of spoken heritage archives","volume":"10","author":"J. F. Ordelman","year":"2009","journal-title":"Journal of Digital Information, Special Issue on Information Access to Cultural Heritage"},{"key":"2026040314323279400_ref212","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1007\/3-540-44805-5_34","article-title":"Speech recognition issues for Dutch spoken document retrieval","volume-title":"Proceedings of the International Conference on Text, Speech and Dialogue","author":"J. F. Ordelman","year":"2001"},{"key":"2026040314323279400_ref213","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1007\/3-540-45681-3_31","volume-title":"Principles of Data Mining and Knowledge Discovery","author":"Paa\u00df","year":"2002"},{"issue":"2","key":"2026040314323279400_ref214","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1145\/328236.328151","article-title":"Measurements in support of research accomplishments","volume":"43","author":"S. Pallett","year":"2000","journal-title":"Communications of the ACM"},{"issue":"6","key":"2026040314323279400_ref215","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/TASL.2009.2037404","article-title":"Performance analysis for lattice-based speech indexing approaches using words and subword units","volume":"18","author":"Pan","year":"2010","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref216","first-page":"99","article-title":"Cross-language speech retrieval: Establishing a baseline performance","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"P\u00e1raic","year":"1997"},{"key":"2026040314323279400_ref217","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1007\/978-3-540-85760-0_86","volume-title":"Advances in Multilingual and Multimodal Information Retrieval","author":"Pe\u010dina","year":"2008"},{"key":"2026040314323279400_ref218","first-page":"275","article-title":"A language modeling approach to information retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"M. Ponte","year":"1998"},{"key":"2026040314323279400_ref219","first-page":"1973","article-title":"Non-speech audio event detection","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Portelo","year":"2009"},{"issue":"6","key":"2026040314323279400_ref220","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1109\/TSA.2003.818026","article-title":"Robust recognition of children\u2019s speech","volume":"11","author":"Potamianos","year":"2003","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref221","volume-title":"Fundamentals of Speech Recognition","author":"Rabiner","year":"1993"},{"issue":"2","key":"2026040314323279400_ref222","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"R. Rabiner","year":"1989","journal-title":"Proceedings of the IEEE"},{"issue":"4","key":"2026040314323279400_ref223","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1109\/PROC.1976.10158","article-title":"Speech recognition by machine: A review","volume":"64","author":"R. Reddy","year":"1976","journal-title":"Proceedings of the IEEE"},{"key":"2026040314323279400_ref224","first-page":"301","article-title":"The ALERT system: Advanced broadcast speech recognition technology for selective dissemination of multimedia Information","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Rigoil","year":"2001"},{"issue":"4","key":"2026040314323279400_ref225","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1108\/eb026866","article-title":"On term selection for query expansion","volume":"46","author":"E. Robertson","year":"1990","journal-title":"Journal of Documentation"},{"issue":"3","key":"2026040314323279400_ref226","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1002\/asi.4630270302","article-title":"Relevance weighting of search terms","volume":"27","author":"E. Robertson","year":"1976","journal-title":"Journal of the American Society for Information Science"},{"key":"2026040314323279400_ref227","first-page":"232","article-title":"Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"E. Robertson","year":"1994"},{"key":"2026040314323279400_ref228","first-page":"109","article-title":"Okapi at TREC-3","volume-title":"Proceedings of the Text REtrieval Conference","author":"E. Robertson","year":"1996"},{"key":"2026040314323279400_ref229","first-page":"42","article-title":"Simple BM25 extension to multiple weighted fields","volume-title":"Proceedings of the International Conference on Information and Knowledge Management","author":"E. Robertson","year":"2004"},{"issue":"1","key":"2026040314323279400_ref230","first-page":"45","article-title":"Techniques for information retrieval from speech messages","volume":"4","author":"C. Rose","year":"1991","journal-title":"Lincoln Laboratory Journal"},{"key":"2026040314323279400_ref231","first-page":"1\/317","article-title":"Techniques for information retrieval from voice messages","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"C. Rose","year":"1991"},{"key":"2026040314323279400_ref232","first-page":"1\/129","article-title":"A hidden Markov model based keyword recognition system","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"C. Rose","year":"1990"},{"key":"2026040314323279400_ref233","first-page":"647","article-title":"The LIMSI QAst systems: Comparison between human and automatic rules generation for questionanswering on speech transcriptions","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Rosset","year":"2007"},{"key":"2026040314323279400_ref234","first-page":"105","article-title":"Automatically extracting highlights for TV baseball programs","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Rui","year":"2000"},{"issue":"5","key":"2026040314323279400_ref235","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term-weighting approaches in automatic text retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Information Processing and Management"},{"key":"2026040314323279400_ref236","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/3-540-49653-X_24","article-title":"Mixing and merging for spoken document retrieval","volume-title":"Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries","author":"Sanderson","year":"1998"},{"key":"2026040314323279400_ref237","first-page":"505","volume-title":"Advances in Information Retrieval. Proceedings of the European Conference on IR Research","author":"Sanderson","year":"2007"},{"key":"2026040314323279400_ref238","first-page":"129","article-title":"Lattice-based search for spoken utterance retrieval","volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics","author":"Saraclar","year":"2004"},{"key":"2026040314323279400_ref239","first-page":"11\/875","article-title":"Confidence measures for spontaneous speech recognition","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Schaaf","year":"1997"},{"key":"2026040314323279400_ref240","doi-asserted-by":"crossref","first-page":"347","DOI":"10.3115\/1075812.1075897","article-title":"Assessing the retrieval effectiveness of a speech retrieval system by simulating recognition errors","volume-title":"Proceedings of the Workshop on Human Language Technology","author":"Sch\u00e4uble","year":"1994"},{"key":"2026040314323279400_ref241","first-page":"59","article-title":"First experiences with a system for content based retrieval of information from speech recordings","volume-title":"Proceedings of the IJCAI Workshop on Intelligent Multimedia Information Retrieval","author":"Sch\u00e4uble","year":"1995"},{"key":"2026040314323279400_ref242","first-page":"393","article-title":"The intelligent ear: A graphical interface to digital audio","volume-title":"Proceedings of the Internationl Conference on Cybernetics and Society","author":"Schmandt","year":"1981"},{"key":"2026040314323279400_ref243","volume-title":"PhD thesis","author":"Schneider","year":"2011"},{"key":"2026040314323279400_ref244","first-page":"1\/577","article-title":"Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Schuller","year":"2004"},{"key":"2026040314323279400_ref245","first-page":"319","article-title":"Experiments in spoken document retrieval at CMU","volume-title":"Proceedings of the Text Retrieval Conference","author":"Siegler","year":"1998"},{"key":"2026040314323279400_ref246","first-page":"1\/505","article-title":"Improving the suitability of imperfect transcriptions for information retrieval from spoken documents","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Siegler","year":"1999"},{"key":"2026040314323279400_ref247","volume-title":"PhD thesis","author":"A. Siegler","year":"1999"},{"key":"2026040314323279400_ref248","first-page":"46","article-title":"Integration of metadata in spoken document search using position specific posterior latices","volume-title":"Proceedings of the IEEE Spoken Language Technology Workshop","author":"Silva","year":"2006"},{"key":"2026040314323279400_ref249","first-page":"21","article-title":"Pivoted document length normalization","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Singhal","year":"1996"},{"key":"2026040314323279400_ref250","first-page":"239","article-title":"AT&T at TREC-7","volume-title":"Proceedings of the Text REtrieval Conference","author":"Singhal","year":"1999"},{"key":"2026040314323279400_ref251","first-page":"34","article-title":"Document expansion for speech retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Singhal","year":"1999"},{"key":"2026040314323279400_ref252","first-page":"53","article-title":"Fast vocabulary-independent audio search using path-based graph indexing","volume-title":"Proceedings of Interspeech","author":"Siohan","year":"2005"},{"key":"2026040314323279400_ref253","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1145\/276675.276747","article-title":"A graphical interface for speech-based retrieval","volume-title":"Proceedings of the ACM Conference on Digital Libraries","author":"Slaughter","year":"1998"},{"key":"2026040314323279400_ref254","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1007\/3-540-49653-X_26","volume-title":"Research and Advanced Technology for Digital Libraries","author":"F. Smeaton","year":"1998"},{"key":"2026040314323279400_ref255","first-page":"321","article-title":"Evaluation campaigns and TRECVid","volume-title":"Proceedings of the ACM International Workshop on Multimedia Information Retrieval","author":"F. Smeaton","year":"2006"},{"issue":"4","key":"2026040314323279400_ref256","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1016\/0306-4573(95)00077-1","article-title":"Experiments in spoken document retrieval","volume":"32","author":"Sp\u00e4rck Jones","year":"1996","journal-title":"Information Processing and Management"},{"key":"2026040314323279400_ref257","first-page":"81","article-title":"Phonetic confusion matrix based spoken document retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Srinivasan","year":"2000"},{"key":"2026040314323279400_ref258","first-page":"1069","article-title":"ASR satisficing: The effects of ASR accuracy on speech retrieval","volume-title":"Proceedings of Interspeech","author":"A. Stark","year":"2000"},{"issue":"3","key":"2026040314323279400_ref259","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1162\/089120100561737","article-title":"Dialogue act modeling for automatic tagging and recognition of conversational speech","volume":"26","author":"Sto\u00efcke","year":"1999","journal-title":"Computational Linguistics"},{"key":"2026040314323279400_ref260","first-page":"61","article-title":"Combining words and speech prosody for automatic topic segmentation","volume-title":"Proceedings of DARPA Broadcast News Transcription and Understanding Workshop","author":"Sto\u00efcke","year":"1999"},{"issue":"2-4","key":"2026040314323279400_ref261","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/S0167-6393(99)00038-2","article-title":"Modeling pronunciation variation for ASR: A survey of the literature","volume":"29","author":"Strik","year":"1999","journal-title":"Speech Communication"},{"issue":"3","key":"2026040314323279400_ref262","doi-asserted-by":"crossref","DOI":"10.1145\/2328967.2328971","article-title":"Comparison of methods for language-dependent and language-independent Query-by-Example spoken term detection","volume":"30","author":"Tejedor","year":"2012","journal-title":"ACM Transactions on Information Systems"},{"issue":"11-12","key":"2026040314323279400_ref263","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1016\/j.specom.2008.03.005","article-title":"A comparison of grapheme and phoneme-based units for Spanish spoken term detection","volume":"50","author":"Tejedor","year":"2008","journal-title":"Speech Communication"},{"issue":"1","key":"2026040314323279400_ref264","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1109\/TASL.2006.872615","article-title":"Rapid yet accurate speech indexing using dynamic match lattice spotting","volume":"15","author":"Thambiratnam","year":"2007","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040314323279400_ref265","article-title":"A study of users\u2019 perception of relevance of spoken documents","volume-title":"Technical Report TR-99-013","author":"Tombros","year":"1999"},{"issue":"5","key":"2026040314323279400_ref266","doi-asserted-by":"crossref","first-page":"1557","DOI":"10.1109\/TASL.2006.878256","article-title":"An overview of automatic speaker diarization systems","volume":"14","author":"E. Tranter","year":"2006","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040314323279400_ref267","first-page":"230","article-title":"Automatic genre identification for content-based video categorization","volume-title":"Proceedings of the International Conference on Pattern Recognition","author":"T. Truong","year":"2000"},{"key":"2026040314323279400_ref268","first-page":"773","article-title":"Term clouds as surrogates for user generated speech","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Tsagkias","year":"2008"},{"key":"2026040314323279400_ref269","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1007\/978-3-540-85853-9_21","volume-title":"Machine Learning for Multimodal Interaction","author":"Tucker","year":"2008"},{"issue":"4","key":"2026040314323279400_ref270","doi-asserted-by":"crossref","DOI":"10.1109\/TASL.2008.916527","article-title":"Temporal compression of speech: An evaluation","volume":"16","author":"Tucker","year":"2008","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"2026040314323279400_ref271","first-page":"631","article-title":"Indexing confusion networks for morph-based spoken document retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"T. Turunen","year":"2007"},{"key":"2026040314323279400_ref272","first-page":"45","article-title":"Data-oriented methods for grapheme-to-phoneme conversion","volume-title":"Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics","author":"van den Bosch","year":"1993"},{"key":"2026040314323279400_ref273","volume-title":"Information Retrieval","author":"J. van Rijsbergen","year":"1979"},{"issue":"6","key":"2026040314323279400_ref274","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1109\/TMM.2007.902882","article-title":"Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling","volume":"9","author":"Vinciarelli","year":"2007","journal-title":"IEEE Transactions on Multimedia"},{"key":"2026040314323279400_ref275","first-page":"567","article-title":"Retrieval from spoken documents using content and speaker information","volume-title":"Proceedings of the International Conference on Document Analysis and Recognition","author":"Viswanathan","year":"1999"},{"issue":"3","key":"2026040314323279400_ref276","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1023\/A:1009980820262","article-title":"Fusion via a linear combination of scores","volume":"1","author":"C. Vogt","year":"1999","journal-title":"Information Retrieval"},{"key":"2026040314323279400_ref277","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"M. Voorhees","year":"2005"},{"issue":"2","key":"2026040314323279400_ref278","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1145\/328236.328144","article-title":"Complementary video and audio analysis for broadcast news archives","volume":"43","author":"D. Wactlar","year":"2000","journal-title":"Communications of the ACM"},{"key":"2026040314323279400_ref279","volume-title":"Readings in Speech Recognition","author":"Waibel","year":"1990"},{"key":"2026040314323279400_ref280","volume-title":"PhD thesis","author":"Wang","year":"2009"},{"issue":"3","key":"2026040314323279400_ref281","doi-asserted-by":"crossref","DOI":"10.1145\/2328967.2328969","article-title":"Direct posterior confidence estimation for out-of-vocabulary spoken term detection","volume":"30","author":"Wang","year":"2012","journal-title":"ACM Transactions on Information System"},{"issue":"1-2","key":"2026040314323279400_ref282","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/S0167-6393(00)00023-6","article-title":"Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese","volume":"32","author":"Wang","year":"2000","journal-title":"Speech Commununication"},{"issue":"6-7","key":"2026040314323279400_ref283","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1016\/S0167-8655(00)00026-X","article-title":"Mandarin spoken document retrieval based on syllable lattice matching","volume":"21","author":"Wang","year":"2000","journal-title":"Pattern Recognition Letters"},{"key":"2026040314323279400_ref284","first-page":"577","article-title":"Is word error rate a good indicator for spoken language understanding accuracy","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Wang","year":"2003"},{"issue":"3","key":"2026040314323279400_ref285","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/MSP.2008.918411","article-title":"An introduction to voice search","volume":"25","author":"Wang","year":"2008","journal-title":"IEEE Signal Processing Magazine"},{"key":"2026040314323279400_ref286","first-page":"287","article-title":"Topic spotting using subword units","volume-title":"9. Aachener Kolloquium \u201cSignaltheorie\u201d Bildund Sprachsignale","author":"Warnke","year":"1997"},{"key":"2026040314323279400_ref287","article-title":"Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation","author":"L. Wayne","year":"2000"},{"key":"2026040314323279400_ref288","first-page":"20","article-title":"New techniques for openvocabulary spoken document retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Wechsler","year":"1998"},{"issue":"3","key":"2026040314323279400_ref289","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1023\/A:1026512724855","article-title":"New approaches to spoken document retrieval","volume":"3","author":"Wechsler","year":"2000","journal-title":"Information Retrieval"},{"key":"2026040314323279400_ref290","first-page":"1\/297","article-title":"LVCSR log-likelihood ratio scoring for keyword spotting","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Weintraub","year":"1995"},{"key":"2026040314323279400_ref291","first-page":"16","article-title":"Effect of speaking style on LVCSR performance","volume-title":"Proceedings of the International Conference on Spoken Language Processing","author":"Weintraub","year":"1996"},{"key":"2026040314323279400_ref292","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1007\/978-3-540-30568-2_2","volume-title":"Machine Learning for Multimodal Interaction","author":"Wellner","year":"2005"},{"key":"2026040314323279400_ref293","doi-asserted-by":"crossref","DOI":"10.1145\/1056808.1057082","article-title":"A meeting browser evaluation test","volume-title":"Computer-Human Interaction Extended Abstracts on Human Factors in Computing Systems","author":"Wellner","year":"2005"},{"issue":"3","key":"2026040314323279400_ref294","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1109\/89.906002","article-title":"Confidence measures for large vocabulary continuous speech recognition","volume":"9","author":"Wessel","year":"2001","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref295","doi-asserted-by":"crossref","first-page":"744","DOI":"10.1007\/11878773_82","volume-title":"Accessing Multilingual Information Repositories","author":"W. White","year":"2006"},{"key":"2026040314323279400_ref296","first-page":"315","article-title":"Vocabulary independent speech recognition using particles","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"W. D. Whittaker","year":"2001"},{"key":"2026040314323279400_ref297","first-page":"275","article-title":"Scanmail: A voicemail interface that makes speech browsable readable and searchable","volume-title":"Proceedings of the Special Interest Group on Computer-Human Interaction (SIGCHI) Conference on Human Factors in Computing Systems","author":"Whittaker","year":"2002"},{"key":"2026040314323279400_ref298","first-page":"26","article-title":"SCAN: Designing and evaluating user interfaces to support retrieval from speech archives","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"Whittaker","year":"1999"},{"issue":"3","key":"2026040314323279400_ref299","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/s00779-007-0146-3","article-title":"Design and evaluation of systems to support interaction capture and retrieval","volume":"12","author":"Whittaker","year":"2008","journal-title":"Personal Ubiquitous Computing"},{"key":"2026040314323279400_ref300","first-page":"1\/161","article-title":"Segmentation of speech using speaker identification","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"Wilcox","year":"1994"},{"key":"2026040314323279400_ref301","first-page":"25","article-title":"HMM-based wordspotting for voice editing and indexing","volume-title":"Proceedings of Euro speech","author":"D. Wilcox","year":"1991"},{"key":"2026040314323279400_ref302","first-page":"3241","article-title":"Confidence measures for HMM-based speech recognition","volume-title":"Proceedings of the International Conference on Spoken Language Processing","author":"Willett","year":"1998"},{"key":"2026040314323279400_ref303","article-title":"Speech recognition and information retrieval: Experiments in retrieving spoken documents","volume-title":"Proceedings of the DARPA Speech Recognition Workshop","author":"J. Witbrock","year":"1997"},{"key":"2026040314323279400_ref304","first-page":"30","article-title":"Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents","volume-title":"Proceedings of the ACM International Conference on Digital Libraries","author":"J. Witbrock","year":"1997"},{"key":"2026040314323279400_ref305","volume-title":"Managing Gigabytes: Compressing and Indexing Documents and Images","author":"H. Witten","year":"1999"},{"key":"2026040314323279400_ref306","first-page":"372","article-title":"Effects of out of vocabulary words in spoken document retrieval","volume-title":"Proceedings of the International ACM Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval","author":"C. Woodland","year":"2000"},{"key":"2026040314323279400_ref307","doi-asserted-by":"crossref","first-page":"2805","DOI":"10.21437\/Eurospeech.2003-747","article-title":"Spotting \u201cHot Spots\u201d in meetings: Human judgments and prosodic cues","volume-title":"Proceeindgs of Eurospeech","author":"Wrede","year":"2003"},{"issue":"1","key":"2026040314323279400_ref308","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1109\/LSP.2008.2008490","article-title":"Speech-annotated photo retrieval using syllable-transformed patterns","volume":"16","author":"Wu","year":"2009","journal-title":"IEEE Signal Processing Letters"},{"key":"2026040314323279400_ref309","first-page":"693","article-title":"A fast-match approach for robust faster than real-time speaker diarization","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Yan","year":"2007"},{"key":"2026040314323279400_ref310","first-page":"632","article-title":"VideoQA: question answering on news video","volume-title":"Proceedings of the ACM International Conference on Multimedia","author":"Yang","year":"2003"},{"key":"2026040314323279400_ref311","first-page":"H\/21","article-title":"Detecting misrecognitions and out-of-vocabulary words","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing","author":"R. Young","year":"1994"},{"issue":"5","key":"2026040314323279400_ref312","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1109\/TSA.2005.851881","article-title":"Vocabulary-independent indexing of spontaneous speech","volume":"13","author":"Yu","year":"2005","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"2026040314323279400_ref313","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-3339-6","volume-title":"Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing","author":"Zhang","year":"2001"},{"key":"2026040314323279400_ref314","first-page":"67","article-title":"Heuristic approach for generic audio data segmentation and annotation","volume-title":"Proceedings of the ACM International Conference on Multimedia (Part 7)","author":"Zhang","year":"1999"},{"key":"2026040314323279400_ref315","first-page":"415","article-title":"Towards spoken-document retrieval for the internet: Lattice indexing for large-scale web-search architectures","volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics","author":"Zhou","year":"2006"},{"key":"2026040314323279400_ref316","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1007\/11965152_35","volume-title":"Machine Learning for Multimodal Interaction","author":"Zhu","year":"2006"},{"issue":"5","key":"2026040314323279400_ref317","doi-asserted-by":"crossref","first-page":"1490","DOI":"10.1109\/TASL.2006.882751","article-title":"Introduction to the special section on Rich Transcription","volume":"14","author":"Zweig","year":"2006","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"}],"container-title":["Foundations and Trends\u00ae in Information Retrieval"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftinr\/article-pdf\/5\/4-5\/235\/11085594\/1500000020en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftinr\/article-pdf\/5\/4-5\/235\/11085594\/1500000020en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T18:34:08Z","timestamp":1775241248000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftinr\/article\/5\/4-5\/235\/1328667\/Spoken-Content-Retrieval-A-Survey-of-Techniques"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,23]]},"references-count":317,"journal-issue":{"issue":"4-5","published-print":{"date-parts":[[2012,7,23]]}},"URL":"https:\/\/doi.org\/10.1561\/1500000020","relation":{},"ISSN":["1554-0669","1554-0677"],"issn-type":[{"value":"1554-0669","type":"print"},{"value":"1554-0677","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,23]]}}}