{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T15:43:59Z","timestamp":1781106239260,"version":"3.54.1"},"reference-count":13,"publisher":"IGI Global Scientific Publishing","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,4,1]]},"abstract":"<p>For large archives of audio media, just as with text archives, indexing is important for allowing quick and accurate searches. Similar to text archives, audio archives can use text for indexing. Generating this text requires using transcripts of the spoken portions of the audio. From them, an alignment can be made that allows users to search for specific content and immediately view the content at the position where the search terms were spoken. Although previous research has addressed this issue, the solutions align the transcripts only in real-time or greater. In this paper, the authors propose AutoCap. It is capable of producing accurate audio indexes in faster than real-time for archived audio and in real-time for live audio. In most cases it takes less than one quarter the original duration for archived audio. This paper discusses the architecture and evaluation of the AutoCap project as well as two of its applications.<\/p>","DOI":"10.4018\/jmdem.2010040101","type":"journal-article","created":{"date-parts":[[2010,6,30]],"date-time":"2010-06-30T17:10:24Z","timestamp":1277917824000},"page":"1-17","source":"Crossref","is-referenced-by-count":3,"title":["Fast Caption Alignment for Automatic Indexing of Audio"],"prefix":"10.4018","volume":"1","author":[{"given":"Allan","family":"Knight","sequence":"first","affiliation":[{"name":"University of California, Santa Barbara, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kevin","family":"Almeroth","sequence":"additional","affiliation":[{"name":"University of California, Santa Barbara, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"2432","reference":[{"key":"jmdem.2010040101-0","unstructured":"Carnegie Mellon University. (2004). Sphinx-4. Retrieved from http:\/\/cmusphinx.sourceforge.net\/sphinx4\/"},{"key":"jmdem.2010040101-1","unstructured":"Clarkson, P. (1999). Statistical Language Modeling Toolkit. Retrieved from http:\/\/www.speech.cs.cmu.edu\/SLM\/CMU-Cam_Toolkit_v2.tar.gz"},{"key":"jmdem.2010040101-2","doi-asserted-by":"crossref","unstructured":"Clarkson, P., & Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge Toolkit. In Proceedings of the European Conference on Speech Communication and Technology \u2013 Eurospeech (pp. 2707-2710).","DOI":"10.21437\/Eurospeech.1997-683"},{"key":"jmdem.2010040101-3","doi-asserted-by":"crossref","unstructured":"Hazen, T. J. (2006). Automatic alignment and error correction of human generated transcripts for long speech recordings. In Proceedings of the International Conference of the International Speech Communication Association \u2013 INTERSPEECH.","DOI":"10.21437\/Interspeech.2006-449"},{"key":"jmdem.2010040101-4","unstructured":"Huang, C. (2003). Automatic closed caption alignment based on speech recognition transcripts (Tech. Rep. No. 005). New York, New York: Columbia University."},{"key":"jmdem.2010040101-5","doi-asserted-by":"publisher","DOI":"10.1006\/csla.1993.1007"},{"key":"jmdem.2010040101-6","first-page":"108","article-title":"Automated closed-captioning using text alignment.","volume":"5307","author":"A. F.Martone","year":"2004","journal-title":"Proceedings of the Society for Photo-Instrumentation Engineers"},{"key":"jmdem.2010040101-7","doi-asserted-by":"crossref","unstructured":"Moreno, P. J., Joerg, C., Thong, J. M., & Van Glickman, O. (1998). A recursive algorithm for the forced alignment of very long audio segments. In Proceedings of the International Conference on Spoken Language Processing.","DOI":"10.21437\/ICSLP.1998-603"},{"key":"jmdem.2010040101-8","doi-asserted-by":"crossref","unstructured":"Placeway, P., & Lafferty, J. (1996). Cheating with imperfect transcripts. In Proceedings of the International Conference on Spoken Language Processing (pp. 2115-2118).","DOI":"10.1109\/ICSLP.1996.607220"},{"key":"jmdem.2010040101-9","doi-asserted-by":"crossref","unstructured":"Robert-Ribes, J., & Mukhtar, R. G. (1997). Automatic generation of hyperlinks between audio and transcript. In Proceedings of the Conference on Speech Communication and Technology \u2013 Eurospeech (pp. 903-906).","DOI":"10.21437\/Eurospeech.1997-300"},{"key":"jmdem.2010040101-10","unstructured":"Sun Microsystems. (2009). Java Speech API. Retrieved from http:\/\/java.sun.com\/products\/java-media\/speech\/"},{"key":"jmdem.2010040101-11","unstructured":"Sun Microsystems. (2009). Java SE Downloads. Retrieved from http:\/\/java.sun.com\/javase\/downloads\/index.jsp"},{"key":"jmdem.2010040101-12","unstructured":"The MPlayer Project. (2008). MPlayer. Retrieved from http:\/\/www.mplayerhq.hu\/design7\/news.html"}],"container-title":["International Journal of Multimedia Data Engineering and Management"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=43745","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,1]],"date-time":"2023-06-01T16:53:56Z","timestamp":1685638436000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jmdem.2010040101"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2010,4,1]]},"references-count":13,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2010,4]]}},"URL":"https:\/\/doi.org\/10.4018\/jmdem.2010040101","relation":{},"ISSN":["1947-8534","1947-8542"],"issn-type":[{"value":"1947-8534","type":"print"},{"value":"1947-8542","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,4,1]]}}}