{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,5]],"date-time":"2025-08-05T12:26:47Z","timestamp":1754396807332,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,11,19]],"date-time":"2016-11-19T00:00:00Z","timestamp":1479513600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001659","name":"German Research Foundation","doi-asserted-by":"crossref","award":["UbiCrypt (GRK 1817\/1)"],"award-info":[{"award-number":["UbiCrypt (GRK 1817\/1)"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2017,2,3]]},"abstract":"<jats:p>A so-called completely automated public Turing test to tell computers and humans apart (CAPTCHA) represents a challenge-response test that is widely used on the Internet to distinguish human users from fraudulent computer programs, often referred to as bots. To enable access for visually impaired users, most Web sites utilize audio CAPTCHAs in addition to a conventional image-based scheme. Recent research has shown that most currently available audio CAPTCHAs are insecure, as they can be broken by means of machine learning at relatively low costs. Moreover, most audio CAPTCHAs suffer from low human success rates that arise from severe signal distortions.<\/jats:p>\n          <jats:p>This article proposes two different audio CAPTCHA schemes that systematically exploit differences between humans and computers in terms of auditory perception and language understanding, yielding a better trade-off between usability and security as compared to currently available schemes. Furthermore, we provide an elaborate analysis of Google\u2019s prominent reCAPTCHA that serves as a baseline setting when evaluating our proposed CAPTCHA designs.<\/jats:p>","DOI":"10.1145\/2856820","type":"journal-article","created":{"date-parts":[[2016,11,21]],"date-time":"2016-11-21T14:01:46Z","timestamp":1479736906000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Toward Improved Audio CAPTCHAs Based on Auditory Perception and Language Understanding"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5403-5642","authenticated-orcid":false,"given":"Hendrik","family":"Meutzner","sequence":"first","affiliation":[{"name":"Ruhr-University Bochum, Bochum, Germany"}]},{"given":"Santosh","family":"Gupta","sequence":"additional","affiliation":[{"name":"Ruhr-University Bochum"}]},{"given":"Viet-Hung","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Ruhr-University Bochum, Bochum, Germany"}]},{"given":"Thorsten","family":"Holz","sequence":"additional","affiliation":[{"name":"Ruhr-University Bochum, Bochum, Germany"}]},{"given":"Dorothea","family":"Kolossa","sequence":"additional","affiliation":[{"name":"Ruhr-University Bochum, Bochum, Germany"}]}],"member":"320","published-online":{"date-parts":[[2016,11,19]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1518701.1518983"},{"key":"e_1_2_2_2_1","volume-title":"Simon","author":"Bohr Sonja","year":"2008","unstructured":"Sonja Bohr , Andrea Shome , and Jonathan Z . Simon . 2008 . Improving Auditory CAPTCHA Security. Technical Report. A. James Clark School of Engineering, College Park , MD. Sonja Bohr, Andrea Shome, and Jonathan Z. Simon. 2008. Improving Auditory CAPTCHA Security. Technical Report. A. James Clark School of Engineering, College Park, MD."},{"volume-title":"Auditory Scene Analysis: The Perceptual Organization of Sound","author":"Bregman Albert S.","key":"e_1_2_2_3_1","unstructured":"Albert S. Bregman . 1994. Auditory Scene Analysis: The Perceptual Organization of Sound . MIT Press , Cambridge, MA . Albert S. Bregman. 1994. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge, MA."},{"volume-title":"Proceedings of the USENIX Workshop on Offensive Technologies (WOOT\u201914)","author":"Bursztein Elie","key":"e_1_2_2_4_1","unstructured":"Elie Bursztein , Jonathan Aigrain , Angelika Moscicki , and John C. Mitchell . 2014. The end is nigh: Generic solving of text-based CAPTCHAs . In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT\u201914) . Elie Bursztein, Jonathan Aigrain, Angelika Moscicki, and John C. Mitchell. 2014. The end is nigh: Generic solving of text-based CAPTCHAs. In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT\u201914)."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2011.14"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the USENIX Workshop on Offensive Technologies (WOOT\u201909)","author":"Bursztein Elie","year":"2009","unstructured":"Elie Bursztein and Steven Bethard . 2009 . Decaptcha breaking 75% of eBay audio CAPTCHAs . In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT\u201909) . Elie Bursztein and Steven Bethard. 2009. Decaptcha breaking 75% of eBay audio CAPTCHAs. In Proceedings of the USENIX Workshop on Offensive Technologies (WOOT\u201909)."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2046707.2046724"},{"key":"e_1_2_2_8_1","volume-title":"Retrieved","author":"Carnegie Mellon University","year":"2014","unstructured":"Carnegie Mellon University . 2014 . The Carnegie Mellon University Pronouncing Dictionary, CMUdict (v. 0.7a) . Retrieved October 19, 2016, from http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict. Carnegie Mellon University. 2014. The Carnegie Mellon University Pronouncing Dictionary, CMUdict (v. 0.7a). Retrieved October 19, 2016, from http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.1999.777511"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/11427896_1"},{"key":"e_1_2_2_11_1","volume-title":"Retrieved","author":"CrowdFlower Inc.","year":"2015","unstructured":"CrowdFlower Inc. 2015 . CrowdFlower Home Page . Retrieved October 19, 2016, from http:\/\/www.crowdflower.com. CrowdFlower Inc. 2015. CrowdFlower Home Page. Retrieved October 19, 2016, from http:\/\/www.crowdflower.com."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1251375.1251396"},{"volume-title":"Speech Separation by Humans and Machines","author":"Divenyi Pierre","key":"e_1_2_2_13_1","unstructured":"Pierre Divenyi . 2005. Speech Separation by Humans and Machines . Springer . Pierre Divenyi. 2005. Speech Separation by Humans and Machines. Springer."},{"volume-title":"Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment","author":"Eskenazi Maxine","key":"e_1_2_2_14_1","unstructured":"Maxine Eskenazi , Gina-Anne Levow , Helen Meng , Gabriel Parent , and David Suendermann . 2013. Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment . Wiley . Maxine Eskenazi, Gina-Anne Levow, Helen Meng, Gabriel Parent, and David Suendermann. 2013. Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment. Wiley."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMSP.2006.285353"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1561\/2000000004"},{"key":"e_1_2_2_17_1","unstructured":"Google Inc. 2015a. Google Web Search. Available at http:\/\/www.google.com.  Google Inc. 2015a. Google Web Search. Available at http:\/\/www.google.com."},{"key":"e_1_2_2_18_1","volume-title":"Retrieved","author":"Google Inc.","year":"2015","unstructured":"Google Inc. 2015 b. reCAPTCHA Home Page . Retrieved October 19, 2016, from http:\/\/www.recaptcha.net. Google Inc. 2015b. reCAPTCHA Home Page. Retrieved October 19, 2016, from http:\/\/www.recaptcha.net."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1978.10837"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1985.1168384"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/WCRE.2008.35"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2205597"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1038\/scientificamerican0611-44"},{"key":"e_1_2_2_24_1","volume-title":"Proceedings of the INTERSPEECH Conference.","author":"Kochanski Greg","year":"2002","unstructured":"Greg Kochanski , Daniel P. Lopresti , and Chilin Shih . 2002 . A reverse Turing test using speech . In Proceedings of the INTERSPEECH Conference. Greg Kochanski, Daniel P. Lopresti, and Chilin Shih. 2002. A reverse Turing test using speech. In Proceedings of the INTERSPEECH Conference."},{"key":"e_1_2_2_25_1","volume-title":"Black","author":"Kominek John","year":"2003","unstructured":"John Kominek and Alan W . Black . 2003 . CMU Arctic Databases for Speech Synthesis. Technical Report. Carnegie Mellon University , Pittsburgh, PA. John Kominek and Alan W. Black. 2003. CMU Arctic Databases for Speech Synthesis. Technical Report. Carnegie Mellon University, Pittsburgh, PA."},{"key":"e_1_2_2_26_1","unstructured":"R. Gary Leonard and George Doddington. 1993. TIDIGITS. Linguistic Data Consortium. Available at https:\/\/catalog.ldc.upenn.edu\/ldc93s10.  R. Gary Leonard and George Doddington. 1993. TIDIGITS. Linguistic Data Consortium. Available at https:\/\/catalog.ldc.upenn.edu\/ldc93s10."},{"key":"e_1_2_2_27_1","volume-title":"Proceedings of the International Conference on the Principles of Knowledge Representation and Reasoning.","author":"Levesque Hector J.","year":"2012","unstructured":"Hector J. Levesque , Ernest Davis , and Leora Morgenstern . 2012 . The Winograd schema challenge . In Proceedings of the International Conference on the Principles of Knowledge Representation and Reasoning. Hector J. Levesque, Ernest Davis, and Leora Morgenstern. 2012. The Winograd schema challenge. In Proceedings of the International Conference on the Principles of Knowledge Representation and Reasoning."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1989.1.1.1"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2664243.2664262"},{"volume-title":"Speech Processing in the Auditory System","author":"Morgan Nelson","key":"e_1_2_2_30_1","unstructured":"Nelson Morgan , Herv\u00e9 Bourlard , and Hynek Hermansky . 2004. Automatic speech recognition: An auditory perspective . In Speech Processing in the Auditory System . Springer . Nelson Morgan, Herv\u00e9 Bourlard, and Hynek Hermansky. 2004. Automatic speech recognition: An auditory perspective. In Speech Processing in the Auditory System. Springer."},{"key":"e_1_2_2_31_1","volume-title":"Retrieved","author":"Ogden Charles K.","year":"1930","unstructured":"Charles K. Ogden . 1930 . Ogden\u2019s Basic English . Retrieved October 19, 2016, from http:\/\/ogden.basic-english.org\/basiceng.html. Charles K. Ogden. 1930. Ogden\u2019s Basic English. Retrieved October 19, 2016, from http:\/\/ogden.basic-english.org\/basiceng.html."},{"key":"e_1_2_2_32_1","volume-title":"Proceedings of the ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millennium (ISCA ITRW ASR\u201900)","author":"Pearce David","year":"2000","unstructured":"David Pearce and Hans-G\u00fcnter Hirsch . 2000 . The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions . In Proceedings of the ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millennium (ISCA ITRW ASR\u201900) . David Pearce and Hans-G\u00fcnter Hirsch. 2000. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of the ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millennium (ISCA ITRW ASR\u201900)."},{"volume-title":"Fundamentals of Speech Recognition","author":"Rabiner Lawrence","key":"e_1_2_2_33_1","unstructured":"Lawrence Rabiner and Biing-Hwang Juang . 1993. Fundamentals of Speech Recognition . Prentice Hall . Lawrence Rabiner and Biing-Hwang Juang. 1993. Fundamentals of Speech Recognition. Prentice Hall."},{"key":"e_1_2_2_34_1","volume-title":"Okuno","author":"Sano Shotaro","year":"2013","unstructured":"Shotaro Sano , Takuma Otsuka , and Hiroshi G . Okuno . 2013 . Solving Google\u2019s continuous audio CAPTCHA with HMM-based automatic speech recognition. In Advances in Information and Computer Security. Lecture Notes in Computer Science, Vol. 8231 . Springer , 36--52. Shotaro Sano, Takuma Otsuka, and Hiroshi G. Okuno. 2013. Solving Google\u2019s continuous audio CAPTCHA with HMM-based automatic speech recognition. In Advances in Information and Computer Security. Lecture Notes in Computer Science, Vol. 8231. Springer, 36--52."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003"},{"key":"e_1_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Rituraj Soni and Devendra Tiwari. 2010. Improved CAPTCHA method. International Journal of Computer Applications 25 Article No. 17.  Rituraj Soni and Devendra Tiwari. 2010. Improved CAPTCHA method. International Journal of Computer Applications 25 Article No. 17.","DOI":"10.5120\/451-754"},{"key":"e_1_2_2_38_1","volume-title":"Proceedings of the Symposium on Usable Privacy and Security (SOUPS\u201908)","author":"Tam Jennifer","year":"2008","unstructured":"Jennifer Tam , Jiri Simsa , David Huggins-Daines , Luis von Ahn , and Manuel Blum . 2008 a. Improving audio CAPTCHAs . In Proceedings of the Symposium on Usable Privacy and Security (SOUPS\u201908) . Jennifer Tam, Jiri Simsa, David Huggins-Daines, Luis von Ahn, and Manuel Blum. 2008a. Improving audio CAPTCHAs. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS\u201908)."},{"key":"e_1_2_2_39_1","volume-title":"Proceedings of Advances in Neural Information Processing Systems (NIPS\u201915)","author":"Tam Jennifer","year":"2008","unstructured":"Jennifer Tam , Jiri Simsa , Sean Hyde , and Luis von Ahn . 2008 b. Breaking audio CAPTCHAs . In Proceedings of Advances in Neural Information Processing Systems (NIPS\u201915) . Jennifer Tam, Jiri Simsa, Sean Hyde, and Luis von Ahn. 2008b. Breaking audio CAPTCHAs. In Proceedings of Advances in Neural Information Processing Systems (NIPS\u201915)."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.2307\/25470707"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6393(93)90095-3"},{"volume-title":"Techniques for Noise Robustness in Automatic Speech Recognition","author":"Virtanen Tuomas","key":"e_1_2_2_43_1","unstructured":"Tuomas Virtanen , Rita Singh , and Bhiksha Raj . 2012. Techniques for Noise Robustness in Automatic Speech Recognition . Wiley . Tuomas Virtanen, Rita Singh, and Bhiksha Raj. 2012. Techniques for Noise Robustness in Automatic Speech Recognition. Wiley."},{"key":"e_1_2_2_44_1","volume-title":"CAPTCHA: Using hard AI problems for security. In Advances in Cryptology\u2014EUROCRYPT","author":"von Ahn Luis","year":"2003","unstructured":"Luis von Ahn , Manuel Blum , Nicholas J. Hopper , and John Langford . 2003 . CAPTCHA: Using hard AI problems for security. In Advances in Cryptology\u2014EUROCRYPT 2003. Lecture Notes in Computer Science, Vol. 2656 . Springer , 294--311. Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. 2003. CAPTCHA: Using hard AI problems for security. In Advances in Cryptology\u2014EUROCRYPT 2003. Lecture Notes in Computer Science, Vol. 2656. Springer, 294--311."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/966389.966390"},{"volume-title":"Understanding Natural Language","author":"Winograd Terry","key":"e_1_2_2_46_1","unstructured":"Terry Winograd . 1972. Understanding Natural Language . Academic Press , Orlando, FL . Terry Winograd. 1972. Understanding Natural Language. Academic Press, Orlando, FL."},{"key":"e_1_2_2_47_1","volume-title":"Retrieved","author":"Research Wolfram","year":"2015","unstructured":"Wolfram Research . 2015 . Wolfram Alpha . Retrieved October 19, 2016, from http:\/\/www.wolframalpha.com. Wolfram Research. 2015. Wolfram Alpha. Retrieved October 19, 2016, from http:\/\/www.wolframalpha.com."},{"key":"e_1_2_2_48_1","volume-title":"Proceedings of the ISCA Workshops on Speech Synthesis (SSW\u201910)","author":"Wolters Maria K.","year":"2010","unstructured":"Maria K. Wolters , Karl Isaac , and Steve Renals . 2010 . Evaluating speech synthesis intelligibility using Amazon Mechanical Turk . In Proceedings of the ISCA Workshops on Speech Synthesis (SSW\u201910) . Maria K. Wolters, Karl Isaac, and Steve Renals. 2010. Evaluating speech synthesis intelligibility using Amazon Mechanical Turk. In Proceedings of the ISCA Workshops on Speech Synthesis (SSW\u201910)."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACSAC.2007.47"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1455770.1455839"},{"volume-title":"The HTK Hidden Markov Model Toolkit: Design and Philosophy","author":"Young Steve J.","key":"e_1_2_2_51_1","unstructured":"Steve J. Young . 1994. The HTK Hidden Markov Model Toolkit: Design and Philosophy . Entropic Cambridge Research Laboratory, Ltd. , Cambridge, England. Steve J. Young. 1994. The HTK Hidden Markov Model Toolkit: Design and Philosophy. Entropic Cambridge Research Laboratory, Ltd., Cambridge, England."},{"key":"e_1_2_2_52_1","volume-title":"Proceedings of the ISCA Workshops on Speech Synthesis (SSW\u201907)","author":"Zen Heiga","year":"2007","unstructured":"Heiga Zen , Takashi Nose , Junichi Yamagishi , Shinji Sako , Takashi Masuko , Alan W. Black , and Keiichi Tokuda . 2007 . The HMM-based speech synthesis system (HTS) version 2.0 . In Proceedings of the ISCA Workshops on Speech Synthesis (SSW\u201907) . Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, and Keiichi Tokuda. 2007. The HMM-based speech synthesis system (HTS) version 2.0. In Proceedings of the ISCA Workshops on Speech Synthesis (SSW\u201907)."}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2856820","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2856820","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:39:08Z","timestamp":1750221548000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2856820"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,11,19]]},"references-count":50,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,2,3]]}},"alternative-id":["10.1145\/2856820"],"URL":"https:\/\/doi.org\/10.1145\/2856820","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"type":"print","value":"2471-2566"},{"type":"electronic","value":"2471-2574"}],"subject":[],"published":{"date-parts":[[2016,11,19]]},"assertion":[{"value":"2015-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-11-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}