{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T12:42:28Z","timestamp":1761396148883,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":76,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T00:00:00Z","timestamp":1635206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006785","name":"Google PhD Fellowship","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Second Language Testing Inc. (SLTI)"},{"name":"Infosys Center for AI"},{"name":"Center for Design and New Media at IIIT-Delhii"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,26]]},"DOI":"10.1145\/3459637.3482395","type":"proceedings-article","created":{"date-parts":[[2021,10,30]],"date-time":"2021-10-30T18:33:14Z","timestamp":1635618794000},"page":"1681-1691","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring"],"prefix":"10.1145","author":[{"given":"Yaman Kumar","family":"Singla","sequence":"first","affiliation":[{"name":"IIIT-Delhi, Adobe, &amp; SUNY-Buffalo, Delhi, India"}]},{"given":"Avyakt","family":"Gupta","sequence":"additional","affiliation":[{"name":"IIIT-Delhi, Delhi, India"}]},{"given":"Shaurya","family":"Bagga","sequence":"additional","affiliation":[{"name":"IIIT-Delhi, Delhi, India"}]},{"given":"Changyou","family":"Chen","sequence":"additional","affiliation":[{"name":"SUNY-Buffalo, Buffalo, NY, USA"}]},{"given":"Balaji","family":"Krishnamurthy","sequence":"additional","affiliation":[{"name":"Adobe, Delhi, India"}]},{"given":"Rajiv Ratn","family":"Shah","sequence":"additional","affiliation":[{"name":"IIIT-Delhi, Delhi, India"}]}],"member":"320","published-online":{"date-parts":[[2021,10,30]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045410"},{"volume-title":"Common Voice: A Massively-Multilingual Speech Corpus. In LREC.","year":"2020","author":"Ardila Rosana","key":"e_1_3_2_2_2_1"},{"volume-title":"wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. CoRR","year":"2020","author":"Baevski Alexei","key":"e_1_3_2_2_3_1"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Pakhi Bamdev Manraj Singh Grover Yaman Kumar Singla Rajiv Ratn Shah Payman Vafaee and Mika Hama. To appear in 2021. Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency. International Journal of Artificial Intelligence in Education ( To appear in 2021). Pakhi Bamdev Manraj Singh Grover Yaman Kumar Singla Rajiv Ratn Shah Payman Vafaee and Mika Hama. To appear in 2021. Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency. International Journal of Artificial Intelligence in Education ( To appear in 2021).","DOI":"10.1007\/s40593-022-00291-5"},{"volume-title":"Threats to score meaning in automated scoring. Validation of score meaning for the next generation of assessments: The use of response processes","year":"2017","author":"Bejar Isaac I","key":"e_1_3_2_2_5_1"},{"key":"e_1_3_2_2_6_1","unstructured":"Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https:\/\/www.wandb.com\/ Software available from wandb.com. Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https:\/\/www.wandb.com\/ Software available from wandb.com."},{"key":"e_1_3_2_2_7_1","unstructured":"Paul Boersma and David Weenink. 2009. Praat: doing phonetics by computer (Version 5.1.13). http:\/\/www.praat.org Paul Boersma and David Weenink. 2009. Praat: doing phonetics by computer (Version 5.1.13). http:\/\/www.praat.org"},{"volume-title":"Language Education in Europe: The Common European Framework of Reference","author":"Broeder Peter","key":"e_1_3_2_2_8_1"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462562"},{"volume-title":"End-to-End Neural Network Based Automated Speech Scoring. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6234--6238","year":"2018","author":"Chen Lei","key":"e_1_3_2_2_10_1"},{"key":"e_1_3_2_2_11_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019"},{"volume-title":"Personal VAD: Speaker-conditioned voice activity detection. arXiv preprint arXiv:1908.04284","year":"2019","author":"Ding Shaojin","key":"e_1_3_2_2_12_1"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K17-1017"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.04.005"},{"key":"e_1_3_2_2_15_1","unstructured":"Educational Testing Association (ETA). 2019. A snapshot of the individuals who took the GRE revised general test. Educational Testing Association (ETA). 2019. A snapshot of the individuals who took the GRE revised general test."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/ets2.12147"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.700"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"volume-title":"Multi-modal automated speech scoring using attention fusion. arXiv preprint arXiv:2005.08182","year":"2020","author":"Grover Manraj Singh","key":"e_1_3_2_2_19_1"},{"volume-title":"Teacher workload survey","year":"2016","author":"Higton John","key":"e_1_3_2_2_20_1"},{"key":"e_1_3_2_2_21_1","unstructured":"Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings convolutional neural networks and incremental parsing. (2017). To appear. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings convolutional neural networks and incremental parsing. (2017). To appear."},{"key":"e_1_3_2_2_22_1","unstructured":"Wei-Ning Hsu Yu Zhang Ron J Weiss Heiga Zen Yonghui Wu Yuxuan Wang Yuan Cao Ye Jia Zhifeng Chen Jonathan Shen etal 2018. Hierarchical generative modeling for controllable speech synthesis. arXiv preprint arXiv:1810.07217 (2018). Wei-Ning Hsu Yu Zhang Ron J Weiss Heiga Zen Yonghui Wu Yuxuan Wang Yuan Cao Ye Jia Zhifeng Chen Jonathan Shen et al. 2018. Hierarchical generative modeling for controllable speech synthesis. arXiv preprint arXiv:1810.07217 (2018)."},{"key":"e_1_3_2_2_23_1","unstructured":"Thomas B. Fordham Institute. 2020. Ohio Public School Students. https:\/\/www.ohiobythenumbers.com\/. Thomas B. Fordham Institute. 2020. Ohio Public School Students. https:\/\/www.ohiobythenumbers.com\/."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2019.8852464"},{"volume-title":"Authorship Verification with Personalized Language Models. In International Conference on Text, Speech, and Dialogue. Springer, 248--256","year":"2020","author":"King Milton","key":"e_1_3_2_2_25_1"},{"volume-title":"Kingma and Jimmy Ba","year":"2014","author":"Diederik","key":"e_1_3_2_2_26_1"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2003.1202765"},{"volume-title":"Captum: A unified and generic model interpretability library for PyTorch. arxiv","year":"2020","author":"Kokhlikyan Narine","key":"e_1_3_2_2_28_1"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019662"},{"volume-title":"Di Jin, and Rajiv Ratn Shah.","year":"2020","author":"Kumar Yaman","key":"e_1_3_2_2_30_1"},{"key":"e_1_3_2_2_31_1","volume-title":"Retrieved April","volume":"28","author":"LaFlair Geoffrey T","year":"2019"},{"key":"e_1_3_2_2_32_1","unstructured":"Thi Le. 2020. Testing & Educational Support in the US. https:\/\/my.ibisworld.com\/us\/en\/industry\/61171\/key-statistics. Thi Le. 2020. Testing & Educational Support in the US. https:\/\/my.ibisworld.com\/us\/en\/industry\/61171\/key-statistics."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-0514"},{"volume-title":"Proceedings of the 27th International Conference on Computational Linguistics. 1099--1109","year":"2018","author":"Madnani Nitin","key":"e_1_3_2_2_34_1"},{"key":"e_1_3_2_2_35_1","unstructured":"Margaret Malone. 2000. Simulated Oral Proficiency Interviews: Recent Developments. ERIC Digest. (2000). Margaret Malone. 2000. Simulated Oral Proficiency Interviews: Recent Developments. ERIC Digest. (2000)."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053406"},{"volume-title":"Did the model understand the question? arXiv preprint arXiv:1805.05492","year":"2018","author":"Mudrakarta Pramod Kaushik","key":"e_1_3_2_2_37_1"},{"key":"e_1_3_2_2_38_1","unstructured":"Patrick O'Donnell. 2020. Computers are now grading essays on Ohio's state tests. https:\/\/www.cleveland.com\/metro\/2018\/03\/computers_are_now_grading_essays_on_ohios_state_tests_your_ch.html. Patrick O'Donnell. 2020. Computers are now grading essays on Ohio's state tests. https:\/\/www.cleveland.com\/metro\/2018\/03\/computers_are_now_grading_essays_on_ohios_state_tests_your_ch.html."},{"key":"e_1_3_2_2_39_1","unstructured":"OECD. 2014. Indicator D4: How much time do teachers spend teaching? OECD. 2014. Indicator D4: How much time do teachers spend teaching?"},{"volume-title":"Controlling personality-based stylistic variation with neural natural language generators. arXiv preprint arXiv:1805.08352","year":"2018","author":"Oraby Shereen","key":"e_1_3_2_2_40_1"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.3115\/991566.991598"},{"volume-title":"Librispeech: An ASR corpus based on public domain audio books. 5206--5210. https:\/\/doi.org\/10.1109\/ICASSP.2015.7178964","year":"2015","author":"Panayotov Vassil","key":"e_1_3_2_2_42_1"},{"volume-title":"Changyou Chen, Junyi Jessy Li, and Rajiv Ratn Shah.","year":"2020","author":"Parekh Swapnil","key":"e_1_3_2_2_43_1"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"volume-title":"Rajiv Ratn Shah, Mika Hama, and Roger Zimmermann.","year":"2020","author":"Patil Rajaswa","key":"e_1_3_2_2_45_1"},{"volume-title":"Manning","year":"2014","author":"Pennington Jeffrey","key":"e_1_3_2_2_46_1"},{"volume-title":"The Kaldi Speech Recognition Toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (Hilton Waikoloa Village","year":"2011","author":"Povey Daniel","key":"e_1_3_2_2_47_1"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683717"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2018.8639697"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/EUSIPCO.2015.7362751"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT48900.2021.9383590"},{"key":"e_1_3_2_2_52_1","unstructured":"Education Testing Service. 2020. Education Testing Service EIN 21-0634479. https:\/\/www.causeiq.com\/organizations\/educational-testing-service 210634479\/. Education Testing Service. 2020. Education Testing Service EIN 21-0634479. https:\/\/www.causeiq.com\/organizations\/educational-testing-service 210634479\/."},{"volume-title":"Changyou Chen, and Rajiv Ratn Shah.","year":"2021","author":"Shah Jui","key":"e_1_3_2_2_53_1"},{"volume-title":"Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713","year":"2016","author":"Shrikumar Avanti","key":"e_1_3_2_2_54_1"},{"key":"e_1_3_2_2_55_1","unstructured":"SLTI. 2020. Simulated Oral Proficiency Interview (SOPI) by SLTI. https:\/\/secondlanguagetesting.com\/products-%26-services#5eb17e51--2737--458f-96a1--9101d1e453e4. SLTI. 2020. Simulated Oral Proficiency Interview (SOPI) by SLTI. https:\/\/secondlanguagetesting.com\/products-%26-services#5eb17e51--2737--458f-96a1--9101d1e453e4."},{"volume-title":"Testing aptitude for second language learning. Language Testing and Assessment","year":"2017","author":"Smith Megan","key":"e_1_3_2_2_56_1"},{"key":"e_1_3_2_2_57_1","volume-title":"Testing aptitude for second language learning. Encyclopaedia of language and education","author":"Stansfield Charles","year":"2008","edition":"2"},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-4781.1992.tb01093.x"},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/0346-251X(92)90045-5"},{"volume-title":"Test Development Handbook: Simulated Oral Proficiency Interview,(SOPI)","author":"Stansfield Charles W","key":"e_1_3_2_2_60_1"},{"key":"e_1_3_2_2_61_1","unstructured":"Valerie Strauss. 2020. How much do big education nonprofits pay their bosses? Quite a bit it turns out. https:\/\/www.washingtonpost.com\/news\/answer-sheet\/wp\/2015\/09\/30\/how-much-do-big-education-nonprofits-pay-their-bosses-quite-a-bit-it-turns-out\/. Valerie Strauss. 2020. How much do big education nonprofits pay their bosses? Quite a bit it turns out. https:\/\/www.washingtonpost.com\/news\/answer-sheet\/wp\/2015\/09\/30\/how-much-do-big-education-nonprofits-pay-their-bosses-quite-a-bit-it-turns-out\/."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3306024"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1193"},{"volume-title":"SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring. (11","year":"2017","author":"Tay Yi","key":"e_1_3_2_2_64_1"},{"key":"e_1_3_2_2_65_1","unstructured":"TechNavio. 2020. Global Higher Education Testing and Assessment Market 2020--2024. https:\/\/www.researchandmarkets.com\/reports\/5136950\/global-higher-education-testing-and-assessment. TechNavio. 2020. Global Higher Education Testing and Assessment Market 2020--2024. https:\/\/www.researchandmarkets.com\/reports\/5136950\/global-higher-education-testing-and-assessment."},{"volume-title":"US Department of Education","year":"2016","author":"Thomas Susan","key":"e_1_3_2_2_66_1"},{"volume-title":"UTAH STATE BOARD OF EDUCATION 2018--19 FINGERTIP FACTS. https:\/\/www.ets.org\/s\/gre\/pdf\/gre_guide_table1a.pdf.","year":"2020","author":"USBE.","key":"e_1_3_2_2_67_1"},{"key":"e_1_3_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT48900.2021.9383553"},{"key":"e_1_3_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1090"},{"key":"e_1_3_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(99)00044-8"},{"volume-title":"Morgan Funtowicz, and Jamie Brew.","year":"2019","author":"Wolf Thomas","key":"e_1_3_2_2_72_1"},{"key":"e_1_3_2_2_73_1","doi-asserted-by":"crossref","unstructured":"Wenting Xiong Keelan Evanini Klaus Zechner and Lei Chen. 2013. Automated content scoring of spoken responses containing multiple parts with factual information. In Speech and Language Technology in Education. Wenting Xiong Keelan Evanini Klaus Zechner and Lei Chen. 2013. Automated content scoring of spoken responses containing multiple parts with factual information. In Speech and Language Technology in Education.","DOI":"10.21437\/SLaTE.2013-24"},{"volume-title":"Handbook of automated scoring: Theory into practice","author":"Yan Duanli","key":"e_1_3_2_2_74_1"},{"key":"e_1_3_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00062"},{"key":"e_1_3_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.04.009"}],"event":{"name":"CIKM '21: The 30th ACM International Conference on Information and Knowledge Management","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"],"location":"Virtual Event Queensland Australia","acronym":"CIKM '21"},"container-title":["Proceedings of the 30th ACM International Conference on Information &amp; Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459637.3482395","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3459637.3482395","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:23Z","timestamp":1750191143000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459637.3482395"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,26]]},"references-count":76,"alternative-id":["10.1145\/3459637.3482395","10.1145\/3459637"],"URL":"https:\/\/doi.org\/10.1145\/3459637.3482395","relation":{},"subject":[],"published":{"date-parts":[[2021,10,26]]},"assertion":[{"value":"2021-10-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}