{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T02:48:24Z","timestamp":1772592504595,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,3,17]],"date-time":"2020-03-17T00:00:00Z","timestamp":1584403200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,3,17]]},"DOI":"10.1145\/3377325.3377515","type":"proceedings-article","created":{"date-parts":[[2020,3,4]],"date-time":"2020-03-04T23:14:49Z","timestamp":1583363689000},"page":"22-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["VASTA"],"prefix":"10.1145","author":[{"given":"Alborz Rezazadeh","family":"Sereshkeh","sequence":"first","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gary","family":"Leung","sequence":"additional","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Krish","family":"Perumal","sequence":"additional","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Caleb","family":"Phillips","sequence":"additional","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minfan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Afsaneh","family":"Fazly","sequence":"additional","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Iqbal","family":"Mohomed","sequence":"additional","affiliation":[{"name":"Samsung AI Centre Toronto"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,3,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Amos Azaria Jayant Krishnamurthy and Tom M Mitchell. 2016. Instructable Intelligent Personal Agent.. In AAAI. 2681--2689.  Amos Azaria Jayant Krishnamurthy and Tom M Mitchell. 2016. Instructable Intelligent Personal Agent.. In AAAI. 2681--2689.","DOI":"10.1609\/aaai.v30i1.10357"},{"key":"e_1_3_2_1_2_1","first-page":"72","article-title":"Autonomous Learning of User's Preferences Improved through User Feedback","volume":"396","author":"Aztiria Asier","year":"2008","unstructured":"Asier Aztiria , Juan Carlos Augusto , and Alberto Izaguirre . 2008 . Autonomous Learning of User's Preferences Improved through User Feedback . BMI 396 (2008), 72 -- 86 . Asier Aztiria, Juan Carlos Augusto, and Alberto Izaguirre. 2008. Autonomous Learning of User's Preferences Improved through User Feedback. BMI 396 (2008), 72--86.","journal-title":"BMI"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380116.2380129"},{"key":"e_1_3_2_1_4_1","volume-title":"Towards zero-shot frame semantic parsing for domain scaling. arXiv preprint arXiv:1707.02363","author":"Bapna Ankur","year":"2017","unstructured":"Ankur Bapna , Gokhan Tur , Dilek Hakkani-Tur , and Larry Heck . 2017. Towards zero-shot frame semantic parsing for domain scaling. arXiv preprint arXiv:1707.02363 ( 2017 ). Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, and Larry Heck. 2017. Towards zero-shot frame semantic parsing for domain scaling. arXiv preprint arXiv:1707.02363 (2017)."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533--1544","author":"Berant Jonathan","year":"2013","unstructured":"Jonathan Berant , Andrew Chou , Roy Frostig , and Percy Liang . 2013 . Semantic parsing on freebase from question-answer pairs . In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533--1544 . Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533--1544."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052562"},{"key":"e_1_3_2_1_7_1","volume-title":"Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil.","author":"Cer Daniel","year":"2018","unstructured":"Daniel Cer , Yinfei Yang , Sheng yi Kong , Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018 . Universal Sentence Encoder , Vol . https:\/\/arxiv.org\/pdf\/1803.11175.pdf. Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder, Vol. https:\/\/arxiv.org\/pdf\/1803.11175.pdf."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of LREC","volume":"6","author":"De Marneffe Marie-Catherine","year":"2006","unstructured":"Marie-Catherine De Marneffe , Bill MacCartney , Christopher D Manning , and others. 2006 . Generating typed dependency parses from phrase structure parses . In Proceedings of LREC , Vol. 6 . Genoa Italy, 449--454. Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, and others. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, Vol. 6. Genoa Italy, 449--454."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126594.3126651"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2984511.2984581"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1753326.1753554"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/6462.6502"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347873"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2015-70"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1719970.1719973"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025171.3025176"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025483"},{"key":"e_1_3_2_1_18_1","volume-title":"PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations. arXiv preprint arXiv.1909.00031","author":"Jia-Jun Li Toby","year":"2019","unstructured":"Toby Jia-Jun Li , Marissa Radensky , Justin Jia , Kirielle Singarajah , Tom M Mitchell , and Brad A Myers . 2019 . PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations. arXiv preprint arXiv.1909.00031 (2019). Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M Mitchell, and Brad A Myers. 2019. PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations. arXiv preprint arXiv.1909.00031 (2019)."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 96--109","author":"Jia-Jun Li Toby","year":"2018","unstructured":"Toby Jia-Jun Li and Oriana Riva . 2018 . KITE: Building conversational bots from mobile apps . In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 96--109 . Toby Jia-Jun Li and Oriana Riva. 2018. KITE: Building conversational bots from mobile apps. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 96--109."},{"key":"e_1_3_2_1_20_1","volume-title":"Focal loss for dense object detection","author":"Lin Tsung-Yi","year":"2018","unstructured":"Tsung-Yi Lin , Priyal Goyal , Ross Girshick , Kaiming He , and Piotr Doll\u00e1r . 2018. Focal loss for dense object detection . IEEE transactions on pattern analysis and machine intelligence ( 2018 ). Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, and Piotr Doll\u00e1r. 2018. Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence (2018)."},{"key":"e_1_3_2_1_21_1","volume-title":"Learning Design Semantics for Mobile Apps. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 569--579","author":"Liu Thomas F","year":"2018","unstructured":"Thomas F Liu , Mark Craft , Jason Situ , Ersin Yumer , Radomir Mech , and Ranjitha Kumar . 2018 . Learning Design Semantics for Mobile Apps. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 569--579 . Thomas F Liu, Mark Craft, Jason Situ, Ersin Yumer, Radomir Mech, and Ranjitha Kumar. 2018. Learning Design Semantics for Mobile Apps. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 569--579."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2493190.2493216"},{"key":"e_1_3_2_1_23_1","volume-title":"Manning","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D . Manning . 2014 . GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP) . 1532--1543. http:\/\/www.aclweb.org\/anthology\/D14-1162 Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http:\/\/www.aclweb.org\/anthology\/D14-1162"},{"key":"e_1_3_2_1_24_1","volume-title":"Pixel data access: interprocess communication in the user interface for end-user programming and graphical macros","author":"Potter Richard Lee","unstructured":"Richard Lee Potter . 1999. Pixel data access: interprocess communication in the user interface for end-user programming and graphical macros . University of Maryland at College Park. Richard Lee Potter. 1999. Pixel data access: interprocess communication in the user interface for end-user programming and graphical macros. University of Maryland at College Park."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1971.10482356"},{"key":"e_1_3_2_1_26_1","volume-title":"Universal semantic parsing. arXiv preprint arXiv:1702.03196","author":"Reddy Siva","year":"2017","unstructured":"Siva Reddy , Oscar T\u00e4ckstr\u00f6m , Slav Petrov , Mark Steedman , and Mirella Lapata . 2017. Universal semantic parsing. arXiv preprint arXiv:1702.03196 ( 2017 ). Siva Reddy, Oscar T\u00e4ckstr\u00f6m, Slav Petrov, Mark Steedman, and Mirella Lapata. 2017. Universal semantic parsing. arXiv preprint arXiv:1702.03196 (2017)."},{"key":"e_1_3_2_1_27_1","volume-title":"Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi . 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 ( 2018 ). Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)."},{"key":"e_1_3_2_1_28_1","volume-title":"Sikuli: Using GUI Screenshots for Search and Automation.","author":"Tsung-Hsiang Chang Tom Yeh","year":"2009","unstructured":"Tom Yeh Tsung-Hsiang Chang Robert and C Miller . 2009 . Sikuli: Using GUI Screenshots for Search and Automation. (2009). Tom Yeh Tsung-Hsiang Chang Robert and C Miller. 2009. Sikuli: Using GUI Screenshots for Search and Automation. (2009)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2700648.2811322"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-4018"},{"key":"e_1_3_2_1_31_1","first-page":"13","article-title":"Combinatory categorial grammar. Nontransformational Syntax: A Guide to Current Models. Blackwell","volume":"9","author":"Steedman Mark","year":"2009","unstructured":"Mark Steedman and Jason Baldridge . 2009 . Combinatory categorial grammar. Nontransformational Syntax: A Guide to Current Models. Blackwell , Oxford 9 (2009), 13 -- 67 . Mark Steedman and Jason Baldridge. 2009. Combinatory categorial grammar. Nontransformational Syntax: A Guide to Current Models. Blackwell, Oxford 9 (2009), 13--67.","journal-title":"Oxford"},{"key":"e_1_3_2_1_32_1","volume-title":"Frame-semantic parsing with softmax-margin segmental rnns and a syntactic scaffold. arXiv preprint arXiv:1706.09528","author":"Swayamdipta Swabha","year":"2017","unstructured":"Swabha Swayamdipta , Sam Thomson , Chris Dyer , and Noah A Smith . 2017. Frame-semantic parsing with softmax-margin segmental rnns and a syntactic scaffold. arXiv preprint arXiv:1706.09528 ( 2017 ). Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A Smith. 2017. Frame-semantic parsing with softmax-margin segmental rnns and a syntactic scaffold. arXiv preprint arXiv:1706.09528 (2017)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3213344.3213353"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.1078"},{"key":"e_1_3_2_1_35_1","volume-title":"Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence","author":"Zettlemoyer Luke S","year":"2005","unstructured":"Luke S Zettlemoyer and Michael Collins . 2005 . Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars . Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005). Luke S Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005)."}],"event":{"name":"IUI '20: 25th International Conference on Intelligent User Interfaces","location":"Cagliari Italy","acronym":"IUI '20","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 25th International Conference on Intelligent User Interfaces"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377325.3377515","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3377325.3377515","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:33:17Z","timestamp":1750199597000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377325.3377515"}},"subtitle":["a vision and language-assisted smartphone task automation system"],"short-title":[],"issued":{"date-parts":[[2020,3,17]]},"references-count":35,"alternative-id":["10.1145\/3377325.3377515","10.1145\/3377325"],"URL":"https:\/\/doi.org\/10.1145\/3377325.3377515","relation":{},"subject":[],"published":{"date-parts":[[2020,3,17]]},"assertion":[{"value":"2020-03-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}