{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T07:36:56Z","timestamp":1723016216585},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8]]},"abstract":"<jats:p>Transformer-based deep learning models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model meets both resource and real-time specifications of mobile devices. Our framework applies a compiler-aware neural architecture optimization method (CANAO), which can generate the optimal compressed model that balances both accuracy and latency. We are able to achieve up to 7.8x speedup compared with TensorFlow-Lite with only minor accuracy loss. We present two types of BERT applications on mobile devices: Question Answering (QA) and Text Generation. Both can be executed in real-time with latency as low as 45ms. Videos for demonstrating the framework can be found on https:\/\/www.youtube.com\/watch?v=_WIRvK_2PZI<\/jats:p>","DOI":"10.24963\/ijcai.2021\/712","type":"proceedings-article","created":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T11:00:49Z","timestamp":1628679649000},"page":"5000-5003","source":"Crossref","is-referenced-by-count":2,"title":["A Compression-Compilation Framework for On-mobile Real-time BERT Applications"],"prefix":"10.24963","author":[{"given":"Wei","family":"Niu","sequence":"first","affiliation":[{"name":"William & Mary"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenglun","family":"Kong","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Geng","family":"Yuan","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiwen","family":"Jiang","sequence":"additional","affiliation":[{"name":"University of Notre Dame"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiexiong","family":"Guan","sequence":"additional","affiliation":[{"name":"College of William and Mary"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Caiwen","family":"Ding","sequence":"additional","affiliation":[{"name":"University of Connecticut"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pu","family":"Zhao","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sijia","family":"Liu","sequence":"additional","affiliation":[{"name":"MIT-IBM Watson AI Lab, IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Ren","sequence":"additional","affiliation":[{"name":"William & Mary"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanzhi","family":"Wang","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"number":"30","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2021","name":"Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}","start":{"date-parts":[[2021,8,19]]},"theme":"Artificial Intelligence","location":"Montreal, Canada","end":{"date-parts":[[2021,8,27]]}},"container-title":["Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T11:04:49Z","timestamp":1628679889000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2021\/712"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2021,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2021\/712","relation":{},"subject":[],"published":{"date-parts":[[2021,8]]}}}