{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T05:18:12Z","timestamp":1776489492725,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Project of Key Laboratory of Intelligent Processing Technology for Digital Music (Zhejiang Conservatory of Music), Ministry of Culture and Tourism","award":["No.2022DMKLB001"],"award-info":[{"award-number":["No.2022DMKLB001"]}]},{"name":"the Key R&D Program of Zhejiang Province","award":["No.2022C03126"],"award-info":[{"award-number":["No.2022C03126"]}]},{"name":"the Key Project of Natural Science Foundation of Zhejiang Province","award":["No.LZ19F020002"],"award-info":[{"award-number":["No.LZ19F020002"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548368","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:01Z","timestamp":1665416581000},"page":"1057-1067","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias"],"prefix":"10.1145","author":[{"given":"Zihao","family":"Wang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kejun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuxing","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qihao","family":"Liang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pengfei","family":"Yu","sequence":"additional","affiliation":[{"name":"Jingchu University of Technology, Jingmen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongsheng","family":"Feng","sequence":"additional","affiliation":[{"name":"Shandong University, Weihai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenbo","family":"Liu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yikai","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuntao","family":"Bao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yiheng","family":"Yang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1355771899002071"},{"key":"e_1_3_2_2_2_1","volume-title":"Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860","author":"Dai Zihang","year":"2019","unstructured":"Zihang Dai , Zhilin Yang , Yiming Yang , Jaime Carbonell , Quoc V Le , and Ruslan Salakhutdinov . 2019 . Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019). Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)."},{"key":"e_1_3_2_2_3_1","first-page":"193","article-title":"An on-line algorithm for real-time accompaniment","volume":"84","author":"Dannenberg Roger B","year":"1984","unstructured":"Roger B Dannenberg . 1984 . An on-line algorithm for real-time accompaniment . In ICMC , Vol. 84. 193 -- 198 . Roger B Dannenberg. 1984. An on-line algorithm for real-time accompaniment. In ICMC, Vol. 84. 193--198.","journal-title":"ICMC"},{"key":"e_1_3_2_2_4_1","volume-title":"Yiting Ethan Li, Garrison W Cottrell, and Julian McAuley.","author":"Donahue Chris","year":"2019","unstructured":"Chris Donahue , Huanru Henry Mao , Yiting Ethan Li, Garrison W Cottrell, and Julian McAuley. 2019 . LakhNES : Improving multi-instrumental music generation with cross-domain pre-training. arXiv preprint arXiv:1907.04868 (2019). Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W Cottrell, and Julian McAuley. 2019. LakhNES: Improving multi-instrumental music generation with cross-domain pre-training. arXiv preprint arXiv:1907.04868 (2019)."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11312"},{"key":"e_1_3_2_2_6_1","volume-title":"ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Garoufis Christos","unstructured":"Christos Garoufis , Athanasia Zlatintsi , and Petros Maragos . 2020. An LSTM-Based Dynamic Chord Progression Generation System for Interactive Music Performance . In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE , 4502--4506. Christos Garoufis, Athanasia Zlatintsi, and Petros Maragos. 2020. An LSTM-Based Dynamic Chord Progression Generation System for Interactive Music Performance. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4502--4506."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1178723.1178727"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00201"},{"key":"e_1_3_2_2_9_1","unstructured":"Lejaren Arthur Hiller and Leonard M Isaacson. 1979. Experimental Music; Composition with an electronic computer. Greenwood Publishing Group Inc.  Lejaren Arthur Hiller and Leonard M Isaacson. 1979. Experimental Music; Composition with an electronic computer. Greenwood Publishing Group Inc."},{"key":"e_1_3_2_2_10_1","volume-title":"Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs. arXiv preprint arXiv:2101.02402","author":"Hsiao Wen-Yi","year":"2021","unstructured":"Wen-Yi Hsiao , Jen-Yu Liu , Yin-Cheng Yeh , and Yi-Hsuan Yang . 2021. Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs. arXiv preprint arXiv:2101.02402 ( 2021 ). Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, and Yi-Hsuan Yang. 2021. Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs. arXiv preprint arXiv:2101.02402 (2021)."},{"key":"e_1_3_2_2_11_1","volume-title":"Music transformer. arXiv preprint arXiv:1809.04281","author":"Anna Huang Cheng-Zhi","year":"2018","unstructured":"Cheng-Zhi Anna Huang , Ashish Vaswani , Jakob Uszkoreit , Noam Shazeer , Ian Simon , Curtis Hawthorne , Andrew M Dai , Matthew D Hoffman , Monica Dinculescu , and Douglas Eck . 2018. Music transformer. arXiv preprint arXiv:1809.04281 ( 2018 ). Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018. Music transformer. arXiv preprint arXiv:1809.04281 (2018)."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5413"},{"key":"e_1_3_2_2_13_1","unstructured":"John Lafferty Andrew McCallum and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).  John Lafferty Andrew McCallum and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001)."},{"key":"e_1_3_2_2_14_1","unstructured":"Tzu-Hsuan Lin. 2021. Real-time pop music accompaniment generation according to vocal melody by deep learning models. https:\/\/tw40210.github.io\/Real-time-pop-music-accompaniment-generation-according-to-vocal-melody-by-deep-learning-models_DEMO\/.  Tzu-Hsuan Lin. 2021. Real-time pop music accompaniment generation according to vocal melody by deep learning models. https:\/\/tw40210.github.io\/Real-time-pop-music-accompaniment-generation-according-to-vocal-melody-by-deep-learning-models_DEMO\/."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12141"},{"key":"e_1_3_2_2_16_1","volume-title":"URL https:\/\/openai. com\/blog\/musenet","author":"Payne Christine","year":"2019","unstructured":"Christine Payne . 2019. MuseNet, 2019. URL https:\/\/openai. com\/blog\/musenet ( 2019 ). Christine Payne. 2019. MuseNet, 2019. URL https:\/\/openai. com\/blog\/musenet (2019)."},{"key":"e_1_3_2_2_17_1","unstructured":"Christopher Raphael. 2010. Music plus one and machine learning. In ICML.  Christopher Raphael. 2010. Music plus one and machine learning. In ICML."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413721"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.350"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1357054.1357169"},{"key":"e_1_3_2_2_21_1","volume-title":"Performance rnn: Generating music with expressive timing and dynamics. Magenta Blog","author":"Simon Ian","year":"2017","unstructured":"Ian Simon and Sageev Oore . 2017. Performance rnn: Generating music with expressive timing and dynamics. Magenta Blog ( 2017 ). Ian Simon and Sageev Oore. 2017. Performance rnn: Generating music with expressive timing and dynamics. Magenta Blog (2017)."},{"key":"e_1_3_2_2_22_1","volume-title":"Attention is all you need. Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017)."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSID.2012.57"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-16667-0_12"},{"key":"e_1_3_2_2_25_1","volume-title":"MuseMorphose: Full-song and fine-grained music style transfer with just one Transformer VAE. arXiv e-prints","author":"Wu Shih-Lun","year":"2021","unstructured":"Shih-Lun Wu and Yi-Hsuan Yang . 2021. MuseMorphose: Full-song and fine-grained music style transfer with just one Transformer VAE. arXiv e-prints ( 2021 ), arXiv-2105. Shih-Lun Wu and Yi-Hsuan Yang. 2021. MuseMorphose: Full-song and fine-grained music style transfer with just one Transformer VAE. arXiv e-prints (2021), arXiv-2105."},{"key":"e_1_3_2_2_26_1","volume-title":"MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847","author":"Yang Li-Chia","year":"2017","unstructured":"Li-Chia Yang , Szu-Yu Chou , and Yi-Hsuan Yang . 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 ( 2017 ). Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 (2017)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-018-3849-7"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1080\/09298215.2021.1873392"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548368","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548368","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:44Z","timestamp":1750186844000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548368"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":28,"alternative-id":["10.1145\/3503161.3548368","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548368","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}