{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T20:05:05Z","timestamp":1769717105250,"version":"3.49.0"},"reference-count":13,"publisher":"SAGE Publications","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2023,11,4]]},"abstract":"<jats:p>Recent years, research on automatic music transcription has made significant progress as deep learning techniques have been validated to demonstrate strong performance in complex data applications. Although the existing work is exciting, they all rely on specific domain knowledge to enable the design of model architectures and training modes for different tasks. At the same time, the noise generated in the process of automatic music transcription data collection cannot be ignored, which makes the existing work unsatisfactory. To address the issues highlighted above, we propose an end-to-end framework based on Transformer. Through the encoder-decoder structure, we realize the direct conversion of the spectrogram of the collected piano audio to MIDI output. Further, to remove the impression of environmental noise on transcription quality, we design a training mechanism mixed with white noise to improve the robustness of our proposed model. Our experiments on the classic piano transcription datasets show that the proposed method can greatly improve the quality of automatic music transcription.<\/jats:p>","DOI":"10.3233\/jifs-233653","type":"journal-article","created":{"date-parts":[[2023,9,1]],"date-time":"2023-09-01T11:19:06Z","timestamp":1693567146000},"page":"8441-8448","source":"Crossref","is-referenced-by-count":0,"title":["Piano automatic transcription based on transformer"],"prefix":"10.1177","volume":"45","author":[{"given":"Yuan","family":"Wang","sequence":"first","affiliation":[{"name":"School of Music, NanJing XiaoZhuang University, Nanjing, China"}]}],"member":"179","reference":[{"issue":"3","key":"10.3233\/JIFS-233653_ref1","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1007\/s10844-013-0258-3","article-title":"Automatic music transcription: challenges and future directions","volume":"41","author":"Benetos","year":"2013","journal-title":"Journal of Intelligent Information Systems"},{"issue":"6","key":"10.3233\/JIFS-233653_ref2","doi-asserted-by":"crossref","first-page":"1643","DOI":"10.1109\/TASL.2009.2038819","article-title":"Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle","volume":"18","author":"Emiya","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"1","key":"10.3233\/JIFS-233653_ref6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-19266-y","article-title":"State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis","volume":"11","author":"Tetko","year":"2020","journal-title":"Nature Communications"},{"issue":"6","key":"10.3233\/JIFS-233653_ref8","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1093\/bioinformatics\/btaa880","article-title":"MolTrans: Molecular Interaction Transformer for drug\u2013target interaction prediction","volume":"37","author":"Huang","year":"2021","journal-title":"Bioinformatics"},{"issue":"10","key":"10.3233\/JIFS-233653_ref9","doi-asserted-by":"crossref","first-page":"1600","DOI":"10.1109\/TASLP.2015.2442411","article-title":"Combining spectral and temporal representations for multipitch estimation of polyphonic music","volume":"23","author":"Su","year":"2015","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"issue":"3","key":"10.3233\/JIFS-233653_ref10","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1109\/TASL.2009.2029769","article-title":"Generative spectrogram factorization models for polyphonic piano transcription","volume":"18","author":"Peeling","year":"2009","journal-title":"IEEE Transactions on Audio, Speech, And Language Processing"},{"issue":"8","key":"10.3233\/JIFS-233653_ref11","doi-asserted-by":"crossref","first-page":"2121","DOI":"10.1109\/TASL.2010.2042119","article-title":"Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions","volume":"18","author":"Duan","year":"2010","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"7","key":"10.3233\/JIFS-233653_ref13","doi-asserted-by":"crossref","first-page":"1405","DOI":"10.1109\/TMM.2017.2674603","article-title":"Instrument learning and sparse NMD for automatic polyphonic music transcription","volume":"19","author":"Rizzi","year":"2017","journal-title":"IEEE Transactions on Multimedia"},{"key":"10.3233\/JIFS-233653_ref16","doi-asserted-by":"crossref","first-page":"3707","DOI":"10.1109\/TASLP.2021.3121991","article-title":"High-resolution piano transcription with pedals by regressing onset and offset times","volume":"29","author":"Kong","year":"2021","journal-title":"IEEE\/ACM Transactions on Audio, Speech, and Language Processing"},{"issue":"1","key":"10.3233\/JIFS-233653_ref19","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1121\/10.0001468","article-title":"Polyphonic pitch tracking with deep layered learning","volume":"148","author":"Elowsson","year":"2020","journal-title":"Journal of the Acoustical Society of America"},{"key":"10.3233\/JIFS-233653_ref22","doi-asserted-by":"crossref","first-page":"109134","DOI":"10.1016\/j.sigpro.2023.109134","article-title":"Polyphonic piano transcription based on graph convolutional network","volume":"212","author":"Zhe","year":"2023","journal-title":"Signal Processing"},{"issue":"1","key":"10.3233\/JIFS-233653_ref26","doi-asserted-by":"crossref","first-page":"6706","DOI":"10.1609\/aaai.v33i01.33016706","article-title":"Neural speech synthesis with transformer network","volume":"33","author":"Li","year":"2019","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"10.3233\/JIFS-233653_ref27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2011\/347645","article-title":"Real-time audio transformer emulation for virtual tube amplifiers","volume":"2011","author":"Cauduro Dias de Paiva","year":"2011","journal-title":"EURASIP Journal on Advances in Signal Processing"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-233653","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T08:55:37Z","timestamp":1769676937000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-233653"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,4]]},"references-count":13,"journal-issue":{"issue":"5"},"URL":"https:\/\/doi.org\/10.3233\/jifs-233653","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,4]]}}}