{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,19]],"date-time":"2026-04-19T08:10:16Z","timestamp":1776586216755,"version":"3.51.2"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T00:00:00Z","timestamp":1701907200000},"content-version":"vor","delay-in-days":15,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Research Program of Zhejiang Lab","award":["2021PE0AC04"],"award-info":[{"award-number":["2021PE0AC04"]}]},{"name":"Research Program of Zhejiang Lab","award":["2021PE0AC05"],"award-info":[{"award-number":["2021PE0AC05"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Genomic prediction (GP) uses single nucleotide polymorphisms (SNPs) to establish associations between markers and phenotypes. Selection of early individuals by genomic estimated breeding value shortens the generation interval and speeds up the breeding process. Recently, methods based on deep learning (DL) have gained great attention in the field of GP. In this study, we explore the application of Transformer-based structures to GP and develop a novel deep-learning model named GPformer. GPformer obtains a global view by gleaning beneficial information from all relevant SNPs regardless of the physical distance between SNPs. Comprehensive experimental results on five different crop datasets show that GPformer outperforms ridge regression-based linear unbiased prediction (RR-BLUP), support vector regression (SVR), light gradient boosting machine (LightGBM) and deep neural network genomic prediction (DNNGP) in terms of mean absolute error, Pearson\u2019s correlation coefficient and the proposed metric consistent index. Furthermore, we introduce a knowledge-guided module (KGM) to extract genome-wide association studies-based information, which is fused into GPformer as prior knowledge. KGM is very flexible and can be plugged into any DL network. Ablation studies of KGM on three datasets illustrate the efficiency of KGM adequately. Moreover, GPformer is robust and stable to hyperparameters and can generalize to each phenotype of every dataset, which is suitable for practical application scenarios.<\/jats:p>","DOI":"10.1093\/bib\/bbad438","type":"journal-article","created":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T06:37:57Z","timestamp":1701931077000},"source":"Crossref","is-referenced-by-count":34,"title":["A transformer-based genomic prediction method fused with knowledge-guided module"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-2407-5839","authenticated-orcid":false,"given":"Cuiling","family":"Wu","sequence":"first","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"}]},{"given":"Yiyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"}]},{"given":"Zhiwen","family":"Ying","sequence":"additional","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"}]},{"given":"Ling","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"}]},{"given":"Hui","family":"Yu","sequence":"additional","affiliation":[{"name":"Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences , Changchun 130012, China"}]},{"given":"Mengchen","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Rice Biology, China National Rice Research Institute , Hangzhou 310006, China"}]},{"given":"Xianzhong","family":"Feng","sequence":"additional","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"},{"name":"Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences , Changchun 130012, China"}]},{"given":"Xinghua","family":"Wei","sequence":"additional","affiliation":[{"name":"Institute of Intelligent Computing, Zhejiang Lab , Hangzhou 311121, China"},{"name":"State Key Laboratory of Rice Biology, China National Rice Research Institute , Hangzhou 310006, China"}]},{"given":"Xiaogang","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer and Information Engineering, Zhejiang Gongshang University , Hangzhou 310018, China"}]}],"member":"286","published-online":{"date-parts":[[2023,12,6]]},"reference":[{"key":"2023120706071358500_ref1","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1093\/genetics\/157.4.1819","article-title":"Prediction of total genetic value using genome-wide dense marker maps","volume":"157","author":"Meuwissen","year":"2001","journal-title":"Genetics"},{"key":"2023120706071358500_ref2","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1016\/j.tplants.2017.08.011","article-title":"Genomic selection in plant breeding: methods, models, and perspectives","volume":"22","author":"Crossa","year":"2017","journal-title":"Trends Plant Sci"},{"key":"2023120706071358500_ref3","doi-asserted-by":"crossref","first-page":"221","DOI":"10.3389\/fgene.2016.00221","article-title":"Genomic selection in the era of next generation sequencing for complex traits in plant breeding","volume":"7","author":"Bhat","year":"2016","journal-title":"Front Genet"},{"key":"2023120706071358500_ref4","doi-asserted-by":"crossref","first-page":"1697","DOI":"10.1007\/s00122-016-2733-z","article-title":"Genomic selection for wheat traits and trait stability","volume":"129","author":"Huang","year":"2016","journal-title":"Theor Appl Genet"},{"key":"2023120706071358500_ref5","doi-asserted-by":"crossref","first-page":"4414","DOI":"10.3168\/jds.2007-0980","article-title":"Efficient methods to compute genomic predictions","volume":"91","author":"VanRaden","year":"2008","journal-title":"J Dairy Sci"},{"key":"2023120706071358500_ref6","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1111\/j.1439-0388.2011.00964.x","article-title":"Using the genomic relationship matrix to predict the accuracy of genomic selection","volume":"128","author":"Goddard","year":"2011","journal-title":"J Anim Breed Genet"},{"key":"2023120706071358500_ref7","doi-asserted-by":"crossref","first-page":"250","DOI":"10.3835\/plantgenome2011.08.0024","article-title":"Ridge regression and other kernels for genomic selection with R package rrBLUP","volume":"4","author":"Endelman","year":"2011","journal-title":"The plant genome"},{"key":"2023120706071358500_ref8","volume-title":"Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics"},{"key":"2023120706071358500_ref9","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1534\/genetics.112.139014","article-title":"Back to basics for Bayesian model building in genomic selection","volume":"191","author":"K\u00e4rkk\u00e4inen","year":"2012","journal-title":"Genetics"},{"key":"2023120706071358500_ref10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12711-020-00531-z","article-title":"Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes","volume":"52","author":"Abdollahi-Arpanahi","year":"2020","journal-title":"Genetics Selection Evolution"},{"key":"2023120706071358500_ref11","article-title":"Lightgbm: a highly efficient gradient boosting decision tree","volume":"30","author":"Ke","year":"2017","journal-title":"Adv Neural Inf Process"},{"key":"2023120706071358500_ref12","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1007\/s00122-011-1648-y","article-title":"Application of support vector regression to genome-assisted prediction of quantitative traits","volume":"123","author":"Long","year":"2011","journal-title":"Theor Appl Genet"},{"key":"2023120706071358500_ref13","doi-asserted-by":"crossref","first-page":"153354","DOI":"10.1016\/j.jplph.2020.153354","article-title":"Machine learning approaches for crop improvement: leveraging phenotypic and genotypic big data","volume":"257","author":"Tong","year":"2021","journal-title":"J Plant Physiol"},{"key":"2023120706071358500_ref14","doi-asserted-by":"crossref","first-page":"170104","DOI":"10.3835\/plantgenome2017.11.0104","article-title":"Applications of machine learning methods to genomic selection in breeding wheat for rust resistance","volume":"11","author":"Gonz\u00e1lez-Camacho","year":"2018","journal-title":"Plant Genome"},{"key":"2023120706071358500_ref15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-021-02492-y","article-title":"LightGBM: accelerated genomically designed crop breeding through ensemble learning","volume":"22","author":"Yan","year":"2021","journal-title":"Genome Biol"},{"key":"2023120706071358500_ref16"},{"key":"2023120706071358500_ref17","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.3389\/fgene.2019.01091","article-title":"Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean","volume":"10","author":"Liu","year":"2019","journal-title":"Front Genet"},{"key":"2023120706071358500_ref18","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/j.molp.2022.11.004","article-title":"DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants","volume":"16","author":"Wang","year":"2023","journal-title":"Mol Plant"},{"key":"2023120706071358500_ref19","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023120706071358500_ref20","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023120706071358500_ref21","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"2023120706071358500_ref22","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2023120706071358500_ref23","first-page":"22419","article-title":"Autoformer: decomposition transformers with auto-correlation for long-term series forecasting","volume":"34","author":"Wu","year":"2021","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023120706071358500_ref24","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2023120706071358500_ref25","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"2021","journal-title":"Nat Methods"},{"key":"2023120706071358500_ref26","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1038\/hdy.2015.113","article-title":"Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement","volume":"116","author":"Spindel","year":"2016","journal-title":"Heredity"},{"key":"2023120706071358500_ref27","doi-asserted-by":"crossref","first-page":"e93017","DOI":"10.1371\/journal.pone.0093017","article-title":"Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies","volume":"9","author":"Zhang","year":"2014","journal-title":"PLoS One"},{"key":"2023120706071358500_ref28","doi-asserted-by":"crossref","first-page":"s13742-13015-10047-13748","DOI":"10.1186\/s13742-015-0047-8","article-title":"Second-generation PLINK: rising to the challenge of larger and richer datasets","volume":"4","author":"Chang","year":"2015","journal-title":"Gigascience"},{"key":"2023120706071358500_ref29","doi-asserted-by":"crossref","first-page":"2633","DOI":"10.1093\/bioinformatics\/btm308","article-title":"TASSEL: software for association mapping of complex traits in diverse samples","volume":"23","author":"Bradbury","year":"2007","journal-title":"Bioinformatics"},{"key":"2023120706071358500_ref30","doi-asserted-by":"crossref","first-page":"D1041","DOI":"10.1093\/nar\/gkm1022","article-title":"Panzea: an update on new content and features","volume":"36","author":"Canaran","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023120706071358500_ref31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-015-2245-2","article-title":"Genetic variation and association mapping for 12 agronomic traits in indica rice","volume":"16","author":"Lu","year":"2015","journal-title":"BMC Genomics"},{"key":"2023120706071358500_ref32","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1104\/pp.105.063438","article-title":"The international Rice information system. A platform for meta-analysis of rice crop data","volume":"139","author":"McLaren","year":"2005","journal-title":"Plant Physiol"},{"key":"2023120706071358500_ref33","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1534\/g3.116.029637","article-title":"Genomic prediction of gene bank wheat landraces","volume":"6","author":"Crossa","year":"2016","journal-title":"G3: Genes, Genomes, Genetics"},{"key":"2023120706071358500_ref34","volume-title":"IEEE International Conference on Computer Vision"},{"key":"2023120706071358500_ref35","volume-title":"3rd International Conference on Learning Representations"},{"key":"2023120706071358500_ref36","article-title":"Pytorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023120706071358500_ref38","volume-title":"8th International Conference on Learning Representations"},{"key":"2023120706071358500_ref39","doi-asserted-by":"crossref","first-page":"uhac225","DOI":"10.1093\/hr\/uhac225","article-title":"Application of machine learning to explore the genomic prediction accuracy of fall dormancy in autotetraploid alfalfa","volume":"10","author":"Zhang","year":"2023","journal-title":"Hortic Res"},{"key":"2023120706071358500_ref40","doi-asserted-by":"crossref","first-page":"859109","DOI":"10.3389\/fpls.2022.859109","article-title":"Genome-wide association study and genomic selection for proteinogenic methionine in soybean seeds","volume":"13","author":"Singer","year":"2022","journal-title":"Front Plant Sci"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad438\/54036302\/bbad438.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad438\/54036302\/bbad438.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T06:38:15Z","timestamp":1701931095000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad438\/7459582"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,22]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad438","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2023,11,22]]},"article-number":"bbad438"}}