{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T15:51:39Z","timestamp":1753890699405,"version":"3.41.2"},"reference-count":44,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:00:00Z","timestamp":1750204800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>With the rapid development of high-density molecular marker chips and high-throughput sequencing technologies, genomic selection\/prediction (GS\/GP) has been widely applied in plant breeding. <jats:italic>Arabidopsis thaliana<\/jats:italic>, as a common model organism, provides important resources for dissecting genetic variation and evolutionary mechanisms of complex traits. Quantitative traits are typically influenced by multiple minor-effect genes, which are often functionally related and can be enriched within gene ontology (GO) pathways. However, optimizing marker subsets associated with these pathways to enhance GP performance remains challenging. In this study, we propose an improved GS framework called binGO-GS by integrating GO-based biological priors with a novel bin-based combinatorial SNP subset selection strategy. We evaluated the performance of binGO-GS on nine quantitative traits from two <jats:italic>A. thaliana<\/jats:italic> datasets, comprising nearly 1,000 samples and over 1.8 million SNPs. Compared with using either the full marker set or randomly selected markers with Genomic BLUP (GBLUP), binGO-GS achieved statistically significant improvements in prediction accuracy across all traits. Similar improvements were observed across six additional regression models when applying binGO-GS instead of the full marker set. Furthermore, the selected markers for identical or similar morphological traits exhibited consistent patterns in quantity and genomic distribution, supporting the polygenic model of complex quantitative traits driven by minor-effect genes. Taken together, binGO-GS offers a powerful and interpretable approach to enhance GS performance, providing a methodological reference for accelerating plant breeding and germplasm innovation.<\/jats:p>","DOI":"10.3389\/fbinf.2025.1607119","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:46:00Z","timestamp":1750225560000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization"],"prefix":"10.3389","volume":"5","author":[{"given":"Qingfang","family":"Ba","sequence":"first","affiliation":[]},{"given":"Heng","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Zheming","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Zhijun","family":"Dai","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s12711-020-00531-z","article-title":"Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes","volume":"52","author":"Abdollahi-Arpanahi","year":"2020","journal-title":"Genet. Sel. Evol."},{"key":"B2","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1016\/j.cell.2016.05.063","article-title":"1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana","volume":"166","author":"Alonso-Blanco","year":"2016","journal-title":"Cell"},{"key":"B3","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s11831-021-09569-8","article-title":"Emerging trends in machine learning to predict crop yield and study its influential factors: a survey","volume":"29","author":"Bali","year":"2022","journal-title":"Arch. Comput. Methods Eng."},{"key":"B4","doi-asserted-by":"publisher","first-page":"1082","DOI":"10.2135\/cropsci2006.11.0690","article-title":"Prospects for genomewide selection for quantitative traits in maize","volume":"47","author":"Bernardo","year":"2007","journal-title":"Crop Sci."},{"key":"B5","doi-asserted-by":"publisher","first-page":"855","DOI":"10.1111\/1755-0998.12357","article-title":"Genotyping-in-Thousands by sequencing (GT-seq): a cost effective SNP genotyping method based on custom amplicon sequencing","volume":"15","author":"Campbell","year":"2015","journal-title":"Mol. Ecol. Resour."},{"key":"B6","doi-asserted-by":"publisher","first-page":"961","DOI":"10.1016\/j.tplants.2017.08.011","article-title":"Genomic selection in plant breeding: methods, models, and perspectives","volume":"22","author":"Crossa","year":"2017","journal-title":"Trends Plant Sci."},{"key":"B7","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1534\/g3.119.400812","article-title":"Influence of genetic interactions on polygenic prediction","volume":"10","author":"Dai","year":"2020","journal-title":"G3 (Bethesda)"},{"key":"B8","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.1007\/s00726-014-1667-5","article-title":"A pipeline for improved QSAR analysis of peptides: physiochemical property parameter selection via BMSF, near-neighbor sample selection via semivariogram, and weighted SVR regression and prediction","volume":"46","author":"Dai","year":"2014","journal-title":"Amino Acids"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1040","DOI":"10.1016\/j.jad.2021.09.001","article-title":"Improving depression prediction using a novel feature selection algorithm coupled with context-aware analysis","volume":"295","author":"Dai","year":"2021","journal-title":"J. Affect. Disord."},{"key":"B10","doi-asserted-by":"publisher","first-page":"609117","DOI":"10.3389\/fgene.2020.609117","article-title":"Prior biological knowledge improves genomic prediction of growth-related traits in Arabidopsis thaliana","volume":"11","author":"Farooq","year":"2021","journal-title":"Front. Genet."},{"key":"B11","doi-asserted-by":"publisher","first-page":"2289","DOI":"10.1534\/genetics.107.084285","article-title":"Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits","volume":"178","author":"Gianola","year":"2008","journal-title":"Genetics"},{"key":"B12","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1016\/j.livsci.2014.05.036","article-title":"Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits","volume":"166","author":"Gonz\u00e1lez-Recio","year":"2014","journal-title":"Livest. Sci."},{"key":"B13","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1105\/tpc.16.00551","article-title":"easyGWAS: a cloud-based platform for comparing the results of genome-wide association studies","volume":"29","author":"Grimm","year":"2017","journal-title":"Plant Cell"},{"key":"B14","doi-asserted-by":"publisher","first-page":"688","DOI":"10.1016\/j.cj.2020.04.005","article-title":"Harness the power of genomic selection and the potential of germplasm in crop breeding for global food security in the era with rapid climate change","volume":"8","author":"He","year":"2020","journal-title":"Crop J."},{"key":"B15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2135\/cropsci2008.08.0512","article-title":"Genomic selection for crop improvement","volume":"49","author":"Heffner","year":"2009","journal-title":"Crop Sci."},{"key":"B16","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1007\/s00122-025-04894-z","article-title":"EBMGP: a deep learning model for genomic prediction based on Elastic Net feature selection and bidirectional encoder representations from transformer's embedding and multi-head attention pooling","volume":"138","author":"Ji","year":"2025","journal-title":"Theor. Appl. Genet."},{"key":"B17","doi-asserted-by":"publisher","first-page":"456","DOI":"10.1038\/sj.hdy.6800306","article-title":"Genetics of quantitative traits in Arabidopsis thaliana","volume":"91","author":"Kearsey","year":"2003","journal-title":"Heredity"},{"key":"B18","doi-asserted-by":"publisher","first-page":"983","DOI":"10.2307\/2533558","article-title":"Small sample inference for fixed effects from restricted maximum likelihood","volume":"53","author":"Kenward","year":"1997","journal-title":"Biometrics"},{"key":"B19","doi-asserted-by":"publisher","first-page":"14035","DOI":"10.1073\/pnas.1210730109","article-title":"Genetic and environmental risk factors in congenital heart disease functionally converge in protein networks driving heart development","volume":"109","author":"Lage","year":"2012","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"B20","doi-asserted-by":"publisher","first-page":"832","DOI":"10.1038\/nature09410","article-title":"Hundreds of variants clustered in genomic loci and biological pathways affect human height","volume":"467","author":"Lango Allen","year":"2010","journal-title":"Nature"},{"key":"B21","doi-asserted-by":"publisher","first-page":"237","DOI":"10.3389\/fgene.2018.00237","article-title":"Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods","volume":"9","author":"Li","year":"2018","journal-title":"Front. Genet."},{"key":"B22","doi-asserted-by":"publisher","first-page":"736620","DOI":"10.1016\/j.aquaculture.2021.736620","article-title":"Genomic selection using a subset of SNPs identified by genome-wide association analysis for disease resistance traits in aquaculture species","volume":"539","author":"Luo","year":"2021","journal-title":"Aquaculture"},{"key":"B23","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1105\/tpc.113.121913","article-title":"Machine learning-based differential network analysis: a study of stress-responsive transcriptomes in Arabidopsis","volume":"26","author":"Ma","year":"2014","journal-title":"Plant Cell"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1190","DOI":"10.1126\/science.1222794","article-title":"Systematic localization of common disease-associated variation in regulatory DNA","volume":"337","author":"Maurano","year":"2012","journal-title":"Science"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1819","DOI":"10.1093\/genetics\/157.4.1819","article-title":"Prediction of total genetic value using genome-wide dense marker maps","volume":"157","author":"Meuwissen","year":"2001","journal-title":"Genetics"},{"key":"B26","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1038\/nature10989","article-title":"Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations","volume":"485","author":"O'Roak","year":"2012","journal-title":"Nature"},{"key":"B27","doi-asserted-by":"publisher","first-page":"307","DOI":"10.3389\/fgene.2012.00307","article-title":"Inferring quantitative trait pathways associated with bull fertility from a genome-wide association study","volume":"3","author":"Pe\u00f1agaricano","year":"2013","journal-title":"Front. Genet."},{"key":"B28","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1534\/genetics.114.164442","article-title":"Genome-wide regression and prediction with the BGLR statistical package","volume":"198","author":"P\u00e9rez","year":"2014","journal-title":"Genetics"},{"key":"B29","doi-asserted-by":"publisher","first-page":"611506","DOI":"10.3389\/fgene.2021.611506","article-title":"Feature selection stability and accuracy of prediction models for genomic prediction of residual feed intake in pigs using machine learning","volume":"12","author":"Piles","year":"2021","journal-title":"Front. Genet."},{"key":"B30","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1086\/519795","article-title":"PLINK: a tool set for whole-genome association and population-based linkage analyses","volume":"81","author":"Purcell","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"B31","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1002\/aepp.13044","article-title":"Role of new plant breeding technologies for food security and sustainable agricultural development","volume":"42","author":"Qaim","year":"2020","journal-title":"Appl. Econ. Perspect."},{"key":"B32","doi-asserted-by":"publisher","first-page":"113222","DOI":"10.1016\/j.phytochem.2022.113222","article-title":"iPReditor-CMG: improving a predictive RNA editor for crop mitochondrial genomes using genomic sequence features and an optimal support vector machine","volume":"200","author":"Qin","year":"2022","journal-title":"Phytochemistry"},{"key":"B33","doi-asserted-by":"publisher","first-page":"1047","DOI":"10.1016\/j.molp.2017.06.008","article-title":"Crop breeding chips and genotyping platforms: progress, challenges, and perspectives","volume":"10","author":"Rasheed","year":"2017","journal-title":"Mol. Plant"},{"key":"B34","doi-asserted-by":"publisher","first-page":"550","DOI":"10.3389\/fpls.2017.00550","article-title":"Genomic selection for drought tolerance using genome-wide SNPs in maize","volume":"8","author":"Shikha","year":"2017","journal-title":"Front. Plant Sci."},{"key":"B35","doi-asserted-by":"publisher","first-page":"195","DOI":"10.9787\/PBB.2014.2.3.195","article-title":"High-throughput SNP genotyping to accelerate crop improvement","volume":"2","author":"Thomson","year":"2014","journal-title":"Plant Breed. Biotechnol."},{"key":"B36","doi-asserted-by":"publisher","first-page":"3416","DOI":"10.1073\/pnas.1709141115","article-title":"Adaptive diversification of growth allometry in the plant Arabidopsis thaliana","volume":"115","author":"Vasseur","year":"2018","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"B37","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1016\/j.molp.2022.11.004","article-title":"DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants","volume":"16","author":"Wang","year":"2023","journal-title":"Mol. Plant"},{"key":"B38","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1","author":"Wolpert","year":"1997","journal-title":"IEEE Trans. Evol. Comput."},{"key":"B39","doi-asserted-by":"publisher","first-page":"101199","DOI":"10.1016\/j.xplc.2024.101199","article-title":"Metabolic marker-assisted genomic prediction improves hybrid breeding","volume":"6","author":"Xu","year":"2025","journal-title":"Plant Commun."},{"key":"B40","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/j.ajhg.2010.11.011","article-title":"GCTA: a tool for genome-wide complex trait analysis","volume":"88","author":"Yang","year":"2011","journal-title":"Am. J. Hum. Genet."},{"key":"B41","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1111\/eva.13240","article-title":"Increased accuracy of genomic predictions for growth under chronic thermal stress in rainbow trout by prioritizing variants from GWAS using imputed sequence data","volume":"15","author":"Yoshida","year":"2022","journal-title":"Evol. Appl."},{"key":"B42","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1093\/molbev\/msy203","article-title":"A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population","volume":"36","author":"Zan","year":"2019","journal-title":"Mol. Biol. Evol."},{"key":"B43","doi-asserted-by":"publisher","first-page":"e93017","DOI":"10.1371\/journal.pone.0093017","article-title":"Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies","volume":"9","author":"Zhang","year":"2014","journal-title":"PLoS One"},{"key":"B44","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1038\/ng.2310","article-title":"Genome-wide efficient mixed-model analysis for association studies","volume":"44","author":"Zhou","year":"2012","journal-title":"Nat. Genet."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1607119\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:46:01Z","timestamp":1750225561000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1607119\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,18]]},"references-count":44,"alternative-id":["10.3389\/fbinf.2025.1607119"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1607119","relation":{},"ISSN":["2673-7647"],"issn-type":[{"type":"electronic","value":"2673-7647"}],"subject":[],"published":{"date-parts":[[2025,6,18]]},"article-number":"1607119"}}