{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T19:29:52Z","timestamp":1768678192231,"version":"3.49.0"},"reference-count":77,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,3,21]],"date-time":"2022-03-21T00:00:00Z","timestamp":1647820800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,21]],"date-time":"2022-03-21T00:00:00Z","timestamp":1647820800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003484","name":"Heinrich-Heine-Universit\u00e4t D\u00fcsseldorf","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003484","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Genetic risk scores (GRS) summarize genetic features such as single nucleotide polymorphisms (SNPs) in a single statistic with respect to a given trait. So far, GRS are typically built using generalized linear models or regularized extensions. However, these linear methods are usually not able to incorporate gene-gene interactions or non-linear SNP-response relationships. Tree-based statistical learning methods such as random forests and logic regression may be an alternative to such regularized-regression-based methods and are investigated in this article. Moreover, we consider modifications of random forests and logic regression for the construction of GRS.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In an extensive simulation study and an application to a real data set from a German cohort study, we show that both tree-based approaches can outperform elastic net when constructing GRS for binary traits. Especially a modification of logic regression called logic bagging could induce comparatively high predictive power as measured by the area under the curve and the statistical power. Even when considering no epistatic interaction effects but only marginal genetic effects, the regularized regression method lead in most cases to inferior results.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>When constructing GRS, we recommend taking random forests and logic bagging into account, in particular, if it can be assumed that possibly unknown epistasis between SNPs is present. To develop the best possible prediction models, extensive joint hyperparameter optimizations should be conducted.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-022-04634-w","type":"journal-article","created":{"date-parts":[[2022,3,21]],"date-time":"2022-03-21T11:03:04Z","timestamp":1647860584000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Evaluation of tree-based statistical learning methods for constructing genetic risk scores"],"prefix":"10.1186","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5327-8351","authenticated-orcid":false,"given":"Michael","family":"Lau","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Claudia","family":"Wigmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sara","family":"Kress","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tamara","family":"Schikowski","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Holger","family":"Schwender","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,3,21]]},"reference":[{"issue":"1","key":"4634_CR1","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1111\/j.1749-6632.2010.05838.x","volume":"1212","author":"LK Billings","year":"2010","unstructured":"Billings LK, Florez JC. The genetics of type 2 diabetes: what have we learned from GWAS? Ann N Y Acad Sci. 2010;1212(1):59\u201377.","journal-title":"Ann N Y Acad Sci"},{"issue":"9","key":"4634_CR2","doi-asserted-by":"publisher","first-page":"2759","DOI":"10.1038\/s41596-020-0353-1","volume":"15","author":"SW Choi","year":"2020","unstructured":"Choi SW, Mak TSH, O\u2019Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15(9):2759\u201372.","journal-title":"Nat Protoc"},{"issue":"3","key":"4634_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pgen.1003348","volume":"9","author":"F Dudbridge","year":"2013","unstructured":"Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013;9(3):1\u201317.","journal-title":"PLoS Genet"},{"issue":"9","key":"4634_CR4","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1038\/s41576-018-0018-x","volume":"19","author":"A Torkamani","year":"2018","unstructured":"Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19(9):581\u201390.","journal-title":"Nat Rev Genet"},{"issue":"1","key":"4634_CR5","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1001\/jamapsychiatry.2020.3049","volume":"78","author":"NR Wray","year":"2021","unstructured":"Wray NR, Lin T, Austin J, McGrath JJ, Hickie IB, Murray GK, et al. From basic science to clinical application of polygenic risk scores: a primer. JAMA Psychiat. 2021;78(1):101\u20139.","journal-title":"JAMA Psychiat"},{"issue":"3","key":"4634_CR6","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1016\/j.ajhg.2020.07.006","volume":"107","author":"M Thomas","year":"2020","unstructured":"Thomas M, Sakoda LC, Hoffmeister M, Rosenthal EA, Lee JK, van Duijnhoven FJB, et al. Genome-wide modeling of polygenic risk score in colorectal cancer risk. Am J Hum Genet. 2020;107(3):432\u201344.","journal-title":"Am J Hum Genet"},{"issue":"7","key":"4634_CR7","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1002\/gepi.20509","volume":"34","author":"C Kooperberg","year":"2010","unstructured":"Kooperberg C, LeBlanc M, Obenchain V. Risk prediction using genome-wide association studies. Genet Epidemiol. 2010;34(7):643\u201352.","journal-title":"Genet Epidemiol"},{"key":"4634_CR8","doi-asserted-by":"crossref","unstructured":"Gilbert-Diamond D, Moore JH. Analysis of gene\u2013gene interactions. Curr Protocols Human Genet. 2011;70(1):1.14.1\u20131.14.12.","DOI":"10.1002\/0471142905.hg0114s70"},{"issue":"8","key":"4634_CR9","doi-asserted-by":"publisher","first-page":"157","DOI":"10.21037\/atm.2018.04.05","volume":"6","author":"MD Ritchie","year":"2018","unstructured":"Ritchie MD, Van Steen K. The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. Ann Transl Med. 2018;6(8):157.","journal-title":"Ann Transl Med"},{"key":"4634_CR10","doi-asserted-by":"publisher","first-page":"138","DOI":"10.3389\/fgene.2013.00138","volume":"4","author":"R Che","year":"2013","unstructured":"Che R, Motsinger-Reif A. Evaluation of genetic risk score models in the presence of interaction and linkage disequilibrium. Front Genet. 2013;4:138.","journal-title":"Front Genet"},{"issue":"1","key":"4634_CR11","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1186\/s12863-017-0519-1","volume":"18","author":"A H\u00fcls","year":"2017","unstructured":"H\u00fcls A, Ickstadt K, Schikowski T, Kr\u00e4mer U. Detection of gene-environment interactions in the presence of linkage disequilibrium and noise by using genetic risk scores with internal weights from elastic net regression. BMC Genet. 2017;18(1):55.","journal-title":"BMC Genet"},{"issue":"6","key":"4634_CR12","doi-asserted-by":"publisher","first-page":"764","DOI":"10.1006\/pmed.1996.0117","volume":"25","author":"R Ottman","year":"1996","unstructured":"Ottman R. Gene-environment interaction: definitions and study design. Prev Med. 1996;25(6):764\u201370.","journal-title":"Prev Med"},{"issue":"1","key":"4634_CR13","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol). 1996;58(1):267\u201388.","journal-title":"J R Stat Soc Ser B (Methodol)"},{"issue":"2","key":"4634_CR14","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","volume":"67","author":"H Zou","year":"2005","unstructured":"Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol). 2005;67(2):301\u201320.","journal-title":"J R Stat Soc Ser B (Stat Methodol)"},{"issue":"1","key":"4634_CR15","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/j.ajhg.2018.11.002","volume":"104","author":"N Mavaddat","year":"2019","unstructured":"Mavaddat N, Michailidou K, Dennis J, Lush M, Fachal L, Lee A, et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Human Genet. 2019;104(1):21\u201334.","journal-title":"Am J Human Genet"},{"issue":"1","key":"4634_CR16","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1534\/genetics.119.302019","volume":"212","author":"F Priv\u00e9","year":"2019","unstructured":"Priv\u00e9 F, Aschard H, Blum MGB. Efficient implementation of penalized regression for genetic risk prediction. Genetics. 2019;212(1):65\u201374.","journal-title":"Genetics"},{"issue":"1","key":"4634_CR17","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001;45(1):5\u201332.","journal-title":"Mach Learn"},{"key":"4634_CR18","doi-asserted-by":"crossref","unstructured":"Fu H, Zhang Q, Qiu G. Random forest for image annotation. In: Computer Vision\u2014ECCV 2012. Berlin: Springer; 2012. p. 86\u201399.","DOI":"10.1007\/978-3-642-33783-3_7"},{"key":"4634_CR19","doi-asserted-by":"crossref","unstructured":"Elagamy MN, Stanier C, Sharp B. Stock market random forest-text mining system mining critical indicators of stock market movements. In: 2018 2nd international conference on natural language and speech processing (ICNLSP); 2018. p. 1\u20138.","DOI":"10.1109\/ICNLSP.2018.8374370"},{"issue":"3","key":"4634_CR20","doi-asserted-by":"publisher","first-page":"133","DOI":"10.3390\/ijgi8030133","volume":"8","author":"M Hao","year":"2019","unstructured":"Hao M, Jiang D, Ding F, Fu J, Chen S. Simulating spatio-temporal patterns of terrorism incidents on the Indochina Peninsula with GIS and the random forest method. ISPRS Int J Geo-Inf. 2019;8(3):133.","journal-title":"ISPRS Int J Geo-Inf"},{"key":"4634_CR21","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton: CRC Press; 1984."},{"issue":"1","key":"4634_CR22","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1186\/1471-2105-13-164","volume":"13","author":"SJ Winham","year":"2012","unstructured":"Winham SJ, Colby CL, Freimuth RR, Wang X, de Andrade M, Huebner M, et al. SNP interaction detection with Random Forests in high-dimensional genetic data. BMC Bioinform. 2012;13(1):164.","journal-title":"BMC Bioinform"},{"issue":"3","key":"4634_CR23","doi-asserted-by":"publisher","first-page":"475","DOI":"10.1198\/1061860032238","volume":"12","author":"I Ruczinski","year":"2003","unstructured":"Ruczinski I, Kooperberg C, LeBlanc M. Logic Regression. J Comput Graph Stat. 2003;12(3):475\u2013511.","journal-title":"J Comput Graph Stat"},{"issue":"1","key":"4634_CR24","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1093\/biostatistics\/kxm024","volume":"9","author":"H Schwender","year":"2007","unstructured":"Schwender H, Ickstadt K. Identification of SNP interactions using logic regression. Biostatistics. 2007;9(1):187\u201398.","journal-title":"Biostatistics"},{"issue":"2","key":"4634_CR25","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1002\/gepi.20042","volume":"28","author":"C Kooperberg","year":"2005","unstructured":"Kooperberg C, Ruczinski I. Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol. 2005;28(2):157\u201370.","journal-title":"Genet Epidemiol"},{"issue":"10","key":"4634_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0043035","volume":"7","author":"I Dinu","year":"2012","unstructured":"Dinu I, Mahasirimongkol S, Liu Q, Yanai H, Sharaf Eldin N, Kreiter E, et al. SNP-SNP interactions discovered by logic regression explain Crohn\u2019s disease genetics. PLoS ONE. 2012;7(10):1\u20136.","journal-title":"PLoS ONE"},{"issue":"10","key":"4634_CR27","doi-asserted-by":"publisher","first-page":"1639","DOI":"10.1007\/s00439-012-1194-y","volume":"131","author":"J Kruppa","year":"2012","unstructured":"Kruppa J, Ziegler A, K\u00f6nig IR. Risk estimation and risk prediction using machine-learning methods. Hum Genet. 2012;131(10):1639\u201354.","journal-title":"Hum Genet"},{"issue":"4","key":"4634_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0093379","volume":"9","author":"V Botta","year":"2014","unstructured":"Botta V, Louppe G, Geurts P, Wehenkel L. Exploiting SNP correlations within random forest for genome-wide association studies. PLoS ONE. 2014;9(4):1\u201311.","journal-title":"PLoS ONE"},{"issue":"2","key":"4634_CR29","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1002\/gepi.22279","volume":"44","author":"D Gola","year":"2020","unstructured":"Gola D, Erdmann J, M\u00fcller-Myhsok B, Schunkert H, K\u00f6nig IR. Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status. Genet Epidemiol. 2020;44(2):125\u201338.","journal-title":"Genet Epidemiol"},{"issue":"4","key":"4634_CR30","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1038\/s10038-020-00832-7","volume":"66","author":"A Badr\u00e9","year":"2021","unstructured":"Badr\u00e9 A, Zhang L, Muchero W, Reynolds JC, Pan C. Deep neural network improves the estimation of polygenic risk scores for breast cancer. J Hum Genet. 2021;66(4):359\u201369.","journal-title":"J Hum Genet"},{"issue":"7","key":"4634_CR31","first-page":"268","volume":"2","author":"W Yoo","year":"2012","unstructured":"Yoo W, Ference BA, Cote ML, Schwartz A. A comparison of logistic regression, logic regression, classification tree, and random forests to identify effective gene-gene and gene-environmental interactions. Int J Appl Sci Technol. 2012;2(7):268.","journal-title":"Int J Appl Sci Technol"},{"key":"4634_CR32","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009."},{"key":"4634_CR33","doi-asserted-by":"crossref","unstructured":"Li RH, Belford GG. Instability of decision tree classification algorithms. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. New York: Association for Computing Machinery; 2002. p. 570\u2013575.","DOI":"10.1145\/775047.775131"},{"issue":"2","key":"4634_CR34","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/BF00058655","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123\u201340.","journal-title":"Mach Learn"},{"issue":"1","key":"4634_CR35","doi-asserted-by":"publisher","first-page":"74","DOI":"10.3414\/ME00-01-0052","volume":"51","author":"JD Malley","year":"2012","unstructured":"Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A. Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med. 2012;51(1):74\u201381.","journal-title":"Methods Inf Med"},{"issue":"3","key":"4634_CR36","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1023\/A:1024099825458","volume":"52","author":"F Provost","year":"2003","unstructured":"Provost F, Domingos P. Tree induction for probability-based ranking. Mach Learn. 2003;52(3):199\u2013215.","journal-title":"Mach Learn"},{"issue":"11","key":"4634_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v036.i11","volume":"36","author":"MB Kursa","year":"2010","unstructured":"Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36(11):1\u201313.","journal-title":"J Stat Softw"},{"issue":"4","key":"4634_CR38","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1007\/s11634-016-0276-4","volume":"12","author":"S Janitza","year":"2018","unstructured":"Janitza S, Celik E, Boulesteix AL. A computationally fast variable importance test for random forests for high-dimensional data. Adv Data Anal Classif. 2018;12(4):885\u2013915.","journal-title":"Adv Data Anal Classif"},{"issue":"10","key":"4634_CR39","doi-asserted-by":"publisher","first-page":"1340","DOI":"10.1093\/bioinformatics\/btq134","volume":"26","author":"A Altmann","year":"2010","unstructured":"Altmann A, Tolo\u015fi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics. 2010;26(10):1340\u20137.","journal-title":"Bioinformatics"},{"issue":"2","key":"4634_CR40","doi-asserted-by":"publisher","first-page":"492","DOI":"10.1093\/bib\/bbx124","volume":"20","author":"F Degenhardt","year":"2017","unstructured":"Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform. 2017;20(2):492\u2013503.","journal-title":"Brief Bioinform"},{"issue":"1","key":"4634_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v077.i01","volume":"77","author":"MN Wright","year":"2017","unstructured":"Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77(1):1\u201317.","journal-title":"J Stat Softw"},{"issue":"4598","key":"4634_CR42","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1126\/science.220.4598.671","volume":"220","author":"S Kirkpatrick","year":"1983","unstructured":"Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science. 1983;220(4598):671\u201380.","journal-title":"Science"},{"key":"4634_CR43","unstructured":"Kooperberg C, Ruczinski I. LogicReg: Logic Regression; 2021. R package version 1.6.3."},{"key":"4634_CR44","unstructured":"Schwender H, Tietz T. logicFS: Identification of SNP Interactions; 2020. R package version 2.10.0."},{"issue":"1","key":"4634_CR45","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","volume":"12","author":"AE Hoerl","year":"1970","unstructured":"Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55\u201367.","journal-title":"Technometrics"},{"issue":"1","key":"4634_CR46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v033.i01","volume":"33","author":"J Friedman","year":"2010","unstructured":"Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1\u201322.","journal-title":"J Stat Softw"},{"key":"4634_CR47","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2020. Available from: https:\/\/www.R-project.org\/."},{"key":"4634_CR48","unstructured":"Schwender H, Fritsch A. scrime: Analysis of High-Dimensional Categorical Data Such as SNP Data; 2018. R package version 1.3.5."},{"issue":"1","key":"4634_CR49","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1186\/s12863-017-0586-3","volume":"18","author":"A H\u00fcls","year":"2017","unstructured":"H\u00fcls A, Kr\u00e4mer U, Carlsten C, Schikowski T, Ickstadt K, Schwender H. Comparison of weighting approaches for genetic risk scores in gene-environment interaction studies. BMC Genet. 2017;18(1):115.","journal-title":"BMC Genet"},{"issue":"5","key":"4634_CR50","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1002\/gepi.20488","volume":"34","author":"Q Li","year":"2010","unstructured":"Li Q, Fallin MD, Louis TA, Lasseter VK, McGrath JA, Avramopoulos D, et al. Detection of SNP-SNP interactions in trios of parents with schizophrenic children. Genet Epidemiol. 2010;34(5):396\u2013406.","journal-title":"Genet Epidemiol"},{"issue":"1","key":"4634_CR51","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1186\/1471-2105-12-9","volume":"12","author":"D Pan","year":"2011","unstructured":"Pan D, Li Q, Jiang N, Liu A, Yu K. Robust joint analysis allowing for model uncertainty in two-stage genetic association studies. BMC Bioinform. 2011;12(1):9.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4634_CR52","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","volume":"143","author":"JA Hanley","year":"1982","unstructured":"Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29\u201336.","journal-title":"Radiology"},{"key":"4634_CR53","doi-asserted-by":"crossref","unstructured":"Alberg AJ, Park JW, Hager BW, Brock MV, Diener-West M. The use of \u201coverall accuracy\u201d to evaluate the validity of screening or diagnostic tests. J Gen Internal Med. 2004;19(5p1):460\u2013465.","DOI":"10.1111\/j.1525-1497.2004.30091.x"},{"issue":"1","key":"4634_CR54","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1186\/1465-9921-6-152","volume":"6","author":"T Schikowski","year":"2005","unstructured":"Schikowski T, Sugiri D, Ranft U, Gehring U, Heinrich J, Wichmann HE, et al. Long-term air pollution exposure and living close to busy roads are associated with COPD in women. Respir Res. 2005;6(1):152.","journal-title":"Respir Res"},{"issue":"9919","key":"4634_CR55","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1016\/S0140-6736(13)62158-3","volume":"383","author":"R Beelen","year":"2014","unstructured":"Beelen R, Raaschou-Nielsen O, Stafoggia M, Andersen ZJ, Weinmayr G, Hoffmann B, et al. Effects of long-term exposure to air pollution on natural-cause mortality: an analysis of 22 European cohorts within the multicentre ESCAPE project. Lancet. 2014;383(9919):785\u201395.","journal-title":"Lancet"},{"issue":"20","key":"4634_CR56","doi-asserted-by":"publisher","first-page":"11195","DOI":"10.1021\/es301948k","volume":"46","author":"M Eeftens","year":"2012","unstructured":"Eeftens M, Beelen R, de Hoogh K, Bellander T, Cesaroni G, Cirach M, et al. Development of land use regression models for PM2.5, PM2.5 absorbance, PM10 and PMcoarse in 20 European Study areas; results of the ESCAPE project. Environ Sci Technol. 2012;46(20):11195\u2013205.","journal-title":"Environ Sci Technol"},{"issue":"9","key":"4634_CR57","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1289\/ehp.0901689","volume":"118","author":"U Kr\u00e4mer","year":"2010","unstructured":"Kr\u00e4mer U, Herder C, Sugiri D, Strassburger K, Schikowski T, Ranft U, et al. Traffic-related air pollution and incident type 2 diabetes: results from the SALIA cohort study. Environ Health Perspect. 2010;118(9):1273\u20139.","journal-title":"Environ Health Perspect"},{"key":"4634_CR58","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1016\/j.envres.2016.09.028","volume":"152","author":"A H\u00fcls","year":"2017","unstructured":"H\u00fcls A, Kr\u00e4mer U, Herder C, Fehsel K, Luckhaus C, Stolz S, et al. Genetic susceptibility for air pollution-induced airway inflammation in the SALIA study. Environ Res. 2017;152:43\u201350.","journal-title":"Environ Res"},{"issue":"5","key":"4634_CR59","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1136\/ard.61.5.453","volume":"61","author":"J Vanhoof","year":"2002","unstructured":"Vanhoof J, Declerck K, Geusens P. Prevalence of rheumatic diseases in a rheumatological outpatient practice. Ann Rheum Dis. 2002;61(5):453\u20135.","journal-title":"Ann Rheum Dis"},{"issue":"1","key":"4634_CR60","doi-asserted-by":"publisher","first-page":"21","DOI":"10.22631\/rr.2017.69997.1037","volume":"3","author":"M Jokar","year":"2018","unstructured":"Jokar M, Jokar M. Prevalence of inflammatory rheumatic diseases in a rheumatologic outpatient clinic: analysis of 12626 cases. Rheumatol Res. 2018;3(1):21\u20137.","journal-title":"Rheumatol Res"},{"key":"4634_CR61","doi-asserted-by":"crossref","unstructured":"Sangha O. Epidemiology of rheumatic diseases. Rheumatology. 2000;39(suppl\\_2):3\u201312.","DOI":"10.1093\/rheumatology\/39.suppl_2.3"},{"issue":"3","key":"4634_CR62","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1093\/qjmed\/hcp165","volume":"103","author":"YW Song","year":"2009","unstructured":"Song YW, Kang EH. Autoantibodies in rheumatoid arthritis: rheumatoid factors and anticitrullinated protein antibodies. QJM Int J Med. 2009;103(3):139\u201346.","journal-title":"QJM Int J Med"},{"issue":"8","key":"4634_CR63","doi-asserted-by":"publisher","first-page":"597","DOI":"10.1007\/s00251-017-0987-5","volume":"69","author":"AS Kampstra","year":"2017","unstructured":"Kampstra AS, Toes RE. HLA class II and rheumatoid arthritis: the bumpy road of revelation. Immunogenetics. 2017;69(8):597\u2013603.","journal-title":"Immunogenetics"},{"issue":"5","key":"4634_CR64","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/ar2781","volume":"11","author":"A Clarke","year":"2009","unstructured":"Clarke A, Vyse TJ. Genetics of rheumatic disease. Arthritis Res Therapy. 2009;11(5):1\u20139.","journal-title":"Arthritis Res Therapy"},{"issue":"12","key":"4634_CR65","doi-asserted-by":"publisher","first-page":"1336","DOI":"10.1038\/ng.2462","volume":"44","author":"S Eyre","year":"2012","unstructured":"Eyre S, Bowes J, Diogo D, Lee A, Barton A, Martin P, et al. High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat Genet. 2012;44(12):1336\u201340.","journal-title":"Nat Genet"},{"issue":"3","key":"4634_CR66","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1038\/ng.1076","volume":"44","author":"S Raychaudhuri","year":"2012","unstructured":"Raychaudhuri S, Sandor C, Stahl EA, Freudenberg J, Lee HS, Jia X, et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat Genet. 2012;44(3):291\u20136.","journal-title":"Nat Genet"},{"key":"4634_CR67","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1016\/j.ijsu.2018.01.046","volume":"52","author":"L Jiang","year":"2018","unstructured":"Jiang L, Jiang D, Han Y, Shi X, Ren C. Association of HLA-DPB1 polymorphisms with rheumatoid arthritis: a systemic review and meta-analysis. Int J Surg. 2018;52:98\u2013104.","journal-title":"Int J Surg"},{"issue":"2","key":"4634_CR68","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1016\/j.ajhg.2016.06.019","volume":"99","author":"Y Okada","year":"2016","unstructured":"Okada Y, Suzuki A, Ikari K, Terao C, Kochi Y, Ohmura K, et al. Contribution of a non-classical HLA gene, HLA-DOA, to the risk of rheumatoid arthritis. Am J Human Genet. 2016;99(2):366\u201374.","journal-title":"Am J Human Genet"},{"key":"4634_CR69","unstructured":"Purcell S, Chang C. PLINK 1.9; 2021. Available from: www.cog-genomics.org\/plink\/1.9\/."},{"key":"4634_CR70","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/s13742-015-0047-8","volume":"4","author":"CC Chang","year":"2015","unstructured":"Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7.","journal-title":"GigaScience"},{"issue":"3","key":"4634_CR71","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1086\/519795","volume":"81","author":"S Purcell","year":"2007","unstructured":"Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559\u201375.","journal-title":"Am J Hum Genet"},{"issue":"5","key":"4634_CR72","doi-asserted-by":"publisher","first-page":"558","DOI":"10.1136\/annrheumdis-2020-219065","volume":"80","author":"E Ha","year":"2021","unstructured":"Ha E, Bae SC, Kim K. Large-scale meta-analysis across East Asian and European populations updated genetic architecture and variant-driven biology of rheumatoid arthritis, identifying 11 novel susceptibility loci. Ann Rheum Dis. 2021;80(5):558\u201365.","journal-title":"Ann Rheum Dis"},{"issue":"5","key":"4634_CR73","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1086\/516736","volume":"80","author":"H K\u00e4llberg","year":"2007","unstructured":"K\u00e4llberg H, Padyukov L, Plenge RM, R\u00f6nnelid J, Gregersen PK, van der Helm-van Mil AHM, et al. Gene-gene and gene-environment interactions involving HLA-DRB1, PTPN22, and smoking in two subsets of rheumatoid arthritis. Am J Human Genet. 2007;80(5):867\u201375.","journal-title":"Am J Human Genet"},{"issue":"2","key":"4634_CR74","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1016\/j.rdc.2012.04.002","volume":"38","author":"EW Karlson","year":"2012","unstructured":"Karlson EW, Deane K. Environmental and gene-environment interactions and risk of rheumatoid arthritis. Rheum Dis Clin. 2012;38(2):405\u201326.","journal-title":"Rheum Dis Clin"},{"issue":"2","key":"4634_CR75","doi-asserted-by":"publisher","first-page":"475","DOI":"10.3390\/cells9020475","volume":"9","author":"Y Ishikawa","year":"2020","unstructured":"Ishikawa Y, Terao C. The impact of cigarette smoking on risk of rheumatoid arthritis: a narrative review. Cells. 2020;9(2):475.","journal-title":"Cells"},{"key":"4634_CR76","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1016\/j.eswa.2019.05.028","volume":"134","author":"JL Speiser","year":"2019","unstructured":"Speiser JL, Miller ME, Tooze J, Ip E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl. 2019;134:93\u2013101.","journal-title":"Expert Syst Appl"},{"key":"4634_CR77","doi-asserted-by":"publisher","first-page":"270","DOI":"10.3389\/fgene.2013.00270","volume":"4","author":"P Waldmann","year":"2013","unstructured":"Waldmann P, M\u00e9sz\u00e1ros G, Gredler B, F\u00fcrst C, S\u00f6lkner J. Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet. 2013;4:270.","journal-title":"Front Genet"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04634-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-022-04634-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04634-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T18:22:28Z","timestamp":1726856548000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-022-04634-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,21]]},"references-count":77,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["4634"],"URL":"https:\/\/doi.org\/10.1186\/s12859-022-04634-w","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,21]]},"assertion":[{"value":"29 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The study was conducted in accordance to the declaration of Helsinki. The SALIA cohort study has been approved by the Ethics Committees of the Ruhr-University Bochum and the Heinrich Heine University D\u00fcsseldorf. We received written informed consent from all participants.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"97"}}