{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T03:21:18Z","timestamp":1772248878441,"version":"3.50.1"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2019,11,8]],"date-time":"2019-11-08T00:00:00Z","timestamp":1573171200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 LM010098"],"award-info":[{"award-number":["R01 LM010098"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>TPOT is freely available via http:\/\/epistasislab.github.io\/tpot\/.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz796","type":"journal-article","created":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T04:25:20Z","timestamp":1572495920000},"page":"1772-1778","source":"Crossref","is-referenced-by-count":49,"title":["Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning"],"prefix":"10.1093","volume":"36","author":[{"given":"Alena","family":"Orlenko","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania , Philadelphia, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Kofink","sequence":"additional","affiliation":[{"name":"Department of Cardiology, Division Heart and Lungs , Utrecht, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leo-Pekka","family":"Lyytik\u00e4inen","sequence":"additional","affiliation":[{"name":"Department of Cardiology, Division Heart and Lungs , Utrecht, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kjell","family":"Nikus","sequence":"additional","affiliation":[{"name":"Department of Clinical Chemistry, Fimlab Laboratories , Tampere, Finland"},{"name":"Department of Cardiology, Tampere University Hospital , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pashupati","family":"Mishra","sequence":"additional","affiliation":[{"name":"Department of Clinical Chemistry, Fimlab Laboratories , Tampere, Finland"},{"name":"Department of Cardiology, Tampere University Hospital , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pekka","family":"Kuukasj\u00e4rvi","sequence":"additional","affiliation":[{"name":"Department of Cardio-Thoracic Surgery, Heart Center, Tampere University Hospital , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pekka J","family":"Karhunen","sequence":"additional","affiliation":[{"name":"Department of Forensic Medicine, Fimlab Laboratories , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mika","family":"K\u00e4h\u00f6nen","sequence":"additional","affiliation":[{"name":"Department of Clinical Physiology, Tampere University Hospital , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7253-6048","authenticated-orcid":false,"given":"Jari O","family":"Laurikka","sequence":"additional","affiliation":[{"name":"Department of Cardio-Thoracic Surgery, Heart Center, Tampere University Hospital , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Terho","family":"Lehtim\u00e4ki","sequence":"additional","affiliation":[{"name":"Department of Clinical Chemistry, Fimlab Laboratories , Tampere, Finland"},{"name":"Department of Cardiology, Tampere University Hospital , Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Folkert W","family":"Asselbergs","sequence":"additional","affiliation":[{"name":"Department of Cardiology, Division Heart and Lungs , Utrecht, The Netherlands"},{"name":"Health Data Research UK London, Institute for Health Informatics, University College London , London, UK"},{"name":"Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London , London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jason H","family":"Moore","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania , Philadelphia, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,11,8]]},"reference":[{"key":"2023060912023784400_btz796-B1","author":"Alaa","year":"2018"},{"key":"2023060912023784400_btz796-B2","first-page":"5938","article-title":"mlr: machine learning in R","volume":"17","author":"Bischl","year":"2016","journal-title":"J. Mach. Learn. Res"},{"key":"2023060912023784400_btz796-B3","first-page":"1","article-title":"Points of significance: statistics versus machine learning","author":"Bzdok","year":"2018","journal-title":"Nat. Methods"},{"key":"2023060912023784400_btz796-B5","first-page":"233","author":"Davis","year":"2006"},{"key":"2023060912023784400_btz796-B6","first-page":"3133","article-title":"Do we need hundreds of classifiers to solve real world classification problems?","volume":"15","author":"Fern\u00e1ndez-Delgado","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023060912023784400_btz796-B7","first-page":"2962","author":"Feurer","year":"2015"},{"key":"2023060912023784400_btz796-B8","first-page":"2171","article-title":"DEAP: evolutionary algorithms made easy","volume":"13","author":"Fortin","year":"2012","journal-title":"J. Mach. Learn. Res"},{"key":"2023060912023784400_btz796-B9","doi-asserted-by":"crossref","first-page":"109","DOI":"10.4103\/HEARTVIEWS.HEARTVIEWS_106_17","article-title":"Risk factors for coronary artery disease: historical perspectives","volume":"18","author":"Hajar","year":"2017","journal-title":"Heart Views"},{"key":"2023060912023784400_btz796-B11","volume-title":"Genetic Programming: On the Programming of Computers by Means of Natural Selection.","author":"Koza","year":"1992"},{"key":"2023060912023784400_btz796-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v028.i05","article-title":"Building predictive models in R using the caret package","volume":"28","author":"Kuhn","year":"2008","journal-title":"J. Stat. Softw"},{"key":"2023060912023784400_btz796-B13","doi-asserted-by":"crossref","first-page":"714","DOI":"10.1080\/00365510802172145","article-title":"Diagnostic performance of plasma high sensitive C-reactive protein in detecting three-vessel coronary artery disease: modification by apolipoprotein E genotype","volume":"68","author":"Mennander","year":"2008","journal-title":"Scand. J. Clin. Lab. Invest"},{"key":"2023060912023784400_btz796-B14","author":"Mosley","year":"2010"},{"key":"2023060912023784400_btz796-B15","first-page":"66","author":"Olson","year":"2016"},{"key":"2023060912023784400_btz796-B16","first-page":"192","article-title":"Data-driven advice for applying machine learning to bioinformatics problems","volume":"23","author":"Olson","year":"2018","journal-title":"Pac. Symp. Biocomput"},{"key":"2023060912023784400_btz796-B17","first-page":"460","article-title":"Considerations for automated machine learning in clinical metabolic profiling: altered homocysteine plasma concentration associated with metformin exposure","volume":"23","author":"Orlenko","year":"2018","journal-title":"Pac. Symp. Biocomput"},{"key":"2023060912023784400_btz796-B18","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023060912023784400_btz796-B19","first-page":"445","volume-title":"ICML","author":"Provost","year":"1998"},{"key":"2023060912023784400_btz796-B20","author":"Rubinsteyn","year":"2016"},{"key":"2023060912023784400_btz796-B21","doi-asserted-by":"crossref","first-page":"e0118432.","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2023060912023784400_btz796-B22","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1161\/CIRCGENETICS.114.000216","article-title":"Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics","volume":"8","author":"Soininen","year":"2015","journal-title":"Circ. Cardiovasc. Genet"},{"key":"2023060912023784400_btz796-B23","author":"Thornton","year":"2013"},{"key":"2023060912023784400_btz796-B24","author":"Witten","year":"2016"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz796\/30959973\/btz796.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/6\/1772\/50554377\/btz796.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/6\/1772\/50554377\/btz796.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T12:06:37Z","timestamp":1686312397000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/6\/1772\/5614811"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso.","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2019,11,8]]},"references-count":22,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz796","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3,15]]},"published":{"date-parts":[[2019,11,8]]}}}