{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:01Z","timestamp":1772138041409,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2020,10,18]],"date-time":"2020-10-18T00:00:00Z","timestamp":1602979200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1R01HL141813-01"],"award-info":[{"award-number":["1R01HL141813-01"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["1839332"],"award-info":[{"award-number":["1839332"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>There is growing interest in the biomedical research community to incorporate retrospective data, available in healthcare systems, to shed light on associations between different biomarkers. Understanding the association between various types of biomedical data, such as genetic, blood biomarkers, imaging, etc. can provide a holistic understanding of human diseases. To formally test a hypothesized association between two types of data in Electronic Health Records (EHRs), one requires a substantial sample size with both data modalities to achieve a reasonable power. Current association test methods only allow using data from individuals who have both data modalities. Hence, researchers cannot take advantage of much larger EHR samples that includes individuals with at least one of the data types, which limits the power of the association test.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present a new method called the Semi-paired Association Test (SAT) that makes use of both paired and unpaired data. In contrast to classical approaches, incorporating unpaired data allows SAT to produce better control of false discovery and to improve the power of the association test. We study the properties of the new test theoretically and empirically, through a series of simulations and by applying our method on real studies in the context of Chronic Obstructive Pulmonary Disease. We are able to identify an association between the high-dimensional characterization of Computed Tomography chest images and several blood biomarkers as well as the expression of dozens of genes involved in the immune system.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Code is available on https:\/\/github.com\/batmanlab\/Semi-paired-Association-Test.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa886","type":"journal-article","created":{"date-parts":[[2020,10,5]],"date-time":"2020-10-05T15:15:57Z","timestamp":1601910957000},"page":"785-792","source":"Crossref","is-referenced-by-count":1,"title":["Unpaired data empowers association tests"],"prefix":"10.1093","volume":"37","author":[{"given":"Mingming","family":"Gong","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh , Pittsburgh, PA 15206, USA"},{"name":"Department of Philosophy, Carnegie Mellon University , Pittsburgh, PA 15213, USA"},{"name":"School of Mathematics and Statistics, The University of Melbourne , Melbourne, VIC 3010, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh , Pittsburgh, PA 15206, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frank C","family":"Sciurba","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh , Pittsburgh, PA 15206, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Petar","family":"Stojanov","sequence":"additional","affiliation":[{"name":"Department of Philosophy, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dacheng","family":"Tao","sequence":"additional","affiliation":[{"name":"Australia School of Computer Science, The University of Sydney , Sydney, NSW 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5447-1014","authenticated-orcid":false,"given":"George C","family":"Tseng","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh , Pittsburgh, PA 15206, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Philosophy, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kayhan","family":"Batmanghelich","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh , Pittsburgh, PA 15206, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,10,18]]},"reference":[{"key":"2023051705200485100_btaa886-B1","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1126\/science.1156409","article-title":"Genetic mapping in human disease","volume":"322","author":"Altshuler","year":"2008","journal-title":"Science"},{"key":"2023051705200485100_btaa886-B2","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1038\/nbt.2594","article-title":"viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia","volume":"31","author":"Amir","year":"2013","journal-title":"Nat. Biotechnol"},{"key":"2023051705200485100_btaa886-B3","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1093\/ije\/dys234","article-title":"The general population cohort in rural south-western Uganda: a platform for communicable and non-communicable disease studies","volume":"42","author":"Asiki","year":"2013","journal-title":"Int. J. Epidemiol"},{"key":"2023051705200485100_btaa886-B4","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1165\/rcmb.2012-0230OC","article-title":"Peripheral blood mononuclear cell gene expression in chronic obstructive pulmonary disease","volume":"49","author":"Bahr","year":"2013","journal-title":"Am. J. Respir. Cell Mol. Biol"},{"key":"2023051705200485100_btaa886-B5","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1186\/s12931-014-0127-9","article-title":"The association of plasma biomarkers with computed tomography-assessed emphysema phenotypes","volume":"15","author":"Carolan","year":"2014","journal-title":"Respir. Res"},{"key":"2023051705200485100_btaa886-B6","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1016\/j.neuroimage.2004.12.036","article-title":"Preclinical detection of Alzheimer\u2019s disease: hippocampal shape and volume predict dementia onset in the elderly","volume":"25","author":"Csernansky","year":"2005","journal-title":"Neuroimage"},{"key":"2023051705200485100_btaa886-B7","doi-asserted-by":"crossref","first-page":"11406","DOI":"10.1073\/pnas.95.19.11406","article-title":"Hippocampal morphometry in schizophrenia by high dimensional brain mapping","volume":"95","author":"Csernansky","year":"1998","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705200485100_btaa886-B8","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1038\/nature10405","article-title":"Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk","volume":"478","author":"Ehret","year":"2011","journal-title":"Nature"},{"key":"2023051705200485100_btaa886-B9","doi-asserted-by":"crossref","first-page":"2479","DOI":"10.1073\/pnas.1415603112","article-title":"Massively expedited genome-wide heritability analysis (MEGHA)","volume":"112","author":"Ge","year":"2015","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705200485100_btaa886-B10","doi-asserted-by":"crossref","first-page":"13291","DOI":"10.1038\/ncomms13291","article-title":"Multidimensional heritability analysis of neuroanatomical shape","volume":"7","author":"Ge","year":"2016","journal-title":"Nat. Commun"},{"key":"2023051705200485100_btaa886-B11","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1016\/j.media.2010.05.008","article-title":"Manifold modeling for brain population analysis","volume":"14","author":"Gerber","year":"2010","journal-title":"Med. Image Anal"},{"key":"2023051705200485100_btaa886-B12","doi-asserted-by":"crossref","first-page":"782","DOI":"10.1038\/nn.3708","article-title":"Large-scale genomics unveils the genetic architecture of psychiatric disorders","volume":"17","author":"Gratten","year":"2014","journal-title":"Nat. Neurosci"},{"key":"2023051705200485100_btaa886-B13","first-page":"585","author":"Gretton","year":"2008"},{"key":"2023051705200485100_btaa886-B14","first-page":"585","volume-title":"NIPS 20","author":"Gretton","year":"2008"},{"key":"2023051705200485100_btaa886-B15","doi-asserted-by":"crossref","first-page":"2989","DOI":"10.1093\/bioinformatics\/btv325","article-title":"Diffusion maps for high-dimensional single-cell analysis of differentiation data","volume":"31","author":"Haghverdi","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051705200485100_btaa886-B16","doi-asserted-by":"crossref","first-page":"7377","DOI":"10.1073\/pnas.1510497113","article-title":"Linear mixed model for heritability estimation that explicitly addresses environmental variation","volume":"113","author":"Heckerman","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051705200485100_btaa886-B17","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1111\/biom.12314","article-title":"Equivalence of kernel machine regression and kernel distance covariance for multidimensional phenotype association studies","volume":"71","author":"Hua","year":"2015","journal-title":"Biometrics"},{"key":"2023051705200485100_btaa886-B18","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1176\/appi.ajp.163.2.276","article-title":"Basal ganglia shape alterations in bipolar disorder","volume":"163","author":"Hwang","year":"2006","journal-title":"Am. J. Psychiatry"},{"key":"2023051705200485100_btaa886-B19","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: kyoto encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023051705200485100_btaa886-B20","doi-asserted-by":"crossref","first-page":"924","DOI":"10.1186\/s12864-015-2170-4","article-title":"Integrative phenotyping framework (IPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes","volume":"16","author":"Kim","year":"2015","journal-title":"BMC Genomics"},{"key":"2023051705200485100_btaa886-B21","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1016\/j.ajhg.2007.10.010","article-title":"A powerful and flexible multilocus association test for quantitative traits","volume":"82","author":"Kwee","year":"2008","journal-title":"Am. J. Hum. Genet"},{"key":"2023051705200485100_btaa886-B22","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1111\/j.1541-0420.2007.00799.x","article-title":"Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models","volume":"63","author":"Liu","year":"2007","journal-title":"Biometrics"},{"key":"2023051705200485100_btaa886-B23","doi-asserted-by":"crossref","first-page":"686","DOI":"10.1002\/gepi.21663","article-title":"Multivariate phenotype association analysis by marker-set kernel machine regression","volume":"36","author":"Maity","year":"2012","journal-title":"Genet. Epidemiol"},{"key":"2023051705200485100_btaa886-B24","first-page":"474","author":"Mendoza","year":"2012"},{"key":"2023051705200485100_btaa886-B25","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/1479-5876-12-9","article-title":"Comparison of serum, EDTA plasma and P100 plasma for luminex-based biomarker multiplex assays in patients with chronic obstructive pulmonary disease in the SPIROMICS study","volume":"12","author":"O\u2019Neal","year":"2014","journal-title":"J. Transl. Med"},{"key":"2023051705200485100_btaa886-B26","doi-asserted-by":"crossref","first-page":"886","DOI":"10.1038\/nbt.1991","article-title":"Extracting a cellular hierarchy from high-dimensional cytometry data with spade","volume":"29","author":"Qiu","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023051705200485100_btaa886-B27","first-page":"A6418","article-title":"TESRA (treatment of emphysema with a selective retinoid agonist) study results","volume":"183","author":"Rames","year":"2011","journal-title":"Am. J. Respir. Crit. Care Med"},{"key":"2023051705200485100_btaa886-B28","doi-asserted-by":"crossref","first-page":"32","DOI":"10.3109\/15412550903499522","article-title":"Genetic epidemiology of COPD (COPDgene) study design","volume":"7","author":"Regan","year":"2011","journal-title":"COPD J. Chronic Obstr. Pulm. Dis"},{"key":"2023051705200485100_btaa886-B29","doi-asserted-by":"crossref","first-page":"1319","DOI":"10.1378\/chest.106.5.1319","article-title":"An automated method to assess the distribution of low attenuation areas on chest CT scans in chronic pulmonary emphysema patients","volume":"106","author":"Sakai","year":"1994","journal-title":"Chest"},{"key":"2023051705200485100_btaa886-B30","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1378\/chest.120.3.725","article-title":"CT assessment of subtypes of pulmonary emphysema in smokers","volume":"120","author":"Satoh","year":"2001","journal-title":"Chest"},{"key":"2023051705200485100_btaa886-B31","first-page":"170","author":"Schabdach","year":"2017"},{"key":"2023051705200485100_btaa886-B32","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1159\/000312641","article-title":"Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations","volume":"70","author":"Schaid","year":"2010","journal-title":"Hum. Hered"},{"key":"2023051705200485100_btaa886-B33","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1159\/000312643","article-title":"Genomic similarity and kernel methods II: methods for genomic information","volume":"70","author":"Schaid","year":"2010","journal-title":"Hum. Hered"},{"key":"2023051705200485100_btaa886-B34","doi-asserted-by":"crossref","first-page":"W460","DOI":"10.2214\/AJR.12.10102","article-title":"Relationships between airflow obstruction and quantitative CT measurements of emphysema, air trapping, and airways in subjects with and without chronic obstructive pulmonary disease","volume":"201","author":"Schroeder","year":"2013","journal-title":"Am. J. Roentgenol"},{"key":"2023051705200485100_btaa886-B35","doi-asserted-by":"crossref","first-page":"2263","DOI":"10.1214\/13-AOS1140","article-title":"Equivalence of distance-based and RKHS-based statistics in hypothesis testing","volume":"41","author":"Sejdinovic","year":"2013","journal-title":"Ann. Stat"},{"key":"2023051705200485100_btaa886-B36","doi-asserted-by":"crossref","first-page":"771","DOI":"10.1093\/hmg\/ddg088","article-title":"Linkage disequilibrium patterns of the human genome across populations","volume":"12","author":"Shifman","year":"2003","journal-title":"Hum. Mol. Genet"},{"key":"2023051705200485100_btaa886-B37","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1109\/TMI.2011.2164931","article-title":"Texture-based analysis of COPD: a data-driven approach","volume":"31","author":"Sorensen","year":"2012","journal-title":"IEEE Trans. Med. Imaging"},{"key":"2023051705200485100_btaa886-B38","first-page":"2389","article-title":"Universality, characteristic kernels and RKHS embedding of measures","volume":"12","author":"Sriperumbudur","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051705200485100_btaa886-B39","doi-asserted-by":"crossref","first-page":"2769","DOI":"10.1214\/009053607000000505","article-title":"Measuring and testing dependence by correlation of distances","volume":"35","author":"Sz\u00e9kely","year":"2007","journal-title":"Ann. Stat"},{"key":"2023051705200485100_btaa886-B40","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1164\/rccm.201204-0596PP","article-title":"Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: gold executive summary","volume":"187","author":"Vestbo","year":"2013","journal-title":"Am. J. Respir. Crit. Care Med"},{"key":"2023051705200485100_btaa886-B41","doi-asserted-by":"crossref","first-page":"e41","DOI":"10.1371\/journal.pgen.0020041","article-title":"Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings","volume":"2","author":"Visscher","year":"2006","journal-title":"PLoS Genet"},{"key":"2023051705200485100_btaa886-B42","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.ajhg.2017.06.005","article-title":"10 years of GWAS discovery: biology, function, and translation","volume":"101","author":"Visscher","year":"2017","journal-title":"Am. J. Hum. Genet"},{"key":"2023051705200485100_btaa886-B43","doi-asserted-by":"crossref","first-page":"1963","DOI":"10.1093\/bioinformatics\/btx103","article-title":"A generalized association test based on U statistics","volume":"33","author":"Wei","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051705200485100_btaa886-B44","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1038\/ng.2797","article-title":"Discovery and refinement of loci associated with lipid levels","volume":"45","author":"Willer","year":"2013","journal-title":"Nat. Genet"},{"key":"2023051705200485100_btaa886-B45","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1038\/ng.608","article-title":"Common SNPs explain a large proportion of the heritability for human height","volume":"42","author":"Yang","year":"2010","journal-title":"Nat. Genet"},{"key":"2023051705200485100_btaa886-B46","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.ajhg.2010.11.011","article-title":"GCTA: a tool for genome-wide complex trait analysis","volume":"88","author":"Yang","year":"2011","journal-title":"Am. J. Hum. Genet"},{"key":"2023051705200485100_btaa886-B47","first-page":"804","author":"Zhang","year":"2011"},{"key":"2023051705200485100_btaa886-B48","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1164\/rccm.201302-0263OC","article-title":"Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers","volume":"188","author":"Zhou","year":"2013","journal-title":"Am. J. Respir. Crit. Care Med"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa886\/34752881\/btaa886.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/6\/785\/50357666\/btaa886.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/6\/785\/50357666\/btaa886.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T01:21:16Z","timestamp":1684286476000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/6\/785\/5929693"}},"subtitle":[],"editor":[{"given":"Valencia","family":"Alfonso","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,10,18]]},"references-count":48,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,5,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa886","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/839159","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3,15]]},"published":{"date-parts":[[2020,10,18]]}}}