{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:52:03Z","timestamp":1761598323827,"version":"3.37.3"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1553281","IIS-1812641"],"award-info":[{"award-number":["CCF-1553281","IIS-1812641"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Mathematical Multifaceted Integrated Capability Centers program","award":["DE-SC0019393"],"award-info":[{"award-number":["DE-SC0019393"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>When learning to subtype complex disease based on next-generation sequencing data, the amount of available data is often limited. Recent works have tried to leverage data from other domains to design better predictors in the target domain of interest with varying degrees of success. But they are either limited to the cases requiring the outcome label correspondence across domains or cannot leverage the label information at all. Moreover, the existing methods cannot usually benefit from other information available a priori such as gene interaction networks.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this article, we develop a generative optimal Bayesian supervised domain adaptation (OBSDA) model that can integrate RNA sequencing (RNA-Seq) data from different domains along with their labels for improving prediction accuracy in the target domain. Our model can be applied in cases where different domains share the same labels or have different ones. OBSDA is based on a hierarchical Bayesian negative binomial model with parameter factorization, for which the optimal predictor can be derived by marginalization of likelihood over the posterior of the parameters. We first provide an efficient Gibbs sampler for parameter inference in OBSDA. Then, we leverage the gene-gene network prior information and construct an informed and flexible variational family to infer the posterior distributions of model parameters. Comprehensive experiments on real-world RNA-Seq data demonstrate the superior performance of OBSDA, in terms of accuracy in identifying cancer subtypes by utilizing data from different domains. Moreover, we show that by taking advantage of the prior network information we can further improve the performance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code for implementations of OBSDA and SI-OBSDA are available at the following link. https:\/\/github.com\/SHBLK\/BSDA.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab228","type":"journal-article","created":{"date-parts":[[2021,4,3]],"date-time":"2021-04-03T03:14:39Z","timestamp":1617419679000},"page":"3212-3219","source":"Crossref","is-referenced-by-count":3,"title":["Optimal Bayesian supervised domain adaptation for RNA sequencing data"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5015-8995","authenticated-orcid":false,"given":"Shahin","family":"Boluki","sequence":"first","affiliation":[{"name":"Department of Electrical & Computer Engineering, Texas A&M University , College Station, TX 77843, USA"}]},{"given":"Xiaoning","family":"Qian","sequence":"additional","affiliation":[{"name":"Department of Electrical & Computer Engineering, Texas A&M University , College Station, TX 77843, USA"},{"name":"TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University , College Station, TX 77843, USA"}]},{"given":"Edward R","family":"Dougherty","sequence":"additional","affiliation":[{"name":"Department of Electrical & Computer Engineering, Texas A&M University , College Station, TX 77843, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"2023051701220811900_btab228-B1","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1038\/nm.3915","article-title":"Toward understanding and exploiting tumor heterogeneity","volume":"21","author":"Alizadeh","year":"2015","journal-title":"Nat. Med"},{"key":"2023051701220811900_btab228-B2","doi-asserted-by":"crossref","first-page":"D525","DOI":"10.1093\/nar\/gkp878","article-title":"The intact molecular interaction database in 2010","volume":"38","author":"Aranda","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023051701220811900_btab228-B3","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","article-title":"Variational inference: a review for statisticians","volume":"112","author":"Blei","year":"2017","journal-title":"J. Am. Stat. Assoc"},{"key":"2023051701220811900_btab228-B0470162","doi-asserted-by":"publisher","first-page":"524","DOI":"10.1109\/TCBB.2017.2778715","article-title":"Constructing Pathway-Based Priors within a Gaussian Mixture Model for Bayesian Regression and Classification","volume":"16","author":"Boluki","journal-title":"IEEE\/ACM Transactions on Computational Biology and Bioinformatics"},{"key":"2023051701220811900_btab228-B4","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1186\/s12859-017-1893-4","article-title":"Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors","volume":"18","author":"Boluki","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023051701220811900_btab228-B5","doi-asserted-by":"crossref","first-page":"e49","DOI":"10.1093\/bioinformatics\/btl242","article-title":"Integrating structured biological data by kernel maximum mean discrepancy","volume":"22","author":"Borgwardt","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051701220811900_btab228-B6","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1080\/01621459.2017.1328358","article-title":"BNP-seq: Bayesian nonparametric differential expression analysis of sequencing count data","volume":"113","author":"Dadaneh","year":"2018","journal-title":"J. Am. Stat. Assoc"},{"first-page":"540","year":"2020","author":"Dadaneh","key":"2023051701220811900_btab228-B7"},{"first-page":"193","year":"2007","author":"Dai","key":"2023051701220811900_btab228-B8"},{"key":"2023051701220811900_btab228-B9","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1016\/j.patcog.2012.10.018","article-title":"Optimal classifiers with minimum expected error within a Bayesian framework. Part I: discrete and Gaussian models","volume":"46","author":"Dalton","year":"2013","journal-title":"Pattern Recogn"},{"key":"2023051701220811900_btab228-B10","doi-asserted-by":"crossref","DOI":"10.1117\/3.2540669","volume-title":"Optimal Bayesian Classification","author":"Dalton","year":"2020"},{"key":"2023051701220811900_btab228-B11","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1186\/s12859-018-2465-y","article-title":"Application of transfer learning for cancer drug sensitivity prediction","volume":"19","author":"Dhruba","year":"2018","journal-title":"BMC Bioinformatics"},{"first-page":"255","year":"2018","author":"Frazier","key":"2023051701220811900_btab228-B12"},{"key":"2023051701220811900_btab228-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-29990-7","article-title":"Searching the overlap between network modules with specific betweeness (S2B) and its application to cross-disease analysis","volume":"8","author":"Garcia-Vaquero","year":"2018","journal-title":"Sci. Rep"},{"key":"2023051701220811900_btab228-B14","first-page":"222","article-title":"Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation","author":"Gong","year":"2013","journal-title":"Int. Conf. Mach. Learn"},{"key":"2023051701220811900_btab228-B15","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1186\/s13073-014-0082-6","article-title":"Modules, networks and systems medicine for understanding disease and aiding diagnosis","volume":"6","author":"Gustafsson","year":"2014","journal-title":"Genome Med"},{"key":"2023051701220811900_btab228-B16","first-page":"9115","article-title":"Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data","volume":"31","author":"Hajiramezanali","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023051701220811900_btab228-B17","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/j.cell.2018.03.042","article-title":"The Cancer Genome Atlas: creating lasting value beyond its data","volume":"173","author":"Hutter","year":"2018","journal-title":"Cell"},{"key":"2023051701220811900_btab228-B18","first-page":"745","article-title":"Clustered multi-task learning: a convex formulation","volume":"22","author":"Jacob","year":"2009","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023051701220811900_btab228-B19","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1023\/A:1007665907178","article-title":"An introduction to variational methods for graphical models","volume":"37","author":"Jordan","year":"1999","journal-title":"Mach. Learn"},{"first-page":"521","year":"2011","author":"Kang","key":"2023051701220811900_btab228-B20"},{"key":"2023051701220811900_btab228-B21","doi-asserted-by":"crossref","first-page":"3724","DOI":"10.1109\/TSP.2018.2839583","article-title":"Optimal Bayesian transfer learning","volume":"66","author":"Karbalayghareh","year":"2018","journal-title":"IEEE Trans. Signal Process"},{"key":"2023051701220811900_btab228-B22","article-title":"Optimal Bayesian transfer learning for count data","author":"Karbalayghareh","year":"2019","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf,"},{"key":"2023051701220811900_btab228-B23","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1038\/nature12113","article-title":"Integrated genomic characterization of endometrial carcinoma","volume":"497","author":"Kandoth","year":"2013","journal-title":"Nature"},{"key":"2023051701220811900_btab228-B24","first-page":"469","article-title":"Coupled generative adversarial networks","volume":"29","author":"Liu","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst"},{"first-page":"97","year":"2015","author":"Long","key":"2023051701220811900_btab228-B25"},{"key":"2023051701220811900_btab228-B26","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2023051701220811900_btab228-B27","doi-asserted-by":"crossref","first-page":"1257601","DOI":"10.1126\/science.1257601","article-title":"Uncovering disease-disease relationships through the incomplete interactome","volume":"347","author":"Menche","year":"2015","journal-title":"Science"},{"key":"2023051701220811900_btab228-B28","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1038\/s41592-018-0214-9","article-title":"Found in translation: a machine learning model for mouse-to-human inference","volume":"15","author":"Normand","year":"2018","journal-title":"Nat. Methods"},{"key":"2023051701220811900_btab228-B29","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"2023051701220811900_btab228-B30","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1109\/TNN.2010.2091281","article-title":"Domain adaptation via transfer component analysis","volume":"22","author":"Pan","year":"2011","journal-title":"IEEE Trans. Neural Netw"},{"first-page":"1283","year":"2012","author":"Passos","key":"2023051701220811900_btab228-B31"},{"key":"2023051701220811900_btab228-B32","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1109\/MSP.2014.2347059","article-title":"Visual domain adaptation: a survey of recent advances","volume":"32","author":"Patel","year":"2015","journal-title":"IEEE Signal Process. Mag"},{"first-page":"1321","year":"2009","author":"Rai","key":"2023051701220811900_btab228-B33"},{"key":"2023051701220811900_btab228-B34","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051701220811900_btab228-B35","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1109\/JPROC.2015.2494218","article-title":"Taking the human out of the loop: a review of Bayesian optimization","volume":"104","author":"Shahriari","year":"2016","journal-title":"Proc. IEEE"},{"key":"2023051701220811900_btab228-B36","doi-asserted-by":"crossref","first-page":"D698","DOI":"10.1093\/nar\/gkq1116","article-title":"The BioGRID interaction database: 2011 update","volume":"39","author":"Stark","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051701220811900_btab228-B37","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1214\/11-AOAS502","article-title":"Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor","volume":"6","author":"Wei","year":"2012","journal-title":"Ann. Appl. Stat"},{"key":"2023051701220811900_btab228-B38","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s40537-016-0043-6","article-title":"A survey of transfer learning","volume":"3","author":"Weiss","year":"2016","journal-title":"J. Big Data"},{"year":"2018","author":"Yin","key":"2023051701220811900_btab228-B39"},{"first-page":"3320","year":"2014","author":"Yosinski","key":"2023051701220811900_btab228-B40"},{"key":"2023051701220811900_btab228-B41","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/17-BA1070","article-title":"Nonparametric Bayesian negative binomial factor analysis","volume":"13","author":"Zhou","year":"2018","journal-title":"Bayesian Anal"},{"key":"2023051701220811900_btab228-B42","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1109\/TPAMI.2013.211","article-title":"Negative binomial process count and mixture modeling","volume":"37","author":"Zhou","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab228\/37929130\/btab228.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3212\/50338121\/btab228.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3212\/50338121\/btab228.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T01:23:33Z","timestamp":1684286613000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/19\/3212\/6211157"}},"subtitle":[],"editor":[{"given":"Jan","family":"Gorodkin","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":43,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab228","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,10,1]]},"published":{"date-parts":[[2021,4,5]]}}}