{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T17:01:28Z","timestamp":1759683688300,"version":"3.41.0"},"reference-count":14,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,8,25]],"date-time":"2014-08-25T00:00:00Z","timestamp":1408924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","award":["R01CA160736"],"award-info":[{"award-number":["R01CA160736"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["IIS 0914861 and IIS 0915196"],"award-info":[{"award-number":["IIS 0914861 and IIS 0915196"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2014,10,28]]},"abstract":"<jats:p>Bayesian models are generally computed with Markov Chain Monte Carlo (MCMC) methods. The main disadvantage of MCMC methods is the large number of iterations they need to sample the posterior distributions of model parameters, especially for large datasets. On the other hand, variable selection remains a challenging problem due to its combinatorial search space, where Bayesian models are a promising solution. In this work, we study how to accelerate Bayesian model computation for variable selection in linear regression. We propose a fast Gibbs sampler algorithm, a widely used MCMC method that incorporates several optimizations. We use a Zellner prior for the regression coefficients, an improper prior on variance, and a conjugate prior Gaussian distribution, which enable dataset summarization in one pass, thus exploiting an augmented set of sufficient statistics. Thereafter, the algorithm iterates in main memory. Sufficient statistics are indexed with a sparse binary vector to efficiently compute matrix projections based on selected variables. Discovered variable subsets probabilities, selecting and discarding each variable, are stored on a hash table for fast retrieval in future iterations. We study how to integrate our algorithm into a Database Management System (DBMS), exploiting aggregate User-Defined Functions for parallel data summarization and stored procedures to manipulate matrices with arrays. An experimental evaluation with real datasets evaluates accuracy and time performance, comparing our DBMS-based algorithm with the R package. Our algorithm is shown to produce accurate results, scale linearly on dataset size, and run orders of magnitude faster than the R package.<\/jats:p>","DOI":"10.1145\/2629617","type":"journal-article","created":{"date-parts":[[2014,8,26]],"date-time":"2014-08-26T12:08:55Z","timestamp":1409054935000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Bayesian Variable Selection in Linear Regression in One Pass for Large Datasets"],"prefix":"10.1145","volume":"9","author":[{"given":"Carlos","family":"Ordonez","sequence":"first","affiliation":[{"name":"University of Houston"}]},{"given":"Carlos","family":"Garcia-Alvarado","sequence":"additional","affiliation":[{"name":"University of Houston"}]},{"given":"Veerabhadaran","family":"Baladandayuthapani","sequence":"additional","affiliation":[{"name":"UT MD Anderson Cancer Center"}]}],"member":"320","published-online":{"date-parts":[[2014,8,25]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1993.10476321"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186005X47345"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1214\/12-BA716"},{"edition":"1","volume-title":"Applied Numerical Linear Algebra","author":"Demmel J. W.","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"A. Gelman J. B. Carlin H. S. Stern and D. B. Rubin. 2003. Bayesian Data Analysis. Chapman and Hall\/CRC.  A. Gelman J. B. Carlin H. S. Stern and D. B. Rubin. 2003. Bayesian Data Analysis. Chapman and Hall\/CRC.","DOI":"10.1201\/9780429258480"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1993.10476353"},{"key":"e_1_2_1_7_1","first-page":"339","article-title":"Approaches for Bayesian variable selection","volume":"7","author":"George E. I.","year":"1997","journal-title":"Statistica Sinica"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1198\/jasa.2010.tm08177"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214507000001337"},{"volume-title":"Bayesian Core: A Practical Approach to Computational Bayesian Statistics","year":"2007","author":"Marin J. M.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.79"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.44"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2396761.2398605"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775049"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629617","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2629617","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:29Z","timestamp":1750227209000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629617"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8,25]]},"references-count":14,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,10,28]]}},"alternative-id":["10.1145\/2629617"],"URL":"https:\/\/doi.org\/10.1145\/2629617","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2014,8,25]]},"assertion":[{"value":"2012-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-08-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}