{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:21Z","timestamp":1772138061526,"version":"3.50.1"},"reference-count":13,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T00:00:00Z","timestamp":1687219200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["WT220788"],"award-info":[{"award-number":["WT220788"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>While many pipelines have been developed for calling genotypes using RNA-sequencing (RNA-Seq) data, they all have adapted DNA genotype callers that do not model biases specific to RNA-Seq such as allele-specific expression (ASE).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we present Bayesian beta-binomial mixture model (BBmix), a Bayesian beta-binomial mixture model that first learns the expected distribution of read counts for each genotype, and then deploys those learned parameters to call genotypes probabilistically. We benchmarked our model on a wide variety of datasets and showed that our method generally performed better than competitors, mainly due to an increase of up to 1.4% in the accuracy of heterozygous calls, which may have a big impact in reducing false positive rate in applications sensitive to genotyping error such as ASE. Moreover, BBmix can be easily incorporated into standard pipelines for calling genotypes. We further show that parameters are generally transferable within datasets, such that a single learning run of less than 1 h is sufficient to call genotypes in a large number of samples.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>We implemented BBmix as an R package that is available for free under a GPL-2 licence at https:\/\/gitlab.com\/evigorito\/bbmix\u00a0and https:\/\/cran.r-project.org\/package=bbmix with accompanying pipeline at https:\/\/gitlab.com\/evigorito\/bbmix_pipeline.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad393","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T11:14:41Z","timestamp":1687259681000},"source":"Crossref","is-referenced-by-count":8,"title":["BBmix: a Bayesian beta-binomial mixture model for accurate genotyping from RNA-sequencing"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6230-3849","authenticated-orcid":false,"given":"Elena","family":"Vigorito","sequence":"first","affiliation":[{"name":"MRC Biostatistics Unit, University of Cambridge , Cambridge CB2 0SR, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3316-2527","authenticated-orcid":false,"given":"Anne","family":"Barton","sequence":"additional","affiliation":[{"name":"Division of Musculoskeletal and Dermatological Sciences, University of Manchester , Manchester M13 9PL, United Kingdom"}]},{"given":"Costantino","family":"Pitzalis","sequence":"additional","affiliation":[{"name":"Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London , London EC1M 6BQ, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9365-5345","authenticated-orcid":false,"given":"Myles J","family":"Lewis","sequence":"additional","affiliation":[{"name":"Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London , London EC1M 6BQ, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9755-1703","authenticated-orcid":false,"given":"Chris","family":"Wallace","sequence":"additional","affiliation":[{"name":"MRC Biostatistics Unit, University of Cambridge , Cambridge CB2 0SR, United Kingdom"},{"name":"Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge , Cambridge CB2 0AW, United Kingdom"}]}],"member":"286","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"2023070406462088200_btad393-B1","doi-asserted-by":"crossref","first-page":"e0216838","DOI":"10.1371\/journal.pone.0216838","article-title":"Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data","volume":"14","author":"Adetunji","year":"2019","journal-title":"PLoS ONE"},{"key":"2023070406462088200_btad393-B2","author":"Akutagawa","year":"2022"},{"key":"2023070406462088200_btad393-B3","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s40104-019-0359-0","article-title":"The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments","volume":"10","author":"Brouard","year":"2019","journal-title":"J Anim Sci Biotechnol"},{"key":"2023070406462088200_btad393-B4","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1186\/s13059-015-0762-6","article-title":"Tools and best practices for data processing in allelic expression analysis","volume":"16","author":"Castel","year":"2015","journal-title":"Genome Biol"},{"key":"2023070406462088200_btad393-B5","author":"Garrison"},{"key":"2023070406462088200_btad393-B6","doi-asserted-by":"crossref","DOI":"10.3389\/fgene.2021.655707","article-title":"RNA-Seq data for reliable SNP detection and genotype calling: interest for coding variant characterization and cis-regulation analysis by allele-specific expression in livestock species","volume":"12","author":"Jehl","year":"2021","journal-title":"Front Genet"},{"key":"2023070406462088200_btad393-B7","doi-asserted-by":"crossref","first-page":"2455","DOI":"10.1016\/j.celrep.2019.07.091","article-title":"Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes","volume":"28","author":"Lewis","year":"2019","journal-title":"Cell Rep"},{"key":"2023070406462088200_btad393-B8","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023070406462088200_btad393-B9","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"Abecasis","year":"2012","journal-title":"Nature"},{"key":"2023070406462088200_btad393-B10","doi-asserted-by":"crossref","first-page":"e58815","DOI":"10.1371\/journal.pone.0058815","article-title":"Development of strategies for SNP detection in RNA-Seq data: application to lymphoblastoid cell lines and evaluation using 1000 genomes data","volume":"8","author":"Quinn","year":"2013","journal-title":"PLoS ONE"},{"key":"2023070406462088200_btad393-B11","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1186\/s12864-018-5239-z","article-title":"Accuracy of RNAseq based SNP discovery and genotyping in Populus nigra","volume":"19","author":"Rogier","year":"2018","journal-title":"BMC Genomics"},{"key":"2023070406462088200_btad393-B12","author":"Stan Development Team","year":"2018"},{"key":"2023070406462088200_btad393-B13","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1186\/s12859-021-04307-0","article-title":"A pipeline for RNA-seq based eQTL analysis with automated quality control procedures","volume":"22","author":"Wang","year":"2021","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad393\/50657946\/btad393.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad393\/50791680\/btad393.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad393\/50791680\/btad393.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,4]],"date-time":"2023-07-04T02:46:38Z","timestamp":1688438798000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad393\/7203797"}},"subtitle":[],"editor":[{"given":"Christina","family":"Kendziorski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,6,20]]},"references-count":13,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad393","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.12.02.518817","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,7,1]]},"published":{"date-parts":[[2023,6,20]]},"article-number":"btad393"}}