{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:17Z","timestamp":1772138057391,"version":"3.50.1"},"reference-count":20,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,11,7]],"date-time":"2022-11-07T00:00:00Z","timestamp":1667779200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"University of Pittsburgh School of Medicine","award":["T32 5T32EB009403-13"],"award-info":[{"award-number":["T32 5T32EB009403-13"]}]},{"name":"National Institute of Heath"},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000070","name":"National Institute of Biomedical Imaging and Bioengineering","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000070","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000070","name":"NIBIB","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000070","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Single-cell RNA sequencing (scRNA-seq) continues to expand our knowledge by facilitating the study of transcriptional heterogeneity at the level of single cells. Despite this technology\u2019s utility and success in biomedical research, technical artifacts are present in scRNA-seq data. Doublets\/multiplets are a type of artifact that occurs when two or more cells are tagged by the same barcode, and therefore they appear as a single cell. Because this introduces non-existent transcriptional profiles, doublets can bias and mislead downstream analysis. To address this limitation, computational methods to annotate and remove doublets form scRNA-seq datasets are needed.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce vaeda (Variational Auto-Encoder for Doublet Annotation), a new approach for computational annotation of doublets in scRNA-seq data. Vaeda integrates a variational auto-encoder and Positive-Unlabeled learning to produce doublet scores and binary doublet calls. We apply vaeda, along with seven existing doublet annotation methods, to 16 benchmark datasets and find that vaeda performs competitively in terms of doublet scores and doublet calls. Notably, vaeda outperforms other python-based methods for doublet annotation. Altogether, vaeda is a robust and competitive method for scRNA-seq doublet annotation and may be of particular interest in the context of python-based workflows.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Vaeda is available at https:\/\/github.com\/kostkalab\/vaeda, and the version used for the results we present here is archived at zenodo (https:\/\/doi.org\/10.5281\/zenodo.7199783).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac720","type":"journal-article","created":{"date-parts":[[2022,11,5]],"date-time":"2022-11-05T18:10:18Z","timestamp":1667671818000},"source":"Crossref","is-referenced-by-count":7,"title":["Vaeda computationally annotates doublets in single-cell RNA sequencing data"],"prefix":"10.1093","volume":"39","author":[{"given":"Hannah","family":"Schriever","sequence":"first","affiliation":[{"name":"Department of Developmental Biology, University of Pittsburgh , Pittsburgh, PA 15201, USA"},{"name":"Canegie Mellon\u2014University of Pittsburgh Joint PhD Program, University of Pittsburgh , Pittsburgh, PA 15201, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1460-5487","authenticated-orcid":false,"given":"Dennis","family":"Kostka","sequence":"additional","affiliation":[{"name":"Department of Developmental Biology, University of Pittsburgh , Pittsburgh, PA 15201, USA"},{"name":"Department of Computational & Systems Biology and Center for Evolutionary Biology and Medicine, University of Pittsburgh , Pittsburgh, PA 15201, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,11,7]]},"reference":[{"key":"2023010107541329500_btac720-B1","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1093\/bioinformatics\/btz698","article-title":"scds: computational annotation of doublets in single-cell RNA sequencing data","volume":"36","author":"Bais","year":"2020","journal-title":"Bioinformatics"},{"key":"2023010107541329500_btac720-B2","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.cels.2020.05.010","article-title":"Solo: doublet identification in single-cell RNA-seq via semi-supervised deep learning","volume":"11","author":"Bernstein","year":"2020","journal-title":"Cell Syst"},{"key":"2023010107541329500_btac720-B3","doi-asserted-by":"crossref","first-page":"979","DOI":"10.12688\/f1000research.73600.1","article-title":"Doublet identification in single-cell sequencing data using scDblFinder","volume":"10","author":"Germain","year":"2021","journal-title":"F1000Research"},{"key":"2023010107541329500_btac720-B4","doi-asserted-by":"crossref","first-page":"e1008625","DOI":"10.1371\/journal.pcbi.1008625","article-title":"mbkmeans: fast clustering for single cell data using mini-batch k-means","volume":"17","author":"Hicks","year":"2021","journal-title":"PLoS Comput. Biol"},{"key":"2023010107541329500_btac720-B5","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nbt.4042","article-title":"Multiplexed droplet single-cell RNA-sequencing using natural genetic variation","volume":"36","author":"Kang","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023010107541329500_btac720-B7","first-page":"179","author":"Liu","year":"2003"},{"key":"2023010107541329500_btac720-B8","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/j.cels.2019.03.003","article-title":"Doubletfinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors","volume":"8","author":"McGinnis","year":"2019","journal-title":"Cell Syst"},{"key":"2023010107541329500_btac720-B9","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1038\/s41592-019-0433-8","article-title":"Multi-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices","volume":"16","author":"McGinnis","year":"2019","journal-title":"Nat. Methods"},{"key":"2023010107541329500_btac720-B10","author":"McInnes","year":"2018"},{"key":"2023010107541329500_btac720-B11","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.patrec.2013.06.010","article-title":"A bagging SVM to learn from positive and unlabeled examples","volume":"37","author":"Mordelet","year":"2014","journal-title":"Patt. Recogn. Lett"},{"key":"2023010107541329500_btac720-B12","first-page":"166","author":"Satopaa","year":"2011"},{"key":"2023010107541329500_btac720-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-018-1603-1","article-title":"Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics","volume":"19","author":"Stoeckius","year":"2018","journal-title":"Genome Biol"},{"key":"2023010107541329500_btac720-B14","doi-asserted-by":"crossref","first-page":"5233","DOI":"10.1038\/s41598-019-41695-z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"Traag","year":"2019","journal-title":"Sci. Rep"},{"key":"2023010107541329500_btac720-B15","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2023010107541329500_btac720-B16","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.cels.2018.11.005","article-title":"Scrublet: computational identification of cell doublets in single-cell transcriptomic data","volume":"8","author":"Wolock","year":"2019","journal-title":"Cell Syst"},{"key":"2023010107541329500_btac720-B17","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1016\/j.cels.2020.11.008","article-title":"Benchmarking computational doublet-detection methods for single-cell RNA sequencing data","volume":"12","author":"Xi","year":"2021","journal-title":"Cell Syst"},{"key":"2023010107541329500_btac720-B18","doi-asserted-by":"crossref","first-page":"100699","DOI":"10.1016\/j.xpro.2021.100699","article-title":"Protocol for executing and benchmarking eight computational doublet-detection methods in single-cell RNA sequencing data analysis","volume":"2","author":"Xi","year":"2021","journal-title":"STAR Protoc"},{"key":"2023010107541329500_btac720-B19","doi-asserted-by":"crossref","first-page":"100311","DOI":"10.1016\/j.patter.2021.100311","article-title":"Emptynn: a neural network based on positive and unlabeled learning to remove cell-free droplets and recover lost cells in scRNA-seq data","volume":"2","author":"Yan","year":"2021","journal-title":"Patterns"},{"key":"2023010107541329500_btac720-B20","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"},{"key":"2023010107541329500_btac720-B21","doi-asserted-by":"crossref","first-page":"1317","DOI":"10.1038\/s41592-021-01286-1","article-title":"An analytical framework for interpretable and generalizable single-cell data analysis","volume":"18","author":"Zhou","year":"2021","journal-title":"Nat. Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac720\/47104587\/btac720.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac720\/48448959\/btac720.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac720\/48448959\/btac720.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T05:12:37Z","timestamp":1672549957000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac720\/6808614"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,11,7]]},"references-count":20,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,11,7]]},"published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac720","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.04.15.488440","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,11,7]]},"article-number":"btac720"}}