{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T19:55:15Z","timestamp":1774641315577,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T00:00:00Z","timestamp":1655078400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004189","name":"Max Planck Society","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004189","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (&amp;lt;1\u00d7 average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1\u00d7 and below) remains a challenging task for existing approaches.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present a new method to estimate contamination in aDNA for male modern humans. It utilizes a Li&amp;Stephens haplotype copying model for haploid X chromosomes, with mismatches modeled as errors or contamination. We assessed this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our experiments demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1\u00d7 for SNP capture data (1240k) and 0.02\u00d7 for whole genome sequencing data, substantially extending the coverage limit of previous male X chromosome-based contamination estimation methods. Our experiments demonstrate that hapCon has little bias for contamination up to 25\u201330% as long as the contaminating source is specified within continental genetic variation, and that its application range extends to human aDNA as old as \u223c45\u00a0000 and various global ancestries.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>We make hapCon available as part of a python package (hapROH), which is available at the Python Package Index (https:\/\/pypi.org\/project\/hapROH) and can be installed via pip. The documentation provides example use cases as blueprints for custom applications (https:\/\/haproh.readthedocs.io\/en\/latest\/hapCon.html). The program can analyze either BAM files or pileup files produced with samtools. An implementation of our software (hapCon) using Python and C is deposited at https:\/\/github.com\/hyl317\/hapROH.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac390","type":"journal-article","created":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T10:09:02Z","timestamp":1655114942000},"page":"3768-3777","source":"Crossref","is-referenced-by-count":35,"title":["hapCon: estimating contamination of ancient genomes by copying from reference haplotypes"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2860-2543","authenticated-orcid":false,"given":"Yilei","family":"Huang","sequence":"first","affiliation":[{"name":"Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology , 04103 Leipzig, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4884-9682","authenticated-orcid":false,"given":"Harald","family":"Ringbauer","sequence":"additional","affiliation":[{"name":"Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology , 04103 Leipzig, Germany"}]}],"member":"286","published-online":{"date-parts":[[2022,6,13]]},"reference":[{"key":"2023041405354859000_","author":"Ausmees","year":"2022"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"key":"2023041405354859000_","first-page":"627","volume-title":"Pattern Recognition and Machine Learning (Information Science and Statistics)","author":"Bishop","year":"2006"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1880","DOI":"10.1016\/j.ajhg.2021.08.005","article-title":"Fast two-stage phasing of large-scale sequence data","volume":"108","author":"Browning","year":"2021","journal-title":"Am. J. Hum. Genet"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1137\/0916069","article-title":"A limited memory algorithm for bound constrained optimization","volume":"16","author":"Byrd","year":"1995","journal-title":"SIAM J. Sci. Comput"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1038\/s41586-020-2859-7","article-title":"High-depth African genomes inform human migration and health","volume":"586","author":"Choudhury","year":"2020","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1038\/nature02168","article-title":"The international hapmap project","volume":"426","year":"2003","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1038\/s41586-018-0094-2","article-title":"137 ancient human genomes from across the Eurasian steppes","volume":"557","author":"de Barros Damgaard","year":"2018","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-13225-y","article-title":"Accurate, scalable and integrative haplotype estimation","volume":"10","author":"Delaneau","year":"2019","journal-title":"Nat. Commun"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1038\/s41588-022-01071-6","article-title":"Promoting the genomic revolution in Africa through the Nigerian 100k genome project","volume":"54","author":"Fatumo","year":"2022","journal-title":"Nat. Genet"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1038\/s41586-020-03053-2","article-title":"A genetic history of the pre-contact Caribbean","volume":"590","author":"Fernandes","year":"2021","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1038\/nature13810","article-title":"Genome sequence of a 45,000-year-old modern human from Western Siberia","volume":"514","author":"Fu","year":"2014","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1038\/nature14558","article-title":"An early modern human from Romania with a recent Neanderthal ancestor","volume":"524","author":"Fu","year":"2015","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1038\/nature17993","article-title":"The genetic history of ice age Europe","volume":"534","author":"Fu","year":"2016","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-32083-0","article-title":"Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis","volume":"8","author":"Furtw\u00e4ngler","year":"2018","journal-title":"Sci. Rep"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nature14317","article-title":"Massive migration from the steppe was a source for Indo-European languages in Europe","volume":"522","author":"Haak","year":"2015","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-75387-w","article-title":"Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes","volume":"10","author":"Hui","year":"2020","journal-title":"Sci. Rep"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1038\/nature13673","article-title":"Ancient human genomes suggest three ancestral populations for present-day Europeans","volume":"513","author":"Lazaridis","year":"2014","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.1093\/genetics\/165.4.2213","article-title":"Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data","volume":"165","author":"Li","year":"2003","journal-title":"Genetics"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"820","DOI":"10.1126\/science.aad2879","article-title":"Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa","volume":"350","author":"Llorente","year":"2015","journal-title":"Science"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1443","DOI":"10.1038\/ng.3679","article-title":"Reference-based phasing using the Haplotype Reference Consortium panel","volume":"48","author":"Loh","year":"2016","journal-title":"Nat. Genet"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"798","DOI":"10.1093\/bioinformatics\/bty735","article-title":"Haplotype matching in large cohorts using the Li and Stephens model","volume":"35","author":"Lunter","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1038\/nature18964","article-title":"The Simons genome diversity project: 300 genomes from 142 diverse populations","volume":"538","author":"Mallick","year":"2016","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-14523-6","article-title":"Genetic history from the Middle Neolithic to present on the Mediterranean island of Sardinia","volume":"11","author":"Marcus","year":"2020","journal-title":"Nat. Commun"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/nature16152","article-title":"Genome-wide patterns of selection in 230 ancient Eurasians","volume":"528","author":"Mathieson","year":"2015","journal-title":"Nature"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1093\/bioinformatics\/btz660","article-title":"A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data","volume":"36","author":"Moreno-Mayar","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-020-02111-2","article-title":"ContamLD: estimation of ancient nuclear DNA contamination using breakdown of linkage disequilibrium","volume":"21","author":"Nakatsuka","year":"2020","journal-title":"Genome Biol"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1230","DOI":"10.1126\/science.aav4040","article-title":"The genomic history of the Iberian Peninsula over the past 8000 years","volume":"363","author":"Olalde","year":"2019","journal-title":"Science"},{"key":"2023041405354859000_","author":"Peter","year":"2020"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-020-02123-y","article-title":"AuthentiCT: a model of ancient DNA damage to estimate the proportion of present-day DNA contamination","volume":"21","author":"Peyr\u00e9gne","year":"2020","journal-title":"Genome Biol"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1016\/j.cell.2018.10.027","article-title":"Reconstructing the deep population history of Central and South America","volume":"175","author":"Posth","year":"2018","journal-title":"Cell"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"e1005972","DOI":"10.1371\/journal.pgen.1005972","article-title":"Joint estimation of contamination, error and demography for nuclear DNA from ancient humans","volume":"12","author":"Racimo","year":"2016","journal-title":"PLoS Genet"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1126\/science.1211177","article-title":"An Aboriginal Australian genome reveals separate human dispersals into Asia","volume":"334","author":"Rasmussen","year":"2011","journal-title":"Science"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-015-0776-0","article-title":"Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA","volume":"16","author":"Renaud","year":"2015","journal-title":"Genome Biol"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-25289-w","article-title":"Parental relatedness through time revealed by runs of homozygosity in ancient DNA","volume":"12","author":"Ringbauer","year":"2021","journal-title":"Nat. Commun"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1038\/s41588-020-00756-0","article-title":"Efficient phasing and imputation of low-coverage sequencing data using large reference panels","volume":"53","author":"Rubinacci","year":"2021","journal-title":"Nat. Genet"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1126\/science.aao1807","article-title":"Ancient genomes show social and reproductive behavior of early upper Paleolithic foragers","volume":"358","author":"Sikora","year":"2017","journal-title":"Science"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1073\/pnas.1318934111","article-title":"Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal","volume":"111","author":"Skoglund","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat. Methods"},{"key":"2023041405354859000_","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1145\/279232.279236","article-title":"Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization","volume":"23","author":"Zhu","year":"1997","journal-title":"ACM Trans. Math. Softw"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac390\/44181963\/btac390.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3768\/49884336\/btac390.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3768\/49884336\/btac390.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,22]],"date-time":"2023-11-22T20:39:56Z","timestamp":1700685596000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/15\/3768\/6607584"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,13]]},"references-count":42,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2022,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac390","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.12.20.473429","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,8,1]]},"published":{"date-parts":[[2022,6,13]]}}}