{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T01:46:19Z","timestamp":1776131179689,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2024,7,4]],"date-time":"2024-07-04T00:00:00Z","timestamp":1720051200000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004063","name":"Knut and Alice Wallenberg Foundation","doi-asserted-by":"publisher","award":["2021.0048"],"award-info":[{"award-number":["2021.0048"]}],"id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004063","name":"Knut and Alice Wallenberg Foundation","doi-asserted-by":"publisher","award":["KAW 2022.0033"],"award-info":[{"award-number":["KAW 2022.0033"]}],"id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The alignment of sequencing reads is a critical step in the characterization of ancient genomes. However, reference bias and spurious mappings pose a significant challenge, particularly as cutting-edge wet lab methods generate datasets that push the boundaries of alignment tools. Reference bias occurs when reference alleles are favoured over alternative alleles during mapping, whereas spurious mappings stem from either contamination or when endogenous reads fail to align to their correct position. Previous work has shown that these phenomena are correlated with read length but a more thorough investigation of reference bias and spurious mappings for ancient DNA has been lacking. Here, we use a range of empirical and simulated palaeogenomic datasets to investigate the impacts of mapping tools, quality thresholds, and reference genome on mismatch rates across read lengths.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>For these analyses, we introduce AMBER, a new bioinformatics tool for assessing the quality of ancient DNA mapping directly from BAM-files and informing on reference bias, read length cut-offs and reference selection. AMBER rapidly and simultaneously computes the sequence read mapping bias in the form of the mismatch rates per read length, cytosine deamination profiles at both CpG and non-CpG sites, fragment length distributions, and genomic breadth and depth of coverage. Using AMBER, we find that mapping algorithms and quality threshold choices dictate reference bias and rates of spurious alignment at different read lengths in a predictable manner, suggesting that optimized mapping parameters for each read length will be a key step in alleviating reference bias and spurious mappings.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>AMBER is available for noncommercial use on GitHub (https:\/\/github.com\/tvandervalk\/AMBER.git). Scripts used to generate and analyse simulated datasets are available on Github (https:\/\/github.com\/sdolenz\/refbias_scripts).<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae436","type":"journal-article","created":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T23:46:00Z","timestamp":1719963960000},"source":"Crossref","is-referenced-by-count":24,"title":["Unravelling reference bias in ancient DNA datasets"],"prefix":"10.1093","volume":"40","author":[{"given":"Stephanie","family":"Dolenz","sequence":"first","affiliation":[{"name":"Centre for Palaeogenetics, Svante Arrhenius v\u00e4g 20C, Stockholm, SE-106 91, Sweden"},{"name":"Department of Geological Sciences, Stockholm University , Stockholm, SE-106 91, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6582-3452","authenticated-orcid":false,"given":"Tom","family":"van der Valk","sequence":"additional","affiliation":[{"name":"Centre for Palaeogenetics, Svante Arrhenius v\u00e4g 20C, Stockholm, SE-106 91, Sweden"},{"name":"Department of Bioinformatics and Genetics, Swedish Museum of Natural History , Stockholm, SE-114 18, Sweden"},{"name":"Science for Life Laboratory , Stockholm, SE-171 65, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2392-7090","authenticated-orcid":false,"given":"Chenyu","family":"Jin","sequence":"additional","affiliation":[{"name":"Centre for Palaeogenetics, Svante Arrhenius v\u00e4g 20C, Stockholm, SE-106 91, Sweden"},{"name":"Department of Bioinformatics and Genetics, Swedish Museum of Natural History , Stockholm, SE-114 18, Sweden"},{"name":"Department of Zoology, Stockholm University , Stockholm, SE-106 91, Sweden"}]},{"given":"Jonas","family":"Oppenheimer","sequence":"additional","affiliation":[{"name":"Department of Biomolecular Engineering, University of California Santa Cruz , Santa Cruz, CA, 95064, United States"}]},{"given":"Muhammad Bilal","family":"Sharif","sequence":"additional","affiliation":[{"name":"Centre for Palaeogenetics, Svante Arrhenius v\u00e4g 20C, Stockholm, SE-106 91, Sweden"},{"name":"Department of Zoology, Stockholm University , Stockholm, SE-106 91, Sweden"}]},{"given":"Ludovic","family":"Orlando","sequence":"additional","affiliation":[{"name":"Centre for Anthropobiology and Genomics of Toulouse (CAGT, CNRS UMR5288), University Paul Sabatier, Facult\u00e9 de Sant\u00e9, Toulouse, 31000, France"}]},{"given":"Beth","family":"Shapiro","sequence":"additional","affiliation":[{"name":"Department of Ecology and Evolutionary Biology, University of California Santa Cruz , Santa Cruz, CA, 95064, United States"},{"name":"Howard Hughes Medical Institute, University of California Santa Cruz , Santa Cruz, CA, 95064, United States"}]},{"given":"Love","family":"Dal\u00e9n","sequence":"additional","affiliation":[{"name":"Centre for Palaeogenetics, Svante Arrhenius v\u00e4g 20C, Stockholm, SE-106 91, Sweden"},{"name":"Department of Bioinformatics and Genetics, Swedish Museum of Natural History , Stockholm, SE-114 18, Sweden"},{"name":"Department of Zoology, Stockholm University , Stockholm, SE-106 91, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6449-0219","authenticated-orcid":false,"given":"Peter D","family":"Heintzman","sequence":"additional","affiliation":[{"name":"Centre for Palaeogenetics, Svante Arrhenius v\u00e4g 20C, Stockholm, SE-106 91, Sweden"},{"name":"Department of Geological Sciences, Stockholm University , Stockholm, SE-106 91, Sweden"}]}],"member":"286","published-online":{"date-parts":[[2024,7,3]]},"reference":[{"key":"2024071720562745300_btae436-B1","doi-asserted-by":"crossref","first-page":"14616","DOI":"10.1073\/pnas.0704665104","article-title":"Patterns of damage in genomic DNA sequences from a neandertal","volume":"104","author":"Briggs","year":"2007","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024071720562745300_btae436-B2","doi-asserted-by":"crossref","first-page":"e87","DOI":"10.1093\/nar\/gkp1163","article-title":"Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA","volume":"38","author":"Briggs","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2024071720562745300_btae436-B3","doi-asserted-by":"crossref","first-page":"i884","DOI":"10.1093\/bioinformatics\/bty560","article-title":"fastp: an ultra-fast all-in-one FASTQ preprocessor","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2024071720562745300_btae436-B4","author":"Fernandez-Guerra","year":"2023"},{"key":"2024071720562745300_btae436-B5","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1186\/s12864-020-07229-y","article-title":"Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets","volume":"21","author":"Feuerborn","year":"2020","journal-title":"BMC Genomics"},{"key":"2024071720562745300_btae436-B6","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1186\/s12915-018-0581-9","article-title":"Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences","volume":"16","author":"de Filippo","year":"2018","journal-title":"BMC Biol"},{"key":"2024071720562745300_btae436-B7","doi-asserted-by":"crossref","first-page":"710","DOI":"10.1126\/science.1188021","article-title":"A draft sequence of the Neandertal genome","volume":"328","author":"Green","year":"2010","journal-title":"Science"},{"key":"2024071720562745300_btae436-B8","doi-asserted-by":"crossref","first-page":"e1008302","DOI":"10.1371\/journal.pgen.1008302","article-title":"The presence and impact of reference bias on population genomic studies of prehistoric human populations","volume":"15","author":"G\u00fcnther","year":"2019","journal-title":"PLoS Genet"},{"key":"2024071720562745300_btae436-B9","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1186\/1471-2105-14-184","article-title":"Benchmarking short sequence mapping tools","volume":"14","author":"Hatem","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2024071720562745300_btae436-B10","author":"Heger","year":"2014"},{"key":"2024071720562745300_btae436-B11","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCSE.2007.55","article-title":"Matplotlib: a 2D graphics environment","volume":"9","author":"Hunter","year":"2007","journal-title":"Comput Sci Eng"},{"key":"2024071720562745300_btae436-B12","doi-asserted-by":"crossref","first-page":"1682","DOI":"10.1093\/bioinformatics\/btt193","article-title":"mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters","volume":"29","author":"J\u00f3nsson","year":"2013","journal-title":"Bioinformatics"},{"key":"2024071720562745300_btae436-B13","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/s41586-022-05453-y","article-title":"A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA","volume":"612","author":"Kj\u00e6r","year":"2022","journal-title":"Nature"},{"key":"2024071720562745300_btae436-B14","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat Methods"},{"key":"2024071720562745300_btae436-B15","author":"Li","year":"2013"},{"key":"2024071720562745300_btae436-B16","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2024071720562745300_btae436-B17","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2024071720562745300_btae436-B18","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1186\/s13059-020-02160-7","article-title":"Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph","volume":"21","author":"Martiniano","year":"2020","journal-title":"Genome Biol"},{"key":"2024071720562745300_btae436-B19","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1186\/s12859-021-04375-2","article-title":"Detecting selection in low-coverage high-throughput sequencing data using principal component analysis","volume":"22","author":"Meisner","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2024071720562745300_btae436-B20","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbab076","article-title":"Systematic benchmark of ancient DNA read mapping","volume":"22","author":"Oliva","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024071720562745300_btae436-B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s43586-020-00011-0","article-title":"Ancient DNA analysis","volume":"1","author":"Orlando","year":"2021","journal-title":"Nat Rev Methods Primers"},{"key":"2024071720562745300_btae436-B22","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1101\/gr.163592.113","article-title":"Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome","volume":"24","author":"Pedersen","year":"2014","journal-title":"Genome Res"},{"key":"2024071720562745300_btae436-B23","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1186\/s13059-023-03083-9","article-title":"aMeta: an accurate and memory-efficient ancient metagenomic profiling workflow","volume":"24","author":"Pochon","year":"2023","journal-title":"Genome Biol"},{"key":"2024071720562745300_btae436-B24","doi-asserted-by":"crossref","first-page":"105","DOI":"10.3389\/fevo.2020.00105","article-title":"Assessing DNA sequence alignment methods for characterizing ancient genomes and methylomes","volume":"8","author":"Poullet","year":"2020","journal-title":"Front Ecol Evol"},{"key":"2024071720562745300_btae436-B25","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1093\/bioinformatics\/btw670","article-title":"Gargammel: a sequence simulator for ancient DNA","volume":"33","author":"Renaud","year":"2017","journal-title":"Bioinformatics"},{"key":"2024071720562745300_btae436-B26","doi-asserted-by":"crossref","first-page":"e34131","DOI":"10.1371\/journal.pone.0034131","article-title":"Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA","volume":"7","author":"Sawyer","year":"2012","journal-title":"PLoS One"},{"key":"2024071720562745300_btae436-B27","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/1471-2164-13-178","article-title":"Improving ancient DNA read mapping against modern reference genomes","volume":"13","author":"Schubert","year":"2012","journal-title":"BMC Genomics"},{"key":"2024071720562745300_btae436-B28","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1073\/pnas.1318934111","article-title":"Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal","volume":"111","author":"Skoglund","year":"2014","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024071720562745300_btae436-B29","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/s41586-021-03224-9","article-title":"Million-year-old DNA sheds light on the genomic history of mammoths","volume":"591","author":"van der Valk","year":"2021","journal-title":"Nature"},{"key":"2024071720562745300_btae436-B30","doi-asserted-by":"crossref","DOI":"10.1126\/science.abf1667","article-title":"Unearthing Neanderthal population history using nuclear and mitochondrial DNA from cave sediments","volume":"372","author":"Vernot","year":"2021","journal-title":"Science"},{"key":"2024071720562745300_btae436-B95864","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1002\/ece3.7056","article-title":"An efficient pipeline for ancient DNA mapping and recovery of endogenous ancient DNA FROM whole-genome sequencing data","volume":"11","author":"Xu","year":"2021","journal-title":"Ecology and Evolution"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae436\/58425539\/btae436.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae436\/58576890\/btae436.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae436\/58576890\/btae436.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T21:22:45Z","timestamp":1721251365000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae436\/7705522"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,7,1]]},"references-count":31,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae436","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,7,1]]},"article-number":"btae436"}}