{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T04:15:08Z","timestamp":1759637708566,"version":"3.37.3"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T00:00:00Z","timestamp":1626220800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["309048"],"award-info":[{"award-number":["309048"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100023750","name":"Helsinki Institute for Information Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100023750","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,12,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Variant calling workflows that utilize a single reference sequence are the de facto standard elementary genomic analysis routine for resequencing projects. Various ways to enhance the reference with pangenomic information have been proposed, but scalability combined with seamless integration to existing workflows remains a challenge.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present PanVC with founder sequences, a scalable and accurate variant calling workflow based on a multiple alignment of reference sequences. Scalability is achieved by removing duplicate parts up to a limit into a founder multiple alignment, that is then indexed using a hybrid scheme that exploits general purpose read aligners. Our implemented workflow uses GATK or BCFtools for variant calling, but the various steps of our workflow (e.g. vcf2multialign tool, founder reconstruction) can be of independent interest as a basis for creating novel pangenome analysis workflows beyond variant calling.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our open access tools and instructions how to reproduce our experiments are available at the following address: https:\/\/github.com\/algbio\/panvc-founders.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab516","type":"journal-article","created":{"date-parts":[[2021,7,9]],"date-time":"2021-07-09T11:17:45Z","timestamp":1625829465000},"page":"4611-4619","source":"Crossref","is-referenced-by-count":10,"title":["Founder reconstruction enables scalable and seamless pangenomic analysis"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8276-0585","authenticated-orcid":false,"given":"Tuukka","family":"Norri","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00014, Finland"}]},{"given":"Bastien","family":"Cazaux","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00014, Finland"}]},{"given":"Saska","family":"D\u00f6nges","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00014, Finland"}]},{"given":"Daniel","family":"Valenzuela","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00014, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4454-1493","authenticated-orcid":false,"given":"Veli","family":"M\u00e4kinen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00014, Finland"}]}],"member":"286","published-online":{"date-parts":[[2021,7,14]]},"reference":[{"key":"2023051607145158900_btab516-B1","first-page":"11.10.1","article-title":"From fastq data to high-confidence variant calls: the genome analysis toolkit best practices pipeline","volume":"43","author":"Auwera","year":"2013","journal-title":"Curr. Protoc. Bioinf"},{"key":"2023051607145158900_btab516-B2","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1186\/s13059-019-1774-4","article-title":"Is it time to change the reference genome?","volume":"20","author":"Ballouz","year":"2019","journal-title":"Genome Biol"},{"key":"2023051607145158900_btab516-B3","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13059-015-0587-3","article-title":"Extending reference assembly models","volume":"16","author":"Church","year":"2015","journal-title":"Genome Biol"},{"key":"2023051607145158900_btab516-B4","first-page":"bbw089","article-title":"Computational pan-genomics: status, promises and challenges","volume":"19","year":"2016","journal-title":"Brief. Bioinf"},{"key":"2023051607145158900_btab516-B5","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and vcftools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051607145158900_btab516-B6","doi-asserted-by":"crossref","first-page":"giab008","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of samtools and bcftools","volume":"10","author":"Danecek","year":"2021","journal-title":"GigaScience"},{"key":"2023051607145158900_btab516-B7","doi-asserted-by":"crossref","first-page":"e109384","DOI":"10.1371\/journal.pone.0109384","article-title":"Indexes of large genome collections on a PC","volume":"9","author":"Danek","year":"2014","journal-title":"PLoS One"},{"key":"2023051607145158900_btab516-B8","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1038\/ng.3257","article-title":"Improved genome inference in the mhc using a population reference graph","volume":"47","author":"Dilthey","year":"2015","journal-title":"Nat. Genet"},{"key":"2023051607145158900_btab516-B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023051607145158900_btab516-B10","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"Eberle","year":"2017","journal-title":"Genome Res"},{"key":"2023051607145158900_btab516-B11","doi-asserted-by":"crossref","first-page":"1654","DOI":"10.1038\/ng.3964","article-title":"Graphtyper enables population-scale genotyping using pangenome graphs","volume":"49","author":"Eggertsson","year":"2017","journal-title":"Nat. Genet"},{"key":"2023051607145158900_btab516-B12","doi-asserted-by":"crossref","first-page":"5402","DOI":"10.1038\/s41467-019-13341-9","article-title":"Graphtyper2 enables population-scale genotyping of structural variation using pangenome graphs","volume":"10","author":"Eggertsson","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051607145158900_btab516-B13","doi-asserted-by":"crossref","first-page":"20130137","DOI":"10.1098\/rsta.2013.0137","article-title":"Hybrid indexes for repetitive datasets","volume":"372","author":"Ferrada","year":"2014","journal-title":"Phil. Trans. R. Soc. A"},{"key":"2023051607145158900_btab516-B14","doi-asserted-by":"crossref","first-page":"12","DOI":"10.3389\/fbioe.2015.00012","article-title":"Searching and indexing genomic databases via kernelization","volume":"3","author":"Gagie","year":"2015","journal-title":"Front. Bioeng. Biotechnol"},{"key":"2023051607145158900_btab516-B15","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1038\/nbt.4227","article-title":"Variation graph toolkit improves read mapping by representing genetic variation in the reference","volume":"36","author":"Garrison","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023051607145158900_btab516-B16","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1186\/s13059-020-1941-7","article-title":"Genotyping structural variants in pangenome graphs using the vg toolkit","volume":"21","author":"Hickey","year":"2020","journal-title":"Genome Biol"},{"key":"2023051607145158900_btab516-B17","doi-asserted-by":"crossref","first-page":"i361","DOI":"10.1093\/bioinformatics\/btt215","article-title":"Short read alignment with populations of genomes","volume":"29","author":"Huang","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051607145158900_btab516-B18","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1038\/s41587-019-0201-4","article-title":"Graph-based genome alignment and genotyping with hisat2 and hisat-genotype","volume":"37","author":"Kim","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023051607145158900_btab516-B19","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023051607145158900_btab516-B20","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with burrows\u2013wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"volume-title":"Proc. BigData 2020, LNCS","year":"2020","author":"Maarala","key":"2023051607145158900_btab516-B21"},{"key":"2023051607145158900_btab516-B22","first-page":"222","volume-title":"Algorithms in Bioinformatics - 16th International Workshop, WABI 2016, Aarhus, Denmark, August 22\u201324, 2016. Proceedings, Volume 9838 of Lecture Notes in Computer Science","author":"Maciuca","year":"2016"},{"key":"2023051607145158900_btab516-B23","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1089\/cmb.2009.0169","article-title":"Storage and retrieval of highly repetitive sequence collections","volume":"17","author":"M\u00e4kinen","year":"2010","journal-title":"J. Comput. Biol"},{"key":"2023051607145158900_btab516-B24","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1038\/nature18964","article-title":"The simons genome diversity project: 300 genomes from 142 diverse populations","volume":"538","author":"Mallick","year":"2016","journal-title":"Nature"},{"key":"2023051607145158900_btab516-B25","doi-asserted-by":"crossref","first-page":"i142","DOI":"10.1093\/bioinformatics\/bty266","article-title":"Versatile genome assembly evaluation with QUAST-LG","volume":"34","author":"Mikheenko","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051607145158900_btab516-B26","doi-asserted-by":"crossref","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with snakemake","volume":"10","author":"M\u00f6lder","year":"2021","journal-title":"F1000Research"},{"key":"2023051607145158900_btab516-B27","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s13015-019-0147-6","article-title":"Linear time minimum segmentation enables scalable founder reconstruction","volume":"14","author":"Norri","year":"2019","journal-title":"Algorithms Mol. Biol"},{"key":"2023051607145158900_btab516-B28","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1101\/gr.214155.116","article-title":"Genome graphs and the evolution of genome inference","volume":"27","author":"Paten","year":"2017","journal-title":"Genome Res"},{"key":"2023051607145158900_btab516-B29","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1038\/538161a","article-title":"Genomics is failing on diversity","volume":"538","author":"Popejoy","year":"2016","journal-title":"Nature"},{"key":"2023051607145158900_btab516-B30","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1186\/s13059-018-1595-x","article-title":"Forge: prioritizing variants for graph genomes","volume":"19","author":"Pritt","year":"2018","journal-title":"Genome Biol"},{"key":"2023051607145158900_btab516-B31","doi-asserted-by":"crossref","first-page":"3499","DOI":"10.1093\/bioinformatics\/btu438","article-title":"Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop","volume":"30","author":"Rahn","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051607145158900_btab516-B32","doi-asserted-by":"crossref","first-page":"R98","DOI":"10.1186\/gb-2009-10-9-r98","article-title":"Simultaneous alignment of short reads against multiple genomes","volume":"10","author":"Schneeberger","year":"2009","journal-title":"Genome Biol"},{"key":"2023051607145158900_btab516-B33","doi-asserted-by":"crossref","first-page":"e0136771","DOI":"10.1371\/journal.pone.0136771","article-title":"Improving the power of structural variation detection by augmenting the reference","volume":"10","author":"Schr\u00f6der","year":"2015","journal-title":"PLoS One"},{"key":"2023051607145158900_btab516-B34","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1109\/TCBB.2013.2297101","article-title":"Indexing graphs for path queries with applications in genome research","volume":"11","author":"Sir\u00e9n","year":"2014","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf"},{"key":"2023051607145158900_btab516-B35","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1093\/bioinformatics\/btz575","article-title":"Haplotype-aware graph indexes","volume":"36","author":"Sir\u00e9n","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051607145158900_btab516-B36","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1093\/bioinformatics\/btw753","article-title":"Edlib: a c\/c++ library for fast, exact sequence alignment using edit distance","volume":"33","author":"\u0160o\u0161i\u0107","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051607145158900_btab516-B37","first-page":"42","article-title":"GNU parallel \u2013 the command-line power tool","volume":"36","author":"Tange","year":"2011","journal-title":"USENIX Mag"},{"key":"2023051607145158900_btab516-B38","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"2023051607145158900_btab516-B39","first-page":"277","volume-title":"Algorithms in Bioinformatics, Second International Workshop, WABI 2002, Rome, Italy, September 17-21, 2002, Proceedings, Volume 2452 of Lecture Notes in Computer Science","author":"Ukkonen","year":"2002"},{"first-page":"326","year":"2016","author":"Valenzuela","key":"2023051607145158900_btab516-B40"},{"key":"2023051607145158900_btab516-B41","first-page":"178129","article-title":"CHIC: a short read aligner for pan-genomic references","author":"Valenzuela","year":"2017","journal-title":"bioRxiv"},{"key":"2023051607145158900_btab516-B42","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1186\/s12864-018-4465-8","article-title":"Towards pan-genome read alignment to improve variation calling","volume":"19","author":"Valenzuela","year":"2018","journal-title":"BMC Genomics"},{"key":"2023051607145158900_btab516-B43","doi-asserted-by":"crossref","first-page":"1534","DOI":"10.14778\/2536258.2536265","article-title":"RCSI: scalable similarity search in thousand (s) of genomes","volume":"6","author":"Wandelt","year":"2013","journal-title":"Proc. VLDB Endowment"},{"key":"2023051607145158900_btab516-B44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.25","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"Zook","year":"2016","journal-title":"Sci. Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab516\/40391795\/btab516.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/24\/4611\/50334896\/btab516.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/24\/4611\/50334896\/btab516.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T07:45:59Z","timestamp":1684223159000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/24\/4611\/6321452"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,7,14]]},"references-count":44,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2021,12,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab516","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,12,15]]},"published":{"date-parts":[[2021,7,14]]}}}