{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T23:25:36Z","timestamp":1771025136086,"version":"3.50.1"},"reference-count":45,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T00:00:00Z","timestamp":1714176000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 Research and Innovation Staff Exchange"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Bacterial genomes present more variability than human genomes, which requires important adjustments in computational tools that are developed for human data. In particular, bacteria exhibit a mosaic structure due to homologous recombinations, but this fact is not sufficiently captured by standard read mappers that align against linear reference genomes. The recent introduction of pangenomics provides some insights in that context, as a pangenome graph can represent the variability within a species. However, the concept of sequence-to-graph alignment that captures the presence of recombinations has not been previously investigated.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this paper, we present the extension of the notion of sequence-to-graph alignment to a variation graph that incorporates a recombination, so that the latter are explicitly represented and evaluated in an alignment. Moreover, we present a dynamic programming approach for the special case where there is at most a recombination\u2014we implement this case as RecGraph. From a modelling point of view, a recombination corresponds to identifying a new path of the variation graph, where the new arc is composed of two halves, each extracted from an original path, possibly joined by a new arc. Our experiments show that RecGraph accurately aligns simulated recombinant bacterial sequences that have at most a recombination, providing evidence for the presence of recombination events.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our implementation is open source and available at https:\/\/github.com\/AlgoLab\/RecGraph.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae292","type":"journal-article","created":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T19:32:39Z","timestamp":1714073559000},"source":"Crossref","is-referenced-by-count":5,"title":["RecGraph: recombination-aware alignment of sequences to variation graphs"],"prefix":"10.1093","volume":"40","author":[{"given":"Jorge","family":"Avila Cartes","sequence":"first","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7289-4988","authenticated-orcid":false,"given":"Paola","family":"Bonizzoni","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6469-4887","authenticated-orcid":false,"given":"Simone","family":"Ciccolella","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5584-3089","authenticated-orcid":false,"given":"Gianluca","family":"Della Vedova","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8786-2276","authenticated-orcid":false,"given":"Luca","family":"Denti","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1885-500X","authenticated-orcid":false,"given":"Xavier","family":"Didelot","sequence":"additional","affiliation":[{"name":"Department of Statistics and School of Life Sciences, University of Warwick , Coventry CV4 7AL, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1710-5500","authenticated-orcid":false,"given":"Davide Cesare","family":"Monti","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8479-7592","authenticated-orcid":false,"given":"Yuri","family":"Pirola","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, University of Milano \u2013 Bicocca . Viale Sarca 336 , Milano 20126, Italy"}]}],"member":"286","published-online":{"date-parts":[[2024,4,27]]},"reference":[{"key":"2024071814062781300_btae292-B1","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1006\/jagm.1999.1063","article-title":"Pattern matching in hypertext","volume":"35","author":"Amir","year":"2000","journal-title":"J Algorithms"},{"key":"2024071814062781300_btae292-B2","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/s11047-022-09882-6","article-title":"Computational graph pangenomics: a tutorial on data structures and their applications","volume":"21","author":"Baaijens","year":"2022","journal-title":"Nat Comput"},{"key":"2024071814062781300_btae292-B3","first-page":"15","author":"Bonnet","year":"2023"},{"key":"2024071814062781300_btae292-B4","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1186\/s13059-021-02473-1","article-title":"Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs","volume":"22","author":"Colquhoun","year":"2021","journal-title":"Genome Biol"},{"key":"2024071814062781300_btae292-B5","first-page":"118","article-title":"Computational pan-genomics: status, promises and challenges","volume":"19","author":"Computational Pan-Genomics Consortium","year":"2018","journal-title":"Brief Bioinf"},{"key":"2024071814062781300_btae292-B6","doi-asserted-by":"crossref","first-page":"e11147","DOI":"10.1371\/journal.pone.0011147","article-title":"progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement","volume":"5","author":"Darling","year":"2010","journal-title":"PLoS One"},{"key":"2024071814062781300_btae292-B7","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1186\/s12859-018-2436-3","article-title":"ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events","volume":"19","author":"Denti","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2024071814062781300_btae292-B8","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.tim.2010.04.002","article-title":"Impact of recombination on bacterial evolution","volume":"18","author":"Didelot","year":"2010","journal-title":"Trends Microbiol"},{"key":"2024071814062781300_btae292-B9","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1534\/genetics.110.120121","article-title":"Inference of homologous recombination in bacteria using whole-genome sequences","volume":"186","author":"Didelot","year":"2010","journal-title":"Genetics"},{"key":"2024071814062781300_btae292-B10","volume-title":"Graph Theory, volume 173 of Graduate Texts in Mathematics","author":"Diestel","year":"2005","edition":"3rd edn"},{"key":"2024071814062781300_btae292-B11","doi-asserted-by":"crossref","first-page":"e5","DOI":"10.1093\/nar\/gkx977","article-title":"panx: pan-genome analysis and exploration","volume":"46","author":"Ding","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2024071814062781300_btae292-B12","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1093\/infdis\/jis734","article-title":"Recombinational switching of the Clostridium difficile S-layer and a novel glycosylation gene cluster revealed by large-scale whole-genome sequencing","volume":"207","author":"Dingle","year":"2012","journal-title":"J Infect Dis"},{"key":"2024071814062781300_btae292-B13","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2024071814062781300_btae292-B14","doi-asserted-by":"crossref","first-page":"2045","DOI":"10.1098\/rstb.2006.1925","article-title":"Mismatch induced speciation in Salmonella: model and data","volume":"361","author":"Falush","year":"2006","journal-title":"Philos Trans R Soc Lond B Biol Sci"},{"key":"2024071814062781300_btae292-B15","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1126\/science.1127573","article-title":"Recombination and the nature of bacterial speciation","volume":"315","author":"Fraser","year":"2007","journal-title":"Science"},{"key":"2024071814062781300_btae292-B16","doi-asserted-by":"crossref","first-page":"2209","DOI":"10.1093\/bioinformatics\/btaa963","article-title":"abPOA: an SIMD-based C library for fast partial order alignment using adaptive band","volume":"37","author":"Gao","year":"2021","journal-title":"Bioinformatics"},{"key":"2024071814062781300_btae292-B17","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1101\/gr.255505.119","article-title":"Detection of simple and complex de novo mutations with multiple reference sequences","volume":"30","author":"Garimella","year":"2020","journal-title":"Genome Res"},{"key":"2024071814062781300_btae292-B18","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1038\/nbt.4227","article-title":"Variation graph toolkit improves read mapping by representing genetic variation in the reference","volume":"36","author":"Garrison","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2024071814062781300_btae292-B19","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","article-title":"An improved algorithm for matching biological sequences","volume":"162","author":"Gotoh","year":"1982","journal-title":"J Mol Biol"},{"key":"2024071814062781300_btae292-B20","doi-asserted-by":"crossref","first-page":"1454","DOI":"10.1126\/science.1171908","article-title":"Hyper-recombination, diversity, and antibiotic resistance in Pneumococcus","volume":"324","author":"Hanage","year":"2009","journal-title":"Science"},{"key":"2024071814062781300_btae292-B21","doi-asserted-by":"crossref","first-page":"e02158\u201314","DOI":"10.1128\/mBio.02158-14","article-title":"Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not","volume":"5","author":"Hedge","year":"2014","journal-title":"mBio"},{"key":"2024071814062781300_btae292-B22","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1145\/360825.360861","article-title":"A linear space algorithm for computing maximal common subsequences","volume":"18","author":"Hirschberg","year":"1975","journal-title":"Commun ACM"},{"key":"2024071814062781300_btae292-B23","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1089\/cmb.2019.0066","article-title":"On the complexity of sequence-to-graph alignment","volume":"27","author":"Jain","year":"2020","journal-title":"J Comput Biol"},{"key":"2024071814062781300_btae292-B24","doi-asserted-by":"crossref","first-page":"970","DOI":"10.1038\/s41467-022-28196-w","article-title":"Structure and assembly of the S-layer in C. difficile","volume":"13","author":"Lanzoni-Mangutchi","year":"2022","journal-title":"Nat Commun"},{"key":"2024071814062781300_btae292-B25","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","article-title":"Multiple sequence alignment using partial order graphs","volume":"18","author":"Lee","year":"2002","journal-title":"Bioinformatics"},{"key":"2024071814062781300_btae292-B26","doi-asserted-by":"crossref","first-page":"2213","DOI":"10.1093\/genetics\/165.4.2213","article-title":"Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data","volume":"165","author":"Li","year":"2003","journal-title":"Genetics"},{"key":"2024071814062781300_btae292-B27","doi-asserted-by":"crossref","first-page":"S15","DOI":"10.1186\/1471-2164-15-S6-S15","article-title":"Recombination-aware alignment of diploid individuals","volume":"15","author":"Makinen","year":"2014","journal-title":"BMC Genomics"},{"key":"2024071814062781300_btae292-B28","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1093\/bioinformatics\/btaa777","article-title":"Fast gap-affine pairwise alignment using the wavefront algorithm","volume":"37","author":"Marco-Sola","year":"2021","journal-title":"Bioinformatics"},{"key":"2024071814062781300_btae292-B29","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1016\/S0304-3975(99)00333-3","article-title":"Improved approximate pattern matching on hypertext","volume":"237","author":"Navarro","year":"2000","journal-title":"Theor Comput Sci"},{"key":"2024071814062781300_btae292-B30","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search of similarities in the amino-acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J Mol Biol"},{"key":"2024071814062781300_btae292-B31","first-page":"1477","article-title":"Bacterial recombination promotes the evolution of multi-drug-resistance in functionally diverse populations","volume":"279","author":"Perron","year":"2012","journal-title":"Proc Biol Sci"},{"key":"2024071814062781300_btae292-B32","author":"Rautiainen","year":"2017"},{"key":"2024071814062781300_btae292-B33","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1186\/s13059-020-02157-2","article-title":"GraphAligner: rapid and versatile sequence-to-graph alignment","volume":"21","author":"Rautiainen","year":"2020","journal-title":"Genome Biol"},{"key":"2024071814062781300_btae292-B34","doi-asserted-by":"crossref","first-page":"3599","DOI":"10.1093\/bioinformatics\/btz162","article-title":"Bit-parallel sequence-to-graph alignment","volume":"35","author":"Rautiainen","year":"2019","journal-title":"Bioinformatics"},{"key":"2024071814062781300_btae292-B35","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/TCBB.2018.2831691","article-title":"Hardness of covering alignment: phase transition in post-sequence genomics","volume":"16","author":"Rizzi","year":"2019","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2024071814062781300_btae292-B36","doi-asserted-by":"crossref","first-page":"1051","DOI":"10.1111\/mec.12162","article-title":"Progressive genome-wide introgression in agricultural Campylobacter coli","volume":"22","author":"Sheppard","year":"2013","journal-title":"Mol Ecol"},{"key":"2024071814062781300_btae292-B37","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol Syst Biol"},{"key":"2024071814062781300_btae292-B38","author":"Sir\u00e9n","year":"2021"},{"key":"2024071814062781300_btae292-B39","doi-asserted-by":"crossref","first-page":"abg8871","DOI":"10.1126\/science.abg8871","article-title":"Pangenomics enables genotyping of known structural variants in 5202 diverse genomes","volume":"374","author":"Sir\u00e9n","year":"2021","journal-title":"Science"},{"key":"2024071814062781300_btae292-B40","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1089\/106652702761034172","article-title":"A novel approach to remote homology detection: jumping alignments","volume":"9","author":"Spang","year":"2002","journal-title":"J Comput Biol"},{"key":"2024071814062781300_btae292-B41","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.jda.2012.10.001","article-title":"Indexing hypertext","volume":"18","author":"Thachuk","year":"2013","journal-title":"J Discret Algorithms"},{"key":"2024071814062781300_btae292-B42","doi-asserted-by":"crossref","first-page":"1136","DOI":"10.1111\/j.1365-2958.2006.05172.x","article-title":"Sex and virulence in Escherichia coli: an evolutionary perspective","volume":"60","author":"Wirth","year":"2006","journal-title":"Mol Microbiol"},{"key":"2024071814062781300_btae292-B43","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1093\/molbev\/msu082","article-title":"Efficient inference of recombination hot regions in bacterial genomes","volume":"31","author":"Yahara","year":"2014","journal-title":"Mol Biol Evol"},{"key":"2024071814062781300_btae292-B44","author":"Zhang","year":"2022"},{"key":"2024071814062781300_btae292-B45","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1186\/1471-2148-13-110","article-title":"Hypervariable antigen genes in malaria have ancient roots","volume":"13","author":"Zilversmit","year":"2013","journal-title":"BMC Evol Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae292\/57342508\/btae292.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/5\/btae292\/58585393\/btae292.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/5\/btae292\/58585393\/btae292.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T15:34:45Z","timestamp":1721316885000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae292\/7658945"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,4,27]]},"references-count":45,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae292","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5,1]]},"published":{"date-parts":[[2024,4,27]]},"article-number":"btae292"}}