{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:53:56Z","timestamp":1740135236549,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T00:00:00Z","timestamp":1674172800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T00:00:00Z","timestamp":1674172800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["UM1 HG008898","R24-OD-11173"],"award-info":[{"award-number":["UM1 HG008898","R24-OD-11173"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"DK RNA","award":["UW: W1207-B09"],"award-info":[{"award-number":["UW: W1207-B09"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Recent population studies are ever growing in number of samples to investigate the diversity of a population or species. These studies reveal new polymorphism that lead to important insights into the mechanisms of evolution, but are also important for the interpretation of these variations. Nevertheless, while the full catalog of variations across entire species remains unknown, we can predict which regions harbor additional not yet detected variations and investigate their properties, thereby enhancing the analysis for potentially missed variants.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>To achieve this we developed SVhound (<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/lfpaulin\/SVhound\">https:\/\/github.com\/lfpaulin\/SVhound<\/jats:ext-link>), which based on a population level SVs dataset can predict regions that harbor unseen SV alleles. We tested SVhound using subsets of the 1000 genomes project data and showed that its correlation (average correlation of 2800 tests r\u2009=\u20090.7136) is high to the full data set. Next, we utilized SVhound to investigate potentially missed or understudied regions across 1KGP and CCDG. Lastly we also apply SVhound on a small and novel SV call set for rhesus macaque (<jats:italic>Macaca mulatta<\/jats:italic>) and discuss the impact and choice of parameters for SVhound.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>SVhound is a unique method to identify potential regions that harbor hidden diversity in model and non model organisms and can also be potentially used to ensure high quality of SV call sets.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-022-05046-6","type":"journal-article","created":{"date-parts":[[2023,1,21]],"date-time":"2023-01-21T01:06:26Z","timestamp":1674263186000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SVhound: detection of regions that harbor yet undetected structural variation"],"prefix":"10.1186","volume":"24","author":[{"given":"Luis F.","family":"Paulin","sequence":"first","affiliation":[]},{"given":"Muthuswamy","family":"Raveendran","sequence":"additional","affiliation":[]},{"given":"R. Alan","family":"Harris","sequence":"additional","affiliation":[]},{"given":"Jeffrey","family":"Rogers","sequence":"additional","affiliation":[]},{"given":"Arndt","family":"von Haeseler","sequence":"additional","affiliation":[]},{"given":"Fritz J.","family":"Sedlazeck","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,1,20]]},"reference":[{"key":"5046_CR1","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1016\/j.cell.2019.02.032","volume":"177","author":"T Lappalainen","year":"2019","unstructured":"Lappalainen T, Scott AJ, Brandt M, Hall IM. Genomic analysis in the age of human genome sequencing. Cell. 2019;177:70\u201384. https:\/\/doi.org\/10.1016\/j.cell.2019.02.032.","journal-title":"Cell"},{"key":"5046_CR2","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/nrg.2016.49","volume":"17","author":"S Goodwin","year":"2016","unstructured":"Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333\u201351.","journal-title":"Nat Rev Genet"},{"key":"5046_CR3","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1038\/nature15394","volume":"526","author":"PH Sudmant","year":"2015","unstructured":"Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75\u201381.","journal-title":"Nature"},{"key":"5046_CR4","first-page":"733","volume":"6","author":"FJ Sedlazeck","year":"2020","unstructured":"Sedlazeck FJ, Yu B, Mansfield AJ, Chen H, Krasheninina O, Tin A, et al. Multiethnic catalog of structural variants and their translational impact for disease phenotypes across 19,652 genomes. Genomics bioRxiv. 2020;6:733.","journal-title":"Genomics bioRxiv"},{"key":"5046_CR5","doi-asserted-by":"publisher","first-page":"444","DOI":"10.1038\/s41586-020-2287-8","volume":"581","author":"RL Collins","year":"2020","unstructured":"Collins RL, Brand H, Karczewski KJ, Zhao X, Alf\u00f6ldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581:444\u201351.","journal-title":"Nature"},{"key":"5046_CR6","doi-asserted-by":"publisher","DOI":"10.1126\/science.abf7117","author":"P Ebert","year":"2021","unstructured":"Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021. https:\/\/doi.org\/10.1126\/science.abf7117.","journal-title":"Science"},{"key":"5046_CR7","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1016\/j.cell.2018.12.019","volume":"176","author":"PA Audano","year":"2019","unstructured":"Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, et al. Characterizing the major structural variant alleles of the human genome. Cell. 2019;176:663-75.e19.","journal-title":"Cell"},{"key":"5046_CR8","doi-asserted-by":"publisher","DOI":"10.1126\/science.abc6617","author":"WC Warren","year":"2020","unstructured":"Warren WC, Harris RA, Haukness M, Fiddes IT, Murali SC, Fernandes J, et al. Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility. Science. 2020. https:\/\/doi.org\/10.1126\/science.abc6617.","journal-title":"Science"},{"key":"5046_CR9","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1186\/s13059-019-1828-7","volume":"20","author":"M Mahmoud","year":"2019","unstructured":"Mahmoud M, Gobet N, Cruz-D\u00e1valos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol. 2019;20:246.","journal-title":"Genome Biol"},{"key":"5046_CR10","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1038\/s41576-019-0180-9","volume":"21","author":"SS Ho","year":"2020","unstructured":"Ho SS, Urban AE, Mills RE. Structural variation in the sequencing era. Nat Rev Genet. 2020;21:171\u201389.","journal-title":"Nat Rev Genet"},{"key":"5046_CR11","first-page":"508515","volume":"2018","author":"HJ Abel","year":"2018","unstructured":"Abel HJ, Larson DE, Chiang C, Das I, Kanchi KL, Layer RM, et al. Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes. Genomics bioRxiv. 2018;2018:508515.","journal-title":"Genomics bioRxiv"},{"key":"5046_CR12","first-page":"203","volume":"590","author":"D Taliun","year":"2019","unstructured":"Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Genomics bioRxiv. 2019;590:203.","journal-title":"Genomics bioRxiv."},{"key":"5046_CR13","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1002\/em.21943","volume":"56","author":"JR Lupski","year":"2015","unstructured":"Lupski JR. Structural variation mutagenesis of the human genome: Impact on disease and evolution. Environ Mol Mutagen. 2015;56:419\u201336.","journal-title":"Environ Mol Mutagen"},{"key":"5046_CR14","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","volume":"37","author":"AM Wenger","year":"2019","unstructured":"Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155\u201362.","journal-title":"Nat Biotechnol"},{"key":"5046_CR15","doi-asserted-by":"publisher","DOI":"10.1126\/science.1098918","author":"J Sebat","year":"2004","unstructured":"Sebat J. Large-scale copy number polymorphism in the human genome. Science. 2004. https:\/\/doi.org\/10.1126\/science.1098918.","journal-title":"Science"},{"key":"5046_CR16","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-020-0538-8","author":"JM Zook","year":"2020","unstructured":"Zook JM, Hansen NF, Olson ND, Chapman L, Mullikin JC, Xiao C, et al. A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol. 2020. https:\/\/doi.org\/10.1038\/s41587-020-0538-8.","journal-title":"Nat Biotechnol"},{"key":"5046_CR17","doi-asserted-by":"publisher","first-page":"329","DOI":"10.1038\/s41576-018-0003-4","volume":"19","author":"FJ Sedlazeck","year":"2018","unstructured":"Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet. 2018;19:329\u201346.","journal-title":"Nat Rev Genet"},{"key":"5046_CR18","doi-asserted-by":"publisher","first-page":"e1008742","DOI":"10.1371\/journal.pgen.1008742","volume":"16","author":"M Bras\u00f3-Vives","year":"2020","unstructured":"Bras\u00f3-Vives M, Povolotskaya IS, Hartas\u00e1nchez DA, Farr\u00e9 X, Fernandez-Callejo M, Raveendran M, et al. Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta). PLoS Genet. 2020;16:e1008742.","journal-title":"PLoS Genet"},{"key":"5046_CR19","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msaa303","author":"GWC Thomas","year":"2020","unstructured":"Thomas GWC, Wang RJ, Nguyen J, Harris RA, Raveendran M, Rogers J, et al. Origins and long-term patterns of copy-number variation in rhesus macaques. Mol Biol Evol. 2020. https:\/\/doi.org\/10.1093\/molbev\/msaa303.","journal-title":"Mol Biol Evol"},{"key":"5046_CR20","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1038\/nature15393","volume":"526","author":"A Auton","year":"2015","unstructured":"1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526:68\u201374.","journal-title":"Nature"},{"key":"5046_CR21","doi-asserted-by":"publisher","first-page":"D419","DOI":"10.1093\/nar\/gky1038","volume":"47","author":"H Mi","year":"2019","unstructured":"Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 2019;47:D419\u201326.","journal-title":"Nucleic Acids Res"},{"key":"5046_CR22","first-page":"Unit4.10","volume":"Chapter 4","author":"M Tarailo-Graovac","year":"2009","unstructured":"Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 2009;Chapter 4:Unit4.10.","journal-title":"Curr Protoc Bioinform"},{"key":"5046_CR23","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1093\/nar\/27.2.573","volume":"27","author":"G Benson","year":"1999","unstructured":"Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573\u201380.","journal-title":"Nucleic Acids Res"},{"key":"5046_CR24","doi-asserted-by":"publisher","first-page":"1003","DOI":"10.1126\/science.1072047","volume":"297","author":"JA Bailey","year":"2002","unstructured":"Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, et al. Recent segmental duplications in the human genome. Science. 2002;297:1003\u20137.","journal-title":"Science"},{"key":"5046_CR25","doi-asserted-by":"publisher","first-page":"160025","DOI":"10.1038\/sdata.2016.25","volume":"3","author":"JM Zook","year":"2016","unstructured":"Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016;3:160025.","journal-title":"Sci Data"},{"key":"5046_CR26","doi-asserted-by":"publisher","first-page":"1136","DOI":"10.1093\/gbe\/evz058","volume":"11","author":"Y-L Lin","year":"2019","unstructured":"Lin Y-L, Gokcumen O. Fine-scale characterization of genomic structural variation in the human genome reveals adaptive and biomedically relevant hotspots. Genome Biol Evol. 2019;11:1136\u201351.","journal-title":"Genome Biol Evol"},{"key":"5046_CR27","first-page":"1","volume":"5","author":"ER Havecker","year":"2004","unstructured":"Havecker ER, Gao X, Voytas DF. The diversity of LTR retrotransposons. Genome Biol BioMed Central. 2004;5:1\u20136.","journal-title":"Genome Biol BioMed Central"},{"key":"5046_CR28","doi-asserted-by":"publisher","first-page":"33","DOI":"10.3390\/genes5010033","volume":"5","author":"ME Aldrup-Macdonald","year":"2014","unstructured":"Aldrup-Macdonald ME, Sullivan BA. The past, present, and future of human centromere genomics. Genes. 2014;5:33\u201350.","journal-title":"Genes"},{"key":"5046_CR29","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s12862-020-1595-9","volume":"20","author":"RA Harris","year":"2020","unstructured":"Harris RA, Raveendran M, Worley KC, Rogers J. Unusual sequence characteristics of human chromosome 19 are conserved across 11 nonhuman primates. BMC Evol Biol. 2020;20:33.","journal-title":"BMC Evol Biol"},{"key":"5046_CR30","doi-asserted-by":"publisher","first-page":"433","DOI":"10.1038\/s41587-020-0407-5","volume":"38","author":"T Gilpatrick","year":"2020","unstructured":"Gilpatrick T, Lee I, Graham JE, Raimondeau E, Bowen R, Heron A, et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol. 2020;38:433\u20138.","journal-title":"Nat Biotechnol"},{"key":"5046_CR31","doi-asserted-by":"publisher","first-page":"47","DOI":"10.3389\/fnins.2020.00047","volume":"14","author":"F Theunissen","year":"2020","unstructured":"Theunissen F, Flynn LL, Anderton RS, Mastaglia F, Pytte J, Jiang L, et al. Structural variants may be a source of missing heritability in sALS. Front Neurosci. 2020;14:47.","journal-title":"Front Neurosci"},{"key":"5046_CR32","doi-asserted-by":"publisher","DOI":"10.1101\/2021.02.06.430068v1.abstract","author":"M Byrska-Bishop","year":"2021","unstructured":"Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, et al. High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cold Spring Harbor Lab. 2021. https:\/\/doi.org\/10.1101\/2021.02.06.430068v1.abstract.","journal-title":"Cold Spring Harbor Lab"},{"key":"5046_CR33","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-020-00746-x","author":"A Payne","year":"2021","unstructured":"Payne A, Holmes N, Clarke T, Munro R, Debebe BJ, Loose M. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat Biotechnol. 2021. https:\/\/doi.org\/10.1038\/s41587-020-00746-x.","journal-title":"Nat Biotechnol"},{"key":"5046_CR34","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/0040-5809(72)90035-4","volume":"3","author":"WJ Ewens","year":"1972","unstructured":"Ewens WJ. The sampling theory of selectively neutral alleles. Theor Popul Biol. 1972;3:87\u2013112.","journal-title":"Theor Popul Biol"},{"key":"5046_CR35","doi-asserted-by":"publisher","first-page":"1760","DOI":"10.1101\/gr.135350.111","volume":"22","author":"J Harrow","year":"2012","unstructured":"Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760\u201374.","journal-title":"Genome Res"},{"key":"5046_CR36","first-page":"Unit1.4","volume":"Chapter 1","author":"D Karolchik","year":"2009","unstructured":"Karolchik D, Hinrichs AS, Kent WJ. The UCSC Genome Browser. Curr Protoc Bioinform. 2009;Chapter 1:Unit1.4.","journal-title":"Curr Protoc Bioinform"},{"key":"5046_CR37","doi-asserted-by":"publisher","first-page":"D1005","DOI":"10.1093\/nar\/gky1120","volume":"47","author":"A Buniello","year":"2019","unstructured":"Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005\u201312.","journal-title":"Nucleic Acids Res"},{"key":"5046_CR38","doi-asserted-by":"crossref","first-page":"D682","DOI":"10.1093\/nar\/gkz1138","volume":"48","author":"AD Yates","year":"2020","unstructured":"Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, et al. Ensembl 2020. Nucleic Acids Res. 2020;48:D682\u20138.","journal-title":"Nucleic Acids Res"},{"key":"5046_CR39","doi-asserted-by":"publisher","first-page":"2938","DOI":"10.1093\/bioinformatics\/btx364","volume":"33","author":"JR Conway","year":"2017","unstructured":"Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33:2938\u201340.","journal-title":"Bioinformatics"},{"key":"5046_CR40","doi-asserted-by":"publisher","first-page":"1220","DOI":"10.1093\/bioinformatics\/btv710","volume":"32","author":"X Chen","year":"2016","unstructured":"Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, K\u00e4llberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32:1220\u20132.","journal-title":"Bioinformatics"},{"key":"5046_CR41","doi-asserted-by":"publisher","DOI":"10.1101\/047266","author":"DC Jeffares","year":"2017","unstructured":"Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017. https:\/\/doi.org\/10.1101\/047266.","journal-title":"Nat Commun"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-05046-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-022-05046-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-05046-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,21]],"date-time":"2023-01-21T01:06:49Z","timestamp":1674263209000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-022-05046-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,20]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5046"],"URL":"https:\/\/doi.org\/10.1186\/s12859-022-05046-6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2023,1,20]]},"assertion":[{"value":"25 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable, human data is publicly available.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"FJS has received sponsored travel by Phase genomics, Oxford Nanopore and PacBio.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"23"}}