{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T03:33:56Z","timestamp":1768016036432,"version":"3.49.0"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: With high-throughput DNA sequencing costs dropping &amp;lt;$1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. To address the large-data challenges, we advocate a clean separation between the evidence collection and the inference in variant calling. We define and implement a Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants.<\/jats:p>\n               <jats:p>Results: We provide a number of cases to showcase the use of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5\u201310 lines of high-level code and search large datasets (100 GB) in minutes. We also demonstrate its complementarity with other variant calling tools. Popular variant calling tools can achieve one order of magnitude speed-up by using GQL to retrieve evidence. Finally, we show how GQL can be used to query and compare multiple datasets. By separating the evidence and inference for variant calling, it frees all variant detection tools from the data intensive evidence collection and focuses on statistical inference.<\/jats:p>\n               <jats:p>Availability: GQL can be downloaded from http:\/\/cseweb.ucsd.edu\/~ckozanit\/gql.<\/jats:p>\n               <jats:p>Contact: \u00a0ckozanit@ucsd.edu or vbafna@cs.ucsd.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt250","type":"journal-article","created":{"date-parts":[[2013,6,11]],"date-time":"2013-06-11T04:32:49Z","timestamp":1370925169000},"page":"1-8","source":"Crossref","is-referenced-by-count":37,"title":["Using Genome Query Language to uncover genetic variation"],"prefix":"10.1093","volume":"30","author":[{"given":"Christos","family":"Kozanitis","sequence":"first","affiliation":[{"name":"1 Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, San Diego, CA 92123 and 2Microsoft Research, 1065 La Avenida, Mountain View, CA 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Heiberg","sequence":"additional","affiliation":[{"name":"1 Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, San Diego, CA 92123 and 2Microsoft Research, 1065 La Avenida, Mountain View, CA 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Varghese","sequence":"additional","affiliation":[{"name":"1 Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, San Diego, CA 92123 and 2Microsoft Research, 1065 La Avenida, Mountain View, CA 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vineet","family":"Bafna","sequence":"additional","affiliation":[{"name":"1 Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, San Diego, CA 92123 and 2Microsoft Research, 1065 La Avenida, Mountain View, CA 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2013,6,10]]},"reference":[{"key":"2023012710380780900_btt250-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium, et al.","year":"2010","journal-title":"Nature"},{"key":"2023012710380780900_btt250-B2","unstructured":"1000genomescloud\n          Using 1000 genomes data in the amazon web service cloud\n          2012\n          \n            http:\/\/www.1000genomes.org\/using-1000-genomes-data-amazon-web-service-cloud (4 June 2013, date last accessed)"},{"key":"2023012710380780900_btt250-B3","article-title":"Lossy compression of quality values via rate distortion theory","author":"Asnani","year":"2012","journal-title":"ArXiv e-prints"},{"key":"2023012710380780900_btt250-B4","doi-asserted-by":"crossref","first-page":"1691","DOI":"10.1093\/bioinformatics\/btr174","article-title":"BamTools: a C++ API and toolkit for analyzing and managing BAM files","volume":"27","author":"Barnett","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B5","doi-asserted-by":"crossref","first-page":"2807","DOI":"10.1093\/bioinformatics\/btm390","article-title":"Optimization of primer design for the detection of variable genomic lesions in cancer","volume":"23","author":"Bashir","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B6","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1038\/nature07517","article-title":"Accurate whole human genome sequencing using reversible terminator chemistry","volume":"456","author":"Bentley","year":"2008","journal-title":"Nature"},{"key":"2023012710380780900_btt250-B7","unstructured":"Bison\n          Bison - GNU parser generator\n          1988\n          \n            http:\/\/www.gnu.org\/software\/bison\/ (4 June 2013, date last accessed)"},{"key":"2023012710380780900_btt250-B8","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1038\/nmeth.1363","article-title":"BreakDancer: an algorithm for high-resolution mapping of genomic structural variation","volume":"6","author":"Chen","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012710380780900_btt250-B9","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/nnano.2009.12","article-title":"Continuous base identification for single-molecule nanopore DNA sequencing","volume":"4","author":"Clarke","year":"2009","journal-title":"Nat. Nanotechnol."},{"key":"2023012710380780900_btt250-B10","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1145\/362384.362685","article-title":"A relational model of data for large shared data banks","volume":"13","author":"Codd","year":"1970","journal-title":"Commun. ACM"},{"key":"2023012710380780900_btt250-B11","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/ng1697","article-title":"A high-resolution survey of deletion polymorphism in the human genome","volume":"38","author":"Conrad","year":"2006","journal-title":"Nat. Genet."},{"key":"2023012710380780900_btt250-B12","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1093\/bioinformatics\/bts173","article-title":"Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform","volume":"28","author":"Cox","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B13","doi-asserted-by":"crossref","first-page":"3423","DOI":"10.1093\/bioinformatics\/btr539","article-title":"Pybedtools: a flexible Python library for manipulating genomic datasets and annotations","volume":"27","author":"Dale","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B14","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet."},{"key":"2023012710380780900_btt250-B15","unstructured":"Flex\n          The Fast Lexical Analyzer\n          1990\n          \n            http:\/\/flex.sourceforge.net (4 June 2013, date last accessed)"},{"key":"2023012710380780900_btt250-B16","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/0022-2836(87)90689-9","article-title":"CpG islands in vertebrate genomes","volume":"196","author":"Gardiner-Garden","year":"1987","journal-title":"J. Mol. Biol."},{"key":"2023012710380780900_btt250-B17","unstructured":"gatk-pairend\n          Where does gatk get the mate pair info from bam files?\n          2012\n          \n            http:\/\/gatkforums.broadinstitute.org\/discussion\/1529\/where-does-gatk-get-the-mate-pair-info-from-bam-file (4 June 2013, date last accessed)"},{"key":"2023012710380780900_btt250-B18","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1086\/319506","article-title":"Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosome rearrangements","volume":"68","author":"Giglio","year":"2001","journal-title":"Am. J. Hum. Genet."},{"key":"2023012710380780900_btt250-B19","doi-asserted-by":"crossref","first-page":"R86","DOI":"10.1186\/gb-2010-11-8-r86","article-title":"Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences","volume":"11","author":"Goecks","year":"2010","journal-title":"Genome Biol."},{"key":"2023012710380780900_btt250-B20","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1101\/gr.088633.108","article-title":"Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes","volume":"19","author":"Hormozdiari","year":"2009","journal-title":"Genome Res."},{"key":"2023012710380780900_btt250-B21","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1101\/gr.114819.110","article-title":"Efficient storage of high throughput DNA sequencing data using reference-based compression","volume":"21","author":"Hsi-Yang Fritz","year":"2011","journal-title":"Genome Res."},{"key":"2023012710380780900_btt250-B22","doi-asserted-by":"crossref","first-page":"e171","DOI":"10.1093\/nar\/gks754","article-title":"Compression of next-generation sequencing reads aided by highly efficient de novo assembly","volume":"40","author":"Jones","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012710380780900_btt250-B23","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature06862","article-title":"Mapping and sequencing of structural variation from eight human genomes","volume":"453","author":"Kidd","year":"2008","journal-title":"Nature"},{"key":"2023012710380780900_btt250-B24","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"Koboldt","year":"2012","journal-title":"Nature"},{"key":"2023012710380780900_btt250-B25","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1089\/cmb.2010.0253","article-title":"Compressing genomic sequence fragments using SlimGene","volume":"18","author":"Kozanitis","year":"2011","journal-title":"J. Comput. Biol."},{"key":"2023012710380780900_btt250-B26","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with burrows-wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B27","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012710380780900_btt250-B28","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B29","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1007\/978-1-4419-5913-3_77","article-title":"Standardizing the next generation of bioinformatics software development with BioHDF (HDF5)","volume":"680","author":"Mason","year":"2010","journal-title":"Adv. Exp. Med. Biol."},{"key":"2023012710380780900_btt250-B30","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res."},{"key":"2023012710380780900_btt250-B31","doi-asserted-by":"crossref","first-page":"8006","DOI":"10.1073\/pnas.0602318103","article-title":"Hotspots for copy number variation in chimpanzees and humans","volume":"103","author":"Perry","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012710380780900_btt250-B32","doi-asserted-by":"crossref","first-page":"e27","DOI":"10.1093\/nar\/gks939","article-title":"NGC: lossless and lossy compression of aligned high-throughput sequencing data","volume":"41","author":"Popitsch","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023012710380780900_btt250-B33","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: a flexible suite of utilities for comparing genomic features","volume":"26","author":"Quinlan","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B34","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1146\/annurev.genom.7.080505.115618","article-title":"Structural variation of the human genome","volume":"7","author":"Sharp","year":"2006","journal-title":"Annu. Rev. Genomics Hum. Genet."},{"key":"2023012710380780900_btt250-B35","doi-asserted-by":"crossref","first-page":"i222","DOI":"10.1093\/bioinformatics\/btp208","article-title":"A geometric approach for classification and comparison of structural variants","volume":"25","author":"Sindi","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B36","doi-asserted-by":"crossref","first-page":"e25598","DOI":"10.1371\/journal.pone.0025598","article-title":"A 32 kb critical region excluding Y402H in CFH mediates risk for age-related macular degeneration","volume":"6","author":"Sivakumaran","year":"2011","journal-title":"PLoS One"},{"key":"2023012710380780900_btt250-B37","doi-asserted-by":"crossref","first-page":"2265","DOI":"10.1093\/molbev\/msi222","article-title":"A novel gene family NBPF: intricate structure generated by gene duplications during primate evolution","volume":"22","author":"Vandepoele","year":"2005","journal-title":"Mol. Biol. Evol."},{"key":"2023012710380780900_btt250-B38","unstructured":"VCF Tools\n          Variant call format\n          2011\n          \n            http:\/\/vcftools.sourceforge.net\/specs.html (4 June 2013, date last accessed)"},{"key":"2023012710380780900_btt250-B39","doi-asserted-by":"crossref","first-page":"3662","DOI":"10.1182\/blood.V95.12.3662.012k12_3662_3668","article-title":"RHD gene deletion occurred in the Rhesus box","volume":"95","author":"Wagner","year":"2000","journal-title":"Blood"},{"key":"2023012710380780900_btt250-B40","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1093\/bioinformatics\/btr689","article-title":"Transformations for the compression of FASTQ quality scores of next-generation sequencing data","volume":"28","author":"Wan","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012710380780900_btt250-B41","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1093\/nar\/gkm1000","article-title":"Database resources of the National Center for Biotechnology Information","volume":"36","author":"Wheeler","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012710380780900_btt250-B42","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1186\/1748-7188-6-23","article-title":"ReCoil - an algorithm for compression of extremely large datasets of DNA data","volume":"6","author":"Yanovsky","year":"2011","journal-title":"Algorithms Mol. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/1\/1\/48913575\/bioinformatics_30_1_1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/1\/1\/48913575\/bioinformatics_30_1_1.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T10:41:37Z","timestamp":1674816097000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/1\/1\/234445"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,6,10]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt250","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,1,1]]},"published":{"date-parts":[[2013,6,10]]}}}