{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:14Z","timestamp":1740185114038,"version":"3.37.3"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2017,11,2]],"date-time":"2017-11-02T00:00:00Z","timestamp":1509580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["CA088164 and CA201358"],"award-info":[{"award-number":["CA088164 and CA201358"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100008536","name":"Amazon Web Services","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008536","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Orchid and our annotated tumor mutation database are freely available at https:\/\/github.com\/wittelab\/orchid. Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx709","type":"journal-article","created":{"date-parts":[[2017,10,31]],"date-time":"2017-10-31T20:13:25Z","timestamp":1509480805000},"page":"936-942","source":"Crossref","is-referenced-by-count":14,"title":["Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations"],"prefix":"10.1093","volume":"34","author":[{"given":"Clinton L","family":"Cario","sequence":"first","affiliation":[{"name":"Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA"}]},{"given":"John S","family":"Witte","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,11,2]]},"reference":[{"key":"2023012712475039800_btx709-B1","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1038\/nmeth0410-248","article-title":"A method and server for predicting damaging missense mutations","volume":"7","author":"Adzhubei","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012712475039800_btx709-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.7554\/eLife.05005","article-title":"Predicting effective microRNA target sites in mammalian mRNAs","volume":"4","author":"Agarwal","year":"2015","journal-title":"eLife"},{"key":"2023012712475039800_btx709-B3","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/nature12477","article-title":"Signatures of mutational processes in human cancer","volume":"500","author":"Alexandrov","year":"2013","journal-title":"Nature"},{"key":"2023012712475039800_btx709-B4","doi-asserted-by":"crossref","first-page":"6660","DOI":"10.1158\/0008-5472.CAN-09-1133","article-title":"Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations","volume":"69","author":"Carter","year":"2009","journal-title":"Cancer Res"},{"key":"2023012712475039800_btx709-B5","doi-asserted-by":"crossref","first-page":"e46688","DOI":"10.1371\/journal.pone.0046688","article-title":"Predicting the functional effect of amino acid substitutions and indels","volume":"7","author":"Choi","year":"2012","journal-title":"PLoS One"},{"key":"2023012712475039800_btx709-B6","doi-asserted-by":"crossref","first-page":"80","DOI":"10.4161\/fly.19695","article-title":"A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff","volume":"6","author":"Cingolani","year":"2012","journal-title":"Fly"},{"key":"2023012712475039800_btx709-B7","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1101\/gr.134635.111","article-title":"MuSiC: identifying mutational significance in cancer genomes","volume":"22","author":"Dees","year":"2012","journal-title":"Genome Res"},{"key":"2023012712475039800_btx709-B8","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nmeth.1906","article-title":"ChromHMM: automating chromatin-state discovery and characterization","volume":"9","author":"Ernst","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012712475039800_btx709-B9","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1186\/s13059-014-0480-5","article-title":"FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer","volume":"15","author":"Fu","year":"2014","journal-title":"Genome Biol"},{"key":"2023012712475039800_btx709-B10","doi-asserted-by":"crossref","first-page":"D109","DOI":"10.1093\/nar\/gkh023","article-title":"The microRNA registry","volume":"32","author":"Griffiths-Jones","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023012712475039800_btx709-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gku1280","article-title":"Integrative analysis of public ChIP-Seq experiments reveals a complex multi-cell regulatory landscape","volume":"43","author":"Griffon","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012712475039800_btx709-B12","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1038\/nmeth.1937","article-title":"Unsupervised pattern discovery in human chromatin structure through genomic segmentation","volume":"9","author":"Hoffman","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012712475039800_btx709-B13","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: Kyoto Encyclopedia of Genes and Genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023012712475039800_btx709-B14","doi-asserted-by":"crossref","first-page":"D164","DOI":"10.1093\/nar\/gkv1002","article-title":"DbSUPER: a database of super-enhancers in mouse and human genome","volume":"44","author":"Khan","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023012712475039800_btx709-B15","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat. Genet"},{"key":"2023012712475039800_btx709-B16","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1038\/nprot.2009.86","article-title":"Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm","volume":"4","author":"Kumar","year":"2009","journal-title":"Nat. Protoc"},{"key":"2023012712475039800_btx709-B17","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1038\/ng.3658","article-title":"Unsupervised detection of cancer driver mutations with parsimony-guided learning","volume":"48","author":"Kumar","year":"2016","journal-title":"Nat. Genet"},{"key":"2023012712475039800_btx709-B18","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nature14248","article-title":"Integrative analysis of 111 reference human epigenomes","volume":"518","author":"Kundaje","year":"2015","journal-title":"Nature"},{"key":"2023012712475039800_btx709-B19","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1038\/nature12213","article-title":"Mutational heterogeneity in cancer and the search for new cancer-associated genes","volume":"499","author":"Lawrence","year":"2013","journal-title":"Nature"},{"key":"2023012712475039800_btx709-B20","doi-asserted-by":"crossref","first-page":"D896","DOI":"10.1093\/nar\/gkw1133","article-title":"The new NHGRI-EBI catalog of published Genome-Wide Association Studies (GWAS Catalog)","volume":"45","author":"MacArthur","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023012712475039800_btx709-B21","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1186\/s12920-015-0130-0","article-title":"TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen","volume":"8","author":"Marquard","year":"2015","journal-title":"BMC Med. Genomics"},{"key":"2023012712475039800_btx709-B22","doi-asserted-by":"crossref","first-page":"1428","DOI":"10.1016\/S0140-6736(11)61178-1","article-title":"Cancer of unknown primary site","volume":"379","author":"Pavlidis","year":"2012","journal-title":"Lancet"},{"key":"2023012712475039800_btx709-B23","first-page":"2825","article-title":"Scikit-Learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2012","journal-title":"J. Mach. Learn. Res"},{"key":"2023012712475039800_btx709-B24","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1101\/gr.097857.109","article-title":"Detection of nonneutral substitution rates on mammalian phylogenies","volume":"20","author":"Pollard","year":"2010","journal-title":"Genome Res"},{"key":"2023012712475039800_btx709-B25","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1093\/bioinformatics\/btu703","article-title":"DANN: a deep learning approach for annotating the pathogenicity of genetic variants","volume":"31","author":"Quang","year":"2015","journal-title":"Bioinformatics"},{"key":"2023012712475039800_btx709-B26","doi-asserted-by":"crossref","first-page":"e1002968","DOI":"10.1371\/journal.pcbi.1002968","article-title":"RFECS: a random-forest based algorithm for enhancer identification from chromatin state","volume":"9","author":"Rajagopal","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023012712475039800_btx709-B27","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/gm524","article-title":"Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine","volume":"6","author":"Raphael","year":"2014","journal-title":"Genome Med"},{"key":"2023012712475039800_btx709-B28","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/j.cell.2015.11.050","article-title":"Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin","volume":"164","author":"Snyder","year":"2016","journal-title":"Cell"},{"key":"2023012712475039800_btx709-B29","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712475039800_btx709-B30","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/nature11232","article-title":"The accessible chromatin landscape of the human genome","volume":"489","author":"Thurman","year":"2012","journal-title":"Nature"},{"key":"2023012712475039800_btx709-B31","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/nbt.3820","article-title":"Nextflow enables reproducible computational workflows","volume":"35","author":"Di Tommaso","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023012712475039800_btx709-B32","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1101\/gr.120477.111","article-title":"De novo discovery of mutated driver pathways in cancer","volume":"22","author":"Vandin","year":"2012","journal-title":"Genome Res"},{"key":"2023012712475039800_btx709-B33","doi-asserted-by":"crossref","first-page":"1546","DOI":"10.1126\/science.1235122","article-title":"Cancer genome landscapes","volume":"339","author":"Vogelstein","year":"2013","journal-title":"Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/6\/936\/48914768\/bioinformatics_34_6_936.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/6\/936\/48914768\/bioinformatics_34_6_936.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:40:42Z","timestamp":1674826842000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/6\/936\/4587584"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,11,2]]},"references-count":33,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2018,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx709","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,3,15]]},"published":{"date-parts":[[2017,11,2]]}}}