{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T05:51:15Z","timestamp":1775281875337,"version":"3.50.1"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2017,1,21]],"date-time":"2017-01-21T00:00:00Z","timestamp":1484956800000},"content-version":"vor","delay-in-days":17,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["TL1TR000371"],"award-info":[{"award-number":["TL1TR000371"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/jefftc\/changlab<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw817","type":"journal-article","created":{"date-parts":[[2016,12,22]],"date-time":"2016-12-22T04:06:32Z","timestamp":1482379592000},"page":"1210-1215","source":"Crossref","is-referenced-by-count":34,"title":["Planning bioinformatics workflows using an expert system"],"prefix":"10.1093","volume":"33","author":[{"given":"Xiaoling","family":"Chen","sequence":"first","affiliation":[{"name":"School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA"}]},{"given":"Jeffrey T","family":"Chang","sequence":"additional","affiliation":[{"name":"School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA"},{"name":"Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, Houston, TX, USA"},{"name":"Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,1,4]]},"reference":[{"key":"2023020205014036000_btw817-B1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"J. Mach. Learn"},{"key":"2023020205014036000_btw817-B2","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1038\/520151a","article-title":"Core services: reward bioinformaticians","volume":"520","author":"Chang","year":"2015","journal-title":"Nature"},{"key":"2023020205014036000_btw817-B3","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1186\/1471-2105-12-443","article-title":"SIGNATURE: a workbench for gene expression signature analysis","volume":"12","author":"Chang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020205014036000_btw817-B4","author":"Chang"},{"key":"2023020205014036000_btw817-B5","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1145\/234286.1057820","volume-title":"History of Programming Languages\u2013II","author":"Colmerauer","year":"1996"},{"key":"2023020205014036000_btw817-B6","author":"Curcin","year":"2008"},{"key":"2023020205014036000_btw817-B7","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023020205014036000_btw817-B8","first-page":"111","article-title":"Data Wrangling: Making data useful again","volume":"48","author":"Endel","year":"2015","journal-title":"8th Vienna Int. Conf. Math. Modell"},{"key":"2023020205014036000_btw817-B9","first-page":"255","article-title":"Make\u2014a program for maintaining computer program","volume":"9","author":"Feldman","year":"1979","journal-title":"Software"},{"key":"2023020205014036000_btw817-B10","volume-title":"Jess in Action","author":"Friedman-Hill","year":"2003"},{"key":"2023020205014036000_btw817-B11","doi-asserted-by":"crossref","first-page":"6994","DOI":"10.1073\/pnas.0912708107","article-title":"A pathway-based classification of human breast cancer","volume":"107","author":"Gatza","year":"2010","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020205014036000_btw817-B12","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1080\/0952813X.2010.490962","article-title":"A semantic framework for automatic generation of computational workflows using distributed data and component catalogs","volume":"23","author":"Gil","year":"2011","journal-title":"J. Exp. Theor. Artif. Intell"},{"key":"2023020205014036000_btw817-B13","author":"Gil","year":"2013"},{"key":"2023020205014036000_btw817-B14","doi-asserted-by":"crossref","first-page":"R86","DOI":"10.1186\/gb-2010-11-8-r86","article-title":"Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences","volume":"11","author":"Goecks","year":"2010","journal-title":"Genome Biol"},{"key":"2023020205014036000_btw817-B15","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023020205014036000_btw817-B16","doi-asserted-by":"crossref","first-page":"2778","DOI":"10.1093\/bioinformatics\/btq524","article-title":"Ruffus: a lightweight Python library for computational pipelines","volume":"26","author":"Goodstadt","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020205014036000_btw817-B17","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1101\/gr.1363103","article-title":"Biopipe: a flexible framework for protocol-based bioinformatics analysis","volume":"13","author":"Hoon","year":"2003","journal-title":"Genome Res"},{"key":"2023020205014036000_btw817-B18","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1038\/ng.295","article-title":"Repeatability of published microarray gene expression analyses","volume":"41","author":"Ioannidis","year":"2009","journal-title":"Nat. Genet"},{"key":"2023020205014036000_btw817-B19","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2013a scalable bioinformatics workflow engine","volume":"28","author":"Koster","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020205014036000_btw817-B20","doi-asserted-by":"crossref","first-page":"e0151664","DOI":"10.1371\/journal.pone.0151664","article-title":"Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data","volume":"11","author":"Kroigard","year":"2016","journal-title":"PLoS One"},{"key":"2023020205014036000_btw817-B21","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat. Rev. Genet"},{"key":"2023020205014036000_btw817-B22","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1038\/ng1760","article-title":"The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells","volume":"38","author":"Loh","year":"2006","journal-title":"Nat. Genet"},{"key":"2023020205014036000_btw817-B23","author":"Lohr","year":"2014"},{"key":"2023020205014036000_btw817-B24","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023020205014036000_btw817-B25","doi-asserted-by":"crossref","first-page":"1565","DOI":"10.1038\/nbt1206-1565","article-title":"What is a support vector machine?","volume":"24","author":"Noble","year":"2006","journal-title":"Nat. Biotechnol"},{"key":"2023020205014036000_btw817-B26","doi-asserted-by":"crossref","first-page":"3045","DOI":"10.1093\/bioinformatics\/bth361","article-title":"Taverna: a tool for the composition and enactment of bioinformatics workflows","volume":"20","author":"Oinn","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020205014036000_btw817-B27","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1038\/ng0506-500","article-title":"GenePattern 2.0","volume":"38","author":"Reich","year":"2006","journal-title":"Nat. Genet"},{"key":"2023020205014036000_btw817-B28","author":"Russell","year":"2009"},{"key":"2023020205014036000_btw817-B29","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1038\/ng1545","article-title":"Epistasis analysis with global transcriptional phenotypes","volume":"37","author":"Van Driessche","year":"2005","journal-title":"Nat. Genet"},{"key":"2023020205014036000_btw817-B30","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/gm495","article-title":"Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers","volume":"5","author":"Wang","year":"2013","journal-title":"Genome Med"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/8\/1210\/49038969\/bioinformatics_33_8_1210.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/8\/1210\/49038969\/bioinformatics_33_8_1210.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T05:05:17Z","timestamp":1675314317000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/8\/1210\/2801462"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,1,4]]},"references-count":30,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2017,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw817","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,4,15]]},"published":{"date-parts":[[2017,1,4]]}}}