{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T04:25:14Z","timestamp":1770524714182,"version":"3.49.0"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>This article addresses the problem of interoperation of heterogeneous bioinformatics databases.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>BioWarehouse embodies significant progress on the database integration problem for bioinformatics.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-170","type":"journal-article","created":{"date-parts":[[2006,4,6]],"date-time":"2006-04-06T12:49:20Z","timestamp":1144327760000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":96,"title":["BioWarehouse: a bioinformatics database warehouse toolkit"],"prefix":"10.1186","volume":"7","author":[{"given":"Thomas J","family":"Lee","sequence":"first","affiliation":[]},{"given":"Yannick","family":"Pouliot","sequence":"additional","affiliation":[]},{"given":"Valerie","family":"Wagner","sequence":"additional","affiliation":[]},{"given":"Priyanka","family":"Gupta","sequence":"additional","affiliation":[]},{"given":"David WJ","family":"Stringer-Calvert","sequence":"additional","affiliation":[]},{"given":"Jessica D","family":"Tenenbaum","sequence":"additional","affiliation":[]},{"given":"Peter D","family":"Karp","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,3,23]]},"reference":[{"key":"909_CR1","unstructured":"Department of Energy. DOE white paper on bio-informatics1993. [http:\/\/www.gdb.org\/Dan\/DOE\/whitepaper\/contents.html]"},{"key":"909_CR2","volume-title":"Proc 1994 meeting on the interconnection of molecular biology databases","author":"P Karp","year":"1994","unstructured":"Karp P: Proc 1994 meeting on the interconnection of molecular biology databases.1994. [http:\/\/www.ai.sri.com\/pkarp\/mimbd\/94\/mimbd-94.html]"},{"key":"909_CR3","volume-title":"Proc 1995 meeting on the interconnection of molecular biology databases","author":"P Karp","year":"1994","unstructured":"Karp P: Proc 1995 meeting on the interconnection of molecular biology databases.1994. [http:\/\/www.ai.sri.com\/pkarp\/mimbd\/95\/abstracts.html]"},{"issue":"4","key":"909_CR4","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1089\/cmb.1995.2.537","volume":"2","author":"V Markowitz","year":"1995","unstructured":"Markowitz V: Heterogeneous molecular biology databases. Journal of Computational Biology 1995, 2(4):537\u2013538.","journal-title":"Journal of Computational Biology"},{"issue":"4","key":"909_CR5","doi-asserted-by":"publisher","first-page":"557","DOI":"10.1089\/cmb.1995.2.557","volume":"2","author":"SB Davidson","year":"1995","unstructured":"Davidson SB, Overton C, Buneman P: Challenges in integrating biological data sources. Journal of Computational Biology 1995, 2(4):557\u2013572.","journal-title":"Journal of Computational Biology"},{"issue":"4","key":"909_CR6","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1089\/cmb.1995.2.573","volume":"2","author":"P Karp","year":"1995","unstructured":"Karp P: A strategy for database interoperation. Journal of Computational Biology 1995, 2(4):573\u2013586.","journal-title":"Journal of Computational Biology"},{"issue":"3","key":"909_CR7","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1089\/cmb.1994.1.173","volume":"1","author":"R Robbins","year":"1994","unstructured":"Robbins R: Report of the invitational DOE workshop on genome informatics, 26\u201327 April 1993; Genome informatics I: Community databases. Journal of Computational Biology 1994, 1(3):173\u2013190.","journal-title":"Journal of Computational Biology"},{"key":"909_CR8","unstructured":"PublicHouse overview[http:\/\/bioinformatics.ai.sri.com\/biowarehouse\/PublicHouseOverview.html]"},{"key":"909_CR9","doi-asserted-by":"publisher","first-page":"D3","DOI":"10.1093\/nar\/gkh143","volume":"32","author":"MY Galperin","year":"2004","unstructured":"Galperin MY: The molecular biology database collection: 2004 update. Nuc Acids Res 2004, 32: D3\u201322. 10.1093\/nar\/gkh143","journal-title":"Nuc Acids Res"},{"issue":"3","key":"909_CR10","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1145\/96602.96604","volume":"22","author":"A Sheth","year":"1990","unstructured":"Sheth A, Larson J: Federated database systems for managing distributed heterogeneous and autonomous databases. ACM Computing Surveys 1990, 22(3):183\u2013236. 10.1145\/96602.96604","journal-title":"ACM Computing Surveys"},{"issue":"2","key":"909_CR11","doi-asserted-by":"publisher","first-page":"512","DOI":"10.1147\/sj.402.0512","volume":"40","author":"SB Davidson","year":"2001","unstructured":"Davidson SB, Tannen V, Crabtree J, Overton GC, Brunk BP, Stoeckert CJ Jr, Schug J: K2\/Kleisli and GUS: Experiments in integrated access to genomic data sources. IBM Systems Journal 2001, 40(2):512\u2013531.","journal-title":"IBM Systems Journal"},{"issue":"9","key":"909_CR12","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1016\/S0167-7799(99)01342-6","volume":"17","author":"SY Chung","year":"1999","unstructured":"Chung SY, Wong L: Kleisli: A new tool for data integration in biology. Trends Biotechnol 1999, 17(9):351\u2013355. 10.1016\/S0167-7799(99)01342-6","journal-title":"Trends Biotechnol"},{"key":"909_CR13","first-page":"43","volume-title":"Proc Sixth International Conference on Intelligent Systems for Molecular Biology","author":"IM Chen","year":"1998","unstructured":"Chen IM, Kosky AS, Markowitz VM, Szeto E, Topaloglou T: Advanced query mechanisms for biological databases. In Proc Sixth International Conference on Intelligent Systems for Molecular Biology. Edited by: Glasgow J, Littlejohn T, Major F, Lathrop R, Sankoff D, Sensen C. Menlo Park, CA, AAAI Press; 1998:43\u201351."},{"issue":"2","key":"909_CR14","doi-asserted-by":"publisher","first-page":"184","DOI":"10.1093\/bioinformatics\/16.2.184","volume":"16","author":"R Stevens","year":"2000","unstructured":"Stevens R, Baker P, Bechhofer S, Ng G, Jacoby A, Paton NW, Goble CA, Brass A: TAMBIS: Transparent access to multiple bioinformatics information sources. Bioinformatics 2000, 16(2):184\u20135. 10.1093\/bioinformatics\/16.2.184","journal-title":"Bioinformatics"},{"key":"909_CR15","volume-title":"Proc 30th VLDB Conference","author":"R Shaker","year":"2004","unstructured":"Shaker R, Mork P, Brockenbrough JS, Donelson L, Tarczy-Hornoch P: The BioMediator system as a tool for integrating biologic databases on the Web. In Proc 30th VLDB Conference. Morgan Kaufmann; 2004."},{"key":"909_CR16","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1147\/sj.402.0489","volume":"40","author":"LM Haas","year":"2001","unstructured":"Haas LM, Schwarz PM, Kodali P, Kotlar E, Rice JE, Swope WC: DiscoveryLink: A system for integrated access to life sciences data sources. IBM Systems Journal 2001, 40: 489\u2013511.","journal-title":"IBM Systems Journal"},{"issue":"1\u20132","key":"909_CR17","first-page":"92","volume":"13","author":"Martin David","year":"1999","unstructured":"David Martin, Adam Cheyer, Douglas Moran: The Open Agent Architecture: A Framework for Building Distributed Software Systems. Applied Artificial Intelligence 1999, 13(1\u20132):92\u2013128.","journal-title":"Applied Artificial Intelligence"},{"issue":"3","key":"909_CR18","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1093\/bioinformatics\/16.3.269","volume":"16","author":"PD Karp","year":"2000","unstructured":"Karp PD: An ontology for biological function based on molecular interactions. Bioinformatics 2000, 16(3):269\u2013285. 10.1093\/bioinformatics\/16.3.269","journal-title":"Bioinformatics"},{"issue":"1","key":"909_CR19","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1093\/nar\/28.1.10","volume":"28","author":"DL Wheeler","year":"2000","unstructured":"Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, Tatusova TA, Rapp BA: Database resources of the National Center for Biotechnology Information. Nuc Acids Res 2000, 28(1):10\u201314. 10.1093\/nar\/28.1.10","journal-title":"Nuc Acids Res"},{"key":"909_CR20","unstructured":"Bairoch A, Apweiler R: The SWISS-PROT protein sequence database user manual. Release 39, May, 2000."},{"issue":"1","key":"909_CR21","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1093\/nar\/28.1.304","volume":"28","author":"A Bairoch","year":"2000","unstructured":"Bairoch A: The ENZYME databank in 2000. Nuc Acids Res 2000, 28(1):304\u2013305. 10.1093\/nar\/28.1.304","journal-title":"Nuc Acids Res"},{"key":"909_CR22","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1093\/nar\/28.1.27","volume":"28","author":"M Kanehisa","year":"2000","unstructured":"Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nuc Acids Res 2000, 28: 27\u201330. 10.1093\/nar\/28.1.27","journal-title":"Nuc Acids Res"},{"key":"909_CR23","unstructured":"BioCyc Database Collection[http:\/\/BioCyc.org\/]"},{"key":"909_CR24","doi-asserted-by":"publisher","first-page":"D438","DOI":"10.1093\/nar\/gkh100","volume":"32","author":"CJ Krieger","year":"2004","unstructured":"Krieger CJ, Zhang P, Mueller LA, Wang A, Paley S, Arnaud M, Pick J, Rhee SY, Karp PD: MetaCyc: A multiorganism database of metabolic pathways and enzymes. Nuc Acids Res 2004, 32: D438\u201342. 10.1093\/nar\/gkh100","journal-title":"Nuc Acids Res"},{"issue":"1","key":"909_CR25","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1093\/nar\/30.1.56","volume":"30","author":"PD Karp","year":"2002","unstructured":"Karp PD, Riley M, Saier M, Paulsen IT, Paley S, Pellegrini-Toole A: The EcoCyc database. Nuc Acids Res 2002, 30(1):56\u20138. 10.1093\/nar\/30.1.56","journal-title":"Nuc Acids Res"},{"issue":"1","key":"909_CR26","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1093\/nar\/28.1.15","volume":"28","author":"DA Benson","year":"2000","unstructured":"Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL: GenBank. Nuc Acids Res 2000, 28(1):15\u201318. 10.1093\/nar\/28.1.15","journal-title":"Nuc Acids Res"},{"issue":"1","key":"909_CR27","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1093\/nar\/29.1.123","volume":"29","author":"JD Peterson","year":"2001","unstructured":"Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O: The Comprehensive Microbial Resource. Nuc Acids Res 2001, 29(1):123\u20135. 10.1093\/nar\/29.1.123","journal-title":"Nuc Acids Res"},{"key":"909_CR28","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"M Ashburner","year":"2000","unstructured":"Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: Tool for the unification of biology. Nature Genetics 2000, 25: 25\u201329. 10.1038\/75556","journal-title":"Nature Genetics"},{"issue":"1","key":"909_CR29","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1186\/1471-2105-5-76","volume":"5","author":"ML Green","year":"2004","unstructured":"Green ML, Karp PD: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 2004, 5(1):76. [http:\/\/www.biomedcentral.com\/1471\u20132105\/5\/76] 10.1186\/1471-2105-5-76","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"909_CR30","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1089\/153623103322452413","volume":"7","author":"D Segre","year":"2003","unstructured":"Segre D, Zucker J, Katz J, Lin X, D'Haeseleer P, Rindone WP, Kharchenko P, Nguyen DH, Wright JA, Church GM: From annotated genomes to metabolic flux models and kinetic parameter fitting. OMICS A Journal of Integrative Biology 2003, 7(3):301\u2013316. 10.1089\/153623103322452413","journal-title":"OMICS A Journal of Integrative Biology"},{"issue":"4","key":"909_CR31","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1089\/153623103322637715","volume":"7","author":"TD Garvey","year":"2003","unstructured":"Garvey TD, Lincoln P, Pedersen CJ, Martin D, Johnson M: BioSPICE: Access to the most current computational tools for biologists. OMICS A Journal of Integrative Biology 2003, 7(4):411\u201320. 10.1089\/153623103322637715","journal-title":"OMICS A Journal of Integrative Biology"},{"key":"909_CR32","unstructured":"BioPAX[http:\/\/www.biopax.org\/]"},{"key":"909_CR33","doi-asserted-by":"publisher","first-page":"401.1","DOI":"10.1186\/gb-2004-5-8-401","volume":"5","author":"PD Karp","year":"2004","unstructured":"Karp PD: Call for an enzyme genomics initiative. Genome Biology 2004, 5: 401.1\u2013401.3. [http:\/\/genomebiology.com\/2004\/5\/8\/401] 10.1186\/gb-2004-5-8-401","journal-title":"Genome Biology"},{"key":"909_CR34","volume-title":"Enzyme Nomenclature, 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes","author":"Webb C Edwin","year":"1992","unstructured":"Edwin WebbC: Enzyme Nomenclature, 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. Academic Press; 1992."},{"key":"909_CR35","unstructured":"ENZYME Database[http:\/\/www.chem.qmw.ac.uk\/iubmb\/enzyme\/]"},{"key":"909_CR36","unstructured":"Enzyme Genomics information[http:\/\/bioinformatics.ai.sri.com\/enzyme-genomics\/]"},{"key":"909_CR37","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1006\/cbmr.1994.1011","volume":"27","author":"O Ritter","year":"1994","unstructured":"Ritter O, Kocab P, Senger M, Wolf D, Suhai S: Prototype implementation of the integrated genomic database. Computers and Biomedical Research 1994, 27: 97\u2013115. 10.1006\/cbmr.1994.1011","journal-title":"Computers and Biomedical Research"},{"key":"909_CR38","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1007\/978-1-4615-2451-9_5","volume-title":"Computational Methods in Genome Research","author":"O Ritter","year":"1994","unstructured":"Ritter O: The integrated genomic database. In Computational Methods in Genome Research. Plenum, New York; 1994:57\u201373."},{"key":"909_CR39","first-page":"265","volume-title":"Bioinformatics Databases and Systems","author":"J Thierry-Mieg","year":"1999","unstructured":"Thierry-Mieg J, Thierry-Mieg D, Stein L: ACEDB: The ACE database manager. In Bioinformatics Databases and Systems. Kluwer Academic Publishers, Norwell MA; 1999:265\u201378."},{"key":"909_CR40","first-page":"213","volume-title":"Bioinformatics Databases and Systems","author":"P Carter","year":"1999","unstructured":"Carter P, Coupaye T, Kreil DP, Etzold T: SRS: Analyzing and using data from heterogeneous textual databanks. In Bioinformatics Databases and Systems. Kluwer Academic Publishers, Norwell, MA; 1999:213\u201332."},{"key":"909_CR41","unstructured":"GUS schema[http:\/\/www.gusdb.org\/cgi-bin\/schemaBrowser]"},{"key":"909_CR42","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1186\/1471-2105-6-34","volume":"6","author":"SP Shah","year":"2005","unstructured":"Shah SP, Huang Y, Xu T, Yuen MMS, Ling J, Ouellette BFF: Atlas: a data warehouse for integrative bioinformatics. BMC Bioinformatics 2005, 6: 34. 10.1186\/1471-2105-6-34","journal-title":"BMC Bioinformatics"},{"key":"909_CR43","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1101\/gr.1645104","volume":"14","author":"A Kasprzyk","year":"2004","unstructured":"Kasprzyk A, Keefe D, Smedley D, Darin London, William Spooner, Craig Melsopp, Martin Hammond, Philippe Rocca-Serra, Tony Cox, Ewan Birney: EnsMart: A Generic System for Fast and Flexible Access to Biological Data. Genome Research 2004, 14: 160\u20139. 10.1101\/gr.1645104","journal-title":"Genome Research"},{"key":"909_CR44","unstructured":"Biozon[http:\/\/biozon.org\/]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-170.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:22:09Z","timestamp":1630466529000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3,23]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["909"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-170","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,3,23]]},"assertion":[{"value":"9 August 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"170"}}