{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,25]],"date-time":"2024-05-25T07:11:36Z","timestamp":1716621096591},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at:<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.ensembl.org\/info\/docs\/eHive\/\" ext-link-type=\"uri\">http:\/\/www.ensembl.org\/info\/docs\/eHive\/<\/jats:ext-link>.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-11-240","type":"journal-article","created":{"date-parts":[[2010,5,11]],"date-time":"2010-05-11T18:17:51Z","timestamp":1273601871000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["eHive: An Artificial Intelligence workflow system for genomic analysis"],"prefix":"10.1186","volume":"11","author":[{"given":"Jessica","family":"Severin","sequence":"first","affiliation":[]},{"given":"Kathryn","family":"Beal","sequence":"additional","affiliation":[]},{"given":"Albert J","family":"Vilella","sequence":"additional","affiliation":[]},{"given":"Stephen","family":"Fitzgerald","sequence":"additional","affiliation":[]},{"given":"Michael","family":"Schuster","sequence":"additional","affiliation":[]},{"given":"Leo","family":"Gordon","sequence":"additional","affiliation":[]},{"given":"Abel","family":"Ureta-Vidal","sequence":"additional","affiliation":[]},{"given":"Paul","family":"Flicek","sequence":"additional","affiliation":[]},{"given":"Javier","family":"Herrero","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,5,11]]},"reference":[{"key":"3697_CR1","doi-asserted-by":"publisher","first-page":"D690","DOI":"10.1093\/nar\/gkn828","volume":"37","author":"TJ Hubbard","year":"2009","unstructured":"Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P: Ensembl 2009. Nucleic Acids Res 2009, 37: D690-D697. 10.1093\/nar\/gkn828","journal-title":"Nucleic Acids Res"},{"key":"3697_CR2","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/1471-2164-10-22","volume":"10","author":"D Smedley","year":"2009","unstructured":"Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A: BioMart--biological queries made easy. BMC Genomics 2009, 10: 22. 10.1186\/1471-2164-10-22","journal-title":"BMC Genomics"},{"key":"3697_CR3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1145\/37401.37406","volume-title":"Proceedings of the 14th annual conference on Computer graphics and interactive techniques","author":"CW Reynolds","year":"1987","unstructured":"Reynolds CW: Flocks, herds and schools: A distributed behavioral model. Proceedings of the 14th annual conference on Computer graphics and interactive techniques 1987, 25\u201334. full_text"},{"key":"3697_CR4","first-page":"38","volume":"7","author":"HP Nii","year":"1986","unstructured":"Nii HP: The blackboard model of problem solving and the evolution of blackboard architectures. AI Magazine 1986, 7: 38\u201353.","journal-title":"AI Magazine"},{"key":"3697_CR5","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1017\/S026988890000789X","volume":"11","author":"HS Nwana","year":"1996","unstructured":"Nwana HS: Software agents: An overview. Knowledge Engineering Review 1996, 11: 205\u2013244. 10.1017\/S026988890000789X","journal-title":"Knowledge Engineering Review"},{"key":"3697_CR6","unstructured":"Platform LSF Family[http:\/\/www.platform.com\/Products\/platform-lsf-family]"},{"key":"3697_CR7","unstructured":"Sun Grid Engine[http:\/\/gridengine.sunsource.net\/]"},{"key":"3697_CR8","unstructured":"OpenPBS[http:\/\/www.openpbs.org]"},{"key":"3697_CR9","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1038\/nature05805","volume":"447","author":"TS Mikkelsen","year":"2007","unstructured":"Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SM, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, Hore TA, Huttley GA, Kleber M, Jirtle RL, Koina E, Lee JT, Mahony S, Marra MA, Miller RD, Nicholls RD, Oda M, Papenfuss AT, Parra ZE, Pollock DD, Ray DA, Schein JE, Speed TP, Thompson K, VandeBerg JL, Wade CM, Walker JA, Waters PD, Webber C, Weidman JR, Xie X, Zody MC, Broad Institute Genome Sequencing Platform, Broad Institute Whole Genome Assembly Team, Baldwin J, Abdouelleil A, Abdulkadir J, Abebe A, Abera B, Abreu J, Acer SC, Aftuck L, Alexander A, An P, Anderson E, Anderson S, Arachi H, Azer M, Bachantsang P, Barry A, Bayul T, Berlin A, Bessette D, Bloom T, Boguslavskiy L, Bonnet C, Boukhgalter B, Bourzgui I, Brown A, Cahill P, Channer S, Cheshatsang Y, Chuda L, Citroen M, Collymore A, Cooke P, Costello M, D'Aco K, Daza R, De Haan G, DeGray S, DeMaso C, Dhargay N, Dooley K, Dooley E, Doricent M, Dorje P, Dorjee K, Dupes A, Elong R, Falk J, Farina A, Faro S, Ferguson D, Fisher S, Foley CD, Franke A, Friedrich D, Gadbois L, Gearin G, Gearin CR, Giannoukos G, Goode T, Graham J, Grandbois E, Grewal S, Gyaltsen K, Hafez N, Hagos B, Hall J, Henson C, Hollinger A, Honan T, Huard MD, Hughes L, Hurhula B, Husby ME, Kamat A, Kanga B, Kashin S, Khazanovich D, Kisner P, Lance K, Lara M, Lee W, Lennon N, Letendre F, LeVine R, Lipovsky A, Liu X, Liu J, Liu S, Lokyitsang T, Lokyitsang Y, Lubonja R, Lui A, MacDonald P, Magnisalis V, Maru K, Matthews C, McCusker W, McDonough S, Mehta T, Meldrim J, Meneus L, Mihai O, Mihalev A, Mihova T, Mittelman R, Mlenga V, Montmayeur A, Mulrain L, Navidi A, Naylor J, Negash T, Nguyen T, Nguyen N, Nicol R, Norbu C, Norbu N, Novod N, O'Neill B, Osman S, Markiewicz E, Oyono OL, Patti C, Phunkhang P, Pierre F, Priest M, Raghuraman S, Rege F, Reyes R, Rise C, Rogov P, Ross K, Ryan E, Settipalli S, Shea T, Sherpa N, Shi L, Shih D, Sparrow T, Spaulding J, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Strader C, Tesfaye S, Thomson T, Thoulutsang Y, Thoulutsang D, Topham K, Topping I, Tsamla T, Vassiliev H, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Young G, Yu Q, Zembek L, Zhong D, Zimmer A, Zwirko Z, Jaffe DB, Alvarez P, Brockman W, Butler J, Chin C, Gnerre S, MacCallum I, Graves JA, Ponting CP, Breen M, Samollow PB, Lander ES, Lindblad-Toh K: Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 2007, 447: 167\u2013177. 10.1038\/nature05805","journal-title":"Nature"},{"key":"3697_CR10","doi-asserted-by":"publisher","first-page":"11484","DOI":"10.1073\/pnas.1932072100","volume":"100","author":"WJ Kent","year":"2003","unstructured":"Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D: Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 2003, 100: 11484\u201311489. 10.1073\/pnas.1932072100","journal-title":"Proc Natl Acad Sci USA"},{"key":"3697_CR11","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1101\/gr.229202. Article published online before March 2002","volume":"12","author":"WJ Kent","year":"2002","unstructured":"Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12: 656\u2013664.","journal-title":"Genome Res"},{"key":"3697_CR12","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/978-1-59745-514-5_14","volume":"395","author":"CN Dewey","year":"2007","unstructured":"Dewey CN: Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol 2007, 395: 221\u2013236.","journal-title":"Methods Mol Biol"},{"key":"3697_CR13","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1016\/j.cub.2006.01.050","volume":"16","author":"S Lall","year":"2006","unstructured":"Lall S, Gr\u00fcn D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, Macmenamin P, Kao HL, Gunsalus KC, Pachter L, Piano F, Rajewsky N: A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 2006, 16: 460\u2013471. 10.1016\/j.cub.2006.01.050","journal-title":"Curr Biol"},{"key":"3697_CR14","doi-asserted-by":"publisher","first-page":"1814","DOI":"10.1101\/gr.076554.108","volume":"18","author":"B Paten","year":"2008","unstructured":"Paten B, Herrero J, Beal K, Fitzgerald S, Birney E: Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 2008, 18: 1814\u20131828. 10.1101\/gr.076554.108","journal-title":"Genome Res"},{"key":"3697_CR15","doi-asserted-by":"publisher","first-page":"901","DOI":"10.1101\/gr.3577405","volume":"15","author":"GM Cooper","year":"2005","unstructured":"Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou S, Sidow A: Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005, 15: 901\u2013913. 10.1101\/gr.3577405","journal-title":"Genome Res"},{"key":"3697_CR16","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403\u2013410.","journal-title":"J Mol Biol"},{"key":"3697_CR17","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1101\/gr.073585.107","volume":"19","author":"AJ Vilella","year":"2009","unstructured":"Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2009, 19: 327\u2013335. 10.1101\/gr.073585.107","journal-title":"Genome Res"},{"key":"3697_CR18","unstructured":"TreeSoft: Softwares for Phylogenetic Trees[http:\/\/treesoft.sourceforge.net\/treebest.shtml]"},{"key":"3697_CR19","doi-asserted-by":"publisher","first-page":"D735","DOI":"10.1093\/nar\/gkm1005","volume":"36","author":"J Ruan","year":"2008","unstructured":"Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, H\u00e9rich\u00e9 JK, Hu Y, Kristiansen K, Li R, Liu T, Moses A, Qin J, Vang S, Vilella AJ, Ureta-Vidal A, Bolund L, Wang J, Durbin R: TreeFam: 2008 Update. Nucleic Acids Res 2008, 36: D735-D740. 10.1093\/nar\/gkm1005","journal-title":"Nucleic Acids Res"},{"key":"3697_CR20","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1101\/gr.926603","volume":"13","author":"M Brudno","year":"2003","unstructured":"Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, NISC Comparative Sequencing Program, Green ED, Sidow A, Batzoglou S: LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 2003, 13: 721\u2013731. 10.1101\/gr.926603","journal-title":"Genome Res"},{"key":"3697_CR21","doi-asserted-by":"publisher","first-page":"1829","DOI":"10.1101\/gr.076521.108","volume":"18","author":"B Paten","year":"2008","unstructured":"Paten B, Herrero J, Fitzgerald S, Beal K, Flicek P, Holmes I, Birney E: Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res 2008, 18: 1829\u20131843. 10.1101\/gr.076521.108","journal-title":"Genome Res"},{"key":"3697_CR22","unstructured":"XBaya: A Graphical Workflow Composer for Web Services[http:\/\/www.extreme.indiana.edu\/xbaya\/]"},{"key":"3697_CR23","doi-asserted-by":"publisher","first-page":"1067","DOI":"10.1002\/cpe.993","volume":"18","author":"T Oinn","year":"2006","unstructured":"Oinn T, Greenwood M, Addis M, Alpdemir MN, Ferris J, Glover K, Goble C, Goderis A, Hull D, Marvin D, Li P, Lord P, Pocock MR, Senger M, Stevens R, Wipat A, Wroe C: Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience 2006, 18: 1067\u20131100. 10.1002\/cpe.993","journal-title":"Concurrency and Computation: Practice and Experience"},{"key":"3697_CR24","doi-asserted-by":"publisher","first-page":"W729","DOI":"10.1093\/nar\/gkl320","volume":"34","author":"D Hull","year":"2006","unstructured":"Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucleic Acids Res 2006, 34: W729-W732. 10.1093\/nar\/gkl320","journal-title":"Nucleic Acids Res"},{"key":"3697_CR25","volume-title":"IEEE Workshop on Scientific Workflows","author":"Y Zhao","year":"2007","unstructured":"Zhao Y, Hategan M, Clifford B, Foster I, Von Laszewski G, Raicu I, Stef-Praun T, Wilde M: Swift: Fast, reliable, loosely coupled parallel computation. IEEE Workshop on Scientific Workflows 2007."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-240.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,1]],"date-time":"2023-06-01T04:11:08Z","timestamp":1685592668000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-240"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,5,11]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3697"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-240","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,5,11]]},"assertion":[{"value":"21 October 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"240"}}