{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T03:14:42Z","timestamp":1761707682827,"version":"3.38.0"},"reference-count":42,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2003,8,1]],"date-time":"2003-08-01T00:00:00Z","timestamp":1059696000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2003,8]]},"abstract":"<jats:p> With the emergence of distributed resources and grid technologies there is a need to provide higher level informatics infrastructures allowing scientists to easily create and execute meaningful data integration and analysis processes that take advantage of the distributed nature of the available resources. These resources typically include heterogeneous data sources, computational resources for task execution and various application-specific services. The effort of the high performance community has so far mainly focused on the delivery of low-level informatics infrastructures enabling the basic needs of grid applications. Such infrastructures are essential but do not directly help end-users in creating generic and re-usable applications. <\/jats:p><jats:p> In this paper, we present the Discovery Net architecture for building grid-based knowledge discovery applications. Our architecture enables the creation of high-level, re-usable and distributed application workflows that use a variety of common types of distributed resources. It is built on top of standard protocols and standard infrastructures such as Globus but also defines its own protocols such as the Discovery Process Mark-up Language for data flow management. We discuss an implementation of our architecture and evaluate it by building a real-time genome annotation environment on top. <\/jats:p>","DOI":"10.1177\/1094342003173003","type":"journal-article","created":{"date-parts":[[2004,5,27]],"date-time":"2004-05-27T12:40:00Z","timestamp":1085661600000},"page":"297-315","source":"Crossref","is-referenced-by-count":47,"title":["The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery"],"prefix":"10.1177","volume":"17","author":[{"given":"Salman","family":"AlSairafi","sequence":"first","affiliation":[]},{"given":"Filippia-Sofia","family":"Emmanouil","sequence":"additional","affiliation":[]},{"given":"Moustafa","family":"Ghanem","sequence":"additional","affiliation":[]},{"given":"Nikolaos","family":"Giannadakis","sequence":"additional","affiliation":[]},{"given":"Yike","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Dimitrios","family":"Kalaitzopoulos","sequence":"additional","affiliation":[]},{"given":"Michelle","family":"Osmond","sequence":"additional","affiliation":[]},{"given":"Anthony","family":"Rowe","sequence":"additional","affiliation":[]},{"given":"Jameel","family":"Syed","sequence":"additional","affiliation":[]},{"given":"Patrick","family":"Wendel","sequence":"additional","affiliation":[{"name":"DEPARTMENT OF COMPUTING, IMPERIAL COLLEGE, 180 QUEEN\u2019S GATE,                        LONDON SW7 2BZ, UK"}]}],"member":"179","published-online":{"date-parts":[[2003,8,1]]},"reference":[{"key":"atypb1","doi-asserted-by":"crossref","unstructured":"Abramson, D., Sosic, R., Giddy, J., and Hall, B. 1995. Nimrod: A tool for performing parameterised simulations using distributed workstations. In HPDC, pp. 112\u2013121.","DOI":"10.1109\/HPDC.1995.518701"},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"atypb3","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/25.17.3389"},{"key":"atypb4","doi-asserted-by":"crossref","unstructured":"Anderson, D.P., Cobb, J., Korpela, E., Lebofsky, M., and Werthimer, D. 2002. Seti@Home: an experiment in public-resource computing . Communications of the ACM, 45(11): 56\u201361 .","DOI":"10.1145\/581571.581573"},{"key":"atypb5","unstructured":"Avery, P., Foster, I., Gardner, R., Newman, H., and Szalay, A. 2001. An international virtual-data grid laboratory for data intensive science. http:\/\/www.griphyn.org."},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/28.1.45"},{"key":"atypb7","unstructured":"Baru, C., Moore, R., Rajasekar, A., and Wan, M. 1998. The sdsc storage resource broker. In CASCON\u201998."},{"key":"atypb8","doi-asserted-by":"publisher","DOI":"10.1006\/jmbi.1997.0951"},{"key":"atypb9","doi-asserted-by":"crossref","unstructured":"Buyya, R., Abramson, D., and Giddy, J. 2000. Nimrod\/g: An architecture for a resource management and scheduling system in a global computational grid. In HPC ASIA\u20192000.","DOI":"10.1109\/HPC.2000.846563"},{"key":"atypb10","doi-asserted-by":"crossref","unstructured":"Cannataro, M., Talia, D., and Trunfio, P. 2002. Design of distributed data mining application of the knowledge grid . In Proceedings National Science Foundation Workshop on Next Generation Data Mining.","DOI":"10.1016\/S0167-739X(02)00088-2"},{"key":"atypb11","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1096-9128(199809\/11)10:11\/13<1043::AID-CPE413>3.0.CO;2-6"},{"key":"atypb12","unstructured":"Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., and Wirth, R. March 1999. The CRISP-DM process model. http:\/\/www.crisp-dm.org\/."},{"key":"atypb13","doi-asserted-by":"crossref","unstructured":"Chattratichat, J., Darlington, J., Guo, Y., Hedvall, S., Kohler, M., and Syed, J. 1999. An architecture for distributed enterprise data mining . In Proceedings of the 7th Conference on High Performance Computing and Networking Europe.","DOI":"10.1007\/BFb0100618"},{"key":"atypb14","doi-asserted-by":"crossref","unstructured":"Chattratichat, J., Guo, Y., and Syed, J. 1999. A visual language for internet-based data mining and data visualisation. In IEEE Symposium on Visual Languages, Tokyo, Japan .","DOI":"10.1109\/VL.1999.795876"},{"key":"atypb15","unstructured":"Curbera, F., Goland, Y., Klein, J., Leymann, F., Roller, D., Thatte, S., and Weerawarana, S. 2002. Business process execution language for web services, version 1.0."},{"key":"atypb16","doi-asserted-by":"crossref","unstructured":"Curcin, V., Ghanem, M., Guo, Y., Kohler, M., Rowe, A., Syed, J., and Wendel, P. 2002. Discovery net: Towards a grid of knowledge discovery . In Proceedings of the Eigth International Conference on Knowledge Discovery and Data Mining (KDD-2002).","DOI":"10.1145\/775107.775145"},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.95.25.14863"},{"key":"atypb18","doi-asserted-by":"publisher","DOI":"10.1006\/jmbi.2000.3903"},{"key":"atypb19","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(02)00379-0"},{"key":"atypb20","unstructured":"Foster, I., and Kesselman, C. 1999. The globus toolkit. In The Grid: Blueprint for a New Computing Infrastructure, Chap. 11, pp. 259\u2013278. Morgan Kaufmann, San Francisco, CA ."},{"key":"atypb21","doi-asserted-by":"crossref","unstructured":"Foster, I., Kesselman, C., Nick, J.M., and Tuecke, S. 2002. The physiology of the grid an open grid services architecture for distributed systems integration. Technical report, http:\/\/www.globus.org\/research\/papers\/ogsa.pdf.","DOI":"10.1109\/MC.2002.1009167"},{"key":"atypb22","unstructured":"Grossman, R.L., Bailey, S.M., Sivakumar, H., and Turinsky, A.L. 1999. Papyrus: A system for data mining over local and wide-area clusters and super-clusters. In SC\u201999. ACM Press and IEEE Computer Society Press ."},{"key":"atypb23","unstructured":"Data Mining Group. 2003. PMML specification. http:\/\/www.dmg.org\/."},{"key":"atypb24","unstructured":"Guo, Y., and Sutiwaraphun, J. 2000. Distributed classification with knowledge probing. In H. Kargupta and P. Chan, editors, Advances in Distributed and Parallel Knowledge Discovery, chapter 4. AAAI Press ."},{"key":"atypb25","unstructured":"Henderson, R., and Tweten, D. 1995. Portable batch system: Requirement specification. Technical report, NASA Ames Research Center ."},{"key":"atypb26","unstructured":"Kensington discovery edition. 2003. http:\/\/www.inforsense.com."},{"key":"atypb27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/15.5.356"},{"key":"atypb28","doi-asserted-by":"crossref","unstructured":"Litzkow, M.J., Livny, M., and Mutka, M.W. 1988. Condor: A hunter of idle workstations . In 8th International Conference on Distributed Computing Systems, pp. 104\u2013111 , Washington, DC, USA, June. IEEE Computer Society Press.","DOI":"10.1109\/DCS.1988.12507"},{"key":"atypb29","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/25.5.0955"},{"key":"atypb30","doi-asserted-by":"crossref","unstructured":"Maniatty, W., and Zaki, M.J. 2000. A requirements analysis for parallel KDD systems. In IPDPS Workshops, pp. 358\u2013365.","DOI":"10.1007\/3-540-45591-4_47"},{"key":"atypb31","unstructured":"O\u2019Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A., and Kanehisa, M. 2002. High-quality protein knowledge resource: Swiss-prot and trembl . Brief Bioinformatics, 16: 944\u2013945 ."},{"key":"atypb32","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/27.1.29"},{"key":"atypb33","doi-asserted-by":"publisher","DOI":"10.1016\/S0168-9525(00)02024-2"},{"key":"atypb34","unstructured":"Romberg, M. 1999. The unicore architecture . In Proceeding of the 8th IEEE International Symposium on High Performance Distributed Computing."},{"key":"atypb35","unstructured":"Roth, M., and Schwarz, P. 1997. Don\u2019t scrap it, wrap it! a wrapper architecture for legacy data sources. In VLD 1997."},{"key":"atypb36","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/16.10.944"},{"key":"atypb37","unstructured":"Smit, A., and Green, P. 2003. http:\/\/ftp.genome.washington.edu\/RM\/RepeatMasker.html."},{"key":"atypb38","doi-asserted-by":"crossref","unstructured":"Stein, L. 2002. Genome annotation: from sequence to biology . Nature Reviews Genetics, 2: 493\u2013503 .","DOI":"10.1038\/35080529"},{"key":"atypb39","unstructured":"Stolfo, S., Prodromidis, A.L., Tselepis, S., Lee, W., Fan, D.W., and Chan, P.K. 1997. JAM: Java agents for meta-learning over distributed databases. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), p. 74-74. AAAI Press ."},{"key":"atypb40","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/17.9.849"},{"key":"atypb41","doi-asserted-by":"publisher","DOI":"10.1017\/S0956796899003585"},{"key":"atypb42","unstructured":"Zhou, S. 1992. LSF: load sharing in large-scale heterogeneous distributed systems . In Proceedings of the Workshop on Cluster Computing, Tallahassee, FL, December. Super-computing Computations Research Institute, Florida State University."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342003173003","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342003173003","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T07:19:12Z","timestamp":1740986352000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342003173003"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2003,8]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2003,8]]}},"alternative-id":["10.1177\/1094342003173003"],"URL":"https:\/\/doi.org\/10.1177\/1094342003173003","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2003,8]]}}}