{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T11:41:24Z","timestamp":1740138084799,"version":"3.37.3"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T00:00:00Z","timestamp":1587945600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T00:00:00Z","timestamp":1587945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007914","name":"Brunel University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007914","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Grid Computing"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE\/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.<\/jats:p>","DOI":"10.1007\/s10723-020-09518-y","type":"journal-article","created":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T17:41:49Z","timestamp":1588009309000},"page":"507-527","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets"],"prefix":"10.1007","volume":"18","author":[{"given":"M.","family":"Ghorbani","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S.","family":"Swift","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S. J. E.","family":"Taylor","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A. M.","family":"Payne","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,4,27]]},"reference":[{"issue":"6245","key":"9518_CR1","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1126\/science.aaa8415","volume":"349","author":"MI Jordan","year":"2015","unstructured":"Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science. 349(6245), 255\u2013260 (2015)","journal-title":"Science"},{"key":"9518_CR2","doi-asserted-by":"crossref","unstructured":"Q Zou, L Chen, T Huang, Z Zhang and Y Xu Machine Learning and Graph Analytics in Computational Biomedicine. Artificial Intelligence in Medicine 83, November, Page 1 and papers therein; (2017)","DOI":"10.1016\/j.artmed.2017.09.003"},{"key":"9518_CR3","doi-asserted-by":"crossref","unstructured":"I.H. Witten, E. Frank, M.A. Hall and C.J. Pal, Data Mining: Practical machine learning tools and techniques. (Morgan Kaufmann 2016)","DOI":"10.1016\/B978-0-12-804291-5.00010-6"},{"key":"9518_CR4","doi-asserted-by":"crossref","unstructured":"W. Cheng, G. Kasneci, T. Graepel, D. Stern and R. Herbrich Automated feature generation from structured knowledge. In Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 1395\u20131404). ACM. (2011)","DOI":"10.1145\/2063576.2063779"},{"key":"9518_CR5","doi-asserted-by":"crossref","unstructured":"H. Paulheim and J. F\u00fcmkranz June. Unsupervised generation of data mining features from linked open data. In Proceedings of the 2nd international conference on web intelligence, mining and semantics (p. 31). ACM. (2012)","DOI":"10.1145\/2254129.2254168"},{"key":"9518_CR6","unstructured":"L. Friedman and S. Markovitch Recursive Feature Generation for Knowledge-based Learning. arXiv preprint arXiv:1802.00050. (2018)"},{"issue":"2","key":"9518_CR7","first-page":"427","volume":"10","author":"JA Menezes","year":"2017","unstructured":"Menezes, J.A., Cabral, G., Gomes, B.T.: Genetic algorithms for feature generation in the context of audio classification. World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering. 10(2), 427\u2013430 (2017)","journal-title":"World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering"},{"key":"9518_CR8","doi-asserted-by":"crossref","unstructured":"Afgan, E.; Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; \u010cech, M.; Chilton, J.; Clements, D.; Coraor, N.; Eberhard, C.; Gr\u00fcning, B.; Guerler, A.; Hillman-Jackson, J.; Von Kuster, G.; Rasche, E.; Soranzo, N.; Turaga, N.; Taylor, J.; Nekrutenko, A.; Goecks, J. (8 July 2016). \"The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44 (W1): W3\u2013W10","DOI":"10.1093\/nar\/gkw343"},{"key":"9518_CR9","doi-asserted-by":"crossref","unstructured":"Johannes K\u00f6ster and Sven Rahmann. \u201cSnakemake - A scalable bioinformatics workflow engine\u201d. Bioinformatics 2012","DOI":"10.1093\/bioinformatics\/bts480"},{"key":"9518_CR10","unstructured":"J Gray. Jim Gray on eScience: A transformed scientific method. In The Fourth Paradigm: Data-Intensive Scientific Discovery, Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). (Microsoft, xix\u2013xxxiii. 2009)"},{"key":"9518_CR11","unstructured":"Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research (2009)"},{"key":"9518_CR12","doi-asserted-by":"publisher","unstructured":"Kell D B and Oliver S G. Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. BioEssays 26, 1, DOI:https:\/\/doi.org\/10.1002\/bies.10385 (Jan. 2004)","DOI":"10.1002\/bies.10385"},{"issue":"4","key":"9518_CR13","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/MC.2008.122","volume":"41","author":"I Gorton","year":"2008","unstructured":"Gorton, I., Greenfield, P., Szalay, A., Williams, R.: Data-intensive computing in the 21st century. Computer. 41(4), 30\u201332 (2008)","journal-title":"Computer"},{"key":"9518_CR14","doi-asserted-by":"publisher","unstructured":"Deelman E, Vahi K, Rynge M, Juve G, Mayani R, and Ferreira da Silva R. Pegasus in the cloud: science automation through workflow technologies. IEEE Internet Comput. 20, 1, 70\u201376. DOI:https:\/\/doi.org\/10.1109\/MIC.2016.15 (Jan. 2016)","DOI":"10.1109\/MIC.2016.15"},{"key":"9518_CR15","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1007\/s10723-016-9380","volume":"14","author":"P Kacsuk","year":"2016","unstructured":"Kacsuk, P., Kecskemeti, G., Kertesz, A., et al.: Infrastructure Aware Scientific Workflows and Infrastructure Aware Workflow Managers in Science Gateways J Grid Computing. 14, 641 (2016) https:\/\/doi.org\/10.1007\/s10723-016-9380","journal-title":"Infrastructure Aware Scientific Workflows and Infrastructure Aware Workflow Managers in Science Gateways J Grid Computing"},{"key":"9518_CR16","doi-asserted-by":"publisher","first-page":"743","DOI":"10.1007\/s10723-012-9246-z","volume":"10","author":"TA Wassenaar","year":"2012","unstructured":"Wassenaar, T.A., van Dijk, M., Loureiro-Ferreira, N., et al.: WeNMR: Structural Biology on the Grid J Grid Computing. 10, 743 (2012) https:\/\/doi.org\/10.1007\/s10723-012-9246-z","journal-title":"WeNMR: Structural Biology on the Grid J Grid Computing"},{"key":"9518_CR17","doi-asserted-by":"crossref","unstructured":"M. McLennan, R. Kennell, \"HUBzero: a platform for dissemination and collaboration in computational science and engineering,\" Computing in Science and Engineering 12(2), pp. 48\u201352, March\/April, 2010","DOI":"10.1109\/MCSE.2010.41"},{"key":"9518_CR18","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1007\/s10723-012-9240-5","volume":"10","author":"P Kacsuk","year":"2012","unstructured":"Kacsuk, P., Farkas, Z., Kozlovszky, M., et al.: WS-PGRADE\/gUSE Generic DCI Gateway Framework for a Large Variety of User Communities J Grid Computing. 10, 601 (2012) https:\/\/doi.org\/10.1007\/s10723-012-9240-5","journal-title":"WS-PGRADE\/gUSE Generic DCI Gateway Framework for a Large Variety of User Communities J Grid Computing"},{"key":"9518_CR19","doi-asserted-by":"publisher","unstructured":"Deelman, E.: Grids and clouds: making workflow applications work in heterogeneous distributed environments. International Journal of High Performance Computing Applications. 24(3), 284\u2013298 (Aug. 2010) https:\/\/doi.org\/10.1177\/10943420093564322010","DOI":"10.1177\/10943420093564322010"},{"key":"9518_CR20","doi-asserted-by":"publisher","unstructured":"Kacsuk P (Ed.). Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities. DOI:https:\/\/doi.org\/10.1007\/978-3-319-11268-8 (2014)","DOI":"10.1007\/978-3-319-11268-8"},{"key":"9518_CR21","doi-asserted-by":"publisher","unstructured":"Liew C S, Atkinson M P., Galea M, Ang T F, Martin P, and Van Hemert J I. Scientific workflows: moving across paradigms. ACM Comput. Surv.. 49, 4, Article 66 DOI: https:\/\/doi.org\/10.1145\/3012429 (December 2016)","DOI":"10.1145\/3012429"},{"key":"9518_CR22","doi-asserted-by":"crossref","unstructured":"Kacsuk, P.: P-GRADE portal family for grid infrastructures. Concurrency and Computation: Practice and Experience Special Issue: IWPLS 2009. 23(3), 235\u2013245 (2011)","DOI":"10.1002\/cpe.1654"},{"key":"9518_CR23","doi-asserted-by":"publisher","unstructured":"Balasko, A .: Workflow Concept of WS-PGRADE\/gUSE. Science Gateways for Distributed Computing Infrastructures:Development Framework and Exploitation by Scientific User Communities, pp. 33\u201350 doi:https:\/\/doi.org\/10.1007\/978-3-319-11268-83 (2014)","DOI":"10.1007\/978-3-319-11268-83"},{"key":"9518_CR24","unstructured":"S.C. Shah Recent Advances in Mobile Grid and Cloud Computing. Intelligent Automation & Soft Computing, pp.1\u201313. (2017)"},{"key":"9518_CR25","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1016\/j.future.2006.05.008","volume":"23","author":"M Ellert","year":"2007","unstructured":"Ellert, M., et al.: Advanced resource connector middleware for lightweight computational grids. Futur. Gener. Comput. Syst. 23, 219\u2013240 (2007)","journal-title":"Futur. Gener. Comput. Syst."},{"issue":"2\u20134","key":"9518_CR26","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1002\/cpe.938","volume":"17","author":"D Thain","year":"2005","unstructured":"Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurrency and computation: practice and experience. 17(2\u20134), 323\u2013356 (2005)","journal-title":"Concurrency and computation: practice and experience"},{"key":"9518_CR27","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1007\/11577188_2","volume":"3779","author":"I Foster","year":"2005","unstructured":"Foster, I.: Globus toolkit version 4: software for service-oriented systems. IFIP international conference on network and parallel computing, Springer-Verlag LNCS. 3779, 2\u201313 (2005)","journal-title":"IFIP international conference on network and parallel computing, Springer-Verlag LNCS"},{"key":"9518_CR28","volume-title":"Anderson: Public Computing: Reconnecting People to Science","author":"P David","year":"2003","unstructured":"David, P.: Anderson: Public Computing: Reconnecting People to Science. Conference on Shared Knowledge and the Web, Residencia de Estudiantes, Madrid, Spain (2003)"},{"key":"9518_CR29","doi-asserted-by":"publisher","unstructured":", et al.: The DECIDE science gateway. J Grid Comput. 10, 689\u2013707 (2012). https:\/\/doi.org\/10.1007\/s10723-012-9242-3Ardizzone, V., Barbera, R., Calanducci, A. et al.: The DECIDE science gateway. J Grid Comput 10, 689 doi:https:\/\/doi.org\/10.1007\/s10723-012-9242-3 (2012), 707","DOI":"10.1007\/s10723-012-9242-3 10.1007\/s10723-012-9242-3"},{"key":"9518_CR30","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1007\/s10723-015-9330-2","volume":"13","author":"A Costa","year":"2015","unstructured":"Costa, A., Massimino, P., Bandieramonte, M., et al.: An innovative science gateway for the Cherenkov telescope array. J Grid Comput. 13, 547 (2015). https:\/\/doi.org\/10.1007\/s10723-015-9330-2","journal-title":"J Grid Comput"},{"key":"9518_CR31","doi-asserted-by":"publisher","unstructured":"R. Grunzke, J. Kr\u00fcger, R J\u00e4kel., et al.: Metadata Management in the moSGrid Science Gateway \u2013 Evaluation and the Expansion of Quantum Chemistry Support. J Grid Computing. doi:https:\/\/doi.org\/10.1007\/s10723-016-9362-2 (2016)","DOI":"10.1007\/s10723-016-9362-2"},{"key":"9518_CR32","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1007\/s10723-016-9369-8","volume":"14","author":"S Gugnani","year":"2016","unstructured":"Gugnani, S., Blanco, C., Kiss, T., Terstyanszky, G.: Extending science gateway frameworks to support big data applications in the cloud. Extending science gateway frameworks to support big data applications in the cloud J Grid Computing. 14, 589\u2013601 (2016). https:\/\/doi.org\/10.1007\/s10723-016-9369-8","journal-title":"Extending science gateway frameworks to support big data applications in the cloud J Grid Computing"},{"issue":"4","key":"9518_CR33","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1007\/s10723-016-9388-5","volume":"14","author":"Z Farkas","year":"2016","unstructured":"Farkas, Z., Kacsuk, P., Hajnal, \u00c1.: Enabling workflow-oriented science gateways to access multi-cloud systems. Journal of Grid Computing. 14(4), 619\u2013640 (2016)","journal-title":"Journal of Grid Computing"},{"key":"9518_CR34","unstructured":"C.M. Taylor BOINC user stats https:\/\/boincstats.com\/en\/stats\/-1\/user\/detail\/3531367\/overview accessed 9\/9\/2016"},{"key":"9518_CR35","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1007\/s10723-015-9348-5","volume":"14","author":"AL Bazinet","year":"2016","unstructured":"Bazinet, A.L., Cummings, M.P.: Subdividing long-running, variable-length analyses into short. Fixed-Length BOINC Workunits J Grid Computing. 14, 429. https:\/\/doi.org\/10.1007\/s10723-015-9348-5\u2013441 (2016)","journal-title":"Fixed-Length BOINC Workunits J Grid Computing"},{"key":"9518_CR36","doi-asserted-by":"crossref","unstructured":"F. Gutierrez, D. Azevedo, M. Barreto and R. Zucoloto Support for bioinformatics applications through volunteer and scalable computing frameworks. In Cluster Computing (CLUSTER), 2014 IEEE International Conference (pp. 364\u2013370). IEEE. (2014)","DOI":"10.1109\/CLUSTER.2014.6968780"},{"issue":"D1","key":"9518_CR37","doi-asserted-by":"crossref","first-page":"D20","DOI":"10.1093\/nar\/gkv1352","volume":"44","author":"CE Cook","year":"2015","unstructured":"Cook, C.E., Bergman, M.T., Finn, R.D., Cochrane, G., Birney, E., Apweiler, R.: The European bioinformatics institute in 2016: data growth and integration. Nucleic Acids Res. 44(D1), D20\u2013D26 (2015)","journal-title":"Nucleic Acids Res."},{"key":"9518_CR38","doi-asserted-by":"crossref","unstructured":"M. Ghorbani, M. Themis, A. Payne Genome wide classification and characterisation of CpG sites in cancer and normal cells. Comput Biol Med. 1;68:57\u201366. doi: 10.1016\/j.compbiomed.2015.09.023. Epub 2015 Oct 23. (2015)","DOI":"10.1016\/j.compbiomed.2015.09.023"},{"key":"9518_CR39","unstructured":"BOINC 2017 https:\/\/boinc.berkeley.edu\/ accessed 12\/09\/2017"},{"issue":"6","key":"9518_CR40","doi-asserted-by":"crossref","first-page":"1442","DOI":"10.1016\/j.future.2012.03.013","volume":"29","author":"A Marosi","year":"2013","unstructured":"Marosi, A., Kov\u00e1cs, J., Kacsuk, P.: Towards a volunteer cloud system. Futur. Gener. Comput. Syst. 29(6), 1442\u20131451 (2013)","journal-title":"Futur. Gener. Comput. Syst."},{"issue":"4","key":"9518_CR41","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1007\/s10723-012-9240-5","volume":"10","author":"P Kacsuk","year":"2012","unstructured":"Kacsuk, P., Farkas, Z., Kozlovszky, M., Hermann, G., Balasko, A., Karoczkai, K., Marton, I.: WS-PGRADE\/gUSE generic DCI gateway framework for a large variety of user communities. Journal of Grid Computing. 10(4), 601\u2013630 (2012)","journal-title":"Journal of Grid Computing"},{"key":"9518_CR42","doi-asserted-by":"crossref","unstructured":"C.B. Ries, C. Schroder and V. Grout Approach of a UML profile for Berkeley Open Infrastructure for network computing (BOINC), Computer Applications and Industrial Electronics (ICCAIE), 2011 IEEE International Conference, pp. 483. (2011)","DOI":"10.1109\/ICCAIE.2011.6162183"},{"issue":"1","key":"9518_CR43","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/1471-2105-10-116","volume":"10","author":"C Previti","year":"2009","unstructured":"Previti, C., Harari, O., Zwir, I., del Val, C.: Profile analysis and prediction of tissue-specific CpG island methylation classes. BMC Bioinformatics. 10(1), 116 (2009)","journal-title":"BMC Bioinformatics"},{"key":"9518_CR44","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/S0168-9525(00)02024-2","volume":"16","author":"P Rice","year":"2000","unstructured":"Rice, P., Longden, I., Bleasby, A.: EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276\u2013277 (2000)","journal-title":"Trends Genet."},{"key":"9518_CR45","doi-asserted-by":"crossref","unstructured":"A.C. Marosi, Z. Balaton and P. Kacsuk GenWrapper: a generic wrapper for running legacy applications on desktop grids, Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on IEEE, pp. 1. (2009)","DOI":"10.1109\/IPDPS.2009.5161136"},{"key":"9518_CR46","unstructured":"Jaspar 2017, http:\/\/jaspar.genereg.net\/ accessed 12\/09\/2017"}],"container-title":["Journal of Grid Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10723-020-09518-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10723-020-09518-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10723-020-09518-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T10:46:55Z","timestamp":1696070815000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10723-020-09518-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,27]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["9518"],"URL":"https:\/\/doi.org\/10.1007\/s10723-020-09518-y","relation":{},"ISSN":["1570-7873","1572-9184"],"issn-type":[{"type":"print","value":"1570-7873"},{"type":"electronic","value":"1572-9184"}],"subject":[],"published":{"date-parts":[[2020,4,27]]},"assertion":[{"value":"26 June 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 April 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with Ethical Standards"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}