{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:59:32Z","timestamp":1760241572086,"version":"build-2065373602"},"reference-count":25,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2018,5,29]],"date-time":"2018-05-29T00:00:00Z","timestamp":1527552000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding\u2014using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture.<\/jats:p>","DOI":"10.3390\/info9060131","type":"journal-article","created":{"date-parts":[[2018,5,29]],"date-time":"2018-05-29T02:58:18Z","timestamp":1527562698000},"page":"131","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Hadoop Cluster Deployment: A Methodological Approach"],"prefix":"10.3390","volume":"9","author":[{"given":"Ronaldo Celso Messias","family":"Correia","sequence":"first","affiliation":[{"name":"Departamento de Matematica e Computa\u00e7\u00e3o, Sao Paulo State University\u2014UNESP, Presidente Prudente 19060-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8437-4349","authenticated-orcid":false,"given":"Gabriel","family":"Spadon","sequence":"additional","affiliation":[{"name":"Instituto de Ciencias Matematicas e Computacao, University of Sao Paulo\u2014USP, Sao Carlos 13566-590, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pedro Henrique","family":"De Andrade Gomes","sequence":"additional","affiliation":[{"name":"Departamento de Matematica e Computa\u00e7\u00e3o, Sao Paulo State University\u2014UNESP, Presidente Prudente 19060-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9493-145X","authenticated-orcid":false,"given":"Danilo Medeiros","family":"Eler","sequence":"additional","affiliation":[{"name":"Departamento de Matematica e Computa\u00e7\u00e3o, Sao Paulo State University\u2014UNESP, Presidente Prudente 19060-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1248-528X","authenticated-orcid":false,"given":"Rog\u00e9rio Eduardo","family":"Garcia","sequence":"additional","affiliation":[{"name":"Departamento de Matematica e Computa\u00e7\u00e3o, Sao Paulo State University\u2014UNESP, Presidente Prudente 19060-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Celso","family":"Olivete Junior","sequence":"additional","affiliation":[{"name":"Departamento de Matematica e Computa\u00e7\u00e3o, Sao Paulo State University\u2014UNESP, Presidente Prudente 19060-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,5,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sagiroglu, S., and Sinanc, D. (2013, January 20\u201324). Big data: A review. Proceedings of the 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA.","DOI":"10.1109\/CTS.2013.6567202"},{"key":"ref_2","first-page":"21","article-title":"Big data, analytics and the path from insights to value","volume":"52","author":"LaValle","year":"2013","journal-title":"MIT Sloan Manag. Rev."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Daniel, B.K. (2018). Reimaging Research Methodology as Data Science. Big Data Cogn. Comput., 2.","DOI":"10.3390\/bdcc2010004"},{"key":"ref_4","unstructured":"M\u00fcller, M.U., Rosenbach, M., and Schulz, T. (2013). Living by the Numbers: Big Data Knows What Your Future Holds, Spiegel Online."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Katal, A., Wazid, M., and Goudar, R. (2013, January 8\u201310). Big data: Issues, challenges, tools and Good practices. Proceedings of the 2013 Sixth International Conference on Contemporary Computing (IC3), Noida, India.","DOI":"10.1109\/IC3.2013.6612229"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010, January 3\u20137). The Hadoop Distributed File System. Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, USA.","DOI":"10.1109\/MSST.2010.5496972"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Appuswamy, R., Gkantsidis, C., Narayanan, D., Hodson, O., and Rowstron, A. (2013, January 1\u20133). Scale-up vs. Scale-out for Hadoop: Time to rethink?. Proceedings of the 4th Annual Symposium on Cloud Computing, Santa Clara, CA, USA.","DOI":"10.1145\/2523616.2523629"},{"key":"ref_8","unstructured":"Souza, G.S., Correia, R.C.M., Garcia, R.E., and Olivete, C. (2015, January 17\u201320). Simulation and analysis applied on virtualization to build Hadoop clusters. Proceedings of the 2015 10th Iberian Conference on Information Systems and Technologies (CISTI), Aveiro, Portugal."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1147\/JRD.2013.2240732","article-title":"Big Data text-oriented benchmark creation for Hadoop","volume":"57","author":"Gattiker","year":"2013","journal-title":"IBM J. Res. Dev."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Saletore, V.A., Krishnan, K., Viswanathan, V., and Tolentino, M.E. (2013, January 22\u201324). HcBench: Methodology, development, and characterization of a customer usage representative big data\/Hadoop benchmark. Proceedings of the 2013 IEEE International Symposium on Workload Characterization (IISWC), Portland, OR, USA.","DOI":"10.1109\/IISWC.2013.6704672"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. (2011). The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. New Frontiers in Information and Software as Services, Springer.","DOI":"10.1007\/978-3-642-19294-4_9"},{"key":"ref_12","unstructured":"Huang, S., Huang, J., Liu, Y., Yi, L., and Dai, J. (2010, January 5). Hibench: A representative and comprehensive hadoop benchmark suite. Proceedings of the ICDE Workshops, Long Beach, CA, USA."},{"key":"ref_13","unstructured":"Correia, R.C.M., Souza, G.S., Eler, D.M., Olivete, C., and Garcia, R.E. (2018, January 16\u201318). Teaching Distributed Systems Using Hadoop. Proceedings of the 15th International Conference on Information Technology\u2014New Generations, Las Vegas, NV, USA."},{"key":"ref_14","unstructured":"Henrique, G.J., and Kaster, D.D.S. (2013). Consultas por Similaridade em Big Data: Alternativas e Solu\u00e7\u00f5es, Faculdade Estadual de Londrina. Technical Report."},{"key":"ref_15","unstructured":"Rocha, F.D.G., and Senger, H.S. (2013). An\u00e1lise de Escalabilidade de Aplica\u00e7\u00f5es Hadoop\/MapReduce por meio de Simula\u00e7\u00e3o. [Master\u2019s Thesis, Universidade Federal de S\u00e3o Carlos]."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Khalid, A., Afzal, H., and Aftab, S. (2014, January 16\u201319). Balancing scalability, performance and fault tolerance for structured data (BSPF). Proceedings of the 16th International Conference on Advanced Communication Technology, Pyeongchang, Korea.","DOI":"10.1109\/ICACT.2014.6779058"},{"key":"ref_17","unstructured":"Coulouris, G.F., Dollimore, J., and Kindberg, T. (2005). Distributed Systems: Concepts and Design, Pearson Education."},{"key":"ref_18","unstructured":"Fox, A., and Brewer, E. (1999, January 30). Harvest, yield, and scalable tolerant systems. Proceedings of the Seventh Workshop on Hot Topics in Operating Systems, Rio Rico, AZ, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Brewer, E.A. (2000, January 16\u201319). Towards robust distributed systems. Proceedings of the ACM Symposium on Principles of Distributed Computing, Portland, OR, USA.","DOI":"10.1145\/343477.343502"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1145\/1394127.1394128","article-title":"Base: An acid alternative","volume":"6","author":"Pritchett","year":"2008","journal-title":"Queue"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/MC.2012.37","article-title":"CAP twelve years later: How the \u201crules\u201d have changed","volume":"45","author":"Brewer","year":"2012","journal-title":"Computer"},{"key":"ref_22","unstructured":"Hadoop, A. (2018, March 07). Apache Hadoop. Available online: http:\/\/hadoop.apache.org."},{"key":"ref_23","unstructured":"Goldman, A., Kon, F., Junior, F.P., Polato, I., and de F\u00e1tima Pereira, R. (2012, January 16\u201319). Apache Hadoop: Conceitos te\u00f3ricos e pr\u00e1ticos, evolu\u00e7ao e novas possibilidades. Proceedings of the XXXI Jornadas de Atualiza\u00e7oes em Informatica, Curitiba, Brazil."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Quick, L., Wilkinson, P., and Hardcastle, D. (2012, January 26\u201329). Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis. Proceedings of the 2012 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey.","DOI":"10.1109\/ASONAM.2012.254"},{"key":"ref_25","first-page":"21","article-title":"The hadoop distributed file system: Architecture and design","volume":"11","author":"Borthakur","year":"2007","journal-title":"Hadoop Proj. Webs."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/9\/6\/131\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:06:18Z","timestamp":1760195178000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/9\/6\/131"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,5,29]]},"references-count":25,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2018,6]]}},"alternative-id":["info9060131"],"URL":"https:\/\/doi.org\/10.3390\/info9060131","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2018,5,29]]}}}