{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T05:02:45Z","timestamp":1746594165858,"version":"3.37.3"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T00:00:00Z","timestamp":1638835200000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"psc-cuny","award":["62177-00 50"],"award-info":[{"award-number":["62177-00 50"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Whole Slide Image (WSI) datasets are giga-pixel resolution, unstructured histopathology datasets that consist of extremely big files (each can be as large as multiple GBs in compressed format). These datasets have utility in a wide range of diagnostic and investigative pathology applications. However, the datasets present unique challenges: The size of the files, propriety data formats, and lack of efficient parallel data access libraries limit the scalability of these applications. Commercial clouds provide dynamic, cost-effective, scalable infrastructure to process these datasets, however, we lack the tools and algorithms that will transfer\/transform them onto the cloud seamlessly, providing faster speeds and scalable formats. In this study, we present novel algorithms that transfer these datasets onto the cloud while at the same time transforming them into symmetric scalable formats. Our algorithms use intelligent file size distribution, and pipelining transfer and transformation tasks without introducing extra overhead to the underlying system. The algorithms, tested in the Amazon Web Services (AWS) cloud, outperform the widely used transfer tools and algorithms, and also outperform our previous work. The data access to the transformed datasets provides better performance compared to the related work. The transformed symmetric datasets are fed into three different analytics applications: a distributed implementation of a content-based image retrieval (CBIR) application for prostate carcinoma datasets, a deep convolutional neural network application for classification of breast cancer datasets, and to show that the algorithms can work with any spatial dataset, a Canny Edge Detection application on satellite image datasets. Although different in nature, all of the applications can easily work with our new symmetric data format and performance results show near-linear speed-ups as the number of processors increases.<\/jats:p>","DOI":"10.1186\/s40537-021-00546-3","type":"journal-article","created":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T09:02:52Z","timestamp":1638867772000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Performance-efficient distributed transfer and transformation of big spatial histopathology datasets in the cloud"],"prefix":"10.1186","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9485-3714","authenticated-orcid":false,"given":"Esma","family":"Yildirim","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,12,7]]},"reference":[{"issue":"4","key":"546_CR1","doi-asserted-by":"publisher","first-page":"1049","DOI":"10.1109\/JBHI.2016.2580145","volume":"21","author":"E Yildirim","year":"2017","unstructured":"Yildirim E, Foran DJ. Parallel versus distributed data access for gigapixel-resolution histology images: challenges and opportunities. IEEE J Biomed Health Inform. 2017;21(4):1049\u201357.","journal-title":"IEEE J Biomed Health Inform"},{"issue":"1","key":"546_CR2","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1016\/j.cmpb.2012.03.007","volume":"108","author":"G Bueno","year":"2012","unstructured":"Bueno G, Gonzalez R, D\u00e9niz O, Garc\u00eda-Rojo M, Gonzalez-Garcia J, Fern\u00e1ndez-Carrobles M, et al. A parallel solution for high resolution histological image analysis. Comput Methods Programs Biomed. 2012;108(1):388\u2013401.","journal-title":"Comput Methods Programs Biomed"},{"key":"546_CR3","first-page":"23","volume":"7","author":"N Farahani","year":"2015","unstructured":"Farahani N, Parwani AV, Pantanowitz L. Whole slide imaging in pathology: advantages, limitations, and emerging perspectives. Pathol Lab Med Int. 2015;7:23\u201333.","journal-title":"Pathol Lab Med Int"},{"key":"546_CR4","unstructured":"Openslide; 2021. Available from: https:\/\/openslide.org."},{"key":"546_CR5","doi-asserted-by":"crossref","unstructured":"Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. OpenSlide: a vendor-neutral software foundation for digital pathology. J Pathol Inf. 2013;4.","DOI":"10.4103\/2153-3539.119005"},{"key":"546_CR6","doi-asserted-by":"crossref","unstructured":"Moore J, Linkert M, Blackburn C, Carroll M, Ferguson RK, Flynn H, et\u00a0al. OMERO and Bio-Formats 5: flexible access to large bioimaging datasets at scale. In: Medical Imaging 2015: Image Processing. vol. 9413. International Society for Optics and Photonics; 2015. p. 941307.","DOI":"10.1117\/12.2086370"},{"issue":"1\u201313","key":"546_CR7","first-page":"2","volume":"53","author":"D Borthakur","year":"2008","unstructured":"Borthakur D, et al. HDFS architecture guide. Hadoop Apache Project. 2008;53(1\u201313):2.","journal-title":"Hadoop Apache Project"},{"key":"546_CR8","unstructured":"Amazon Simple Storage System; 2021. Available from: https:\/\/aws.amazon.com\/s3\/."},{"key":"546_CR9","unstructured":"Braam P. The Lustre storage architecture. arXiv preprint arXiv:190301955. 2019."},{"key":"546_CR10","unstructured":"Schmuck FB, Haskin RL. GPFS: a shared-disk file system for large computing clusters. In: FAST. vol.\u00a02; 2002."},{"key":"546_CR11","unstructured":"OpenCV library; 2021. Available from: https:\/\/opencv.org."},{"key":"546_CR12","doi-asserted-by":"crossref","unstructured":"Teodoro G, Kurc T, Kong J, Cooper L, Saltz J. Comparative performance analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: a case study from microscopy image analysis. In: Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE; 2014. p. 1063\u20131072.","DOI":"10.1109\/IPDPS.2014.111"},{"key":"546_CR13","doi-asserted-by":"crossref","unstructured":"Zerbe N, Hufnagl P, Schl\u00fcns K. Distributed computing in image analysis using open source frameworks and application to image sharpness assessment of histological whole slide images. In: Diagnostic pathology. vol.\u00a06. BioMed Central; 2011. p. S16.","DOI":"10.1186\/1746-1596-6-S1-S16"},{"key":"546_CR14","doi-asserted-by":"crossref","unstructured":"Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, et\u00a0al. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. In: Proceedings of the VLDB Endowment International Conference on Very Large Data Bases. vol.\u00a06. NIH Public Access; 2013.","DOI":"10.14778\/2536222.2536227"},{"key":"546_CR15","unstructured":"Hadoop; 2021. Available from: https:\/\/hadoop.apache.org."},{"issue":"1","key":"546_CR16","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1109\/TCC.2015.2457423","volume":"6","author":"R Chard","year":"2015","unstructured":"Chard R, Madduri R, Karonis NT, Chard K, Duffin KL, Ordo\u00f1ez CE, et al. Scalable pCT image reconstruction delivered as a cloud service. IEEE Trans Cloud Comput. 2015;6(1):182\u201395.","journal-title":"IEEE Trans Cloud Comput"},{"key":"546_CR17","doi-asserted-by":"crossref","unstructured":"Parsonson L, Grimm S, Bajwa A, Bourn L, Bai L. A cloud computing medical image analysis and collaboration platform. In: International Conference on Cloud Computing and Services Science. Springer; 2011. p. 207\u2013224.","DOI":"10.1007\/978-1-4614-2326-3_11"},{"key":"546_CR18","doi-asserted-by":"crossref","unstructured":"Kagadis GC, Kloukinas C, Moore K, Philbin J, Papadimitroulas P, Alexakos C, et\u00a0al. Cloud computing in medical imaging. Med Phys. 2013;40(7).","DOI":"10.1118\/1.4811272"},{"issue":"13","key":"546_CR19","doi-asserted-by":"publisher","first-page":"2266","DOI":"10.1002\/cpe.3274","volume":"26","author":"RK Madduri","year":"2014","unstructured":"Madduri RK, Sulakhe D, Lacinski L, Liu B, Rodriguez A, Chard K, et al. Experiences building Globus Genomics: a next-generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services. Concurr Comput: Pract Exp. 2014;26(13):2266\u201379.","journal-title":"Concurr Comput: Pract Exp"},{"issue":"3","key":"546_CR20","doi-asserted-by":"publisher","first-page":"969","DOI":"10.1109\/JBHI.2018.2885214","volume":"23","author":"F Milletari","year":"2018","unstructured":"Milletari F, Frei J, Aboulatta M, Vivar G, Ahmadi SA. Cloud deployment of high-resolution medical image analysis with TOMAAT. IEEE J Biomed Health Inform. 2018;23(3):969\u201377.","journal-title":"IEEE J Biomed Health Inform"},{"issue":"1","key":"546_CR21","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1109\/JBHI.2014.2361633","volume":"20","author":"TM Godinho","year":"2014","unstructured":"Godinho TM, Viana-Ferreira C, Silva LAB, Costa C. A routing mechanism for cloud outsourcing of medical imaging repositories. IEEE J Biomed Health Inform. 2014;20(1):367\u201375.","journal-title":"IEEE J Biomed Health Inform"},{"issue":"1","key":"546_CR22","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1109\/JBHI.2015.2496323","volume":"21","author":"BS Harvey","year":"2015","unstructured":"Harvey BS, Ji SY. Cloud-scale genomic signals processing for robust large-scale cancer genomic microarray data analysis. IEEE J Biomed Health Inform. 2015;21(1):238\u201345.","journal-title":"IEEE J Biomed Health Inform"},{"key":"546_CR23","doi-asserted-by":"crossref","unstructured":"Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, et\u00a0al. Performance analysis of high performance computing applications on the amazon web services cloud. In: 2010 IEEE second international conference on cloud computing technology and science. IEEE; 2010. p. 159\u2013168.","DOI":"10.1109\/CloudCom.2010.69"},{"key":"546_CR24","unstructured":"Bremer E, Almeida J, Saltz J. Representing whole slide cancer image features with Hilbert curves. arXiv preprint arXiv:200506469. 2020."},{"issue":"1","key":"546_CR25","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1186\/1471-2105-15-287","volume":"15","author":"X Qi","year":"2014","unstructured":"Qi X, Wang D, Rodero I, Diaz-Montes J, Gensure RH, Xing F, et al. Content-based histopathology image retrieval using CometCloud. BMC Bioinformatics. 2014;15(1):287.","journal-title":"BMC Bioinformatics"},{"key":"546_CR26","unstructured":"Spark; 2021. Available from: https:\/\/spark.apache.org."},{"key":"546_CR27","unstructured":"Image Processing and Analysis in Java; 2021. Available from: https:\/\/imagej.nih.gov\/ij\/index.html."},{"key":"546_CR28","unstructured":"AWS Elastic Load Balancing; 2021. Available from: https:\/\/aws.amazon.com\/ru\/elasticloadbalancing\/application-load-balancer\/."},{"key":"546_CR29","unstructured":"Breast Cancer Classification with Keras and Deep Learning; 2021. Available from: https:\/\/www.pyimagesearch.com\/2019\/02\/18\/breast-cancer-classification-with-keras-and-deep-learning\/."},{"key":"546_CR30","unstructured":"Breast Histopathology Images; 2021. Available from: https:\/\/www.kaggle.com\/paultimothymooney\/breast-histopathology-images."},{"key":"546_CR31","unstructured":"Natural Earth Dataset; 2021. Available from: https:\/\/www.naturalearthdata.com."},{"issue":"1","key":"546_CR32","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1109\/MSP.2005.1407716","volume":"22","author":"A Koschan","year":"2005","unstructured":"Koschan A, Abidi M. Detection and classification of edges in color images. IEEE Signal Process Mag. 2005;22(1):64\u201373.","journal-title":"IEEE Signal Process Mag"},{"key":"546_CR33","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-6684-9","volume-title":"Digital image processing: an algorithmic introduction using Java","author":"W Burger","year":"2016","unstructured":"Burger W, Burge MJ. Digital image processing: an algorithmic introduction using Java. Berlin: Springer; 2016."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00546-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-021-00546-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00546-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T09:23:39Z","timestamp":1638869019000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00546-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["546"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00546-3","relation":{},"ISSN":["2196-1115"],"issn-type":[{"type":"electronic","value":"2196-1115"}],"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"13 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The author declares that she has no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"155"}}