{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T22:31:24Z","timestamp":1673476284395},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2008,8]]},"abstract":"<jats:p>\n            Sorting hierarchical data in external memory is necessary for a wide variety of applications including archiving scientific data and dealing with large XML datasets. The topic of sorting hierarchical data, however, has received little attention from the research community so far. In this paper we focus on sorting arbitrary hierarchical data that far exceed the size of physical memory. We propose HErMeS, an algorithm that generalizes the most widely-used techniques for sorting flat data in external memory. HErMeS efficiently exploits the hierarchical structure to minimize the number of disk accesses and optimize the use of available memory. We extract the theoretical bounds of the algorithm with respect to the structure of the hierarchical dataset. We then show how the algorithm can be used to support efficient archiving. We have conducted an experimental study using several workloads and comparing HErMeS to the state-of-the-art approaches. Our results show that our algorithm (\n            <jats:italic>a<\/jats:italic>\n            ) meets its theoretical expectations, (\n            <jats:italic>b<\/jats:italic>\n            ) allows for scalable database archiving, and (\n            <jats:italic>c<\/jats:italic>\n            ) outperforms the competition by a significant factor. These results, we believe, prove our technique to be a viable and scalable solution to the problem of sorting hierarchical data in external memory.\n          <\/jats:p>","DOI":"10.14778\/1453856.1453983","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1205-1216","source":"Crossref","is-referenced-by-count":8,"title":["Sorting hierarchical data in external memory for archiving"],"prefix":"10.14778","volume":"1","author":[{"given":"Ioannis","family":"Koltsidas","sequence":"first","affiliation":[{"name":"University of Edinburgh"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heiko","family":"M\u00fcller","sequence":"additional","affiliation":[{"name":"University of Edinburgh"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stratis D.","family":"Viglas","sequence":"additional","affiliation":[{"name":"University of Edinburgh"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"2002","author":"Avila-Campillo I.","year":"2002","unstructured":"I. Avila-Campillo XMLTK: An XML Toolkit for Scalable XML Stream Processing. In PLANX 2002 , 2002 . I. Avila-Campillo et al. XMLTK: An XML Toolkit for Scalable XML Stream Processing. In PLANX 2002, 2002.","journal-title":"In PLANX"},{"key":"e_1_2_1_2_1","volume-title":"January","author":"Berglund A.","year":"2007","unstructured":"A. Berglund XML Path Language (XPath) 2.0 , January 2007 . A. Berglund et al. XML Path Language (XPath) 2.0, January 2007."},{"key":"e_1_2_1_3_1","volume-title":"January","author":"Boag S.","year":"2007","unstructured":"S. Boag XQuery 1.0: An XML Query Language , January 2007 . S. Boag et al. XQuery 1.0: An XML Query Language, January 2007."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.371984"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/974750.974752"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/235968.233366"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2002.994696"},{"key":"e_1_2_1_8_1","volume-title":"The Universal Protein Resource (UniProt). Nucleic Acids Res., 35(Database Issue): D193--197","author":"Consortium","year":"2007","unstructured":"Consortium . The Universal Protein Resource (UniProt). Nucleic Acids Res., 35(Database Issue): D193--197 , 2007 . Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res., 35(Database Issue): D193--197, 2007."},{"key":"e_1_2_1_9_1","volume-title":"Introduction to Algorithms","author":"Cormen T. H.","year":"2001","unstructured":"T. H. Cormen Introduction to Algorithms ( Second Edition). MIT Press , 2001 . T. H. Cormen et al. Introduction to Algorithms (Second Edition). MIT Press, 2001."},{"key":"e_1_2_1_10_1","volume-title":"DAGS Symposium on Parallel Computation","author":"Vengroff D. E.","year":"1994","unstructured":"D. E. Vengroff . A Transparent Parallel I\/O Environment . In DAGS Symposium on Parallel Computation , 1994 . D. E. Vengroff. A Transparent Parallel I\/O Environment. In DAGS Symposium on Parallel Computation, 1994."},{"key":"e_1_2_1_11_1","volume-title":"EMBL Nucleotide Sequence Database","author":"European Bioinformatics Institute","year":"2007","unstructured":"European Bioinformatics Institute . EMBL Nucleotide Sequence Database , 2007 . European Bioinformatics Institute. EMBL Nucleotide Sequence Database, 2007."},{"key":"e_1_2_1_12_1","volume-title":"Swiss-Prot Protein Knowledgebase","author":"European Bioinformatics Institute","year":"2007","unstructured":"European Bioinformatics Institute . Swiss-Prot Protein Knowledgebase , 2007 . European Bioinformatics Institute. Swiss-Prot Protein Knowledgebase, 2007."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/152610.152611"},{"key":"e_1_2_1_14_1","volume-title":"The Art of Computer Programming, Volume 3: Sorting and Searching","author":"Knuth D.","year":"1998","unstructured":"D. Knuth . The Art of Computer Programming, Volume 3: Sorting and Searching , 2 nd edition. Addison-Wesley , 1998 . D. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd edition. Addison-Wesley, 1998.","edition":"2"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376758"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/977401.978071"},{"key":"e_1_2_1_17_1","volume-title":"Aggregation and Accumulation of XML Data","author":"Tufte K.","year":"2001","unstructured":"K. Tufte and D. Maier . Aggregation and Accumulation of XML Data . IEEE Data Eng. Bull ., 24(2), 2001 . K. Tufte and D. Maier. Aggregation and Accumulation of XML Data. IEEE Data Eng. Bull., 24(2), 2001."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2003.1260818"},{"key":"e_1_2_1_19_1","volume-title":"ER","year":"2004","unstructured":"Wanxia Wei and Mengchi Liu and Shijun Li. Merging of XML Documents . In ER , 2004 . Wanxia Wei and Mengchi Liu and Shijun Li. Merging of XML Documents. In ER, 2004."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.494169"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1453856.1453983","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:08:35Z","timestamp":1672225715000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1453856.1453983"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,8]]}},"alternative-id":["10.14778\/1453856.1453983"],"URL":"https:\/\/doi.org\/10.14778\/1453856.1453983","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2008,8]]}}}