{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T22:42:21Z","timestamp":1774219341075,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:p>Random walk is widely used in many graph analysis tasks, especially the first-order random walk. However, as a simplification of real-world problems, the first-order random walk is poor at modeling higher-order structures in the data. Recently, second-order random walk-based applications (e.g., Node2vec, Second-order PageRank) have become attractive. Due to the complexity of the second-order random walk models and memory limitations, it is not scalable to run second-order random walk-based applications on a single machine. Existing disk-based graph systems are only friendly to the first-order random walk models and suffer from expensive disk I\/Os when executing the second-order random walks. This paper introduces an I\/O-efficient disk-based graph system for the scalable second-order random walk of large graphs, called GraSorw. First, to eliminate massive light vertex I\/Os, we develop a bi-block execution engine that converts random I\/Os into sequential I\/Os by applying a new triangular bi-block scheduling strategy, the bucket-based walk management, and the skewed walk storage. Second, to improve the I\/O utilization, we design a learning-based block loading model to leverage the advantages of the full-load and on-demand load methods. Finally, we conducted extensive experiments on six large real datasets as well as several synthetic datasets.. The empirical results demonstrate that the end-to-end time cost of popular tasks in GraSorw is reduced by more than one order of magnitude compared to the existing disk-based graph systems.<\/jats:p>","DOI":"10.14778\/3529337.3529346","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T22:23:05Z","timestamp":1655936585000},"page":"1619-1631","source":"Crossref","is-referenced-by-count":16,"title":["An I\/O-efficient disk-based graph system for scalable second-order random walk of large graphs"],"prefix":"10.14778","volume":"15","author":[{"given":"Hongzheng","family":"Li","sequence":"first","affiliation":[{"name":"BUPT"}]},{"given":"Yingxia","family":"Shao","sequence":"additional","affiliation":[{"name":"BUPT"}]},{"given":"Junping","family":"Du","sequence":"additional","affiliation":[{"name":"BUPT"}]},{"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University (Qingdao), China"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology"}]}],"member":"320","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"April 17 2022. Crawlweb. http:\/\/webdatacommons.org\/hyperlinkgraph\/index.html  April 17 2022. Crawlweb. http:\/\/webdatacommons.org\/hyperlinkgraph\/index.html"},{"key":"e_1_2_1_2_1","unstructured":"April 17 2022. Friendster. https:\/\/snap.stanford.edu\/data\/com-Friendster.html  April 17 2022. Friendster. https:\/\/snap.stanford.edu\/data\/com-Friendster.html"},{"key":"e_1_2_1_3_1","unstructured":"April 17 2022. Graph500. https:\/\/graph500.org\/  April 17 2022. Graph500. https:\/\/graph500.org\/"},{"key":"e_1_2_1_4_1","unstructured":"April 17 2022. LiveJournal. https:\/\/snap.stanford.edu\/data\/soc-LiveJournal1.html  April 17 2022. LiveJournal. https:\/\/snap.stanford.edu\/data\/soc-LiveJournal1.html"},{"key":"e_1_2_1_5_1","unstructured":"April 17 2022. Twitter. https:\/\/old.datahub.io\/dataset\/twitter-social-graph-www2010  April 17 2022. Twitter. https:\/\/old.datahub.io\/dataset\/twitter-social-graph-www2010"},{"key":"e_1_2_1_6_1","volume-title":"UK200705","author":"April","year":"2022","unstructured":"April 17 , 2022 . UK200705 . http:\/\/law.di.unimi.it\/webdata\/uk-2007-05\/ April 17, 2022. UK200705. http:\/\/law.di.unimi.it\/webdata\/uk-2007-05\/"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/645926.671873"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0246062"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/LA-WEB.2012.19"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12083-016-0457-0"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098036"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.80.016105"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1080\/15427951.2005.10129104"},{"key":"e_1_2_1_14_1","volume-title":"PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX symposium on operating systems design and implementation. 17--30","author":"Gonzalez Joseph E","year":"2012","unstructured":"Joseph E Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . 2012 . PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX symposium on operating systems design and implementation. 17--30 . Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX symposium on operating systems design and implementation. 17--30."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939754"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(99)00016-X"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775126"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 2012 IEEE International Symposium on Workload Characterization. 111--112","author":"Kumar Sanjeev","year":"1998","unstructured":"Sanjeev Kumar . 1998 . The PageRank Citation Ranking: Bringing Order to the Web . In Proceedings of the 2012 IEEE International Symposium on Workload Characterization. 111--112 . Sanjeev Kumar. 1998. The PageRank Citation Ranking: Bringing Order to the Web. In Proceedings of the 2012 IEEE International Symposium on Workload Characterization. 111--112."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507173"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31--46","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola , Guy Blelloch , and Carlos Guestrin . 2012 . Graphchi: Large-scale graph computation on just a PC . In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31--46 . Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 31--46."},{"key":"e_1_2_1_21_1","volume-title":"Google's PageRank and Beyond","author":"Langville Amy N","unstructured":"Amy N Langville and Carl D Meyer . 2011. Google's PageRank and beyond . In Google's PageRank and Beyond . Princeton university press . Amy N Langville and Carl D Meyer. 2011. Google's PageRank and beyond. In Google's PageRank and Beyond. Princeton university press."},{"key":"e_1_2_1_22_1","volume-title":"arXiv preprint arXiv:2203.16123","author":"Li Hongzheng","year":"2022","unstructured":"Hongzheng Li , Yingxia Shao , Junping Du , Bin Cui , and Lei Chen . 2022. An I\/O- Efficient Disk-based Graph System for Scalable Second-Order Random Walk of Large Graphs . arXiv preprint arXiv:2203.16123 ( 2022 ). Hongzheng Li, Yingxia Shao, Junping Du, Bin Cui, and Lei Chen. 2022. An I\/O-Efficient Disk-based Graph System for Scalable Second-Order Random Walk of Large Graphs. arXiv preprint arXiv:2203.16123 (2022)."},{"key":"e_1_2_1_23_1","volume-title":"Second-Order CoSimRank for Similarity Measures in Social Networks. In 2019 IEEE International Conference on Communications. 1--6.","author":"Liao Xueting","year":"2019","unstructured":"Xueting Liao , Yubao Wu , and Xiaojun Cao . 2019 . Second-Order CoSimRank for Similarity Measures in Social Networks. In 2019 IEEE International Conference on Communications. 1--6. Xueting Liao, Yubao Wu, and Xiaojun Cao. 2019. Second-Order CoSimRank for Similarity Measures in Social Networks. In 2019 IEEE International Conference on Communications. 1--6."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 15th USENIX Conference on File and Storage Technologies. 285--299","author":"Liu Hang","unstructured":"Hang Liu and H. Howie Huang . 2017. Graphene: Fine-grained IO management for graph computing . In Proceedings of the 15th USENIX Conference on File and Storage Technologies. 285--299 . Hang Liu and H. Howie Huang. 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of the 15th USENIX Conference on File and Storage Technologies. 285--299."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983713"},{"key":"e_1_2_1_26_1","volume-title":"Hellerstein","author":"Low Yucheng","year":"2012","unstructured":"Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin , and Joseph M . Hellerstein . 2012 . Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the VLDB Endowment . 716--727. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the VLDB Endowment. 716--727."},{"key":"e_1_2_1_27_1","volume-title":"ServiceRank: Root Cause Identification of Anomaly in Large-Scale Microservice Architecture","author":"Ma Meng","year":"2021","unstructured":"Meng Ma , Weilan Lin , Disheng Pan , and Ping Wang . 2021. ServiceRank: Root Cause Identification of Anomaly in Large-Scale Microservice Architecture . IEEE Transactions on Dependable and Secure Computing 5971, c ( 2021 ), 1--15. Meng Ma, Weilan Lin, Disheng Pan, and Ping Wang. 2021. ServiceRank: Root Cause Identification of Anomaly in Large-Scale Microservice Architecture. IEEE Transactions on Dependable and Secure Computing 5971, c (2021), 1--15."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064191"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/322063.322075"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Kang Ning and Hon Wai Leong. 2006. Towards a Better Solution to the Shortest Common Supersequence Problem: A Post. In Computer and Computational Sciences International Multi-Symposiums on. 84--90.  Kang Ning and Hon Wai Leong. 2006. Towards a Better Solution to the Shortest Common Supersequence Problem: A Post. In Computer and Computational Sciences International Multi-Symposiums on . 84--90.","DOI":"10.1109\/IMSCCS.2006.136"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-021-00155-3"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623732"},{"key":"e_1_2_1_34_1","volume-title":"The network analysis of urban streets: A dual approach. Physica A: Statistical Mechanics and its Applications 369, 2","author":"Porta Sergio","year":"2006","unstructured":"Sergio Porta , Paolo Crucitti , and Vito Latora . 2006. The network analysis of urban streets: A dual approach. Physica A: Statistical Mechanics and its Applications 369, 2 ( 2006 ), 853--866. Sergio Porta, Paolo Crucitti, and Vito Latora. 2006. The network analysis of urban streets: A dual approach. Physica A: Statistical Mechanics and its Applications 369, 2 (2006), 853--866."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/456"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1131"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(81)90075-X"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00669-2"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380562"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476257"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186120"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/0890-5401(89)90044-8"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 2016 USENIX Annual Technical Conference. 507--522","author":"Vora Keval","year":"2016","unstructured":"Keval Vora , Guoqing Xu , and Rajiv Gupta . 2016 . Load the edges you need: A generic I\/O optimization for disk-based graph processing . In Proceedings of the 2016 USENIX Annual Technical Conference. 507--522 . Keval Vora, Guoqing Xu, and Rajiv Gupta. 2016. Load the edges you need: A generic I\/O optimization for disk-based graph processing. In Proceedings of the 2016 USENIX Annual Technical Conference. 507--522."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2018.00076"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference. 559--571","author":"Wang Rui","unstructured":"Rui Wang , Yongkun Li , Hong Xie , Yinlong Xu , and John C.S. Lui . 2020. Graph-Walker: An I\/O-efficient and resource-friendly graph analytic system for fast and scalable random walks . In Proceedings of the 2020 USENIX Annual Technical Conference. 559--571 . Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, and John C.S. Lui. 2020. Graph-Walker: An I\/O-efficient and resource-friendly graph analytic system for fast and scalable random walks. In Proceedings of the 2020 USENIX Annual Technical Conference. 559--571."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3015270.3015272"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-021-00154-4"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359634"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00051"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 2015 USENIX Annual Technical Conference. 375--386","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu , Wentao Han , and Wenguang Chen . 2015 . Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning . In Proceedings of the 2015 USENIX Annual Technical Conference. 375--386 . Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the 2015 USENIX Annual Technical Conference. 375--386."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3529337.3529346","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:46:35Z","timestamp":1672220795000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3529337.3529346"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":51,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["10.14778\/3529337.3529346"],"URL":"https:\/\/doi.org\/10.14778\/3529337.3529346","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,4]]}}}