{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:13:32Z","timestamp":1760242412320,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2017,7,18]],"date-time":"2017-07-18T00:00:00Z","timestamp":1500336000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Graphs emerge naturally in many domains, such as social science, neuroscience, transportation engineering, and more. In many cases, such graphs have millions or billions of nodes and edges, and their sizes increase daily at a fast pace. How can researchers from various domains explore large graphs interactively and efficiently to find out what is \u2018important\u2019? How can multiple researchers explore a new graph dataset collectively and \u201chelp\u201d each other with their findings? In this article, we present Perseus-Hub, a large-scale graph mining tool that computes a set of graph properties in a distributed manner, performs ensemble, multi-view anomaly detection to highlight regions that are worth investigating, and provides users with uncluttered visualization and easy interaction with complex graph statistics. Perseus-Hub uses a Spark cluster to calculate various statistics of large-scale graphs efficiently, and aggregates the results in a summary on the master node to support interactive user exploration. In Perseus-Hub, the visualized distributions of graph statistics provide preliminary analysis to understand a graph. To perform a deeper analysis, users with little prior knowledge can leverage patterns (e.g., spikes in the power-law degree distribution) marked by other users or experts. Moreover, Perseus-Hub guides users to regions of interest by highlighting anomalous nodes and helps users establish a more comprehensive understanding about the graph at hand. We demonstrate our system through the case study on real, large-scale networks.<\/jats:p>","DOI":"10.3390\/informatics4030022","type":"journal-article","created":{"date-parts":[[2017,7,18]],"date-time":"2017-07-18T10:33:14Z","timestamp":1500373994000},"page":"22","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs"],"prefix":"10.3390","volume":"4","author":[{"given":"Di","family":"Jin","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Aristotelis","family":"Leventidis","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Haoming","family":"Shen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Ruowang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3312-7898","authenticated-orcid":false,"given":"Junyue","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Danai","family":"Koutra","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, USA"}]}],"member":"1968","published-online":{"date-parts":[[2017,7,18]]},"reference":[{"key":"ref_1","unstructured":"Kuramochi, M., and Karypis, G. (December, January 29). Frequent Subgraph Discovery. Proceedings of the 2001 1st IEEE International Conference on Data Mining (ICDM), San Jose, CA, USA."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1002\/cem.908","article-title":"Multi-way analysis with applications in the chemical sciences, age smilde, Rasmus Bro and Paul Geladi, Wiley, Chichester, 2004, ISBN 0-471-98691-7, 381 pp","volume":"19","author":"Leardi","year":"2005","journal-title":"J. Chemometr."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tong, H., and Faloutsos, C. (2006, January 20\u201323). Center-piece subgraphs: Problem definition and fast solutions. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201906), New York, NY, USA.","DOI":"10.1145\/1150402.1150448"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sondhi, P., Sun, J., Tong, H., and Zhai, C. (2012, January 12\u201316). SympGraph: A framework for mining clinical notes through symptom relation graphs. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201912), Beijing, China.","DOI":"10.1145\/2339530.2339712"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Backstrom, L., Kumar, R., Marlow, C., Novak, J., and Tomkins, A. (2008, January 11\u201312). Preferential behavior in online groups. Proceedings of the International Conference on Web Search and Web Data Mining (WSDM \u201908), New York, NY, USA.","DOI":"10.1145\/1341531.1341549"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1016\/S0378-4371(02)00736-7","article-title":"Evolution of the social network of scientific collaborations","volume":"311","author":"Jeong","year":"2002","journal-title":"Physica A"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1109\/TVCG.2014.2346444","article-title":"Glo-stix: Graph-level operations for specifying techniques and interactive exploration","volume":"20","author":"Stolper","year":"2014","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chau, D.H., Kittur, A., Hong, J.I., and Faloutsos, C. (2011, January 21\u201324). Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Diego, CA, USA.","DOI":"10.1145\/2020408.2020524"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1924","DOI":"10.14778\/2824032.2824102","article-title":"Perseus: An Interactive Large-Scale Graph Mining and Visualization Tool","volume":"8","author":"Koutra","year":"2015","journal-title":"Proc. VLDB Endow."},{"key":"ref_10","unstructured":"Jin, D., Sethapakdi, T., Koutra, D., and Faloutsos, C. (, January July). PERSEUS3: Visualizing and Interactively Mining Large-Scale Graphs. Proceedings of the WOODSTOCK \u201997, El Paso, TX, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lee, J.Y., Kang, U., Koutra, D., and Faloutsos, C. (2013, January 13\u201317). Fast Anomaly Detection Despite the Duplicates. Proceedings of the 22nd International Conference on World Wide Web (WWW Companion Volume), Rio de Janeiro, Brazil.","DOI":"10.1145\/2487788.2487886"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kriegel, H.P., Zimek, A., and Hubert, M.S. (2008, January 24\u201327). Angle-based outlier detection in high-dimensional data. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA.","DOI":"10.1145\/1401890.1401946"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Pienta, R., Kahng, M., Lin, Z., Vreeken, J., Talukdar, P., Abello, J., Parameswaran, G., and Chau, D.H. (2017, January 27\u201329). FACETS: Adaptive Local Exploration of Large Graphs. Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Houston, TX, USA.","DOI":"10.1137\/1.9781611974973.67"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wongsuphasawat, K., Qu, Z., Moritz, D., Chang, R., Ouk, F., Anand, A., Mackinlay, J., Howe, B., and Heer, J. (2017, January 6\u201311). Voyager 2: Augmenting Visual Analysis with Partial View Specifications. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO, USA.","DOI":"10.1145\/3025453.3025768"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"985","DOI":"10.1111\/j.1467-8659.2012.03091.x","article-title":"Using signposts for navigation in large graphs","volume":"31","author":"May","year":"2012","journal-title":"Comput. Gr. Forum"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1302","DOI":"10.1109\/TVCG.2007.70582","article-title":"NodeTrix: A hybrid visualization of social networks","volume":"13","author":"Henry","year":"2007","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1109\/TVCG.2013.167","article-title":"Interactive exploration of implicit and explicit relations in faceted datasets","volume":"19","author":"Zhao","year":"2013","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/cgf.12642","article-title":"Refinery: Visual exploration of large, heterogeneous networks through associative browsing","volume":"34","author":"Kairam","year":"2015","journal-title":"Comput. Gr. Forum"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Akoglu, L., Chau, D.H., Kang, U., Koutra, D., and Faloutsos, C. (2012, January 20\u201324). OPAvion: Mining and Visualization in Large Graphs. Proceedings of the 2012 ACM International Conference on Management of Data (SIGMOD), Scottsdale, AZ, USA.","DOI":"10.1145\/2213836.2213941"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Kang, U., Tsourakakis, C.E., and Faloutsos, C. (2009, January 6\u20139). PEGASUS: A Peta-Scale Graph Mining System\u2014Implementation and Observations. Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), Miami, FL, USA.","DOI":"10.1109\/ICDM.2009.14"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Akoglu, L., McGlohon, M., and Faloutsos, C. (2010, January 21\u201324). OddBall: Spotting Anomalies in Weighted Graphs. Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hyderabad, India.","DOI":"10.1007\/978-3-642-13672-6_40"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kang, U., Lee, J.Y., Koutra, D., and Faloutsos, C. (2014, January 13\u201316). Net-Ray: Visualizing and Mining Web-Scale Graphs. Proceedings of the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Tainan, Taiwan.","DOI":"10.1007\/978-3-319-06608-0_29"},{"key":"ref_23","unstructured":"Dunne, C., and Shneiderman, B. (May, January 27). Motif Simplification: Improving Network Visualization Readability with Fan, Connector, and Clique Glyphs. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), Paris, France."},{"key":"ref_24","unstructured":"Nielsen, J. (2015, November 17). Website Response Times. Available online: http:\/\/www.nngroup.com\/articles\/website-response-times\/."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Mishra, C., and Koudas, N. (2009, January 24\u201326). Interactive query refinement. Proceedings of the 12th International Conference on Extending Database Technology (EDBT 2009), Saint Petersburg, Russia.","DOI":"10.1145\/1516360.1516459"},{"key":"ref_26","first-page":"1250","article-title":"SnapToQuery: Providing Interactive Feedback during Exploratory Query Specification","volume":"8","author":"Jiang","year":"2015","journal-title":"PVLDB"},{"key":"ref_27","unstructured":"\u00c7etintemel, U., Cherniack, M., DeBrabant, J., Diao, Y., Dimitriadou, K., Kalinin, A., Papaemmanouil, O., and Zdonik, S.B. (2013, January 6\u20139). Query Steering for Interactive Data Exploration. Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research (CIDR 2013), Asilomar, CA, USA."},{"key":"ref_28","first-page":"3","article-title":"Query Recommendations for Interactive Database Exploration","volume":"Volume 5566","author":"Winslett","year":"2009","journal-title":"Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM 2009)"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Goethals, B., Moens, S., and Vreeken, J. (2011, January 21\u201324). MIME: A Framework for Interactive Visual Pattern Mining. Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Diego, CA, USA.","DOI":"10.1145\/2020408.2020529"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2182","DOI":"10.14778\/2831360.2831371","article-title":"SeeDB: Efficient Data-driven Visualization Recommendations to Support Visual Analytics","volume":"8","author":"Vartak","year":"2015","journal-title":"Proc. VLDB Endow."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Shahaf, D., Yang, J., Suen, C., Jacobs, J., Wang, H., and Leskovec, J. (2013, January 11\u201314). Information cartography: Creating zoomable, large-scale maps of information. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2013), Chicago, IL, USA.","DOI":"10.1145\/2487575.2487690"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chau, D.H., Akoglu, L., Vreeken, J., Tong, H., and Faloutsos, C. (2012, January 12\u201316). TOURVIZ: Interactive Visualization of Connection Pathways in Large Graphs. Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China.","DOI":"10.1145\/2339530.2339769"},{"key":"ref_33","unstructured":"Rodrigues, J.F., Tong, H., Traina, A.J.M., Faloutsos, C., and Leskovec, J. (2006, January 12\u201315). GMine: A System for Scalable, Interactive Graph Visualization and Mining. Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Khoa, N.L.D., and Chawla, S. (2010, January 21\u201324). Robust Outlier Detection Using Commute Time and Eigenspace Embedding. Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hyderabad, India.","DOI":"10.1007\/978-3-642-13672-6_41"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1007\/s10618-014-0365-y","article-title":"Graph-based Anomaly Detection and Description: A Survey","volume":"29","author":"Akoglu","year":"2014","journal-title":"Data Min. Knowl. Discov. (DAMI)"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1002\/wics.1347","article-title":"Anomaly detection in dynamic networks: A survey","volume":"7","author":"Ranshous","year":"2015","journal-title":"WIREs Comput. Statist."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1145\/335191.335388","article-title":"LOF: Identifying density-based local outliers","volume":"Volume 29","author":"Breunig","year":"2000","journal-title":"Proceedings of the ACM SIGMOD 2000 International Conference on Management of Data"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lee, J.Y., Kang, U., Koutra, D., and Faloutsos, C. (2013, January 13\u201317). Fast Outlier Detection Despite the Duplicates. Proceedings of the WWW 2013 Companion, Rio de Janeiro, Brazil.","DOI":"10.1145\/2487788.2487886"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chakrabarti, D. (2004, January 20\u201324). Autopart: Parameter-free graph partitioning and outlier detection. Proceedings of the 8th European Conference on Principles of Data Mining and Knowledge Discovery, Pisa, Italy.","DOI":"10.1007\/978-3-540-30116-5_13"},{"key":"ref_40","unstructured":"Xu, X., Yuruk, N., Feng, Z., and Schweiger, T.A. (2007, January 12\u201315). Scan: A structural clustering algorithm for networks. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Jiang, M., Cui, P., Beutel, A., Faloutsos, C., and Yang, S. (2014, January 24\u201327). Catchsync: catching synchronized behavior in large directed graphs. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2623330.2623632"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1145\/316194.316229","article-title":"On power-law relationships of the internet topology","volume":"Volume 29","author":"Faloutsos","year":"1999","journal-title":"ACM SIGCOMM Computer Communication Review"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., and Faloutsos, C. (2010, January 21\u201324). EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hyderabad, India.","DOI":"10.1109\/ICDMW.2009.103"},{"key":"ref_44","unstructured":"Page, L., Brin, S., Motwani, R., and Winograd, T. (1998, January 14\u201318). The PageRank Citation Ranking: Bringing Order to the Web; Stanford Digital Library Technologies Project. Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Alemi, M., Haghighi, H., and Shahrivari, S. (2017). CCFinder: Using Spark to find clustering coefficient in big graphs. J. Supercomput., 1\u201328.","DOI":"10.1007\/s11227-017-2040-8"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Kriegel, H.P., Kroger, P., Schubert, E., and Zimek, A. (2011, January 28\u201330). Interpreting and unifying outlier scores. Proceedings of the 2011 SIAM International Conference on Data Mining, Phoenix, AZ, USA.","DOI":"10.1137\/1.9781611972818.2"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1016\/j.biopsych.2012.03.026","article-title":"Disrupted functional brain connectome in individuals at risk for Alzheimer\u2019s disease","volume":"73","author":"Wang","year":"2013","journal-title":"Biol. Psychiatry"},{"key":"ref_48","unstructured":"Leskovec, J. (2015, November 17). Stanford Large Network Dataset Collection. Available online: http:\/\/snap.stanford.edu\/data\/cit-HepTh.html."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/4\/3\/22\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:42:58Z","timestamp":1760208178000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/4\/3\/22"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,18]]},"references-count":48,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2017,9]]}},"alternative-id":["informatics4030022"],"URL":"https:\/\/doi.org\/10.3390\/informatics4030022","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2017,7,18]]}}}