{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T22:32:42Z","timestamp":1775514762874,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2007,8,1]],"date-time":"2007-08-01T00:00:00Z","timestamp":1185926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2007,8]]},"abstract":"<jats:p>The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this article, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In short, the stability of the widely accepted statistical description of the Web is called into question. In order to provide a more accurate characterization of the Web graph, we study statistical measures beyond the degree distribution, such as degree-degree correlation functions or the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of the Web. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web.<\/jats:p>","DOI":"10.1145\/1255438.1255442","type":"journal-article","created":{"date-parts":[[2007,9,14]],"date-time":"2007-09-14T13:44:55Z","timestamp":1189777495000},"page":"10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":49,"title":["Decoding the structure of the WWW"],"prefix":"10.1145","volume":"1","author":[{"given":"M. \u00c1ngeles","family":"Serrano","sequence":"first","affiliation":[{"name":"Indiana University and Institute for Scientific Interchange, Turin, Italy"}]},{"given":"Ana","family":"Maguitman","sequence":"additional","affiliation":[{"name":"Universidad Nacional del Sur and CONICET, Blanca, Argentina"}]},{"given":"Mari\u00e1n","family":"Bogu\u00f1\u00e1","sequence":"additional","affiliation":[{"name":"Universitat de Barcelona, Barcelona, Spain"}]},{"given":"Santo","family":"Fortunato","sequence":"additional","affiliation":[{"name":"Indiana University and Institute for Scientific Interchange, Turin, Italy"}]},{"given":"Alessandro","family":"Vespignani","sequence":"additional","affiliation":[{"name":"Indiana University and Institute for Scientific Interchange, Turin, Italy"}]}],"member":"320","published-online":{"date-parts":[[2007,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/383694.383707"},{"key":"e_1_2_1_2_1","volume-title":"-L","author":"Albert R.","year":"1999","unstructured":"Albert , R. , Jeong , H. , and Barab\u00e1si , A . -L . 1999 . Diameter of the World-Wide Web. Nature 401, 6749, 130--131. Albert, R., Jeong, H., and Barab\u00e1si, A.-L. 1999. Diameter of the World-Wide Web. Nature 401, 6749, 130--131."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/383034.383035"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). 535--544","author":"Bar-Yossef Z.","unstructured":"Bar-Yossef , Z. , Berg , A. , Chien , S. , Fakcharoenphol , J. , and Weitz , D . 2000. Approximating aggregate queries about web pages via random walks . In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). 535--544 . Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., and Weitz, D. 2000. Approximating aggregate queries about web pages via random walks. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). 535--544."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Barab\u00e1si A.-L. and Albert R. 1999. Emergence of scaling in random networks. Science 286 5439 509--512.  Barab\u00e1si A.-L. and Albert R. 1999. Emergence of scaling in random networks. Science 286 5439 509--512.","DOI":"10.1126\/science.286.5439.509"},{"key":"e_1_2_1_6_1","first-page":"1","article-title":"Scale-free characteristics of random networks: The topology of the World-Wide Web","volume":"281","author":"Barab\u00e1si A.-L.","year":"2000","unstructured":"Barab\u00e1si , A.-L. , Albert , R. , and Jeong , H. 2000 . Scale-free characteristics of random networks: The topology of the World-Wide Web . Physica A 281 , 1 - 4 , 69--77. Barab\u00e1si, A.-L., Albert, R., and Jeong, H. 2000. Scale-free characteristics of random networks: The topology of the World-Wide Web. Physica A 281, 1-4, 69--77.","journal-title":"Physica A"},{"key":"e_1_2_1_7_1","volume-title":"Ed. Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science","volume":"3243","author":"Barrat A.","unstructured":"Barrat , A. , Barth\u00e9lemy , M. , and Vespignani , A . 2004. Traffic-driven model of the World Wide Web graph, Stephano Leonardi , Ed. Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science , vol. 3243 . Springer, Berlin, Heidelburg, Germany, 56--67. Barrat, A., Barth\u00e9lemy, M., and Vespignani, A. 2004. Traffic-driven model of the World Wide Web graph, Stephano Leonardi, Ed. Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science, vol. 3243. Springer, Berlin, Heidelburg, Germany, 56--67."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1103\/PhysRevE.72.016106","article-title":"Generalized percolation in random directed networks","volume":"72","author":"Bogu\u00f1\u00e1 M.","year":"2005","unstructured":"Bogu\u00f1\u00e1 , M. and Serrano , M. A. 2005 . Generalized percolation in random directed networks . Phys. Rev. E 72 , 1 , 016106. Bogu\u00f1\u00e1, M. and Serrano, M. A. 2005. Generalized percolation in random directed networks. Phys. Rev. E 72, 1, 016106.","journal-title":"Phys. Rev. E"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.587"},{"key":"e_1_2_1_10_1","volume-title":"WWW 2004 Conference Proceedings. ACM","author":"Boldi P.","unstructured":"Boldi , P. and Vigna , S . 2004. The Webgraph framework i: Compression techniques . In WWW 2004 Conference Proceedings. ACM , New York, NY, 595--601. 10.1145\/988672.988752 Boldi, P. and Vigna, S. 2004. The Webgraph framework i: Compression techniques. In WWW 2004 Conference Proceedings. ACM, New York, NY, 595--601. 10.1145\/988672.988752"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(00)00083-9"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 26th International Conference on Very Large Databases","author":"Cho J.","unstructured":"Cho , J. and Garcia-Molina , H . 2000. The evolution of the Web and implications for an incremental crawler . In Proceedings of the 26th International Conference on Very Large Databases ( Cairo, Egypt). 200--209. Cho, J. and Garcia-Molina, H. 2000. The evolution of the Web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Databases (Cairo, Egypt). 200--209."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1103\/PhysRevLett.85.4626","article-title":"Resilience of the Internet to random breakdown","volume":"85","author":"Cohen R.","year":"2000","unstructured":"Cohen , R. , Erez , K. , ben Avraham , D. , and Havlin , S. 2000 . Resilience of the Internet to random breakdown . Phys. Rev. Lett. 85 , 21 , 4626. Cohen, R., Erez, K., ben Avraham, D., and Havlin, S. 2000. Resilience of the Internet to random breakdown. Phys. Rev. Lett. 85, 21, 4626.","journal-title":"Phys. Rev. Lett."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20078"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). 69--78","author":"Dill S.","unstructured":"Dill , S. , Kumar , R. , McCurley , K. , Rajagopalan , S. , Sivakumar , D. , and Tomkins , A . 2001. Self-similarity in the Web . In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). 69--78 . Dill, S., Kumar, R., McCurley, K., Rajagopalan, S., Sivakumar, D., and Tomkins, A. 2001. Self-similarity in the Web. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). 69--78."},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1140\/epjb\/e2004-00056-6","article-title":"Large scale properties of the","volume":"38","author":"Donato D.","year":"2004","unstructured":"Donato , D. , Laura , L. , Leonardi , S. , and Millozzi , S. 2004 . Large scale properties of the Webgraph. Eur. Phys. J. B 38 , 2, 239 -- 243 . Donato, D., Laura, L., Leonardi, S., and Millozzi, S. 2004. Large scale properties of the Webgraph. Eur. Phys. J. B 38, 2, 239--243.","journal-title":"Webgraph. Eur. Phys. J. B"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Eighth International Workshop on the Web and Databases (WebDB). 145--150","author":"Donato D.","unstructured":"Donato , D. , Leonardi , S. , Millozzi , S. , and Tsaparas , P . 2005. Mining the inner structure of the Web graph . In Proceedings of the Eighth International Workshop on the Web and Databases (WebDB). 145--150 . Donato, D., Leonardi, S., Millozzi, S., and Tsaparas, P. 2005. Mining the inner structure of the Web graph. In Proceedings of the Eighth International Workshop on the Web and Databases (WebDB). 145--150."},{"key":"e_1_2_1_18_1","unstructured":"Dorogovtsev S. N. and Mendes J. F. F. 2003. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press Oxford U. K.   Dorogovtsev S. N. and Mendes J. F. F. 2003. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press Oxford U. K."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","first-page":"5825","DOI":"10.1073\/pnas.032093399","article-title":"Curvature of co-links uncovers hidden thematic layers in the World Wide Web","volume":"99","author":"Eckmann J. P.","year":"2002","unstructured":"Eckmann , J. P. and Moses , E. 2002 . Curvature of co-links uncovers hidden thematic layers in the World Wide Web . Procc. Natl. Acad. Sci. 99 , 9, 5825 -- 5829 . Eckmann, J. P. and Moses, E. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Procc. Natl. Acad. Sci. 99, 9, 5825--5829.","journal-title":"Procc. Natl. Acad. Sci."},{"key":"e_1_2_1_20_1","volume-title":"Fourth Workshop on Algorithms and Models for the Web-Graph, Nov. 30 -- Dec. 1, Banff, Alta., (Canada).","author":"Fortunato S.","unstructured":"Fortunato , S. , Bogu\u00f1\u00e1 , M. , Flammini , A. , and Menczer , F . 2006. Approximating pagerank from in-degree. In cs.IR\/0511016 , presented at the Fourth Workshop on Algorithms and Models for the Web-Graph, Nov. 30 -- Dec. 1, Banff, Alta., (Canada). Fortunato, S., Bogu\u00f1\u00e1, M., Flammini, A., and Menczer, F. 2006. Approximating pagerank from in-degree. In cs.IR\/0511016, presented at the Fourth Workshop on Algorithms and Models for the Web-Graph, Nov. 30 -- Dec. 1, Banff, Alta., (Canada)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1103\/PhysRevLett.93.268701","article-title":"Patterns of link reciprocity in directed networks","volume":"93","author":"Garlaschelli D.","year":"2004","unstructured":"Garlaschelli , D. and Loffredo , M. I. 2004 . Patterns of link reciprocity in directed networks . Phys. Rev. Lett. 93 , 26 , 268701. Garlaschelli, D. and Loffredo, M. I. 2004. Patterns of link reciprocity in directed networks. Phys. Rev. Lett. 93, 26, 268701.","journal-title":"Phys. Rev. Lett."},{"key":"e_1_2_1_22_1","volume-title":"WWW 2005 Conference Proceedings","author":"Gulli A.","unstructured":"Gulli , A. and Signorini , A . 2005. The indexable Web is more than 11.5 billion pages . In WWW 2005 Conference Proceedings ( Chiba, Japan). ACM Press, New York, NY, 902--903. 10.1145\/1062745.1062789 Gulli, A. and Signorini, A. 2005. The indexable Web is more than 11.5 billion pages. In WWW 2005 Conference Proceedings (Chiba, Japan). ACM Press, New York, NY, 902--903. 10.1145\/1062745.1062789"},{"key":"e_1_2_1_23_1","volume-title":"WWW 2000 Conference Proceedings","author":"Henzinger M. R.","unstructured":"Henzinger , M. R. , Heydon , A. , Mitzenmacher , M. , and Najork , M . 2000. On near-uniform URL sampling . In WWW 2000 Conference Proceedings ( Amsterdam, The Netherlands). ACM Press, New York, NY, 295--308. Henzinger, M. R., Heydon, A., Mitzenmacher, M., and Najork, M. 2000. On near-uniform URL sampling. In WWW 2000 Conference Proceedings (Amsterdam, The Netherlands). ACM Press, New York, NY, 295--308."},{"key":"e_1_2_1_24_1","volume-title":"WWW 2000 Conference Proceedings","author":"Hirai J.","unstructured":"Hirai , J. , Raghavan , S. , Paepcke , A. , and Garcia-Molina , H . 2000. Webbase: A repository of Web pages . In WWW 2000 Conference Proceedings ( Amsterdam, The Netherlands). ACM Press, New York, NY, 277--293. Hirai, J., Raghavan, S., Paepcke, A., and Garcia-Molina, H. 2000. Webbase: A repository of Web pages. In WWW 2000 Conference Proceedings (Amsterdam, The Netherlands). ACM Press, New York, NY, 277--293."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 41th IEEE Symposium on Foundations of Computer Science (FOCS). 57--65","author":"Kumar R.","unstructured":"Kumar , R. , Raghavan , P. , Rajagopalan , S. , Sivakumar , D. , Tomkins , A. , and Upfal , E . 2000. Stochastic models for the Web graph . In Proceedings of the 41th IEEE Symposium on Foundations of Computer Science (FOCS). 57--65 . Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. 2000. Stochastic models for the Web graph. In Proceedings of the 41th IEEE Symposium on Foundations of Computer Science (FOCS). 57--65."},{"key":"e_1_2_1_26_1","volume-title":"WWW 1999 Conference Proceedings","author":"Kumar R.","unstructured":"Kumar , R. , Raghavan , P. , Rajagopalan , S. , and Tomkins , A . 1999. Trawling emerging cyber-communities automatically . In WWW 1999 Conference Proceedings ( Toronto, Ont., Canada). ACM Press, New York, NY, 3--4. Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. Trawling emerging cyber-communities automatically. In WWW 1999 Conference Proceedings (Toronto, Ont., Canada). ACM Press, New York, NY, 3--4."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Lawrence S. and Giles C. L. 1998. Searching the world wide web. Science 280 5360 98--100.  Lawrence S. and Giles C. L. 1998. Searching the world wide web. Science 280 5360 98--100.","DOI":"10.1126\/science.280.5360.98"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Lawrence S. and Giles C. L. 1999. Accessibility of information on the Web. Nature 400 6740 107--109. 10.1145\/333175.333181   Lawrence S. and Giles C. L. 1999. Accessibility of information on the Web. Nature 400 6740 107--109. 10.1145\/333175.333181","DOI":"10.1038\/21987"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of SIGCOMM06","author":"Mahadevan P.","unstructured":"Mahadevan , P. , Krioukov , D. , Fall , K. , and Vahdat , A . 2006. Systematic topology analysis and generation using degree correlations . In Proceedings of SIGCOMM06 ( Pisa, Italy). ACM Press, New York, NY. 10.1145\/1159913.1159930 Mahadevan, P., Krioukov, D., Fall, K., and Vahdat, A. 2006. Systematic topology analysis and generation using degree correlations. In Proceedings of SIGCOMM06 (Pisa, Italy). ACM Press, New York, NY. 10.1145\/1159913.1159930"},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1103\/PhysRevLett.88.138701","article-title":"Truncation of power law behavior in scale-free network models due to information filtering","volume":"88","author":"Mossa S.","year":"2002","unstructured":"Mossa , S. , Barth\u00e9lemy , M. , Stanley , H. E. , and Amaral , L. A. N. 2002 . Truncation of power law behavior in scale-free network models due to information filtering . Phys. Rev. Lett. 88 , 13 , 138701. Mossa, S., Barth\u00e9lemy, M., Stanley, H. E., and Amaral, L. A. N. 2002. Truncation of power law behavior in scale-free network models due to information filtering. Phys. Rev. Lett. 88, 13, 138701.","journal-title":"Phys. Rev. Lett."},{"key":"e_1_2_1_31_1","first-page":"20","article-title":"Assortative mixing in networks","volume":"89","author":"Newman M. E. J.","year":"2002","unstructured":"Newman , M. E. J. 2002 . Assortative mixing in networks . Phys. Rev. Lett. 89 , 20 , 208701. Newman, M. E. J. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 20, 208701.","journal-title":"Phys. Rev. Lett."},{"key":"e_1_2_1_32_1","first-page":"25","article-title":"Dynamical and correlation properties of the","volume":"87","author":"Pastor-Satorras R.","year":"2001","unstructured":"Pastor-Satorras , R. , V\u00e1zquez , A. , and Vespignani , A. 2001 . Dynamical and correlation properties of the Internet. Phys. Rev. Lett. 87 , 25 , 258701. Pastor-Satorras, R., V\u00e1zquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Phys. Rev. Lett. 87, 25, 258701.","journal-title":"Internet. Phys. Rev. Lett."},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","first-page":"3200","DOI":"10.1103\/PhysRevLett.86.3200","article-title":"Epidemic spreading in scale-free networks","volume":"86","author":"Pastor-Satorras R.","year":"2001","unstructured":"Pastor-Satorras , R. and Vespignani , A. 2001 . Epidemic spreading in scale-free networks . Phys. Rev. Lett. 86 , 14, 3200 -- 3203 . Pastor-Satorras, R. and Vespignani, A. 2001. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 14, 3200--3203.","journal-title":"Phys. Rev. Lett."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Pastor-Satorras R. and Vespignani A. 2004. Evolution and Structure of the Internet. A Statistical Physics Approach. Cambridge University Press Cambridge U. K.   Pastor-Satorras R. and Vespignani A. 2004. Evolution and Structure of the Internet. A Statistical Physics Approach. Cambridge University Press Cambridge U. K.","DOI":"10.1017\/CBO9780511610905"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","first-page":"5207","DOI":"10.1073\/pnas.032085699","article-title":"Winners don't take all: Characterizing the competition for links on the web","volume":"99","author":"Pennock D. M.","year":"2002","unstructured":"Pennock , D. M. , Flake , G. W. , Lawrence , S. , Glover , E. J. , and Giles , C. L. 2002 . Winners don't take all: Characterizing the competition for links on the web . Proc. Natl. Acad. Sci. 99 , 8, 5207 -- 5211 . Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., and Giles, C. L. 2002. Winners don't take all: Characterizing the competition for links on the web. Proc. Natl. Acad. Sci. 99, 8, 5207--5211.","journal-title":"Proc. Natl. Acad. Sci."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation. 121--128","author":"Rusmevichientong P.","unstructured":"Rusmevichientong , P. , Pennock , D. M. , Lawrence , S. , and Giles , C. L . 2001. Methods for sampling pages uniformly from the World Wide Web . In Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation. 121--128 . Rusmevichientong, P., Pennock, D. M., Lawrence, S., and Giles, C. L. 2001. Methods for sampling pages uniformly from the World Wide Web. In Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation. 121--128."}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1255438.1255442","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1255438.1255442","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:22:28Z","timestamp":1750278148000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1255438.1255442"}},"subtitle":["A comparative analysis of Web crawls"],"short-title":[],"issued":{"date-parts":[[2007,8]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2007,8]]}},"alternative-id":["10.1145\/1255438.1255442"],"URL":"https:\/\/doi.org\/10.1145\/1255438.1255442","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"value":"1559-1131","type":"print"},{"value":"1559-114X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,8]]},"assertion":[{"value":"2007-08-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}