{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T09:35:21Z","timestamp":1774949721424,"version":"3.50.1"},"reference-count":148,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,2,16]],"date-time":"2016-02-16T00:00:00Z","timestamp":1455580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2016,4,20]]},"abstract":"<jats:p>Several properties of information retrieval (IR) data, such as query frequency or document length, are widely considered to be approximately distributed as a power law. This common assumption aims to focus on specific characteristics of the empirical probability distribution of such data (e.g., its scale-free nature or its long\/fat tail). This assumption, however, may not be always true. Motivated by recent work in the statistical treatment of power law claims, we investigate two research questions: (i) To what extent do power law approximations hold for term frequency, document length, query frequency, query length, citation frequency, and syntactic unigram frequency? And (ii) what is the computational cost of replacing ad hoc power law approximations with more accurate distribution fitting? We study 23 TREC and 5 non-TREC datasets and compare the fit of power laws to 15 other standard probability distributions. We find that query frequency and 5 out of 24 term frequency distributions are best approximated by a power law. All remaining properties are better approximated by the Inverse Gaussian, Generalized Extreme Value, Negative Binomial, or Yule distribution. We also find the overhead of replacing power law approximations by more informed distribution fitting to be negligible, with potential gains to IR tasks like index compression or test collection generation for IR evaluation.<\/jats:p>","DOI":"10.1145\/2816815","type":"journal-article","created":{"date-parts":[[2016,2,22]],"date-time":"2016-02-22T13:07:16Z","timestamp":1456146436000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":43,"title":["Power Law Distributions in Information Retrieval"],"prefix":"10.1145","volume":"34","author":[{"given":"Casper","family":"Petersen","sequence":"first","affiliation":[{"name":"University of Copenhagen, Denmark"}]},{"given":"Jakob Grue","family":"Simonsen","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Denmark"}]},{"given":"Christina","family":"Lioma","sequence":"additional","affiliation":[{"name":"University of Copenhagen, Denmark"}]}],"member":"320","published-online":{"date-parts":[[2016,2,16]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.64.046135"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.1974.1100705"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390517"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646055"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.32614\/RJ-2011-016"},{"key":"e_1_2_2_6_1","volume-title":"Cox","author":"Asthana Harshvardhan","year":"2011","unstructured":"Harshvardhan Asthana , Ruoxun Fu , and Ingemar J . Cox . 2011 . On the feasibility of unstructured peer-to-peer information retrieval. In Advances in Information Retrieval Theory. Springer , 125--138. Harshvardhan Asthana, Ruoxun Fu, and Ingemar J. Cox. 2011. On the feasibility of unstructured peer-to-peer information retrieval. In Advances in Information Retrieval Theory. Springer, 125--138."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1572037"},{"key":"e_1_2_2_8_1","volume-title":"Word Frequency Distributions","author":"Baayen Harald","unstructured":"Harald Baayen . 2001. Word Frequency Distributions . Springer . Harald Baayen. 2001. Word Frequency Distributions. Springer."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609509"},{"key":"e_1_2_2_10_1","first-page":"1","article-title":"Statistical string theory for courts: If the data don\u2019t fit","volume":"4","author":"Babbel David F.","year":"2009","unstructured":"David F. Babbel , Vincent J. Strickler , and Ricki S. Dolan . 2009 . Statistical string theory for courts: If the data don\u2019t fit . Legal Technology Risk Management 4 (2009), 1 . David F. Babbel, Vincent J. Strickler, and Ricki S. Dolan. 2009. Statistical string theory for courts: If the data don\u2019t fit. Legal Technology Risk Management 4 (2009), 1.","journal-title":"Legal Technology Risk Management"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277775"},{"key":"e_1_2_2_12_1","volume-title":"Rodrigo Verschae, Carlos Castillo, and Carlos Hurtado.","author":"Baeza-Yates Ricardo","year":"2004","unstructured":"Ricardo Baeza-Yates , Javier Ruiz-del Solar , Rodrigo Verschae, Carlos Castillo, and Carlos Hurtado. 2004 . Content-based image retrieval and characterization on specific web collections. In Image and Video Retrieval. Springer , 189--198. Ricardo Baeza-Yates, Javier Ruiz-del Solar, Rodrigo Verschae, Carlos Castillo, and Carlos Hurtado. 2004. Content-based image retrieval and characterization on specific web collections. In Image and Video Retrieval. Springer, 189--198."},{"key":"e_1_2_2_13_1","volume-title":"String Processing and Information Retrieval","author":"Baeza-Yates Ricardo","unstructured":"Ricardo Baeza-Yates and Felipe Saint-Jean . 2003. A three level search engine index based in query log distribution . In String Processing and Information Retrieval . Springer , 56--65. Ricardo Baeza-Yates and Felipe Saint-Jean. 2003. A three level search engine index based in query log distribution. In String Processing and Information Retrieval. Springer, 56--65."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281204"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-4371(99)00291-5"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1140\/epjb\/e2007-00219-y"},{"key":"e_1_2_2_17_1","volume-title":"Probability: The Science of Uncertainty with Applications to Investments, Insurance, and Engineering.","author":"Bean Michael A.","year":"2001","unstructured":"Michael A. Bean . 2001 . Probability: The Science of Uncertainty with Applications to Investments, Insurance, and Engineering. Vol. 6 . American Mathematical Society . Michael A. Bean. 2001. Probability: The Science of Uncertainty with Applications to Investments, Insurance, and Engineering. Vol. 6. American Mathematical Society."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135955"},{"key":"e_1_2_2_19_1","unstructured":"Casper Beckman. 1999. Chinese character frequencies. http:\/\/casper.beckman.uiuc.edu\/&sim;c-tsai4\/chinese\/charfreq.html. (1999). No longer available.  Casper Beckman. 1999. Chinese character frequencies. http:\/\/casper.beckman.uiuc.edu\/&sim;c-tsai4\/chinese\/charfreq.html. (1999). No longer available."},{"key":"e_1_2_2_20_1","volume-title":"Statistics of Extremes: Theory and Applications","author":"Beirlant Jan","unstructured":"Jan Beirlant , Yuri Goegebeur , Johan Segers , and Jozef Teugels . 2006. Statistics of Extremes: Theory and Applications . John Wiley & Sons . Jan Beirlant, Yuri Goegebeur, Johan Segers, and Jozef Teugels. 2006. Statistics of Extremes: Theory and Applications. John Wiley & Sons."},{"key":"e_1_2_2_21_1","volume-title":"Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web.","author":"Benczur Andras A.","year":"2005","unstructured":"Andras A. Benczur , Karoly Csalogany , Tamas Sarlos , and Mate Uher . 2005 . SpamRank--Fully automatic link spam detection work in progress . In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web. Andras A. Benczur, Karoly Csalogany, Tamas Sarlos, and Mate Uher. 2005. SpamRank--Fully automatic link spam detection work in progress. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458112"},{"key":"e_1_2_2_23_1","volume-title":"CoPhIR: A test collection for content-based image retrieval. arXiv preprint arXiv:0905.4627","author":"Bolettieri Paolo","year":"2009","unstructured":"Paolo Bolettieri , Andrea Esuli , Fabrizio Falchi , Claudio Lucchese , Raffaele Perego , Tommaso Piccioli , and Fausto Rabitti . 2009. CoPhIR: A test collection for content-based image retrieval. arXiv preprint arXiv:0905.4627 ( 2009 ). Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, Tommaso Piccioli, and Fausto Rabitti. 2009. CoPhIR: A test collection for content-based image retrieval. arXiv preprint arXiv:0905.4627 (2009)."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199007)41:5<368::AID-ASI8>3.0.CO;2-C"},{"key":"e_1_2_2_25_1","volume-title":"Cox","author":"Box George E. P.","year":"1964","unstructured":"George E. P. Box and David R . Cox . 1964 . An analysis of transformations. Journal of the Royal Statistical Society . Series B (Methodological) (1964), 211--252. George E. P. Box and David R. Cox. 1964. An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological) (1964), 211--252."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.1999.749260"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(00)00083-9"},{"key":"e_1_2_2_28_1","volume-title":"Power laws & the new science of complexity management. Strategy+ Business 34","author":"Buchanan Mark","year":"2004","unstructured":"Mark Buchanan . 2004. Power laws & the new science of complexity management. Strategy+ Business 34 ( 2004 ), 1--8. Mark Buchanan. 2004. Power laws & the new science of complexity management. Strategy+ Business 34 (2004), 1--8."},{"key":"e_1_2_2_29_1","volume-title":"Anderson","author":"Burnham Kenneth P.","year":"2002","unstructured":"Kenneth P. Burnham and David R . Anderson . 2002 . Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer . Kenneth P. Burnham and David R. Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.2137622"},{"key":"e_1_2_2_31_1","volume-title":"Proceedings of the 7th International Workshop on Finite-state Methods and Natural Language Processing","volume":"191","author":"Cantone Domenico","year":"2009","unstructured":"Domenico Cantone , Salvatore Cristofaro , Simone Faro , and Emanuele Giaquinta . 2009 . Finite state models for the generation of large corpora of natural language texts . In Proceedings of the 7th International Workshop on Finite-state Methods and Natural Language Processing , Vol. 191 . IOS Press, 175. Domenico Cantone, Salvatore Cristofaro, Simone Faro, and Emanuele Giaquinta. 2009. Finite state models for the generation of large corpora of natural language texts. In Proceedings of the 7th International Workshop on Finite-state Methods and Natural Language Processing, Vol. 191. IOS Press, 175."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401995"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132952.1132954"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2008.06.005"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277855"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.envsoft.2012.03.012"},{"key":"e_1_2_2_37_1","volume-title":"Are your data really pareto distributed? Physica A: Statistical Mechanics and its Applications 392, 23","author":"Cirillo Pasquale","year":"2013","unstructured":"Pasquale Cirillo . 2013. Are your data really pareto distributed? Physica A: Statistical Mechanics and its Applications 392, 23 ( 2013 ), 5947--5962. Pasquale Cirillo. 2013. Are your data really pareto distributed? Physica A: Statistical Mechanics and its Applications 392, 23 (2013), 5947--5962."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1177\/0022002702239512"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mpm004"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1137\/070710111"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2009.03.006"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/2004\/07\/P07003"},{"key":"e_1_2_2_43_1","volume-title":"Foreman","author":"Corder Gregory W.","year":"2009","unstructured":"Gregory W. Corder and Dale I . Foreman . 2009 . Nonparametric Statistics for Non-Statisticians: A Step-By-Step Approach. John Wiley & Sons . Gregory W. Corder and Dale I. Foreman. 2009. Nonparametric Statistics for Non-Statisticians: A Step-By-Step Approach. John Wiley & Sons."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-011-9162-z"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277784"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010012224103"},{"key":"e_1_2_2_47_1","volume-title":"True reason for Zipf\u2019s law in language. Physica A: Statistical Mechanics and its Applications 358, 2","author":"Dahui Wang","year":"2005","unstructured":"Wang Dahui , Li Menghui , and Di Zengru . 2005. True reason for Zipf\u2019s law in language. Physica A: Statistical Mechanics and its Applications 358, 2 ( 2005 ), 545--550. Wang Dahui, Li Menghui, and Di Zengru. 2005. True reason for Zipf\u2019s law in language. Physica A: Statistical Mechanics and its Applications 358, 2 (2005), 545--550."},{"key":"e_1_2_2_48_1","volume-title":"MacKinnon","author":"Davidson Russell","year":"1981","unstructured":"Russell Davidson and James G . MacKinnon . 1981 . Several tests for model specification in the presence of alternative hypotheses. Econometrica : Journal of the Econometric Society ( 1981), 781--793. Russell Davidson and James G. MacKinnon. 1981. Several tests for model specification in the presence of alternative hypotheses. Econometrica: Journal of the Econometric Society (1981), 781--793."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1935826.1935858"},{"key":"e_1_2_2_50_1","volume-title":"Zipfs law, small world and Hungarian language. Alkalmazott Nyelvtudom\u00e1ny 1, 2","author":"Dominich Sandor","year":"2005","unstructured":"Sandor Dominich and Tamas Kiezer . 2005. Zipfs law, small world and Hungarian language. Alkalmazott Nyelvtudom\u00e1ny 1, 2 ( 2005 ), 5--24. In Hungarian . Sandor Dominich and Tamas Kiezer. 2005. Zipfs law, small world and Hungarian language. Alkalmazott Nyelvtudom\u00e1ny 1, 2 (2005), 5--24. In Hungarian."},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Joshua Drucker. 2007. Regional Dominance and Industrial Success: A Productivity-Based Analysis. ProQuest.  Joshua Drucker. 2007. Regional Dominance and Industrial Success: A Productivity-Based Analysis. ProQuest.","DOI":"10.2139\/ssrn.1356249"},{"key":"e_1_2_2_52_1","volume-title":"Gibrat\u2019s law for (all) cities. American Economic Review","author":"Eeckhout Jan","year":"2004","unstructured":"Jan Eeckhout . 2004. Gibrat\u2019s law for (all) cities. American Economic Review ( 2004 ), 1429--1451. Jan Eeckhout. 2004. Gibrat\u2019s law for (all) cities. American Economic Review (2004), 1429--1451."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1005634925734"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0009411"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1018031215"},{"key":"e_1_2_2_56_1","volume-title":"Statistical distributions","author":"Forbes Catherine","unstructured":"Catherine Forbes , Merran Evans , Nicholas Hastings , and Brian Peacock . 2011. Statistical distributions . John Wiley & Sons . Catherine Forbes, Merran Evans, Nicholas Hastings, and Brian Peacock. 2011. Statistical distributions. John Wiley & Sons."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.economics.050708.142940"},{"key":"e_1_2_2_58_1","volume-title":"Proceedings of the 3rd Conference on Online Social Networks.","author":"Galuba Wojciech","year":"2010","unstructured":"Wojciech Galuba , Karl Aberer , Dipanjan Chakraborty , Zoran Despotovic , and Wolfgang Kellerer . 2010 . Outtweeting the Twitterers - Predicting information cascades in microblogs . In Proceedings of the 3rd Conference on Online Social Networks. Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the Twitterers - Predicting information cascades in microblogs. In Proceedings of the 3rd Conference on Online Social Networks."},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2512938.2512946"},{"key":"e_1_2_2_60_1","volume-title":"ECIR 2011","author":"Gatterbauer Wolfgang","year":"2011","unstructured":"Wolfgang Gatterbauer . 2011 . Rules of thumb for information acquisition from large and redundant data. In Advances in Information Retrieval - 33rd European Conference on IR Research , ECIR 2011 , Dublin, Ireland, April 18--21 , 2011. 479--490. Wolfgang Gatterbauer. 2011. Rules of thumb for information acquisition from large and redundant data. In Advances in Information Retrieval - 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18--21, 2011. 479--490."},{"key":"e_1_2_2_61_1","volume-title":"WWW 2004 Workshop on the Weblogging ecosystem: Aggregation, Analysis and Dynamics","volume":"2004","author":"Glance Natalie","year":"2004","unstructured":"Natalie Glance , Matthew Hurst , and Takashi Tomokiyo . 2004 . Blogpulse: Automated trend discovery for weblogs . In WWW 2004 Workshop on the Weblogging ecosystem: Aggregation, Analysis and Dynamics , Vol. 2004 . ACM. Natalie Glance, Matthew Hurst, and Takashi Tomokiyo. 2004. Blogpulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging ecosystem: Aggregation, Analysis and Dynamics, Vol. 2004. ACM."},{"key":"e_1_2_2_63_1","volume-title":"Operational Risk Toward Basel III: Best Practices and Issues in Modeling, Management, and Regulation","author":"Gregoriou Greg N.","unstructured":"Greg N. Gregoriou . 2009. Operational Risk Toward Basel III: Best Practices and Issues in Modeling, Management, and Regulation . Vol. 481 . John Wiley & Sons . Greg N. Gregoriou. 2009. Operational Risk Toward Basel III: Best Practices and Issues in Modeling, Management, and Regulation. Vol. 481. John Wiley & Sons."},{"key":"e_1_2_2_64_1","volume-title":"The Minimum Description Length Principle","author":"Gr\u00fcnwald Peter","unstructured":"Peter Gr\u00fcnwald . 2007. The Minimum Description Length Principle . MIT press . Peter Gr\u00fcnwald. 2007. The Minimum Description Length Principle. MIT press."},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000011206.23588.ab"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835621"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242602"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1287\/deca.1120.0260"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-31865-1_48"},{"key":"e_1_2_2_70_1","volume-title":"Information Retrieval: Computational and Theoretical Aspects","author":"Heaps Harold S.","year":"1978","unstructured":"Harold S. Heaps . 1978 . Information Retrieval: Computational and Theoretical Aspects . Academic Press, Inc. , Orlando, FL, USA . Harold S. Heaps. 1978. Information Retrieval: Computational and Theoretical Aspects. Academic Press, Inc., Orlando, FL, USA."},{"key":"e_1_2_2_71_1","volume-title":"Advances in Information Retrieval","author":"Heesch Daniel","unstructured":"Daniel Heesch and Stefan R\u00fcger . 2004. NNk networks for content-based image retrieval . In Advances in Information Retrieval . Springer , 253--266. Daniel Heesch and Stefan R\u00fcger. 2004. NNk networks for content-based image retrieval. In Advances in Information Retrieval. Springer, 253--266."},{"key":"e_1_2_2_72_1","volume-title":"Negative Binomial Regression","author":"Hilbe Joseph","unstructured":"Joseph Hilbe . 2011. Negative Binomial Regression . Cambridge University Press . Joseph Hilbe. 2011. Negative Binomial Regression. Cambridge University Press."},{"key":"e_1_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176343247"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1007\/11762256_31"},{"key":"e_1_2_2_75_1","volume-title":"Adamic","author":"Huberman Bernardo A.","year":"1999","unstructured":"Bernardo A. Huberman and Lada A . Adamic . 1999 . Evolutionary dynamics of the world wide web. arXiv Preprint Cond-Mat\/ 9901071 (1999). Bernardo A. Huberman and Lada A. Adamic. 1999. Evolutionary dynamics of the world wide web. arXiv Preprint Cond-Mat\/9901071 (1999)."},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/76.2.297"},{"key":"e_1_2_2_77_1","doi-asserted-by":"publisher","DOI":"10.1038\/35036627"},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135986"},{"key":"e_1_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.5555\/580760.823760"},{"key":"e_1_2_2_80_1","volume-title":"Continuous Multivariate Distributions","author":"Johnson Norman L.","unstructured":"Norman L. Johnson , Samuel Kotz , and Narayanaswamy Balakrishnan . 2002. Continuous Multivariate Distributions , Volume 1 , Models and Applications . Vol. 59. New York : John Wiley & Sons . Norman L. Johnson, Samuel Kotz, and Narayanaswamy Balakrishnan. 2002. Continuous Multivariate Distributions, Volume 1, Models and Applications. Vol. 59. New York: John Wiley & Sons."},{"key":"e_1_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2002.803905"},{"key":"e_1_2_2_82_1","volume-title":"Advances in Information Retrieval","author":"Kamps Jaap","unstructured":"Jaap Kamps and Marijn Koolen . 2008. The importance of link evidence in Wikipedia . In Advances in Information Retrieval . Springer , 270--282. Jaap Kamps and Marijn Koolen. 2008. The importance of link evidence in Wikipedia. In Advances in Information Retrieval. Springer, 270--282."},{"key":"e_1_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556195.2559895"},{"key":"e_1_2_2_84_1","volume-title":"On the applicability of peer-to-peer data in music information retrieval research","author":"Koenigstein Noam","unstructured":"Noam Koenigstein , Yuval Shavitt , Ela Weinsberg , and Udi Weinsberg . 2010. On the applicability of peer-to-peer data in music information retrieval research . In International Society for Music Information Retrieval . 273--278. Noam Koenigstein, Yuval Shavitt, Ela Weinsberg, and Udi Weinsberg. 2010. On the applicability of peer-to-peer data in music information retrieval research. In International Society for Music Information Retrieval. 273--278."},{"key":"e_1_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.5402\/2012\/872956"},{"key":"e_1_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/1379092.1379123"},{"key":"e_1_2_2_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380718.2380741"},{"key":"e_1_2_2_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772751"},{"key":"e_1_2_2_89_1","volume-title":"Romano","author":"Lehmann Erich L.","year":"2006","unstructured":"Erich L. Lehmann and Joseph P . Romano . 2006 . Testing Statistical Hypotheses. Springer . Erich L. Lehmann and Joseph P. Romano. 2006. Testing Statistical Hypotheses. Springer."},{"key":"e_1_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2009.2012913"},{"key":"e_1_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.5555\/1763653.1763667"},{"key":"e_1_2_2_93_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2006.12.005"},{"key":"e_1_2_2_94_1","volume-title":"Part of speech N-grams and information retrieval. Revue fran\u00e7aise De Linguistique Appliqu\u00e9e 13, 1","author":"Lioma Christina","year":"2008","unstructured":"Christina Lioma and Cornelis Joost van Rijsbergen . 2008. Part of speech N-grams and information retrieval. Revue fran\u00e7aise De Linguistique Appliqu\u00e9e 13, 1 ( 2008 ), 9--22. Christina Lioma and Cornelis Joost van Rijsbergen. 2008. Part of speech N-grams and information retrieval. Revue fran\u00e7aise De Linguistique Appliqu\u00e9e 13, 1 (2008), 9--22."},{"key":"e_1_2_2_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/1089815.1089821"},{"key":"e_1_2_2_96_1","volume-title":"Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data","author":"Liu Wuying","unstructured":"Wuying Liu , Lin Wang , and Mianzhu Yi. 2013. Power law for text categorization . In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data . Springer , 131--143. Wuying Liu, Lin Wang, and Mianzhu Yi. 2013. Power law for text categorization. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, 131--143."},{"key":"e_1_2_2_97_1","volume-title":"When Genius Failed: The Rise and Fall of Long-Term Capital Management","author":"Lowenstein Roger","unstructured":"Roger Lowenstein . 2000. When Genius Failed: The Rise and Fall of Long-Term Capital Management . Random House Trade Paperbacks . Roger Lowenstein. 2000. When Genius Failed: The Rise and Fall of Long-Term Capital Management. Random House Trade Paperbacks."},{"key":"e_1_2_2_98_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.22.0159"},{"key":"e_1_2_2_99_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12275-0_63"},{"key":"e_1_2_2_100_1","first-page":"661","article-title":"Some comments on CP","volume":"15","author":"Mallows Colin L.","year":"1973","unstructured":"Colin L. Mallows . 1973 . Some comments on CP . Technometrics 15 , 4 (1973), 661 -- 675 . Colin L. Mallows. 1973. Some comments on CP. Technometrics 15, 4 (1973), 661--675.","journal-title":"Technometrics"},{"key":"e_1_2_2_101_1","volume-title":"An informational theory of the statistical structure of language. Communication Theory 84","author":"Mandelbrot Benoit","year":"1953","unstructured":"Benoit Mandelbrot . 1953. An informational theory of the statistical structure of language. Communication Theory 84 ( 1953 ). Benoit Mandelbrot. 1953. An informational theory of the statistical structure of language. Communication Theory 84 (1953)."},{"key":"e_1_2_2_102_1","volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D.","unstructured":"Christopher D. Manning , Prabhakar Raghavan , and Hinrich Sch\u00fctze . 2008. Introduction to Information Retrieval . Vol. 1 . Cambridge University Press . Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press."},{"key":"e_1_2_2_103_1","volume-title":"Proceedings of the AMIA Annual Symposium","volume":"2013","author":"Mao Yuqing","year":"2013","unstructured":"Yuqing Mao and Zhiyong Lu . 2013 . Predicting clicks of PubMed articles . In Proceedings of the AMIA Annual Symposium , Vol. 2013 . American Medical Informatics Association, 947. Yuqing Mao and Zhiyong Lu. 2013. Predicting clicks of PubMed articles. In Proceedings of the AMIA Annual Symposium, Vol. 2013. American Medical Informatics Association, 947."},{"key":"e_1_2_2_104_1","volume-title":"International Encyclopedia of Education (3 ed.), Baker E","author":"Maydeu-Olivares Alberto","unstructured":"Alberto Maydeu-Olivares and Carlos Garca-Forero . 2010. Goodness-of-fit testing . In International Encyclopedia of Education (3 ed.), Baker E . Peterson, P. and B. McGaw (Eds.). Elsevier , 190--196. Alberto Maydeu-Olivares and Carlos Garca-Forero. 2010. Goodness-of-fit testing. In International Encyclopedia of Education (3 ed.), Baker E. Peterson, P. and B. McGaw (Eds.). Elsevier, 190--196."},{"key":"e_1_2_2_105_1","doi-asserted-by":"publisher","DOI":"10.1145\/505680.505683"},{"key":"e_1_2_2_106_1","volume-title":"Meerschaert and Hans-Peter Scheffler","author":"Mark","year":"2001","unstructured":"Mark M. Meerschaert and Hans-Peter Scheffler . 2001 . Limit Distributions for Sums of Independent Random vectors: Heavy Tails in Theory and Practice. Vol. 321 . John Wiley & Sons . Mark M. Meerschaert and Hans-Peter Scheffler. 2001. Limit Distributions for Sums of Independent Random vectors: Heavy Tails in Theory and Practice. Vol. 321. John Wiley & Sons."},{"key":"e_1_2_2_107_1","unstructured":"Edgar Meij and Maarten de Rijke. 2007. Using prior information derived from citations in literature search. In Recherche d\u2019Information et ses Applications.  Edgar Meij and Maarten de Rijke. 2007. Using prior information derived from citations in literature search. In Recherche d\u2019Information et ses Applications."},{"key":"e_1_2_2_108_1","volume-title":"Some effects of intermittent silence. American Journal of Psychology","author":"Miller George A.","year":"1957","unstructured":"George A. Miller . 1957. Some effects of intermittent silence. American Journal of Psychology ( 1957 ), 311--314. George A. Miller. 1957. Some effects of intermittent silence. American Journal of Psychology (1957), 311--314."},{"key":"e_1_2_2_109_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.v61:12"},{"key":"e_1_2_2_110_1","volume-title":"Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem.","author":"Mishne Gilad","year":"2006","unstructured":"Gilad Mishne and Natalie Glance . 2006 . Leave a reply: An analysis of weblog comments . In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem. Gilad Mishne and Natalie Glance. 2006. Leave a reply: An analysis of weblog comments. In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem."},{"key":"e_1_2_2_111_1","doi-asserted-by":"publisher","DOI":"10.1080\/15427951.2004.10129088"},{"key":"e_1_2_2_112_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835619"},{"key":"e_1_2_2_113_1","doi-asserted-by":"publisher","DOI":"10.4018\/978-1-59140-557-3.ch076"},{"key":"e_1_2_2_114_1","doi-asserted-by":"publisher","DOI":"10.1080\/00107510500052444"},{"key":"e_1_2_2_115_1","volume-title":"Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM\u201900)","author":"Christopher","unstructured":"Christopher R. Palmer and Greg Steffan. 2000. Generating network topologies that obey power laws . In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM\u201900) ,Vol. 1. IEEE, 434--438. Christopher R. Palmer and Greg Steffan. 2000. Generating network topologies that obey power laws. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM\u201900),Vol. 1. IEEE, 434--438."},{"key":"e_1_2_2_116_1","doi-asserted-by":"publisher","DOI":"10.1145\/1146847.1146848"},{"key":"e_1_2_2_117_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.032085699"},{"key":"e_1_2_2_118_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2010.03.001"},{"key":"e_1_2_2_119_1","doi-asserted-by":"publisher","DOI":"10.1108\/07378831011026706"},{"key":"e_1_2_2_120_1","volume-title":"The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability","author":"Pitman Jim","year":"1997","unstructured":"Jim Pitman and Marc Yor . 1997. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability ( 1997 ), 855--900. Jim Pitman and Marc Yor. 1997. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability (1997), 855--900."},{"key":"e_1_2_2_121_1","doi-asserted-by":"publisher","DOI":"10.1080\/10635150490522304"},{"key":"e_1_2_2_122_1","first-page":"77","article-title":"Extension of Zipfs law to word and character n-grams for English and Chinese","volume":"1","author":"Ha Le Quan","year":"2003","unstructured":"Le Quan Ha , Ji Ming , and Francis Jack Smith . 2003 . Extension of Zipfs law to word and character n-grams for English and Chinese . Journal of Computational Linguistics and Chinese Language Processing 1 , 77 -- 102 . Citeseer. Le Quan Ha, Ji Ming, and Francis Jack Smith. 2003. Extension of Zipfs law to word and character n-grams for English and Chinese. Journal of Computational Linguistics and Chinese Language Processing 1, 77--102. Citeseer.","journal-title":"Journal of Computational Linguistics and Chinese Language Processing"},{"key":"e_1_2_2_123_1","volume-title":"Symposium on Networked Systems Design and Implementation. Usenix, San Francisco CA.","author":"Venugopalan","unstructured":"Venugopalan Ramasubrama nian and Emin G\u00fcn Sirer. 2004. Beehive: Exploiting power law query distributions for O (1) lookup performance in peer to peer overlays . In Symposium on Networked Systems Design and Implementation. Usenix, San Francisco CA. Venugopalan Ramasubrama nian and Emin G\u00fcn Sirer. 2004. Beehive: Exploiting power law query distributions for O (1) lookup performance in peer to peer overlays. In Symposium on Networked Systems Design and Implementation. Usenix, San Francisco CA."},{"key":"e_1_2_2_124_1","doi-asserted-by":"publisher","DOI":"10.1007\/s100510050359"},{"key":"e_1_2_2_125_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-4371(02)01507-8"},{"key":"e_1_2_2_126_1","doi-asserted-by":"publisher","DOI":"10.1081\/STA-120037438"},{"key":"e_1_2_2_127_1","volume-title":"Foster","author":"Ripeanu Matei","year":"2002","unstructured":"Matei Ripeanu and Ian T . Foster . 2002 . Mapping the Gnutella network: Macroscopic properties of large-scale peer-to-peer systems. In IPTPS. Computing Research Repository , 85--93. Matei Ripeanu and Ian T. Foster. 2002. Mapping the Gnutella network: Macroscopic properties of large-scale peer-to-peer systems. In IPTPS. Computing Research Repository, 85--93."},{"key":"e_1_2_2_128_1","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.107.2.358"},{"key":"e_1_2_2_129_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835804.1835890"},{"key":"e_1_2_2_130_1","volume-title":"Schunn and Dieter Wallach","author":"Christian","year":"2005","unstructured":"Christian D. Schunn and Dieter Wallach . 2005 . Evaluating Goodness-of-Fit in Comparison of Models to Data. University of Saarland Press , Saarbrueken, 115--154. Christian D. Schunn and Dieter Wallach. 2005. Evaluating Goodness-of-Fit in Comparison of Models to Data. University of Saarland Press, Saarbrueken, 115--154."},{"key":"e_1_2_2_131_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176344136"},{"key":"e_1_2_2_132_1","article-title":"On the proficient use of GEV distribution: A case study of subtropical monsoon region in India","volume":"8","author":"Shukla Ripunjai K.","year":"2010","unstructured":"Ripunjai K. Shukla , Mohan Trivedi , and Manoj Kumar . 2010 . On the proficient use of GEV distribution: A case study of subtropical monsoon region in India . Annals of Computer Science Series 8 , 1 (2010). Ripunjai K. Shukla, Mohan Trivedi, and Manoj Kumar. 2010. On the proficient use of GEV distribution: A case study of subtropical monsoon region in India. Annals of Computer Science Series 8, 1 (2010).","journal-title":"Annals of Computer Science Series"},{"key":"e_1_2_2_133_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367542"},{"key":"e_1_2_2_134_1","volume-title":"On a class of skew distribution functions. Biometrika","author":"Simon Herbert A.","year":"1955","unstructured":"Herbert A. Simon . 1955. On a class of skew distribution functions. Biometrika ( 1955 ), 425--440. Herbert A. Simon. 1955. On a class of skew distribution functions. Biometrika (1955), 425--440."},{"key":"e_1_2_2_135_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564475"},{"key":"e_1_2_2_136_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026526"},{"key":"e_1_2_2_137_1","doi-asserted-by":"publisher","DOI":"10.1162\/rest.91.3.648"},{"key":"e_1_2_2_138_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.2003.1209237"},{"key":"e_1_2_2_139_1","volume-title":"Hinton","author":"Srivastava Nitish","year":"2013","unstructured":"Nitish Srivastava , Ruslan Salakhutdinov , and Geoffrey E . Hinton . 2013 . Modeling documents with deep boltzmann machines. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence. AUAI Press , 616--625. Nitish Srivastava, Ruslan Salakhutdinov, and Geoffrey E. Hinton. 2013. Modeling documents with deep boltzmann machines. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 616--625."},{"key":"e_1_2_2_140_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13278-014-0174-8"},{"key":"e_1_2_2_141_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2507870"},{"key":"e_1_2_2_142_1","volume-title":"Algorithms and Models for the Web-Graph","author":"Volkovich Yana","unstructured":"Yana Volkovich , Nelly Litvak , and Debora Donato . 2007. Determining factors behind the PageRank log-log plot . In Algorithms and Models for the Web-Graph . Springer , 108--123. Yana Volkovich, Nelly Litvak, and Debora Donato. 2007. Determining factors behind the PageRank log-log plot. In Algorithms and Models for the Web-Graph. Springer, 108--123."},{"key":"e_1_2_2_143_1","volume-title":"Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society","author":"Vuong Quang H.","year":"1989","unstructured":"Quang H. Vuong . 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society ( 1989 ), 307--333. Quang H. Vuong. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society (1989), 307--333."},{"key":"e_1_2_2_144_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031171.1031192"},{"key":"e_1_2_2_145_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860455"},{"key":"e_1_2_2_146_1","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(90)90072-S"},{"key":"e_1_2_2_147_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009962"},{"key":"e_1_2_2_148_1","doi-asserted-by":"publisher","DOI":"10.1145\/1160633.1160685"},{"key":"e_1_2_2_149_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367594"},{"key":"e_1_2_2_150_1","unstructured":"George K. Zipf. 1935. The Psycho-Biology of Language. Houghton Mifflin.  George K. Zipf. 1935. The Psycho-Biology of Language. Houghton Mifflin."}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2816815","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2816815","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:48:19Z","timestamp":1750225699000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2816815"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,2,16]]},"references-count":148,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,4,20]]}},"alternative-id":["10.1145\/2816815"],"URL":"https:\/\/doi.org\/10.1145\/2816815","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,2,16]]},"assertion":[{"value":"2014-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-02-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}