{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T04:33:25Z","timestamp":1759638805474,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2006,1,1]],"date-time":"2006-01-01T00:00:00Z","timestamp":1136073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2006,1]]},"abstract":"<jats:p>\n            We consider the sequence comparison problem, also known as \u201c\n            <jats:italic>hidden<\/jats:italic>\n            \u201d pattern problem, where one searches for a given\n            <jats:italic>subsequence<\/jats:italic>\n            in a text (rather than a string understood as a sequence of consecutive symbols). A characteristic parameter is the number of occurrences of a given pattern\n            <jats:italic>w<\/jats:italic>\n            of length\n            <jats:italic>m<\/jats:italic>\n            as a subsequence in a random text of length\n            <jats:italic>n<\/jats:italic>\n            generated by a memoryless source. Spacings between letters of the pattern may either be constrained or not in order to define valid occurrences. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law and large deviations. These results are obtained via combinatorics on words, formal language techniques, and methods of analytic combinatorics based on generating functions. The motivations to study this problem come from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.\n          <\/jats:p>","DOI":"10.1145\/1120582.1120586","type":"journal-article","created":{"date-parts":[[2006,5,8]],"date-time":"2006-05-08T16:09:20Z","timestamp":1147104560000},"page":"147-183","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Hidden word statistics"],"prefix":"10.1145","volume":"53","author":[{"given":"Philippe","family":"Flajolet","sequence":"first","affiliation":[{"name":"INRIA-Rocquencourt, Le Chesnay, France"}]},{"given":"Wojciech","family":"Szpankowski","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, Indiana"}]},{"given":"Brigitte","family":"Vall\u00e9e","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Caen, Caen Cedex, France, and INRIA-Rocquencourt, Le Chesnay, France"}]}],"member":"320","published-online":{"date-parts":[[2006,1]]},"reference":[{"volume-title":"Algebraic Geometry for Scientists and Engineers","author":"Abhyankar S.-S.","key":"e_1_2_1_1_1","unstructured":"Abhyankar , S.-S. 1990. Algebraic Geometry for Scientists and Engineers . American Mathematical Society Providence , R.I .]] Abhyankar, S.-S. 1990. Algebraic Geometry for Scientists and Engineers. American Mathematical Society Providence, R.I.]]"},{"volume-title":"Four Walls Eight Windows","author":"Aczel A.","key":"e_1_2_1_2_1","unstructured":"Aczel , A. 2000. The Mystery of the Aleph. Mathematics, the Kabbalah, and the Search for Infinity , Four Walls Eight Windows , New York .]] Aczel, A. 2000. The Mystery of the Aleph. Mathematics, the Kabbalah, and the Search for Infinity, Four Walls Eight Windows, New York.]]"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1006\/inco.2002.3143"},{"volume-title":"Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM-03)","author":"Atallah M.","key":"e_1_2_1_4_1","unstructured":"Atallah , M. , Gwadera , R. , and Szpankowski , W . 2003. Reliable detection of episodes in event sequences . In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM-03) (Melbourne, FL), Nov. IEEE Computer Science Press, Los Alamitos, CA, 67--74.]] Atallah, M., Gwadera, R., and Szpankowski, W. 2003. Reliable detection of episodes in event sequences. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM-03) (Melbourne, FL), Nov. IEEE Computer Science Press, Los Alamitos, CA, 67--74.]]"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/0097-3165(73)90038-1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1006\/eujc.1993.1030"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/0097-3165(83)90012-2"},{"key":"e_1_2_1_8_1","volume-title":"Probability and Measure","author":"Billingsley P.","unstructured":"Billingsley , P. 1986. Probability and Measure , 2 nd Ed., John Wiley , New York .]] Billingsley, P. 1986. Probability and Measure, 2nd Ed., John Wiley, New York.]]","edition":"2"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/303976.304008"},{"key":"e_1_2_1_10_1","first-page":"229","volume-title":"Proceedings of the Colloquium on Mathematics and Computer Science","author":"Bourdon J.","unstructured":"Bourdon , J. , and Vall\u00e9e , B . 2002. Generalized pattern matching statistics . In Proceedings of the Colloquium on Mathematics and Computer Science ( Versailles, France). B. Chauvin et al. Eds. Birkh\u00e4user Verlag , pp. 229 -- 245 .]] Bourdon, J., and Vall\u00e9e, B. 2002. Generalized pattern matching statistics. In Proceedings of the Colloquium on Mathematics and Computer Science (Versailles, France). B. Chauvin et al. Eds. Birkh\u00e4user Verlag, pp. 229--245.]]"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02679623"},{"key":"e_1_2_1_12_1","unstructured":"Crochemore M. and Rytter W. 1994. Text Algorithms Oxford University Press New York.]]   Crochemore M. and Rytter W. 1994. Text Algorithms Oxford University Press New York.]]"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Das G. Fleischer R. Gasieniec L. Gunopulos D. and \n      K\u00e4rkk\u00e4inen J\n  . \n  1997\n  . Episode matching In Combinatorial Pattern Matching 8th \n  Annual Symposium Lecture Notes in Computer Science\n  . 12--27.]]   Das G. Fleischer R. Gasieniec L. Gunopulos D. and K\u00e4rkk\u00e4inen J. 1997. Episode matching In Combinatorial Pattern Matching 8th Annual Symposium Lecture Notes in Computer Science. 12--27.]]","DOI":"10.1007\/3-540-63220-4_46"},{"volume-title":"Large Deviations","author":"den Hollander F.","key":"e_1_2_1_14_1","unstructured":"den Hollander , F. 2000. Large Deviations . American Mathematical Society , Providence, RI .]] den Hollander, F. 2000. Large Deviations. American Mathematical Society, Providence, RI.]]"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00348756"},{"key":"e_1_2_1_16_1","unstructured":"Flajolet P. and Sedgewick R. 2005. Analytic Combinatorics In prep. (Available electronically at http:algo.inria.fr\/flajolet\/Publications.)]]   Flajolet P. and Sedgewick R. 2005. Analytic Combinatorics In prep. (Available electronically at http:algo.inria.fr\/flajolet\/Publications.)]]"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/0012-365X(93)90364-Y"},{"key":"e_1_2_1_18_1","volume-title":"Deutscher Verlag der Wissenschaften","author":"Gantmacher F. R.","year":"1966","unstructured":"Gantmacher , F. R. 1986. Matrizentheorie . Deutscher Verlag der Wissenschaften , Berlin, Germany . (A translation of the Russian original Teoria Matriz, Nauka Moscow, Russia , 1966 .)]] Gantmacher, F. R. 1986. Matrizentheorie. Deutscher Verlag der Wissenschaften, Berlin, Germany. (A translation of the Russian original Teoria Matriz, Nauka Moscow, Russia, 1966.)]]"},{"key":"e_1_2_1_19_1","unstructured":"Gnedenko B. V. and Kolmogorov A. N. 1968. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley Reading MA.]]  Gnedenko B. V. and Kolmogorov A. N. 1968. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley Reading MA.]]"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/0097-3165(81)90038-8"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/0097-3165(81)90005-4"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Guivarc'h Y. 2000. Marches al\u00e9atoires sur les groupes. Fascicule de probabilit\u00e9s. Publ. Inst. Rech. Math. Rennes.]]  Guivarc'h Y. 2000. Marches al\u00e9atoires sur les groupes. Fascicule de probabilit\u00e9s. Publ. Inst. Rech. Math. Rennes.]]","DOI":"10.1007\/978-3-0348-8968-1_19"},{"volume-title":"Proceedings of the 4th IEEE International Conference on Data Mining (ICDM-04)","author":"Gwadera R.","key":"e_1_2_1_23_1","unstructured":"Gwadera , R. , Atallah , M. , and Szpankowski , W . 2004. Detection of significant sets of episodes in event sequences . In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM-04) (Brighton, UK, Nov.) IEEE Computer Science Press, Los Alamitos, CA, 3--10.]] Gwadera, R., Atallah, M., and Szpankowski, W. 2004. Detection of significant sets of episodes in event sequences. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM-04) (Brighton, UK, Nov.) IEEE Computer Science Press, Los Alamitos, CA, 3--10.]]"},{"volume-title":"Proceedings of the 2005 SIAM International Conference on Data Mining (SDM-05)","author":"Gwadera R.","key":"e_1_2_1_24_1","unstructured":"Gwadera , R. , Atallah , M. , and Szpankowski , W . 2005. Markov models for identification of significant episodes . In Proceedings of the 2005 SIAM International Conference on Data Mining (SDM-05) (Newport Beach, Cal. Apr.). SIAM, Philadelphia, PA.]] Gwadera, R., Atallah, M., and Szpankowski, W. 2005. Markov models for identification of significant episodes. In Proceedings of the 2005 SIAM International Conference on Data Mining (SDM-05) (Newport Beach, Cal. Apr.). SIAM, Philadelphia, PA.]]"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoap\/1034968075"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1006\/eujc.1997.0179"},{"volume-title":"Gaussian Hilbert spaces","author":"Janson S.","key":"e_1_2_1_27_1","unstructured":"Janson , S. 1997. Gaussian Hilbert spaces , Cambridge University Press , Cambridge, MA .]] Janson, S. 1997. Gaussian Hilbert spaces, Cambridge University Press, Cambridge, MA.]]"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/rsa.v24:3"},{"volume-title":"Perturbation theory for Linear Operators","author":"Kato T.","key":"e_1_2_1_29_1","unstructured":"Kato , T. 1980. Perturbation theory for Linear Operators . Springer-Verlag , New York .]] Kato, T. 1980. Perturbation theory for Linear Operators. Springer-Verlag, New York.]]"},{"key":"e_1_2_1_30_1","volume-title":"Fundamental Algorithms","author":"Knuth D. E.","unstructured":"Knuth , D. E. 1997. The Art of Computer Programming , Fundamental Algorithms , Vol. 1 , 3 rd Ed., Addison-Wesley , Reading, MA .]] Knuth, D. E. 1997. The Art of Computer Programming, Fundamental Algorithms, Vol. 1, 3rd Ed., Addison-Wesley, Reading, MA.]]","edition":"3"},{"key":"e_1_2_1_31_1","volume-title":"The Art of Computer Programming. Sorting and Searching","author":"Knuth D. E.","unstructured":"Knuth , D. E. 1998. The Art of Computer Programming. Sorting and Searching , Vol. 3 , 2 nd Ed., Addison-Wesley, Reading , MA .]] Knuth, D. E. 1998. The Art of Computer Programming. Sorting and Searching, Vol. 3, 2nd Ed., Addison-Wesley, Reading, MA.]]","edition":"2"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3975(97)88195-9"},{"volume-title":"Proceedings of the National Computer Security Conference. 11--21","author":"Kumar S.","key":"e_1_2_1_33_1","unstructured":"Kumar , S. , and Spafford , E. H . 1994. A pattern-matching model for intrusion detection . In Proceedings of the National Computer Security Conference. 11--21 .]] Kumar, S., and Spafford, E. H. 1994. A pattern-matching model for intrusion detection. In Proceedings of the National Computer Security Conference. 11--21.]]"},{"volume-title":"Encyclopedia of Mathematics and Its Applications","author":"Lothaire M.","key":"e_1_2_1_34_1","unstructured":"Lothaire , M. 2005. Applied Combinatorics on Words . In Encyclopedia of Mathematics and Its Applications , vol. 90 . Cambridge University Press , Cambridge, MA .]] Lothaire, M. 2005. Applied Combinatorics on Words. In Encyclopedia of Mathematics and Its Applications, vol. 90. Cambridge University Press, Cambridge, MA.]]"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1009212243"},{"volume-title":"Proceedings of the European Symposium on Algorithms. IEEE Computer Society Press","author":"Nicod\u00e8me P.","key":"e_1_2_1_36_1","unstructured":"Nicod\u00e8me , P. , Salvy , B. , and Flajolet , P . 1999. Motif statistics . In Proceedings of the European Symposium on Algorithms. IEEE Computer Society Press , Los Alamitos, CA, No. 1643. 194--211.]] Nicod\u00e8me, P., Salvy, B., and Flajolet, P. 1999. Motif statistics. In Proceedings of the European Symposium on Algorithms. IEEE Computer Society Press, Los Alamitos, CA, No. 1643. 194--211.]]"},{"volume-title":"Computational Molecular Biology: An Algorithmic Approach","author":"Pevzner P.","key":"e_1_2_1_37_1","unstructured":"Pevzner , P. 2000. Computational Molecular Biology: An Algorithmic Approach , MIT Press , Cambridge, MA .]] Pevzner, P. 2000. Computational Molecular Biology: An Algorithmic Approach, MIT Press, Cambridge, MA.]]"},{"volume-title":"Proceedings of the Compression and Complexity of Sequences, IEEE Computer Society Press","author":"R\u00e9gnier M.","key":"e_1_2_1_38_1","unstructured":"R\u00e9gnier , M. , and Szpankowski , W . 1997. On the approximate pattern occurrences in a text . In Proceedings of the Compression and Complexity of Sequences, IEEE Computer Society Press , Los Alamitos, CA, 253--264.]] R\u00e9gnier, M., and Szpankowski, W. 1997. On the approximate pattern occurrences in a text. In Proceedings of the Compression and Complexity of Sequences, IEEE Computer Society Press, Los Alamitos, CA, 253--264.]]"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/PL00009244"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1006\/mben.2000.0151"},{"key":"e_1_2_1_41_1","unstructured":"Sedgewick R. and Flajolet P. 1995. An Introduction to the Analysis of Algorithms Addison-Wesley Reading MA.]]   Sedgewick R. and Flajolet P. 1995. An Introduction to the Analysis of Algorithms Addison-Wesley Reading MA.]]"},{"volume-title":"Probability Theory and Combinatorial Optimization","author":"Steele J. M.","key":"e_1_2_1_42_1","unstructured":"Steele , J. M. 1997. Probability Theory and Combinatorial Optimization . SIAM , Philadelphia, PA .]] Steele, J. M. 1997. Probability Theory and Combinatorial Optimization. SIAM, Philadelphia, PA.]]"},{"volume-title":"Average Case Analysis of Algorithms on Sequences","author":"Szpankowski W.","key":"e_1_2_1_43_1","unstructured":"Szpankowski , W. 2001. Average Case Analysis of Algorithms on Sequences . Wiley , New York .]] Szpankowski, W. 2001. Average Case Analysis of Algorithms on Sequences. Wiley, New York.]]"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02679622"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0923-2508(99)00115-1"},{"volume-title":"Introduction to Computational Biology","author":"Waterman M.","key":"e_1_2_1_46_1","unstructured":"Waterman , M. 1995. Introduction to Computational Biology . Chapman and Hall , London, England .]] Waterman, M. 1995. Introduction to Computational Biology. Chapman and Hall, London, England.]]"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/1297828.1297830"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1177010393"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/135239.135244"}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1120582.1120586","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1120582.1120586","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:13:56Z","timestamp":1750277636000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1120582.1120586"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,1]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,1]]}},"alternative-id":["10.1145\/1120582.1120586"],"URL":"https:\/\/doi.org\/10.1145\/1120582.1120586","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"type":"print","value":"0004-5411"},{"type":"electronic","value":"1557-735X"}],"subject":[],"published":{"date-parts":[[2006,1]]},"assertion":[{"value":"2006-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}