{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T15:27:05Z","timestamp":1772206025305,"version":"3.50.1"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2007,8,1]],"date-time":"2007-08-01T00:00:00Z","timestamp":1185926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2007,8]]},"abstract":"<jats:p>We study a problem of mining frequently occurring periodic patterns with a gap requirement from sequences. Given a character sequence<jats:italic>S<\/jats:italic>of length<jats:italic>L<\/jats:italic>and a pattern<jats:italic>P<\/jats:italic>of length<jats:italic>l<\/jats:italic>, we consider<jats:italic>P<\/jats:italic>a frequently occurring pattern in<jats:italic>S<\/jats:italic>if the probability of<jats:italic>observing<\/jats:italic><jats:italic>P<\/jats:italic>given a randomly picked length-<jats:italic>l<\/jats:italic>subsequence of<jats:italic>S<\/jats:italic>exceeds a certain threshold. In many applications, particularly those related to bioinformatics, interesting patterns are<jats:italic>periodic<\/jats:italic>with a<jats:italic>gap requirement<\/jats:italic>. That is to say, the characters in<jats:italic>P<\/jats:italic>should match subsequences of<jats:italic>S<\/jats:italic>in such a way that the matching characters in<jats:italic>S<\/jats:italic>are separated by gaps of more or less the same size. We show the complexity of the mining problem and discuss why traditional mining algorithms are computationally infeasible. We propose practical algorithms for solving the problem and study their characteristics. We also present a case study in which we apply our algorithms on some DNA sequences. We discuss some interesting patterns obtained from the case study.<\/jats:p>","DOI":"10.1145\/1267066.1267068","type":"journal-article","created":{"date-parts":[[2007,9,14]],"date-time":"2007-09-14T13:44:55Z","timestamp":1189777495000},"page":"7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":74,"title":["Mining periodic patterns with gap requirement from sequences"],"prefix":"10.1145","volume":"1","author":[{"given":"Minghua","family":"Zhang","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore"}]},{"given":"Ben","family":"Kao","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Hong Kong"}]},{"given":"David W.","family":"Cheung","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Hong Kong"}]},{"given":"Kevin Y.","family":"Yip","sequence":"additional","affiliation":[{"name":"Yale University, New Haven, CT"}]}],"member":"320","published-online":{"date-parts":[[2007,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/170035.170072"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-2836(05)80360-2"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/20.suppl.2019"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Bernardi G. Olofsson B. Filipski J. Zerial M. Salinas J. Cuny G. Meunier-Rotival M. and Rodier F. 1985. The mosaic genome of warm-blooded vertebrates. Science 228 4702 953--958. Bernardi G. Olofsson B. Filipski J. Zerial M. Salinas J. Cuny G. Meunier-Rotival M. and Rodier F. 1985. The mosaic genome of warm-blooded vertebrates. Science 228 4702 953--958.","DOI":"10.1126\/science.4001930"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/14.6.498"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/20.24.6441"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 15th International Conference on Data Engineering, (ICDE99)","author":"Han J.","unstructured":"Han , J. , Dong , G. , and Yin , Y . 1999. Efficient mining of partial periodic patterns in time series database . In Proceedings of the 15th International Conference on Data Engineering, (ICDE99) . 106--115. Han, J., Dong, G., and Yin, Y. 1999. Efficient mining of partial periodic patterns in time series database. In Proceedings of the 15th International Conference on Data Engineering, (ICDE99). 106--115."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/15.3.187"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB-00)","author":"Kurtz S.","unstructured":"Kurtz , S. , Ohlebusch , E. , Schleiermacher , C. , Stoye , J. , and Giegerich , R . 2000. Computation and visualization of degenerate repeats in complete genomes . In Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB-00) . Kurtz, S., Ohlebusch, E., Schleiermacher, C., Stoye, J., and Giegerich, R. 2000. Computation and visualization of degenerate repeats in complete genomes. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB-00)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009748302351"},{"key":"e_1_2_1_12_1","unstructured":"NCBI. The National Center for Biotechnology Information web site. http:\/\/www.ncbi.nlm.nih.gov. NCBI. The National Center for Biotechnology Information web site. http:\/\/www.ncbi.nlm.nih.gov."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 17th IEEE International Conference on Data Engineering (ICDE)","author":"Pei J.","unstructured":"Pei , J. , Han , J. , Mortazavi-Asl , B. , Pinto , H. , Chen , Q. , Dayal , U. , and Hsu , M . -C. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth . In Proceedings of the 17th IEEE International Conference on Data Engineering (ICDE) ( Heidelberg, Germany). IEEE Computer Society Press, Los Alamitos, CA. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.-C. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proceedings of the 17th IEEE International Conference on Data Engineering (ICDE) (Heidelberg, Germany). IEEE Computer Society Press, Los Alamitos, CA."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0955-0674(97)80009-9"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/bioinformatics\/14.1.55","article-title":"Combinatorial pattern discovery in biological sequences: the teiresias algorithm","volume":"14","author":"Rigoutsos I.","year":"1998","unstructured":"Rigoutsos , I. and Floratos , A. 1998 . Combinatorial pattern discovery in biological sequences: the teiresias algorithm . Bioinformatics 14 , 1 . Rigoutsos, I. and Floratos, A. 1998. Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics 14, 1.","journal-title":"Bioinformatics"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 5th Conference on Extending Database Technology (EDBT)","author":"Srikant R.","unstructured":"Srikant , R. and Agrawal , R . 1996. Mining sequential patterns: Generalizations and performance improvements . In Proceedings of the 5th Conference on Extending Database Technology (EDBT) ( Avignion, France). Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th Conference on Extending Database Technology (EDBT) (Avignion, France)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/PL00006541"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-4371(97)00510-4"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1128\/IAI.65.12.5017-5027.1997"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1006\/jmbi.1996.0341"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/347090.347150"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/288627.288643"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066228"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of IC-AI'2001","author":"Zhang M.","unstructured":"Zhang , M. , Kao , B. , Yip , C. , and Cheung , D . 2001. A GSP-based efficient algorithm for mining frequent sequences . In Proceedings of IC-AI'2001 ( Las Vegas, NV). Zhang, M., Kao, B., Yip, C., and Cheung, D. 2001. A GSP-based efficient algorithm for mining frequent sequences. In Proceedings of IC-AI'2001 (Las Vegas, NV)."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1267066.1267068","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1267066.1267068","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:52:14Z","timestamp":1750258334000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1267066.1267068"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,8]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2007,8]]}},"alternative-id":["10.1145\/1267066.1267068"],"URL":"https:\/\/doi.org\/10.1145\/1267066.1267068","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,8]]},"assertion":[{"value":"2007-08-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}