{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T16:13:46Z","timestamp":1654100026629},"reference-count":26,"publisher":"IGI Global","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,7,1]]},"abstract":"<p>Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors\u2019 new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.<\/p>","DOI":"10.4018\/jkdb.2012070101","type":"journal-article","created":{"date-parts":[[2013,6,20]],"date-time":"2013-06-20T16:03:53Z","timestamp":1371744233000},"page":"1-21","source":"Crossref","is-referenced-by-count":0,"title":["Classification of Tandem Repeats in the Human Genome"],"prefix":"10.4018","volume":"3","author":[{"given":"Yupu","family":"Liang","sequence":"first","affiliation":[{"name":"Department of Computer Science, City University of New York, New York, NY, USA"}]},{"given":"Dina","family":"Sokol","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Science, Brooklyn College of CUNY, Brooklyn, NY, USA"}]},{"given":"Sarah","family":"Zelikovitz","sequence":"additional","affiliation":[{"name":"Department of Computer Science, College of Staten Island of CUNY, Staten Island, NY, USA"}]},{"given":"Sarah Ita","family":"Levitan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Brooklyn College of CUNY, Brooklyn, NY, USA"}]}],"member":"2432","reference":[{"key":"jkdb.2012070101-0","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/27.2.573"},{"key":"jkdb.2012070101-1","first-page":"44","article-title":"A new distance measure for comparing sequence profiles based on path lengths along an entropy surface.","volume":"2002","author":"G.Benson","year":"2002","journal-title":"ECCB"},{"key":"jkdb.2012070101-2","doi-asserted-by":"crossref","unstructured":"Berkhin, P. (2006). A Survey of Clustering Data Mining Techniques. In J. Kogan, C. K. Nicholas, & M. Teboulle, Grouping Multidimensional Data: Recent Advances in Clustering (pp. 25-72). Springer.","DOI":"10.1007\/3-540-28349-8_2"},{"key":"jkdb.2012070101-3","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msp192"},{"key":"jkdb.2012070101-4","doi-asserted-by":"crossref","unstructured":"Galindo, H. L., McIver, L. J., Tae, H., McCormick, J. F., & Skinner et al., M. A. (2011, January 14). Sporadic breast cancer patients\u2019 germline DNA exhibit an AT-rich microsatellite signature. Genes, Chromosomes and Cancer, 50(4), pp. 275-283.","DOI":"10.1002\/gcc.20853"},{"key":"jkdb.2012070101-5","doi-asserted-by":"publisher","DOI":"10.1038\/nrg1691"},{"key":"jkdb.2012070101-6","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkl1013"},{"key":"jkdb.2012070101-7","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3975(03)00423-7"},{"key":"jkdb.2012070101-8","first-page":"100","article-title":"Algorithm AS 136: A K-Means Clustering Algorithm.","author":"J. A.Hartigan","year":"1979","journal-title":"Journal of the Royal Statistical Society. Series A (General)"},{"key":"jkdb.2012070101-9","first-page":"1","article-title":"1992 William Allan Award Address.","author":"A. J.Jeffreys","year":"1993","journal-title":"American Journal of Human Genetics"},{"key":"jkdb.2012070101-10","author":"L.Kaufman","year":"2008","journal-title":"Finding Groups in Data: An Introduction to Cluster Analysis"},{"key":"jkdb.2012070101-11","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkg617"},{"key":"jkdb.2012070101-12","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkq127"},{"key":"jkdb.2012070101-13","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btq042"},{"key":"jkdb.2012070101-14","first-page":"351","article-title":"DNA structures, repeat expansions and human hereditary disorders.","author":"S.Mirkin","year":"2008","journal-title":"Current Opinion in Structural Biology"},{"key":"jkdb.2012070101-15","unstructured":"Peleg, D., & Moore, A. (2000). X-Means: Extending K-Means with an Efficient Estimate of the Number of Clusters. Proceedings of the 17th International Conference in Machine Learning, (pp. 727-734). San Francisco, CA."},{"key":"jkdb.2012070101-16","first-page":"13","article-title":"Tandem Repeats Discovery Service (TReaDS) applied to finding voel Cis-acting factors in Repeat Expansion Diseases.","author":"M.Pellegrini","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"jkdb.2012070101-17","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btq033"},{"key":"jkdb.2012070101-18","unstructured":"Rao, S., Rodriquez, A., & Benson, G. (2005). Evaluating Distance Functions for Clustering Tandem Repeats. Genome Informatics, 3-12."},{"key":"jkdb.2012070101-19","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"jkdb.2012070101-20","article-title":"A Mathematical Theory of Computation.","author":"C.Shannon","year":"1948","journal-title":"The Bell System Technical Journal"},{"key":"jkdb.2012070101-21","doi-asserted-by":"crossref","unstructured":"Sokol, D., & Atagun, F. (2010). TReD - A Database for Tandem Repeats over the Edit Distance. Database: The Journal of Biological Databases and Curation.","DOI":"10.1093\/database\/baq003"},{"key":"jkdb.2012070101-22","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl309"},{"key":"jkdb.2012070101-23","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2005.05.002"},{"key":"jkdb.2012070101-24","unstructured":"Wexler, Y., Yakhini, Z., Kashi, Y., & Geiger, D. (2004). Finding Approximate Tandem Repeats in Genomic Sequences. Proc. of the 8th Ann. Conf. on Res. in Comp. Biol. (RECOMB, 223-232. Xu, R., & Wunsch, D. (2005). Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 645-678."},{"key":"jkdb.2012070101-25","doi-asserted-by":"crossref","unstructured":"Xu, R., & Wunsch, D. (2010). Clustering Algorithms in Biomedical Research. IEEE Reviews in Biomedical Engineering, 120-154.","DOI":"10.1109\/RBME.2010.2083647"}],"container-title":["International Journal of Knowledge Discovery in Bioinformatics"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=77808","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T15:35:01Z","timestamp":1654097701000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jkdb.2012070101"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2012,7,1]]},"references-count":26,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,7]]}},"URL":"https:\/\/doi.org\/10.4018\/jkdb.2012070101","relation":{},"ISSN":["1947-9115","1947-9123"],"issn-type":[{"value":"1947-9115","type":"print"},{"value":"1947-9123","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,1]]}}}