{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T08:48:31Z","timestamp":1762505311202},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"3-4","license":[{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"vor","delay-in-days":6005,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Single nucleotide polymorphisms (SNPs) are genetic variations that determine the differences between any two unrelated individuals. Various population groups can be distinguished from each other using SNPs. For instance, the HapMap dataset has four population groups with about ten million SNPs. For more insights on human evolution, ethnic variation, and population assignment, we propose to find out which SNPs are significant in determining the population groups and then to classify different populations using these relevant SNPs as input features. In this study, we developed a modified t-test ranking measure and applied it to the HapMap genotype data. Firstly, we rank all SNPs in comparison with other feature importance measures including F-statistics and the informativeness for assignment. Secondly, we select different numbers of the most highly ranked SNPs as the input to a classifier, such as the support vector machine, so as to find the best feature subset corresponding to the best classification accuracy. Experimental results showed that the proposed method is very effective in finding SNPs that are significant in determining the population groups, with reduced computational burden and better classification accuracy.<\/jats:p>","DOI":"10.1016\/s1672-0229(08)60011-x","type":"journal-article","created":{"date-parts":[[2008,2,11]],"date-time":"2008-02-11T12:15:33Z","timestamp":1202732133000},"page":"242-249","source":"Crossref","is-referenced-by-count":97,"title":["A Modified T-Test Feature Selection Method and Its Application on the HapMap Genotype Data"],"prefix":"10.1093","volume":"5","author":[{"given":"Nina","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Electrical and Electronic Engineering, Nanyang Technological University , Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lipo","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronic Engineering, Nanyang Technological University , Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,2,8]]},"reference":[{"key":"2024051008233074200_bib1","doi-asserted-by":"crossref","first-page":"i195","DOI":"10.1093\/bioinformatics\/bti1021","article-title":"Tag SNP selection in genotype data for maximizing SNP prediction accuracy","volume":"21","author":"Halperin","year":"2005","journal-title":"Bioinformatics"},{"key":"2024051008233074200_bib2","doi-asserted-by":"crossref","first-page":"1089","DOI":"10.1142\/S0219720005001521","article-title":"Effective algorithms for tag SNP selection","volume":"3","author":"Liu","year":"2005","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2024051008233074200_bib3","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1086\/425587","article-title":"Finding haplotype tagging SNPs by use of principal components analysis","volume":"75","author":"Liu","year":"2004","journal-title":"Am. J. Hum. Genet."},{"key":"2024051008233074200_bib4","first-page":"301","article-title":"Choosing SNPs using feature selection","author":"Phuong","year":"2005","journal-title":"Proc. IEEE Comput. Syst. Bioinform. Conf."},{"key":"2024051008233074200_bib5","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1006\/geno.1995.9003","article-title":"A comparison of linkage disequilibrium measures for fine-scale mapping","volume":"29","author":"Devlin","year":"1995","journal-title":"Genomics"},{"key":"2024051008233074200_bib6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1086\/321275","article-title":"Linkage disequilibrium in humans: models and data","volume":"69","author":"Pritchard","year":"2001","journal-title":"Am. J. Hum. Genet."},{"key":"2024051008233074200_bib7","doi-asserted-by":"crossref","first-page":"1402","DOI":"10.1086\/380416","article-title":"Informativeness of genetic markers for inference of ancestry","volume":"73","author":"Rosenberg","year":"2003","journal-title":"Am. J. Hum. Genet."},{"key":"2024051008233074200_bib8","doi-asserted-by":"crossref","first-page":"1183","DOI":"10.1089\/cmb.2005.12.1183","article-title":"Algorithms for selecting informative marker panels for population assignment","volume":"12","author":"Rosenberg","year":"2005","journal-title":"J. Comput. Biol."},{"key":"2024051008233074200_bib9","doi-asserted-by":"crossref","first-page":"395","DOI":"10.2307\/2406450","article-title":"The interpretation of population structure by F-statistics with special regard to systems of mating","volume":"19","author":"Wright","year":"1965","journal-title":"Evolution"},{"key":"2024051008233074200_bib10","article-title":"Statistics: The Exploration and Analysis of Data","author":"Devore","year":"1997","edition":"third edition"},{"key":"2024051008233074200_bib11","article-title":"Statistical Learning Theory","author":"Vapnik","year":"1998"},{"key":"2024051008233074200_bib12","doi-asserted-by":"crossref","DOI":"10.1007\/b95439","article-title":"Support Vector Machines: Theory and Applications","author":"Wang","year":"2005"},{"key":"2024051008233074200_bib13","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"2024051008233074200_bib14","article-title":"Data Mining with Computational Intelligence","author":"Wang","year":"2005"},{"key":"2024051008233074200_bib15","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2024051008233074200_bib16","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/TCBB.2007.1006","article-title":"Accurate cancer classification using expressions of very few genes","volume":"4","author":"Wang","year":"2007","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"2024051008233074200_bib17","first-page":"53","article-title":"Improved gene selection for classification of microarrays","author":"Jaeger","year":"2003","journal-title":"Pac. Symp. Biocomput."},{"key":"2024051008233074200_bib18","doi-asserted-by":"crossref","first-page":"1578","DOI":"10.1093\/bioinformatics\/btg179","article-title":"RankGene: identification of diagnostic genes based on expression data","volume":"19","author":"Su","year":"2003","journal-title":"Bioinformatics"},{"key":"2024051008233074200_bib19","doi-asserted-by":"crossref","first-page":"1636","DOI":"10.1093\/bioinformatics\/btg210","article-title":"Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data","volume":"19","author":"Wu","year":"2003","journal-title":"Bioinformatics"},{"key":"2024051008233074200_bib20","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1186\/1471-2105-6-68","article-title":"Feature selection and nearest centroid classification for protein mass spectrometry","volume":"6","author":"Levner","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2024051008233074200_bib21","article-title":"The Research Methods Knowledge Base","author":"Trochim","year":"2001","edition":"second edition"},{"key":"2024051008233074200_bib22","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1006\/jmbi.2001.4580","article-title":"A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach","volume":"308","author":"Hua","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2024051008233074200_bib23","doi-asserted-by":"crossref","first-page":"1667","DOI":"10.1109\/TPAMI.2002.1114861","article-title":"Input feature selection by mutual information based on Parzen window","volume":"24","author":"Kwak","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2024051008233074200_bib24","article-title":"A practical guide to support vector classification","author":"Hsu","year":"2003"},{"key":"2024051008233074200_bib25","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1109\/TNB.2006.875040","article-title":"An efficient semi-unsupervised gene selection method via spectral biclustering","volume":"5","author":"Liu","year":"2006","journal-title":"IEEE Trans. Nanobioscience"}],"container-title":["Genomics, Proteomics &amp; Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.elsevier.com\/content\/article\/PII:S167202290860011X?httpAccept=text\/xml","content-type":"text\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/api.elsevier.com\/content\/article\/PII:S167202290860011X?httpAccept=text\/plain","content-type":"text\/plain","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/academic.oup.com\/gpb\/article-pdf\/5\/3-4\/242\/57482909\/gpb_5_3-4_242.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gpb\/article-pdf\/5\/3-4\/242\/57482909\/gpb_5_3-4_242.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T08:24:11Z","timestamp":1715329451000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/gpb\/article\/5\/3-4\/242\/7210662"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,12,1]]},"references-count":25,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[2007,12,1]]}},"URL":"https:\/\/doi.org\/10.1016\/s1672-0229(08)60011-x","relation":{},"ISSN":["1672-0229","2210-3244"],"issn-type":[{"value":"1672-0229","type":"print"},{"value":"2210-3244","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2007,12]]},"published":{"date-parts":[[2007,12,1]]}}}