{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T13:06:14Z","timestamp":1773147974786,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The rapid accumulation of single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), brings the opportunities and needs to understand and predict their disease association. Currently published attributes are limited, the detailed mechanisms governing the disease association of a SAP remain unclear and thus, further investigation of new attributes and improvement of the prediction are desired.<\/jats:p><jats:p>Results: A SAP dataset was compiled from the Swiss-Prot variant pages. We extracted and demonstrated the effectiveness of several new biologically informative attributes including the structural neighbor profiles that describe the SAP's microenvironment, nearby functional sites that measure the structure-based and sequence-based distances between the SAP site and its nearby functional sites, aggregation properties that measure the likelihood of protein aggregation and disordered regions that consider whether the SAP is located in structurally disordered regions. The new attributes provided insights into the mechanisms of the disease association of SAPs. We built a support vector machines (SVMs) classifier employing a carefully selected set of new and previously published attributes. Through a strict protein-level 5-fold cross-validation, we attained an overall accuracy of 82.61%, and an MCC of 0.60. Moreover, a web server was developed to provide a user-friendly interface for biologists.<\/jats:p><jats:p>Availability: The web server is available at http:\/\/sapred.cbi.pku.edu.cn\/<\/jats:p><jats:p>Contact: \u00a0sapred@mail.cbi.pku.edu.cn<\/jats:p><jats:p>Supplementary information: Supplementary data are available at http:\/\/sapred.cbi.pku.edu.cn\/supp.do<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm119","type":"journal-article","created":{"date-parts":[[2007,3,24]],"date-time":"2007-03-24T23:57:42Z","timestamp":1174780662000},"page":"1444-1450","source":"Crossref","is-referenced-by-count":48,"title":["Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP)"],"prefix":"10.1093","volume":"23","author":[{"given":"Zhi-Qiang","family":"Ye","sequence":"first","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]},{"given":"Shu-Qi","family":"Zhao","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]},{"given":"Ge","family":"Gao","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]},{"given":"Xiao-Qiao","family":"Liu","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]},{"given":"Robert E.","family":"Langlois","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]},{"given":"Hui","family":"Lu","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]},{"given":"Liping","family":"Wei","sequence":"additional","affiliation":[{"name":"1 Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. China and 2Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,3,24]]},"reference":[{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"2185","DOI":"10.1093\/bioinformatics\/bti365","article-title":"Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information","volume":"21","author":"Bao","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"W480","DOI":"10.1093\/nar\/gki372","article-title":"nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms","volume":"33","author":"Bao","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"6486","DOI":"10.1093\/nar\/gki949","article-title":"Kernel-based machine learning protocol for predicting DNA-binding proteins","volume":"33","author":"Bhardwaj","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1016\/j.jmb.2006.03.039","article-title":"Structural bioinformatics prediction of membrane-binding proteins","volume":"359","author":"Bhardwaj","year":"2006","journal-title":"J. Mol. Biol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1002\/humu.20063","article-title":"Bayesian approach to discovering pathogenic SNPs in conserved protein domains","volume":"24","author":"Cai","year":"2004","journal-title":"Hum. Mutat"},{"key":"2023041105082933200_","article-title":"LIBSVM: a library for support vector machines","author":"Chang","year":"2001"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"16419","DOI":"10.1073\/pnas.212527999","article-title":"Studies of the aggregation of mutant proteins in vitro provide insights into the genetics of amyloid diseases","volume":"99","author":"Chiti","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1186\/1471-2105-7-217","article-title":"Predicting deleterious nsSNPs: an analysis of sequence and structural attributes","volume":"7","author":"Dobson","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1038\/nbt0901-805","article-title":"The protein trinity \u2014 linking function and disorder","volume":"19","author":"Dunker","year":"2001","journal-title":"Nat. Biotechnol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"1302","DOI":"10.1038\/nbt1012","article-title":"Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins","volume":"22","author":"Fernandez-Escamilla","year":"2004","journal-title":"Nat. Biotechnol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1002\/prot.20252","article-title":"Sequence-based prediction of pathological mutations","volume":"57","author":"Ferrer-Costa","year":"2004","journal-title":"Proteins"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1542\/pir.27.10.363","article-title":"Back to basics: primary immune deficiencies: windows into the immune system","volume":"27","author":"Fleisher","year":"2006","journal-title":"Pediatr. Rev"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"D516","DOI":"10.1093\/nar\/gkh111","article-title":"HGVbase: a curated resource describing human DNA variation and phenotype relationships","volume":"32","author":"Fredman","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1038\/nature02168","article-title":"The International HapMap Project","volume":"426","author":"Gibbs","year":"2003","journal-title":"Nature"},{"key":"2023041105082933200_","article-title":"\u2018NACCESS\u2019","volume-title":"Computer Program.","author":"Hubbard","year":"1993"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"2814","DOI":"10.1093\/bioinformatics\/bti442","article-title":"LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources","volume":"21","author":"Karchin","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1002\/(SICI)1098-1004(200001)15:1<45::AID-HUMU10>3.0.CO;2-T","article-title":"Human gene mutation database-a biomedical information and research resource","volume":"15","author":"Krawczak","year":"2000","journal-title":"Hum. Mutat"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"2199","DOI":"10.1093\/bioinformatics\/btg297","article-title":"A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function","volume":"19","author":"Krishnan","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1038\/85776","article-title":"Variation is the spice of life","volume":"27","author":"Kruglyak","year":"2001","journal-title":"Nat. Genet"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1504\/IJBRA.2006.007909","article-title":"Improved protein fold assignment using support vector machines","volume":"1","author":"Langlois","year":"2006","journal-title":"International Journal of Bioinformatics Research and Applications"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"918","DOI":"10.1002\/ajmg.b.30436","article-title":"Addiction molecular genetics: 639,401 SNP whole genome association identifies many \u201ccell adhesion\u201d genes","volume":"141","author":"Liu","year":"2006","journal-title":"Am. J. Med. Genet. B. Neuropsychiatr. Genet"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthew","year":"1985","journal-title":"Biochim. Biophys. Acta"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1006\/jmbi.1994.1334","article-title":"Satisfying hydrogen bonding potential in proteins","volume":"238","author":"McDonald","year":"1994","journal-title":"J. Mol. Biol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1101\/gr.176601","article-title":"Predicting deleterious amino acid substitutions","volume":"11","author":"Ng","year":"2001","journal-title":"Genome Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1016\/0022-2836(84)90049-4","article-title":"An analysis of incorrectly folded protein models. Implications for structure predictions","volume":"177","author":"Novotny","year":"1984","journal-title":"J. Mol. Biol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"R9","DOI":"10.1093\/hmg\/ddl044","article-title":"Influence of human genome polymorphism on gene expression","volume":"15","author":"Pastinen","year":"2006","journal-title":"Hum. Mol. Genet"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"3894","DOI":"10.1093\/nar\/gkf493","article-title":"Human non-synonymous SNPs: server and survey","volume":"30","author":"Ramensky","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/ng1133","article-title":"Quality and completeness of SNP databases","volume":"33","author":"Reich","year":"2003","journal-title":"Nat. Genet"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1093\/nar\/gkg070","article-title":"IMGT\/HLA and IMGT\/MHC: sequence databases for the study of the major histocompatibility complex","volume":"31","author":"Robinson","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1006\/jmbi.1993.1626","article-title":"Comparative protein modelling by satisfaction of spatial restraints","volume":"234","author":"Sali","year":"1993","journal-title":"J. Mol. Biol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1016\/S0022-2836(02)00813-6","article-title":"Evaluation of structural and evolutionary contributions to deleterious mutation prediction","volume":"322","author":"Saunders","year":"2002","journal-title":"J. Mol. Biol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1016\/0022-2836(86)90165-8","article-title":"Information content of binding sites on nucleotide sequences","volume":"188","author":"Schneider","year":"1986","journal-title":"J. Mol. Biol"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1093\/hmg\/10.6.591","article-title":"Prediction of deleterious human alleles","volume":"10","author":"Sunyaev","year":"2001","journal-title":"Hum. Mol. Genet"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1093\/bioinformatics\/bth476","article-title":"DisProt: a database of protein disorder","volume":"21","author":"Vucetic","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1002\/humu.22","article-title":"SNPs, protein structure, and disease","volume":"17","author":"Wang","year":"2001","journal-title":"Hum. Mutat"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1142\/S0219720003000150","article-title":"Recognizing complex, asymmetric functional sites in protein structures using a Bayesian scoring function","volume":"1","author":"Wei","year":"2003","journal-title":"J. Bioinform. Comput. Biol"},{"key":"2023041105082933200_","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques.","author":"Witten","year":"2005"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1002\/humu.20021","article-title":"The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants","volume":"23","author":"Yip","year":"2004","journal-title":"Hum. Mutat"},{"key":"2023041105082933200_","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1016\/j.jmb.2005.12.025","article-title":"Identification and analysis of deleterious human SNPs","volume":"356","author":"Yue","year":"2006","journal-title":"J. Mol. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/12\/1444\/49812804\/bioinformatics_23_12_1444.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/12\/1444\/49812804\/bioinformatics_23_12_1444.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,12]],"date-time":"2024-02-12T10:27:31Z","timestamp":1707733651000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/12\/1444\/223624"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,3,24]]},"references-count":40,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2007,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm119","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,6,15]]},"published":{"date-parts":[[2007,3,24]]}}}