{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T14:08:25Z","timestamp":1774793305925,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T00:00:00Z","timestamp":1691020800000},"content-version":"vor","delay-in-days":2,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012190","name":"Ministry of Science and Higher Education of the Russian Federation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012190","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Na\u00efve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631\u20130.993; MCC: 0.191\u20130.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https:\/\/github.com\/SmirnygaTotoshka\/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad484","type":"journal-article","created":{"date-parts":[[2023,8,2]],"date-time":"2023-08-02T12:08:31Z","timestamp":1690978111000},"source":"Crossref","is-referenced-by-count":7,"title":["Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0160-8793","authenticated-orcid":false,"given":"Anton","family":"Zadorozhny","sequence":"first","affiliation":[{"name":"Department of Bioinformatics, Pirogov Russian National Research Medical University , Moscow 117513, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7510-1602","authenticated-orcid":false,"given":"Anton","family":"Smirnov","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Pirogov Russian National Research Medical University , Moscow 117513, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0339-8478","authenticated-orcid":false,"given":"Dmitry","family":"Filimonov","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow 119992, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1757-8004","authenticated-orcid":false,"given":"Alexey","family":"Lagunin","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Pirogov Russian National Research Medical University , Moscow 117513, Russia"},{"name":"Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow 119992, Russia"}]}],"member":"286","published-online":{"date-parts":[[2023,8,3]]},"reference":[{"key":"2023081801564033800_btad484-B1","article-title":"Predicting functional effect of human missense mutations using PolyPhen-2","volume":"7","author":"Adzhubei","year":"2013","journal-title":"Hum Genet"},{"key":"2023081801564033800_btad484-B2","doi-asserted-by":"crossref","first-page":"4480","DOI":"10.1038\/s41598-018-22531-2","article-title":"Prediction and interpretation of deleterious coding variants in terms of protein structural stability","volume":"8","author":"Ancien","year":"2018","journal-title":"Sci Rep"},{"key":"2023081801564033800_btad484-B3","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1002\/humu.21047","article-title":"Functional annotations improve the predictive score of human disease-related mutations in proteins","volume":"30","author":"Calabrese","year":"2009","journal-title":"Hum Mutat"},{"key":"2023081801564033800_btad484-B4","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/1471-2164-14-S3-S6","article-title":"WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation","volume":"14","author":"Capriotti","year":"2013","journal-title":"BMC Genomics"},{"key":"2023081801564033800_btad484-B5","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2164-14-S3-S3","article-title":"Identifying Mendelian disease genes with the variant effect scoring tool","volume":"14","author":"Carter","year":"2013","journal-title":"BMC Genomics"},{"key":"2023081801564033800_btad484-B6","doi-asserted-by":"crossref","first-page":"e46688","DOI":"10.1371\/journal.pone.0046688","article-title":"Predicting the functional effect of amino acid substitutions and indels","volume":"7","author":"Choi","year":"2012","journal-title":"PLoS One"},{"key":"2023081801564033800_btad484-B7","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1136\/amiajnl-2011-000309","article-title":"Utility of gene-specific algorithms for predicting pathogenicity of uncertain gene variants","volume":"19","author":"Crockett","year":"2012","journal-title":"J Am Med Inform Assoc"},{"key":"2023081801564033800_btad484-B10","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1007\/s10593-014-1496-1","article-title":"Prediction of the biological activity spectra of organic compounds using the pass online web resource","volume":"50","author":"Filimonov","year":"2014","journal-title":"Chem Heterocycl Comp"},{"key":"2023081801564033800_btad484-B11","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/978-1-62703-342-8_6","article-title":"On exploring structure-activity relationships","volume":"993","author":"Guha","year":"2013","journal-title":"Methods Mol Biol"},{"key":"2023081801564033800_btad484-B12","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1186\/s13321-020-00468-x","article-title":"Structure\u2013activity relationship-based chemical classification of highly imbalanced Tox21 datasets","volume":"12","author":"Idakwo","year":"2020","journal-title":"J Cheminform"},{"key":"2023081801564033800_btad484-B13","doi-asserted-by":"crossref","first-page":"1581","DOI":"10.1038\/ng.3703","article-title":"M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity","volume":"48","author":"Jagadeesh","year":"2016","journal-title":"Nat Genet"},{"key":"2023081801564033800_btad484-B14","doi-asserted-by":"crossref","first-page":"423","DOI":"10.18097\/PBMC20176305423","article-title":"Application of molecular descriptors for recognition of phosphorylation sites in amino acid sequences","volume":"63","author":"Karasev","year":"2017","journal-title":"Biomed Khim"},{"key":"2023081801564033800_btad484-B15","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/s13040-017-0126-8","article-title":"Meta-analytic support vector machine for integrating multiple omics data","volume":"10","author":"Kim","year":"2017","journal-title":"BioData Min"},{"key":"2023081801564033800_btad484-B16","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1093\/bioinformatics\/16.8.747","article-title":"PASS: prediction of activity spectra for biologically active substances","volume":"16","author":"Lagunin","year":"2000","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B17","doi-asserted-by":"crossref","first-page":"2062","DOI":"10.1093\/bioinformatics\/btt322","article-title":"DIGEP-Pred: web service for in silico prediction of drug-induced gene expression profiles based on structural formula","volume":"29","author":"Lagunin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B18","doi-asserted-by":"crossref","first-page":"710","DOI":"10.1093\/bioinformatics\/btx678","article-title":"ROSC-Pred: web-service for rodent organ-specific carcinogenicity prediction","volume":"34","author":"Lagunin","year":"2018","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B19","doi-asserted-by":"crossref","first-page":"1062","DOI":"10.1093\/nar\/gkx1153","article-title":"ClinVar: improving access to variant interpretations and supporting evidence","volume":"46","author":"Landrum","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B20","doi-asserted-by":"crossref","first-page":"2744","DOI":"10.1093\/bioinformatics\/btp528","article-title":"Automated inference of molecular mechanisms of disease from amino acid substitutions","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B21","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1186\/s13073-020-00803-9","article-title":"dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs","volume":"12","author":"Liu","year":"2020","journal-title":"Genome Med"},{"key":"2023081801564033800_btad484-B22","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1093\/nar\/gkx313","article-title":"PMut: a web-based tool for the annotation of pathological variants on proteins, 2017 update","volume":"45","author":"L\u00f3pez-Ferrando","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B36972098","doi-asserted-by":"crossref","first-page":"W154","DOI":"10.1093\/nar\/gkaa288","article-title":"LIST-S2: taxonomy based sorting of deleterious missense mutations across species","volume":"48","author":"Malhis","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B23","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B24","doi-asserted-by":"crossref","first-page":"3525","DOI":"10.1039\/D0CS00098A","article-title":"QSAR without borders","volume":"49","author":"Muratov","year":"2020","journal-title":"Chem Soc Rev"},{"key":"2023081801564033800_btad484-B26","doi-asserted-by":"crossref","first-page":"e0117380","DOI":"10.1371\/journal.pone.0117380","article-title":"PON-P2: prediction method for fast and reliable identification of harmful variants","volume":"10","author":"Niroula","year":"2015","journal-title":"PLoS One"},{"key":"2023081801564033800_btad484-B27","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1016\/B978-0-12-820519-8.00007-7","volume-title":"Translational and Applied Genomics, Clinical DNA Variant Interpretation","author":"\u00d6zkan","year":"2021"},{"key":"2023081801564033800_btad484-B28","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"JMLR"},{"key":"2023081801564033800_btad484-B29","doi-asserted-by":"crossref","first-page":"5918","DOI":"10.1038\/s41467-020-19669-x","article-title":"Inferring the molecular and phenotypic impact of amino acid variants with MutPred2","volume":"11","author":"Pejaver","year":"2020","journal-title":"Nat Commun"},{"key":"2023081801564033800_btad484-B30","doi-asserted-by":"crossref","first-page":"1349","DOI":"10.1021\/ci000383k","article-title":"Robustness of biological activity spectra predicting by computer program PASS for non-congeneric sets of chemical compounds","volume":"40","author":"Poroikov","year":"2000","journal-title":"J Chem Inf Comput Sci"},{"key":"2023081801564033800_btad484-B31","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1038\/s41467-020-20847-0","article-title":"MVP predicts the pathogenicity of missense variants by deep learning","volume":"12","author":"Qi","year":"2021","journal-title":"Nat Commun"},{"key":"2023081801564033800_btad484-B32","doi-asserted-by":"crossref","first-page":"e118","DOI":"10.1093\/nar\/gkr407","article-title":"Predicting the functional impact of protein mutations: application to cancer genomics","volume":"39","author":"Reva","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B33","first-page":"405","article-title":"Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American college of medical genetics and genomics and the association for molecular pathology","volume":"17","author":"Richards","year":"2015","journal-title":"Genet Med Off J Am College Med Genet"},{"key":"2023081801564033800_btad484-B34","doi-asserted-by":"crossref","first-page":"1013e24","DOI":"10.1002\/humu.23048","article-title":"The complementarity between protein-specific and general pathogenicity predictors for amino acid substitutions","volume":"37","author":"Riera","year":"2016","journal-title":"Hum Mutat"},{"key":"2023081801564033800_btad484-B35","doi-asserted-by":"crossref","first-page":"2046","DOI":"10.1093\/bioinformatics\/btv087","article-title":"SOMP: web server for in silico prediction of sites of metabolism for drug-like compounds","volume":"31","author":"Rudik","year":"2015","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B36","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/s41598-020-80113-7","article-title":"Prediction of pharmacological activities from chemical structures with graph convolutional neural networks","volume":"11","author":"Sakai","year":"2021","journal-title":"Sci Rep"},{"key":"2023081801564033800_btad484-B37","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B38","doi-asserted-by":"crossref","first-page":"1504","DOI":"10.1093\/bioinformatics\/btt182","article-title":"Predicting the functional consequences of cancer-associated amino acid substitutions","volume":"29","author":"Shihab","year":"2013","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B39","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1111\/imm.13641","article-title":"TCR-Pred: a new web-application for prediction of epitope and MHC specificity for CDR3 TCR sequences using molecular fragment descriptors","volume":"169","author":"Smirnov","year":"2023","journal-title":"Immunology"},{"key":"2023081801564033800_btad484-B40","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"The UniProt Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023081801564033800_btad484-B41","doi-asserted-by":"crossref","first-page":"1680","DOI":"10.1897\/01-198","article-title":"Structure-activity relationship approaches and applications","volume":"22","author":"Tong","year":"2003","journal-title":"Environ Toxicol Chem"},{"key":"2023081801564033800_btad484-B42","doi-asserted-by":"crossref","first-page":"2918","DOI":"10.1093\/bioinformatics\/btm437","article-title":"Accurate prediction of deleterious protein kinase polymorphisms","volume":"23","author":"Torkamani","year":"2007","journal-title":"Bioinformatics"},{"key":"2023081801564033800_btad484-B43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/nprot.2015.123","article-title":"SIFT missense predictions for genomes","volume":"11","author":"Vaser","year":"2016","journal-title":"Nat Protoc"},{"key":"2023081801564033800_btad484-B44","doi-asserted-by":"crossref","first-page":"5","DOI":"10.3389\/fgene.2020.00005","article-title":"Pathogenic gene prediction algorithm based on heterogeneous information fusion","volume":"11","author":"Wang","year":"2020","journal-title":"Front Genet"},{"key":"2023081801564033800_btad484-B9231484","doi-asserted-by":"crossref","first-page":"4626","DOI":"10.1093\/bioinformatics\/btab529","article-title":"3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints","volume":"37","author":"Won","year":"2021","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad484\/51032804\/btad484.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/8\/btad484\/51142433\/btad484.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/8\/btad484\/51142433\/btad484.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,18]],"date-time":"2023-08-18T01:57:16Z","timestamp":1692323836000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad484\/7236558"}},"subtitle":[],"editor":[{"given":"Valentina","family":"Boeva","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,8,1]]},"references-count":43,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad484","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,8,1]]},"published":{"date-parts":[[2023,8,1]]},"article-number":"btad484"}}