{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T02:47:15Z","timestamp":1768704435180,"version":"3.49.0"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N \u03b1 -terminal acetylation (N-Ac), two of the most common PTMs.<\/jats:p>\n               <jats:p>Results: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models.<\/jats:p>\n               <jats:p>Availability and implementation: Predictors for IMC and N-Ac and all datasets are freely available at http:\/\/terminus.unige.ch\/ .<\/jats:p>\n               <jats:p>Contact: \u00a0jean-luc.falcone@unige.ch<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu165","type":"journal-article","created":{"date-parts":[[2014,3,29]],"date-time":"2014-03-29T01:45:49Z","timestamp":1396057549000},"page":"1974-1982","source":"Crossref","is-referenced-by-count":9,"title":["Motifs tree: a new method for predicting post-translational modifications"],"prefix":"10.1093","volume":"30","author":[{"given":"Christophe","family":"Charpilloz","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"},{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"}]},{"given":"Anne-Lise","family":"Veuthey","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"}]},{"given":"Bastien","family":"Chopard","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"},{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"}]},{"given":"Jean-Luc","family":"Falcone","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"},{"name":"1 Department of Computer Science, University of Geneva, 1227 Carouge and 2 Swiss Institute of Bioinformatics, Centre M\u00e9dical Universitaire, Geneva 4, Switzerland"}]}],"member":"286","published-online":{"date-parts":[[2014,3,28]]},"reference":[{"key":"2023012711241927400_btu165-B1","volume-title":"Genetic Programming: An Introduction: on the Automatic Evolution of Computer Programs and its Applications","author":"Banzhaf","year":"1998"},{"key":"2023012711241927400_btu165-B2","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-84882-260-3","volume-title":"Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data","author":"Berthold","year":"2010","edition":"1st edn"},{"key":"2023012711241927400_btu165-B3","doi-asserted-by":"crossref","DOI":"10.1074\/mcp.M111.015131","article-title":"Comparative large scale characterization of plant versus mammal proteins reveals similar and idiosyncratic N-\u03b1-acetylation features","volume":"11","author":"Bienvenut","year":"2012","journal-title":"Mol. Cell. Proteomics"},{"key":"2023012711241927400_btu165-B4","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1006\/jmbi.1999.3310","article-title":"Sequence and structure-based prediction of eukaryotic protein phosphorylation sites","volume":"294","author":"Blom","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012711241927400_btu165-B5","doi-asserted-by":"crossref","first-page":"1626","DOI":"10.1002\/pmic.200300783","article-title":"N-terminal myristoylation predictions by ensembles of neural networks","volume":"4","author":"Bologna","year":"2004","journal-title":"Proteomics"},{"key":"2023012711241927400_btu165-B6","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/S0076-6879(96)66013-3","article-title":"Applying motif and profile searches","volume":"266","author":"Bork","year":"1996","journal-title":"Methods Enzymol."},{"key":"2023012711241927400_btu165-B7","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1016\/S0968-0004(98)01227-4","article-title":"N-terminal processing: the methionine aminopeptidase and N-\u03b1-acetyl transferase families","volume":"23","author":"Bradshaw","year":"1998","journal-title":"Trends Biochem. Sci."},{"key":"2023012711241927400_btu165-B8","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/S0097-8485(96)80003-9","article-title":"A flexible motif search technique based on generalized profiles","volume":"20","author":"Bucher","year":"1996","journal-title":"Comput. Chem."},{"key":"2023012711241927400_btu165-B9","doi-asserted-by":"crossref","first-page":"2392","DOI":"10.1021\/bi00605a022","article-title":"Primary structures of N-terminal extra peptide segments linked to the variable and constant regions of immunoglobulin light chain precursors: implications on the organization and controlled expression of immunoglobulin genes","volume":"17","author":"Burstein","year":"1978","journal-title":"Biochemistry"},{"key":"2023012711241927400_btu165-B10","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1016\/j.bbrc.2008.05.143","article-title":"Predicting N-terminal acetylation based on feature selection method","volume":"372","author":"Cai","year":"2008","journal-title":"Biochem. Biophys. Res. Commun."},{"key":"2023012711241927400_btu165-B11","doi-asserted-by":"crossref","first-page":"2249","DOI":"10.1093\/bioinformatics\/bts426","article-title":"Computational prediction of N-linked glycosylation incorporating structural properties and patterns","volume":"28","author":"Chuang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012711241927400_btu165-B12","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1007\/978-1-60327-241-4_21","article-title":"Prediction of posttranslational modification of proteins from their amino acid sequence","volume-title":"Data Mining Techniques for the Life Sciences. Methods in Molecular Biology","author":"Eisenhaber","year":"2010"},{"key":"2023012711241927400_btu165-B13","doi-asserted-by":"crossref","first-page":"2336","DOI":"10.1074\/mcp.M600225-MCP200","article-title":"The proteomics of N-terminal methionine cleavage","volume":"5","author":"Frottin","year":"2006","journal-title":"Mol. Cell. Proteomics"},{"key":"2023012711241927400_btu165-B14","doi-asserted-by":"crossref","first-page":"7403","DOI":"10.1128\/MCB.23.20.7403-7414.2003","article-title":"The yeast n-\u03b1-acetyltransferase nata is quantitatively anchored to the ribosome and interacts with nascent polypeptides","volume":"23","author":"Gautschi","year":"2003","journal-title":"Mol. Cell. Biol."},{"key":"2023012711241927400_btu165-B15","volume-title":"Genetic Algorithms in Search, Optimization and Machine Learning","author":"Goldberg","year":"1989","edition":"1st edn"},{"key":"2023012711241927400_btu165-B16","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1093\/bioinformatics\/18.8.1091","article-title":"Probabilistic alignment of motifs with sequences","volume":"18","author":"Gonnet","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012711241927400_btu165-B17"},{"key":"2023012711241927400_btu165-B18","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1093\/glycob\/cwm050","article-title":"Netcglyc 1.0: prediction of mammalian c-mannosylation sites","volume":"17","author":"Julenius","year":"2007","journal-title":"Glycobiology"},{"key":"2023012711241927400_btu165-B19","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/28.1.374","article-title":"AAindex: amino acid index database","volume":"28","author":"Kawashima","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012711241927400_btu165-B20","doi-asserted-by":"crossref","first-page":"20667","DOI":"10.1016\/S0021-9258(19)36737-7","article-title":"Isolation and characterization of the methionine aminopeptidase from porcine liver responsible for the co-translational processing of proteins","volume":"267","author":"Kendall","year":"1992","journal-title":"J. Biol. Chem."},{"key":"2023012711241927400_btu165-B21","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/0022-2836(82)90515-0","article-title":"A simple method for displaying the hydropathic character of a protein","volume":"157","author":"Kyte","year":"1982","journal-title":"J. Mol. Biol."},{"key":"2023012711241927400_btu165-B22","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1093\/bioinformatics\/bti130","article-title":"NetAcet: prediction of N-terminal acetylation sites","volume":"21","author":"Lars","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012711241927400_btu165-B23","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/S1672-0229(04)02032-7","article-title":"A novel method for N-terminal acetylation prediction","volume":"2","author":"Liu","year":"2004","journal-title":"Genomics Proteomics Bioinform."},{"key":"2023012711241927400_btu165-B24","doi-asserted-by":"crossref","first-page":"2809","DOI":"10.1002\/pmic.200701191","article-title":"Extent of N-terminal modifications in cytosolic proteins from eukaryotes","volume":"8","author":"Martinez","year":"2008","journal-title":"Proteomics"},{"key":"2023012711241927400_btu165-B25","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochim. Biophys. Acta"},{"key":"2023012711241927400_btu165-B26","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1016\/j.biochi.2005.03.011","article-title":"Processed N-termini of mature proteins in higher eukaryotes and their major contribution to dynamic proteomics","volume":"87","author":"Meinnel","year":"2005","journal-title":"Biochimie"},{"key":"2023012711241927400_btu165-B27","doi-asserted-by":"crossref","first-page":"1404","DOI":"10.1021\/bi00678a010","article-title":"Acetylation of nascent polypeptide chains on rat liver polyribosomes \n              in vivo\n               and \n              in vitro","volume":"14","author":"Pestana","year":"1975","journal-title":"Biochemistry"},{"key":"2023012711241927400_btu165-B28","doi-asserted-by":"crossref","first-page":"reviews0006","DOI":"10.1186\/gb-2002-3-5-reviews0006","article-title":"The diversity of acetylated proteins","volume":"3","author":"Polevoda","year":"2002","journal-title":"Genome Biol."},{"key":"2023012711241927400_btu165-B29","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1016\/S0022-2836(02)01269-X","article-title":"N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins","volume":"325","author":"Polevoda","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023012711241927400_btu165-B30","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1002\/jcb.21418","article-title":"Yeast n-\u03b1-terminal acetyltransferases are associated with ribosomes","volume":"103","author":"Polevoda","year":"2008","journal-title":"J. Cell. Biochem."},{"key":"2023012711241927400_btu165-B31","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1753-6561-3-S6-S2","article-title":"A synopsis of eukaryotic n-\u03b1-terminal acetyltransferases: nomenclature, subunits and substrates","volume":"3","author":"Polevoda","year":"2009","journal-title":"BMC Proc."},{"key":"2023012711241927400_btu165-B32","volume-title":"C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning)","author":"Quinlan","year":"1992","edition":"1st edn"},{"key":"2023012711241927400_btu165-B33","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1002\/prot.22555","article-title":"Identification, analysis, and prediction of protein ubiquitination sites","volume":"78","author":"Radivojac","year":"2010","journal-title":"Proteins"},{"key":"2023012711241927400_btu165-B34","volume-title":"Artificial Intelligence\u2014A Modern Approach","author":"Russell","year":"2010","edition":"3rd edn"},{"key":"2023012711241927400_btu165-B35","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1074\/mcp.M800332-MCP200","article-title":"Predicting protein post-translational modifications using meta-analysis of proteome scale data sets","volume":"8","author":"Schwartz","year":"2009","journal-title":"Mol. Cell. Proteomics"},{"key":"2023012711241927400_btu165-B36","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1042\/BJ20080658","article-title":"Identification of the human N-\u03b1-acetyltransferase complex b (hNatB): a complex important for cell-cycle progression","volume":"415","author":"Starheim","year":"2008","journal-title":"Biochem. J."},{"key":"2023012711241927400_btu165-B37","doi-asserted-by":"crossref","first-page":"3569","DOI":"10.1128\/MCB.01909-08","article-title":"Knockdown of human N-\u03b1-terminal acetyltransferase complex C leads to p53-dependent apoptosis and aberrant human Arl8b localization","volume":"29","author":"Starheim","year":"2009","journal-title":"Mol. Cell. Biol."},{"key":"2023012711241927400_btu165-B38","volume-title":"Posttranslational Modification of Proteins: Expanding Nature\u2019s Inventory","author":"Walsh","year":"2006"},{"key":"2023012711241927400_btu165-B39","doi-asserted-by":"crossref","first-page":"5588","DOI":"10.1021\/bi1005464","article-title":"Protein N-terminal processing: substrate specificity of \n              Escherichia coli\n               and human methionine aminopeptidases","volume":"49","author":"Xiao","year":"2010","journal-title":"Biochemistry"},{"key":"2023012711241927400_btu165-B40","doi-asserted-by":"crossref","first-page":"2946","DOI":"10.1039\/c2mb25185j","article-title":"Computational prediction and analysis of protein \u03b3-carboxylation sites based on a random forest method","volume":"8","author":"Zhang","year":"2012","journal-title":"Mol. Biosyst."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/14\/1974\/48924688\/bioinformatics_30_14_1974.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/14\/1974\/48924688\/bioinformatics_30_14_1974.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T11:55:15Z","timestamp":1674820515000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/14\/1974\/2390977"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3,28]]},"references-count":40,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2014,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu165","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,7,15]]},"published":{"date-parts":[[2014,3,28]]}}}