{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T21:19:53Z","timestamp":1761513593844},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Cytochrome P450s are a family of enzymes responsible for the metabolism of approximately 90% of FDA-approved drugs. Medicinal chemists often want to know which atoms of a molecule\u2014its metabolized sites\u2014are oxidized by Cytochrome P450s in order to modify their metabolism. Consequently, there are several methods that use literature-derived, atom-resolution data to train models that can predict a molecule\u2019s sites of metabolism. There is, however, much more data available at a lower resolution, where the exact site of metabolism is not known, but the region of the molecule that is oxidized is known. Until now, no site-of-metabolism models made use of region-resolution data.<\/jats:p><jats:p>Results: Here, we describe XenoSite-Region, the first reported method for training site-of-metabolism models with region-resolution data. Our approach uses the Expectation Maximization algorithm to train a site-of-metabolism model. Region-resolution metabolism data was simulated from a large site-of-metabolism dataset, containing 2000 molecules with 3400 metabolized and 30\u2009000 un-metabolized sites and covering nine Cytochrome P450 isozymes. When training on the same molecules (but with only region-level information), we find that this approach yields models almost as accurate as models trained with atom-resolution data. Moreover, we find that atom-resolution trained models are more accurate when also trained with region-resolution data from additional molecules. Our approach, therefore, opens up a way to extend the applicable domain of site-of-metabolism models into larger regions of chemical space. This meets a critical need in drug development by tapping into underutilized data commonly available in most large drug companies.<\/jats:p><jats:p>Availability and implementation: The algorithm, data and a web server are available at http:\/\/swami.wustl.edu\/xregion.<\/jats:p><jats:p>Contact: \u00a0swamidass@wustl.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv100","type":"journal-article","created":{"date-parts":[[2015,2,20]],"date-time":"2015-02-20T22:35:15Z","timestamp":1424471715000},"page":"1966-1973","source":"Crossref","is-referenced-by-count":15,"title":["Extending P450 site-of-metabolism models with region-resolution data"],"prefix":"10.1093","volume":"31","author":[{"given":"Jed M.","family":"Zaretzki","sequence":"first","affiliation":[{"name":"Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63130, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael R.","family":"Browning","sequence":"additional","affiliation":[{"name":"Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63130, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tyler B.","family":"Hughes","sequence":"additional","affiliation":[{"name":"Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63130, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S. Joshua","family":"Swamidass","sequence":"additional","affiliation":[{"name":"Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63130, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,2,19]]},"reference":[{"key":"2023020115192207300_btv100-B1","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1021\/ci600397p","article-title":"One- to four-dimensional kernels for small molecules and predictive regression of physical, chemical, and biological properties","volume":"47","author":"Azencott","year":"2007","journal-title":"J. Chem. Inf. Model"},{"key":"2023020115192207300_btv100-B2","volume-title":"Bioinformatics: The Machine Learning Approach","author":"Baldi","year":"2001"},{"key":"2023020115192207300_btv100-B3","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.drudis.2007.01.007","article-title":"Current and future trends in the application of HPLC-MS to metabolite-identification studies","volume":"12","author":"Castro-Perez","year":"2007","journal-title":"Drug Disc. Today"},{"key":"2023020115192207300_btv100-B4","doi-asserted-by":"crossref","first-page":"2101","DOI":"10.1002\/cbdv.200900078","article-title":"Probabilistic prediction of the human cyp3a4 and cyp2d6 metabolism sites","volume":"6","author":"Dapkunas","year":"2009","journal-title":"Chem. Biodivers."},{"key":"2023020115192207300_btv100-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc.. B"},{"key":"2023020115192207300_btv100-B6","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1002\/jms.3123","article-title":"Metfusion: integration of compound identification strategies","volume":"48","author":"Gerlich","year":"2013","journal-title":"J. Mass Spectrom."},{"key":"2023020115192207300_btv100-B7","doi-asserted-by":"crossref","first-page":"E101","DOI":"10.1208\/aapsj080112","article-title":"Cytochrome P450s and other enzymes in drug metabolism and toxicity","volume":"8","author":"Guengerich","year":"2006","journal-title":"AAPS J."},{"key":"2023020115192207300_btv100-B8","doi-asserted-by":"crossref","first-page":"2333","DOI":"10.1093\/bioinformatics\/bts437","article-title":"Metabolite identification and molecular fingerprint prediction through machine learning","volume":"28","author":"Heinonen","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115192207300_btv100-B9","doi-asserted-by":"crossref","first-page":"847","DOI":"10.2174\/138920008786485092","article-title":"High throughput ADME screening: practical considerations, impact on the portfolio and enabler of in silico ADME models","volume":"9","author":"Hop","year":"2008","journal-title":"Curr. Drug Metab."},{"key":"2023020115192207300_btv100-B10","doi-asserted-by":"crossref","first-page":"3352","DOI":"10.1021\/ci4004688","article-title":"Dr-predictor: incorporating flexible docking with specialized electronic reactivity and machine learning techniques to predict CYP-mediated sites of metabolism","volume":"53","author":"Huang","year":"2013","journal-title":"J. Chem. Inf. Model."},{"key":"2023020115192207300_btv100-B11","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1137\/S1064827595287997","article-title":"A fast and highly quality multilevel scheme for partitioning irregular graphs","volume":"20","author":"Karypis","year":"1999","journal-title":"SIAM J. Sci. Comput."},{"key":"2023020115192207300_btv100-B12","first-page":"939","article-title":"Molgen-ms: evaluation of low resolution electron impact mass spectra with ms classification and exhaustive structure generation","volume":"15","author":"Kerber","year":"2001","journal-title":"Adv. Mass Spectrom."},{"key":"2023020115192207300_btv100-B13","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1021\/ci200542m","article-title":"Computational prediction of metabolism: sites, products, SAR, p450 enzyme dynamics, and mechanisms","volume":"52","author":"Kirchmair","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"2023020115192207300_btv100-B14","doi-asserted-by":"crossref","first-page":"3631","DOI":"10.1021\/jm030102a","article-title":"Modeling of human cytochrome p450-mediated drug metabolism using unsupervised machine learning approach","volume":"46","author":"Korolev","year":"2003","journal-title":"J. Med. Chem."},{"key":"2023020115192207300_btv100-B15","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1002\/prot.340070105","article-title":"An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences","volume":"7","author":"Lawrence","year":"1990","journal-title":"Proteins"},{"key":"2023020115192207300_btv100-B16","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1016\/S0140-6736(02)11203-7","article-title":"Clinical importance of the cytochromes p450","volume":"360","author":"Nebert","year":"2002","journal-title":"Lancet"},{"key":"2023020115192207300_btv100-B17","article-title":"Stardrop, version 4.3","author":"Optibrium Ltd","year":"2009"},{"key":"2023020115192207300_btv100-B18","doi-asserted-by":"crossref","first-page":"3417","DOI":"10.1021\/ac300304u","article-title":"Identifying the unknowns by aligning fragmentation trees","volume":"84","author":"Rasche","year":"2012","journal-title":"Anal. Chem."},{"key":"2023020115192207300_btv100-B19","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1137\/1026034","article-title":"Mixture densities, maximum likelihood and the EM algorithm","volume":"26","author":"Redner","year":"1984","journal-title":"SIAM Rev."},{"key":"2023020115192207300_btv100-B20","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1021\/ci400472j","article-title":"Metabolism site prediction based on xenobiotic structural formulae and pass prediction algorithm","volume":"54","author":"Rudik","year":"2014","journal-title":"J. Chem. Inf. Model."},{"key":"2023020115192207300_btv100-B21","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1021\/ml100016x","article-title":"SMARTCyp: a 2D method for prediction of cytochrome P450-mediated drug metabolism","volume":"1","author":"Rydberg","year":"2010","journal-title":"ACS Med. Chem. Lett."},{"key":"2023020115192207300_btv100-B22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-5-12","article-title":"Computational mass spectrometry for small molecules","volume":"5","author":"Scheubert","year":"2013","journal-title":"J. Cheminform."},{"key":"2023020115192207300_btv100-B23","article-title":"P450 SOM prediction, version 1.0","author":"Schr\u00f6dinger","year":"2011"},{"key":"2023020115192207300_btv100-B24","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.jchromb.2013.11.022","article-title":"Chemical and technical challenges in the analysis of central carbon metabolites by liquid-chromatography mass spectrometry","volume":"966","author":"Siegel","year":"2013","journal-title":"J. Chromatogr. B."},{"key":"2023020115192207300_btv100-B25","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1016\/1044-0305(95)00291-K","article-title":"Chemical substructure identification by mass spectral library searching","volume":"6","author":"Stein","year":"1995","journal-title":"J. Am. Soc. Mass Spectrom."},{"key":"2023020115192207300_btv100-B26","doi-asserted-by":"crossref","first-page":"i359","DOI":"10.1093\/bioinformatics\/bti1055","article-title":"Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity","volume":"21","author":"Swamidass","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020115192207300_btv100-B27","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1186\/1471-2105-11-148","article-title":"In silico fragmentation for computer assisted identification of metabolite mass spectra","volume":"11","author":"Wolf","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020115192207300_btv100-B28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.trac.2011.08.009","article-title":"Metabolite identification and quantitation in LC-MS\/MS-based metabolomics","volume":"32","author":"Xiao","year":"2012","journal-title":"TrAC Trends Anal. Chem."},{"key":"2023020115192207300_btv100-B29","doi-asserted-by":"crossref","first-page":"1667","DOI":"10.1021\/ci2000488","article-title":"RS-predictor: a new tool for predicting sites of cytochrome P450-mediated metabolism applied to CYP 3A4","volume":"51","author":"Zaretzki","year":"2011","journal-title":"J. Chem. Inf. Model."},{"key":"2023020115192207300_btv100-B30","doi-asserted-by":"crossref","first-page":"1637","DOI":"10.1021\/ci300009z","article-title":"Rs-predictor models augmented with smartcyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes","volume":"52","author":"Zaretzki","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"2023020115192207300_btv100-B31","doi-asserted-by":"crossref","first-page":"3373","DOI":"10.1021\/ci400518g","article-title":"Xenosite: accurately predicting CYP-mediated sites of metabolism with neural networks","volume":"53","author":"Zaretzki","year":"2013","journal-title":"J. Chem. Inf. Model."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/12\/1966\/49013803\/bioinformatics_31_12_1966.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/12\/1966\/49013803\/bioinformatics_31_12_1966.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T09:52:14Z","timestamp":1717753934000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/12\/1966\/214807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,2,19]]},"references-count":31,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv100","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,6,15]]},"published":{"date-parts":[[2015,2,19]]}}}