{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T20:35:36Z","timestamp":1776285336053,"version":"3.50.1"},"reference-count":66,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T00:00:00Z","timestamp":1708387200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T00:00:00Z","timestamp":1708387200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and\/or three-dimensional \u201cQSAR-ready\u201d forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the \u201cQSAR-ready\u201d workflow to generate \u201cMS-ready structures\u201d to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. <jats:italic>Scientific contribution<\/jats:italic>: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.<\/jats:p>","DOI":"10.1186\/s13321-024-00814-3","type":"journal-article","created":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T17:02:39Z","timestamp":1708448559000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":44,"title":["Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling"],"prefix":"10.1186","volume":"16","author":[{"given":"Kamel","family":"Mansouri","sequence":"first","affiliation":[]},{"given":"Jos\u00e9 T.","family":"Moreira-Filho","sequence":"additional","affiliation":[]},{"given":"Charles N.","family":"Lowe","sequence":"additional","affiliation":[]},{"given":"Nathaniel","family":"Charest","sequence":"additional","affiliation":[]},{"given":"Todd","family":"Martin","sequence":"additional","affiliation":[]},{"given":"Valery","family":"Tkachenko","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Judson","sequence":"additional","affiliation":[]},{"given":"Mike","family":"Conway","sequence":"additional","affiliation":[]},{"given":"Nicole C.","family":"Kleinstreuer","sequence":"additional","affiliation":[]},{"given":"Antony J.","family":"Williams","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,20]]},"reference":[{"key":"814_CR1","doi-asserted-by":"publisher","first-page":"1243","DOI":"10.1021\/acs.jcim.6b00129","volume":"56","author":"D Fourches","year":"2016","unstructured":"Fourches D, Muratov E, Tropsha A (2016) Trust, but verify II: a practical guide to chemogenomics data curation. J Chem Inf Model 56:1243\u20131252. https:\/\/doi.org\/10.1021\/acs.jcim.6b00129","journal-title":"J Chem Inf Model"},{"key":"814_CR2","doi-asserted-by":"publisher","first-page":"S10","DOI":"10.1186\/gb-2008-9-s2-s10","volume":"9","author":"B Alex","year":"2008","unstructured":"Alex B, Grover C, Haddow B et al (2008) Automating curation using a natural language processing pipeline. Genome Biol 9:S10. https:\/\/doi.org\/10.1186\/gb-2008-9-s2-s10","journal-title":"Genome Biol"},{"key":"814_CR3","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1007\/s10822-010-9401-1","volume":"25","author":"D Cao","year":"2011","unstructured":"Cao D, Liang Y, Xu Q et al (2011) Toward better QSAR\/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features. J Comput Aided Mol Des 25:67\u201380","journal-title":"J Comput Aided Mol Des"},{"key":"814_CR4","doi-asserted-by":"publisher","first-page":"1023","DOI":"10.1289\/ehp.1510267","volume":"124","author":"K Mansouri","year":"2016","unstructured":"Mansouri K, Abdelaziz A, Rybacka A et al (2016) CERAPP: collaborative estrogen receptor activity prediction project. Environ Health Perspect 124:1023\u20131033. https:\/\/doi.org\/10.1289\/ehp.1510267","journal-title":"Environ Health Perspect"},{"key":"814_CR5","doi-asserted-by":"publisher","first-page":"911","DOI":"10.1080\/1062936X.2016.1253611","volume":"27","author":"K Mansouri","year":"2016","unstructured":"Mansouri K, Grulke CM, Richard AM et al (2016) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Environ Res 27:911\u2013937. https:\/\/doi.org\/10.1080\/1062936X.2016.1253611","journal-title":"SAR QSAR Environ Res"},{"key":"814_CR6","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1021\/ci100176x","volume":"50","author":"D Fourches","year":"2010","unstructured":"Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50:1189\u20131204","journal-title":"J Chem Inf Model"},{"key":"814_CR7","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1016\/j.drudis.2012.02.013","volume":"17","author":"AJ Williams","year":"2012","unstructured":"Williams AJ, Ekins S, Tkachenko V (2012) Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov Today 17:685\u2013701. https:\/\/doi.org\/10.1016\/j.drudis.2012.02.013","journal-title":"Drug Discov Today"},{"key":"814_CR8","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1016\/j.drudis.2011.07.007","volume":"16","author":"AJ Williams","year":"2011","unstructured":"Williams AJ, Ekins S (2011) A quality alert and call for improved curation of public chemistry databases. Drug Discov Today 16:747\u2013750. https:\/\/doi.org\/10.1016\/j.drudis.2011.07.007","journal-title":"Drug Discov Today"},{"key":"814_CR9","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1021\/acs.chemrestox.2c00379","volume":"36","author":"CN Lowe","year":"2023","unstructured":"Lowe CN, Charest N, Ramsland C et al (2023) Transparency in modeling through careful application of OECD\u2019s QSAR\/QSPR principles via a curated water solubility data set. Chem Res Toxicol 36:465\u2013478. https:\/\/doi.org\/10.1021\/acs.chemrestox.2c00379","journal-title":"Chem Res Toxicol"},{"key":"814_CR10","doi-asserted-by":"publisher","first-page":"1337","DOI":"10.1002\/qsar.200810084","volume":"27","author":"D Young","year":"2008","unstructured":"Young D, Martin T, Venkatapathy R, Harten P (2008) Are the chemical structures in your QSAR correct? QSAR Comb Sci 27:1337\u20131345. https:\/\/doi.org\/10.1002\/qsar.200810084","journal-title":"QSAR Comb Sci"},{"key":"814_CR11","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1186\/s13321-015-0072-8","volume":"7","author":"K Karapetyan","year":"2015","unstructured":"Karapetyan K, Batchelor C, Sharpe D et al (2015) The chemical validation and standardization platform (CVSP): large-scale automated validation of chemical structure datasets. J Cheminformatics 7:30. https:\/\/doi.org\/10.1186\/s13321-015-0072-8","journal-title":"J Cheminformatics"},{"key":"814_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00456-1","volume":"12","author":"AP Bento","year":"2020","unstructured":"Bento AP, Hersey A, Felix E et al (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12:1\u201316","journal-title":"J Cheminform"},{"key":"814_CR13","doi-asserted-by":"publisher","unstructured":"Cretu MT, Toniato A, Thakkar A, Debabeche A, Laino T, Vaucher AC (2023) Standardizing chemical compounds with\nlanguage models. ChemRxiv. https:\/\/doi.org\/10.26434\/chemrxiv-2022-14ztf-v2","DOI":"10.26434\/chemrxiv-2022-14ztf-v2"},{"key":"814_CR14","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1186\/s13321-018-0293-8","volume":"10","author":"VD H\u00e4hnke","year":"2018","unstructured":"H\u00e4hnke VD, Kim S, Bolton EE (2018) PubChem chemical structure standardization. J Cheminform 10:36. https:\/\/doi.org\/10.1186\/s13321-018-0293-8","journal-title":"J Cheminform"},{"key":"814_CR15","unstructured":"Swain M (2023) MolVS: molecule validation and standardization. https:\/\/github.com\/mcs07\/MolVS. Accessed 8 Feb 2023"},{"key":"814_CR16","unstructured":"MolVS: molecule validation and standardization\u2014MolVS 0.1.1 documentation. https:\/\/molvs.readthedocs.io\/en\/latest\/. Accessed 11 Jan 2023"},{"key":"814_CR17","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1186\/s13321-022-00606-7","volume":"14","author":"D Dolciami","year":"2022","unstructured":"Dolciami D, Villasclaras-Fernandez E, Kannas C et al (2022) CanSAR chemistry registration and standardization pipeline. J Cheminform 14:28. https:\/\/doi.org\/10.1186\/s13321-022-00606-7","journal-title":"J Cheminform"},{"key":"814_CR18","unstructured":"Jeliazkova N, Kochev N, Jeliazkov V (2016) Ambitcli-3.0.2. https:\/\/zenodo.org\/records\/173560"},{"key":"814_CR19","doi-asserted-by":"crossref","unstructured":"Berthold MR, Cebron N, Dill F et al (2008) KNIME: the konstanz information miner. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications: proceedings of the 31st annual conference of the Gesellschaft f\u00fcr Klassifikation e.V., Albert-Ludwigs-Universit\u00e4t Freiburg, March 7\u20139, 2007. Springer, Berlin, pp 319\u2013326","DOI":"10.1007\/978-3-540-78246-9_38"},{"key":"814_CR20","unstructured":"Mansouri K (2016) OPERA: Command line application providing QSAR models predictions as well as\napplicability domain and accuracy assessment. Software GitHub repository. https:\/\/github.com\/kmansouri\/OPERA."},{"key":"814_CR21","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1186\/s13321-018-0263-1","volume":"10","author":"K Mansouri","year":"2018","unstructured":"Mansouri K, Grulke CM, Judson RS, Williams AJ (2018) OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform 10:10. https:\/\/doi.org\/10.1186\/s13321-018-0263-1","journal-title":"J Cheminform"},{"key":"814_CR22","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1186\/s13321-018-0299-2","volume":"10","author":"AD McEachran","year":"2018","unstructured":"McEachran AD, Mansouri K, Grulke C et al (2018) \u201cMS-Ready\u201d structures for non-targeted high-resolution mass spectrometry screening studies. J Cheminform 10:45. https:\/\/doi.org\/10.1186\/s13321-018-0299-2","journal-title":"J Cheminform"},{"key":"814_CR23","doi-asserted-by":"publisher","first-page":"100096","DOI":"10.1016\/j.comtox.2019.100096","volume":"12","author":"CM Grulke","year":"2019","unstructured":"Grulke CM, Williams AJ, Thillanadarajah I, Richard AM (2019) EPA\u2019s DSSTox database: history of development of a curated chemistry resource supporting computational toxicology research. Comput Toxicol 12:100096. https:\/\/doi.org\/10.1016\/j.comtox.2019.100096","journal-title":"Comput Toxicol"},{"key":"814_CR24","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1186\/s13321-017-0247-6","volume":"9","author":"AJ Williams","year":"2017","unstructured":"Williams AJ, Grulke CM, Edwards J et al (2017) The CompTox chemistry dashboard: a community data resource for environmental chemistry. J Cheminform 9:61. https:\/\/doi.org\/10.1186\/s13321-017-0247-6","journal-title":"J Cheminform"},{"key":"814_CR25","doi-asserted-by":"publisher","first-page":"027002","DOI":"10.1289\/EHP5580","volume":"128","author":"K Mansouri","year":"2020","unstructured":"Mansouri K, Nicole K, Abdelaziz AM et al (2020) CoMPARA: collaborative modeling project for androgen receptor activity. Environ Health Perspect 128:027002. https:\/\/doi.org\/10.1289\/EHP5580","journal-title":"Environ Health Perspect"},{"key":"814_CR26","doi-asserted-by":"publisher","first-page":"47013","DOI":"10.1289\/EHP8495","volume":"129","author":"K Mansouri","year":"2021","unstructured":"Mansouri K, Karmaus AL, Fitzpatrick J et al (2021) CATMoS: collaborative acute toxicity modeling suite. Environ Health Perspect 129:47013. https:\/\/doi.org\/10.1289\/EHP8495","journal-title":"Environ Health Perspect"},{"key":"814_CR27","doi-asserted-by":"publisher","first-page":"104916","DOI":"10.1016\/j.tiv.2020.104916","volume":"67","author":"S Bell","year":"2020","unstructured":"Bell S, Abedini J, Ceger P et al (2020) An integrated chemical environment with tools for chemical safety testing. Toxicol Vitro Int J Publ Assoc BIBRA 67:104916. https:\/\/doi.org\/10.1016\/j.tiv.2020.104916","journal-title":"Toxicol Vitro Int J Publ Assoc BIBRA"},{"key":"814_CR28","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1021\/acs.jcim.0c01273","volume":"61","author":"CN Lowe","year":"2021","unstructured":"Lowe CN, Williams AJ (2021) Enabling high-throughput searches for multiple chemical data using the US-EPA CompTox chemicals dashboard. J Chem Inf Model 61:565\u2013570. https:\/\/doi.org\/10.1021\/acs.jcim.0c01273","journal-title":"J Chem Inf Model"},{"key":"814_CR29","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1186\/s13321-021-00571-7","volume":"13","author":"SS Kolmar","year":"2021","unstructured":"Kolmar SS, Grulke CM (2021) The effect of noise on the predictive limit of QSAR models. J Cheminform 13:92. https:\/\/doi.org\/10.1186\/s13321-021-00571-7","journal-title":"J Cheminform"},{"key":"814_CR30","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798\u20131828. https:\/\/doi.org\/10.1109\/TPAMI.2013.50","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"814_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1021\/c160004a001","volume":"2","author":"WH Waldo","year":"1962","unstructured":"Waldo WH (1962) Searching two-dimensional structures by computer. J Chem Doc 2:1\u20132. https:\/\/doi.org\/10.1021\/c160004a001","journal-title":"J Chem Doc"},{"key":"814_CR32","unstructured":"Apodaca RL (2020) A guide to molecular standardization. http:\/\/depth-first.com\/articles\/2020\/07\/27\/a-guide-to-molecular-standardization\/. Accessed 11 Jan 2023"},{"key":"814_CR33","unstructured":"Anderson E, Veith G, Weininger D (1987) SMILES: a line notation and computerized interpreter for chemical\nstructures. https:\/\/api.semanticscholar.org\/CorpusID:64884759"},{"key":"814_CR34","doi-asserted-by":"publisher","DOI":"10.1021\/ci00007a012","author":"A Dalby","year":"1992","unstructured":"Dalby A, Nourse JG, Hounshell WD et al (1992) Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Inf Comput Sci. https:\/\/doi.org\/10.1021\/ci00007a012","journal-title":"J Chem Inf Comput Sci"},{"key":"814_CR35","unstructured":"James CA, Weininger D, Delany J (2008) Daylight theory manual. Chemical information systems, Aliso Viejo, CA, USA"},{"key":"814_CR36","unstructured":"Dassault Syst\u00e8mes (2020) CTfile formats. In: Dassault syst\u00e8mes. https:\/\/discover.3ds.com\/ctfile-documentation-request-form. Accessed 17 Aug 2023"},{"key":"814_CR37","doi-asserted-by":"publisher","first-page":"3781","DOI":"10.1021\/acs.jcim.0c00232","volume":"60","author":"CM Baker","year":"2020","unstructured":"Baker CM, Kidley NJ, Papachristos K et al (2020) Tautomer standardization in chemical databases: deriving business rules from quantum chemistry. J Chem Inf Model 60:3781\u20133791. https:\/\/doi.org\/10.1021\/acs.jcim.0c00232","journal-title":"J Chem Inf Model"},{"key":"814_CR38","doi-asserted-by":"publisher","first-page":"628","DOI":"10.1007\/BF01341936","volume":"76","author":"E H\u00fcckel","year":"1932","unstructured":"H\u00fcckel E (1932) Quantentheoretische beitr\u00e4ge zum benzolproblem. III. Quantentheoretische beitr\u00e4ge zumproblemder aromatischenundunges\u00e4ttingten verbindungen. Z Phys Ger 76:628\u2013648","journal-title":"Z Phys Ger"},{"key":"814_CR39","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1002\/jlac.18661370202","volume":"137","author":"A Kekul\u00e9","year":"1866","unstructured":"Kekul\u00e9 A (1866) Untersuchungen \u00fcber aromatische Verbindungen. Liebigs Ann Chem 137:129\u2013136","journal-title":"Liebigs Ann Chem"},{"key":"814_CR40","doi-asserted-by":"publisher","first-page":"1253","DOI":"10.1021\/acs.jcim.9b01080","volume":"60","author":"DK Dhaked","year":"2020","unstructured":"Dhaked DK, Ihlenfeldt W-D, Patel H et al (2020) Toward a comprehensive treatment of tautomerism in chemoinformatics including in InChI V2. J Chem Inf Model 60:1253\u20131275. https:\/\/doi.org\/10.1021\/acs.jcim.9b01080","journal-title":"J Chem Inf Model"},{"key":"814_CR41","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1007\/s10822-010-9346-4","volume":"24","author":"M Sitzmann","year":"2010","unstructured":"Sitzmann M, Ihlenfeldt W-D, Nicklaus MC (2010) Tautomerism in large databases. J Comput Aided Mol Des 24:521\u2013551. https:\/\/doi.org\/10.1007\/s10822-010-9346-4","journal-title":"J Comput Aided Mol Des"},{"key":"814_CR42","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/j.jbiotec.2017.07.028","volume":"261","author":"A Fillbrunn","year":"2017","unstructured":"Fillbrunn A, Dietz C, Pfeuffer J et al (2017) KNIME for reproducible cross-domain analysis of life science data. J Biotechnol 261:149\u2013156. https:\/\/doi.org\/10.1016\/j.jbiotec.2017.07.028","journal-title":"J Biotechnol"},{"key":"814_CR43","unstructured":"KNIME Server User Guide. https:\/\/docs.knime.com\/latest\/server_user_guide\/index.html#introduction. Accessed 16 May 2023"},{"key":"814_CR44","unstructured":"The KNIME Server REST API. In: KNIME. https:\/\/www.knime.com\/blog\/the-knime-server-rest-api. Accessed 16 May 2023"},{"key":"814_CR45","unstructured":"ChemAxon (2014) ChemAxon Standardizer\u2013Cheminformatics platforms and desktop applications. http:\/\/www.chemaxon.com\/products\/standardizer\/. Accessed 25 Nov 2014"},{"key":"814_CR46","unstructured":"Reusch W (2013) Examples of chemical reactions. http:\/\/www2.chemistry.msu.edu\/faculty\/reusch\/virttxtjml\/react2.htm. Accessed 25 Nov 2014"},{"key":"814_CR47","doi-asserted-by":"publisher","first-page":"876","DOI":"10.1021\/ja01146a537","volume":"73","author":"W von E. Doering","year":"1951","unstructured":"von E. Doering W, Detert FL (1951) Cycloheptatrienylium oxide. J Am Chem Soc 73:876\u2013877. https:\/\/doi.org\/10.1021\/ja01146a537","journal-title":"J Am Chem Soc"},{"key":"814_CR48","unstructured":"US EPA OCSPP (2023) EPA rebuilds endocrine disruptor screening program by soliciting public comment on new approach methodologies to screen for endocrine effects. https:\/\/www.epa.gov\/pesticides\/epa-rebuilds-endocrine-disruptor-screening-program-soliciting-public-comment-new. Accessed 3 May 2023"},{"key":"814_CR49","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1093\/toxsci\/kfl103","volume":"95","author":"DJ Dix","year":"2007","unstructured":"Dix DJ, Houck KA, Martin MT et al (2007) The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol Sci 95:5\u201312. https:\/\/doi.org\/10.1093\/toxsci\/kfl103","journal-title":"Toxicol Sci"},{"key":"814_CR50","doi-asserted-by":"publisher","DOI":"10.1038\/srep05664","author":"R Huang","year":"2014","unstructured":"Huang R, Sakamuru S, Martin MT et al (2014) Profiling of the Tox21 10K compound library for agonists and antagonists of the estrogen receptor alpha signaling pathway. Sci Rep. https:\/\/doi.org\/10.1038\/srep05664","journal-title":"Sci Rep"},{"key":"814_CR51","doi-asserted-by":"publisher","first-page":"485","DOI":"10.1289\/ehp.0901392","volume":"118","author":"RS Judson","year":"2010","unstructured":"Judson RS, Houck KA, Kavlock RJ et al (2010) In vitro screening of environmental chemicals for targeted testing prioritization: the ToxCast project. Environ Health Perspect 118:485\u2013492. https:\/\/doi.org\/10.1289\/ehp.0901392","journal-title":"Environ Health Perspect"},{"key":"814_CR52","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1093\/toxsci\/kfv168","volume":"148","author":"RS Judson","year":"2015","unstructured":"Judson RS, Magpantay FM, Chickarmane V et al (2015) Integrated model of chemical perturbations of a biological pathway using 18 in vitro high-throughput screening assays for the estrogen receptor. Toxicol Sci 148:137\u2013154. https:\/\/doi.org\/10.1093\/toxsci\/kfv168","journal-title":"Toxicol Sci"},{"key":"814_CR53","doi-asserted-by":"publisher","first-page":"946","DOI":"10.1021\/acs.chemrestox.6b00347","volume":"30","author":"NC Kleinstreuer","year":"2017","unstructured":"Kleinstreuer NC, Ceger P, Watt ED et al (2017) Development and validation of a computational model for androgen receptor activity. Chem Res Toxicol 30:946\u2013964. https:\/\/doi.org\/10.1021\/acs.chemrestox.6b00347","journal-title":"Chem Res Toxicol"},{"key":"814_CR54","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.comtox.2017.10.003","volume":"6","author":"JA Leonard","year":"2018","unstructured":"Leonard JA, Stevens C, Mansouri K et al (2018) A workflow for identifying metabolically active chemicals to complement in vitro toxicity screening. Comput Toxicol 6:71\u201383. https:\/\/doi.org\/10.1016\/j.comtox.2017.10.003","journal-title":"Comput Toxicol"},{"key":"814_CR55","doi-asserted-by":"publisher","first-page":"1410","DOI":"10.1021\/acs.chemrestox.6b00079","volume":"29","author":"CL Pinto","year":"2016","unstructured":"Pinto CL, Mansouri K, Judson R, Browne P (2016) Prediction of estrogenic bioactivity of environmental chemical metabolites. Chem Res Toxicol 29:1410\u20131427. https:\/\/doi.org\/10.1021\/acs.chemrestox.6b00079","journal-title":"Chem Res Toxicol"},{"key":"814_CR56","unstructured":"US EPA (2023) Availability of new approach methodologies (NAMs) in the endocrine disruptor screening program (EDSP). https:\/\/www.regulations.gov\/document\/EPA-HQ-OPP-2021-0756-0002. Accessed 31 July 2023"},{"key":"814_CR57","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1016\/j.yrtph.2018.01.022","volume":"94","author":"J Strickland","year":"2018","unstructured":"Strickland J, Clippinger AJ, Brown J et al (2018) Status of acute systemic toxicity testing requirements and data uses by U.S. regulatory agencies. Regul Toxicol Pharmacol 94:183\u2013196. https:\/\/doi.org\/10.1016\/j.yrtph.2018.01.022","journal-title":"Regul Toxicol Pharmacol"},{"key":"814_CR58","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/j.comtox.2018.08.002","volume":"8","author":"NC Kleinstreuer","year":"2018","unstructured":"Kleinstreuer NC, Karmaus AL, Mansouri K et al (2018) Predictive models for acute oral systemic toxicity: a workshop to bridge the gap from research to regulation. Comput Toxicol 8:21\u201324. https:\/\/doi.org\/10.1016\/j.comtox.2018.08.002","journal-title":"Comput Toxicol"},{"key":"814_CR59","doi-asserted-by":"publisher","DOI":"10.1093\/toxsci\/kfac042","author":"AL Karmaus","year":"2022","unstructured":"Karmaus AL, Mansouri K, To KT et al (2022) Evaluation of variability across rat acute oral systemic toxicity studies. Toxicol Sci Off J Soc Toxicol. https:\/\/doi.org\/10.1093\/toxsci\/kfac042","journal-title":"Toxicol Sci Off J Soc Toxicol"},{"key":"814_CR60","unstructured":"OECD (2007) Guidance document on the validation of (quantitative) structure\u2013activity relationship [(Q)SAR] models. Guid doc valid quant struct-act relatsh QSAR models"},{"key":"814_CR61","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/S0027-5107(01)00289-5","volume":"499","author":"AM Richard","year":"2002","unstructured":"Richard AM, Williams CR (2002) Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat Res 499:27\u201352","journal-title":"Mutat Res"},{"key":"814_CR62","unstructured":"PrecisionFDA\u2014overview. https:\/\/precision.fda.gov\/. Accessed 16 May 2023"},{"key":"814_CR63","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1038\/s41370-017-0012-y","volume":"28","author":"JR Sobus","year":"2018","unstructured":"Sobus JR, Wambaugh JF, Isaacs KK et al (2018) Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. J Expo Sci Environ Epidemiol 28:411\u2013426. https:\/\/doi.org\/10.1038\/s41370-017-0012-y","journal-title":"J Expo Sci Environ Epidemiol"},{"key":"814_CR64","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/s13321-016-0115-9","volume":"8","author":"C Ruttkies","year":"2016","unstructured":"Ruttkies C, Schymanski EL, Wolf S et al (2016) MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminform 8:3. https:\/\/doi.org\/10.1186\/s13321-016-0115-9","journal-title":"J Cheminform"},{"key":"814_CR65","unstructured":"Business Intelligence and Analytics Software | Tableau. https:\/\/www.tableau.com\/. Accessed 2 Feb 2024"},{"key":"814_CR66","unstructured":"Qlik Data Integration, Data Quality, and Analytics Solutions. In: Qlik. https:\/\/www.qlik.com\/us. Accessed 2 Feb 2024"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00814-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00814-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00814-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,23]],"date-time":"2024-02-23T21:26:48Z","timestamp":1708723608000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00814-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,20]]},"references-count":66,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["814"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00814-3","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,20]]},"assertion":[{"value":"29 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 February 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"All authors gave consent for publication.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"19"}}