{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:49:51Z","timestamp":1753876191207,"version":"3.41.2"},"reference-count":28,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2020,12,11]],"date-time":"2020-12-11T00:00:00Z","timestamp":1607644800000},"content-version":"vor","delay-in-days":345,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005739","name":"Universidad Nacional Aut\u00f3noma de M\u00e9xico","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100005739","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["1RO1GM131643-01A1"],"award-info":[{"award-number":["1RO1GM131643-01A1"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003141","name":"Consejo Nacional de Ciencia y Tecnolog\u00eda","doi-asserted-by":"publisher","award":["386128"],"award-info":[{"award-number":["386128"]}],"id":[{"id":"10.13039\/501100003141","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,12,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Transcription factors (TFs) play a main role in transcriptional regulation of bacteria, as they regulate transcription of the genetic information encoded in DNA. Thus, the curation of the properties of these regulatory proteins is essential for a better understanding of transcriptional regulation. However, traditional manual curation of article collections to compile descriptions of TF properties takes significant time and effort due to the overwhelming amount of biomedical literature, which increases every day. The development of automatic approaches for knowledge extraction to assist curation is therefore critical. Here, we show an effective approach for knowledge extraction to assist curation of summaries describing bacterial TF properties based on an automatic text summarization strategy. We were able to recover automatically a median 77% of the knowledge contained in manual summaries describing properties of 177 TFs of Escherichia coli K-12 by processing 5961 scientific articles. For 71% of the TFs, our approach extracted new knowledge that can be used to expand manual descriptions. Furthermore, as we trained our predictive model with manual summaries of E. coli, we also generated summaries for 185 TFs of Salmonella enterica serovar Typhimurium from 3498 articles. According to the manual curation of 10 of these Salmonella typhimurium summaries, 96% of their sentences contained relevant knowledge. Our results demonstrate the feasibility to assist manual curation to expand manual summaries with new knowledge automatically extracted and to create new summaries of bacteria for which these curation efforts do not exist.<\/jats:p>\n               <jats:p>Database URL: The automatic summaries of the TFs of E. coli and Salmonella and the automatic summarizer are available in GitHub (https:\/\/github.com\/laigen-unam\/tf-properties-summarizer.git).<\/jats:p>","DOI":"10.1093\/database\/baaa109","type":"journal-article","created":{"date-parts":[[2020,11,27]],"date-time":"2020-11-27T04:14:14Z","timestamp":1606450454000},"source":"Crossref","is-referenced-by-count":2,"title":["Knowledge extraction for assisted curation of summaries of bacterial transcription factor properties"],"prefix":"10.1093","volume":"2020","author":[{"given":"Carlos-Francisco","family":"M\u00e9ndez-Cruz","sequence":"first","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"}]},{"given":"Antonio","family":"Blanchet","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"}]},{"given":"Alan","family":"God\u00ednez","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"}]},{"given":"Ignacio","family":"Arroyo-Fern\u00e1ndez","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"},{"name":"Divisi\u00f3n de Posgrado, Universidad Tecnol\u00f3gica de la Mixteca, Carretera a Acatlima Km. 2.5, Huajuapan de Le\u00f3n, 69000, Oaxaca, Mexico"}]},{"given":"Socorro","family":"Gama-Castro","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"}]},{"given":"Sara Berenice","family":"Mart\u00ednez-Luna","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"}]},{"given":"Cristian","family":"Gonz\u00e1lez-Col\u00edn","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8780-7664","authenticated-orcid":false,"given":"Julio","family":"Collado-Vides","sequence":"additional","affiliation":[{"name":"Centro de Ciencias Gen\u00f3micas, Universidad Nacional Aut\u00f3noma de M\u00e9xico, Av. Universidad s\/n, Colonia Chamilpa, Cuernavaca 62100, Morelos, Mexico"},{"name":"Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Room 403, Boston, 02215 MA, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,12,11]]},"reference":[{"key":"2020121110533135900_R1","doi-asserted-by":"publisher","first-page":"D133","DOI":"10.1093\/nar\/g-kv1156","article-title":"RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond","volume":"44","author":"Gama-Castro","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"2020121110533135900_R2","doi-asserted-by":"publisher","first-page":"D212","DOI":"10.1093\/nar\/gky1077","article-title":"RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli k-12","volume":"47","author":"Santos-Zavaleta","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"2020121110533135900_R3","doi-asserted-by":"publisher","first-page":"D543","DOI":"10.1093\/nar\/gkw1003","article-title":"The EcoCyc database: reflecting new knowledge about Escherichia coli K-12","volume":"45","author":"Keseler","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"2020121110533135900_R4","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2012-13-3-r24","article-title":"The transcription factor encyclopedia","volume":"13","author":"Yusuf","year":"2012","journal-title":"Genome Biol."},{"key":"2020121110533135900_R5","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1093\/bfgp\/elu015","article-title":"Event-based text mining for biology and functional genomics","volume":"14","author":"Ananiadou","year":"2014","journal-title":"Briefings Funct. Genomics"},{"key":"2020121110533135900_R6","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/j.artmed.2004.07.017","article-title":"Summarization from medical documents: a survey","volume":"33","author":"Afantenos","year":"2005","journal-title":"Artif. Intell. Med."},{"key":"2020121110533135900_R7","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1016\/j.jbi.2014.06.009","article-title":"Text summarization in the biomedical domain: a systematic review of recent research","volume":"52","author":"Mishra","year":"2014","journal-title":"J. Biomed. Inf."},{"key":"2020121110533135900_R8","first-page":"1","volume-title":"Automatic Text Summarization","author":"Sparck Jones","year":"1999"},{"key":"2020121110533135900_R9","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1016\/j.jbi.2011.01.004","article-title":"AskHERMES: an online question answering system for complex clinical questions","volume":"44","author":"Cao","year":"2011","journal-title":"J. Biomed. Inf."},{"key":"2020121110533135900_R10","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-12-S2-S5","article-title":"Automatic classification of sentences to support evidence based medicine","volume":"12","author":"Kim","year":"2011","journal-title":"BMC Bioinf."},{"key":"2020121110533135900_R11","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1109\/BIBM.2011.72","article-title":"Automatic summarization of results from clinical trials","author":"Summerscales","year":"2011"},{"key":"2020121110533135900_R12","first-page":"31","article-title":"Using machine learning for medical document summarization","volume":"4","author":"Sarkar","year":"2011","journal-title":"Int J Database Theory Appl."},{"key":"2020121110533135900_R13","doi-asserted-by":"publisher","DOI":"10.1093\/database\/bax070","article-title":"First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes","volume":"2017","author":"M\u00e9ndez-Cruz","year":"2017","journal-title":"Database"},{"key":"2020121110533135900_R14","doi-asserted-by":"publisher","DOI":"10.1186\/2041-1480-3-3","article-title":"Biolemmatizer: a lemmatization tool for morphological processing of biomedical text","volume":"3","author":"Liu","year":"2012","journal-title":"J Biomed Semantics"},{"key":"2020121110533135900_R15","doi-asserted-by":"publisher","first-page":"55","DOI":"10.3115\/v1\/P14-5010","article-title":"The Stanford CoreNLP Natural Language Processing Toolkit","author":"Manning","year":"2014"},{"key":"2020121110533135900_R16","first-page":"179","article-title":"Addressing the curse of imbalanced training sets: one-sided selection","author":"Kubat","year":"1997"},{"key":"2020121110533135900_R17","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"Manning","year":"2008"},{"key":"2020121110533135900_R18","first-page":"559","article-title":"Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning","volume":"18","author":"Lema\u00eetre","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"2020121110533135900_R19","first-page":"769","article-title":"Two modifications of CNN","volume":"6","author":"Tomek","year":"1976","journal-title":"IEEE Trans Syst Man Cybern."},{"key":"2020121110533135900_R20","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s10994-013-5422-z","article-title":"An instance level analysis of data complexity","volume":"95","author":"Smith","year":"2013","journal-title":"Mach Learn"},{"key":"2020121110533135900_R21","article-title":"Evaluation measures for models assessment over imbalanced datasets","volume":"3","author":"Bekkar","year":"2013","journal-title":"J. Inf. Eng. Appl."},{"key":"2020121110533135900_R22","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1007\/3-540-62858-4_79","volume-title":"Machine Learning: ECML-97. ECML 1997. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence)","author":"Kubat","year":"1997"},{"key":"2020121110533135900_R23","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-015-0784-9","article-title":"Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models","volume":"16, 363","author":"Blagus","year":"2015","journal-title":"BMC Bioinf."},{"key":"2020121110533135900_R24","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/BF02288367","article-title":"The approximation of one matrix by another of lower rank","volume":"1","author":"Eckart","year":"1936","journal-title":"Psychometrika"},{"key":"2020121110533135900_R25","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach Learn."},{"key":"2020121110533135900_R26","doi-asserted-by":"publisher","first-page":"1565","DOI":"10.1038\/nbt1206-1565","article-title":"What is a support vector machine?","volume":"24","author":"Noble","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2020121110533135900_R27","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1093\/comjnl\/26.4.354","article-title":"A survey of recent advances in hierarchical clustering algorithms","volume":"26","author":"Murtagh","year":"1983","journal-title":"Comput. J."},{"key":"2020121110533135900_R28","first-page":"74","article-title":"ROUGE: a package for automatic evaluation of summaries","author":"Lin","year":"2004"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa109\/34862531\/baaa109.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baaa109\/34862531\/baaa109.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,12]],"date-time":"2020-12-12T07:14:38Z","timestamp":1607757278000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baaa109\/6029376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,1]]},"references-count":28,"URL":"https:\/\/doi.org\/10.1093\/database\/baaa109","relation":{},"ISSN":["1758-0463"],"issn-type":[{"type":"electronic","value":"1758-0463"}],"subject":[],"published-other":{"date-parts":[[2020,1,1]]},"published":{"date-parts":[[2020,1,1]]},"article-number":"baaa109"}}