{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T11:50:54Z","timestamp":1772538654041,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Foundation","doi-asserted-by":"publisher","award":["NNF14CC0001"],"award-info":[{"award-number":["NNF14CC0001"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Foundation","doi-asserted-by":"publisher","award":["NFF17OC0027594"],"award-info":[{"award-number":["NFF17OC0027594"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Marie Sklodowska-Curie","award":["101023676"],"award-info":[{"award-number":["101023676"]}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Foundation","doi-asserted-by":"publisher","award":["NNF20SA0035590"],"award-info":[{"award-number":["NNF20SA0035590"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Despite lifestyle factors (LSFs) being increasingly acknowledged in shaping individual health trajectories, particularly in chronic diseases, they have still not been systematically described in the biomedical literature. This is in part because no named entity recognition (NER) system exists, which can comprehensively detect all types of LSFs in text. The task is challenging due to their inherent diversity, lack of a comprehensive LSF classification for dictionary-based NER, and lack of a corpus for deep learning-based NER.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present a novel lifestyle factor ontology (LSFO), which we used to develop a dictionary-based system for recognition and normalization of LSFs. Additionally, we introduce a manually annotated corpus for LSFs (LSF200) suitable for training and evaluation of NER systems, and use it to train a transformer-based system. Evaluating the performance of both NER systems on the corpus revealed an F-score of 64% for the dictionary-based system and 76% for the transformer-based system. Large-scale application of these systems on PubMed abstracts and PMC Open Access articles identified over 300 million mentions of LSF in the biomedical literature.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>LSFO, the annotated LSF200 corpus, and the detected LSFs in PubMed and PMC-OA articles using both NER systems, are available under open licenses via the following GitHub repository: https:\/\/github.com\/EsmaeilNourani\/LSFO-expansion. This repository contains links to two associated GitHub repositories and a Zenodo project related to the study. LSFO is also available at BioPortal: https:\/\/bioportal.bioontology.org\/ontologies\/LSFO.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae613","type":"journal-article","created":{"date-parts":[[2024,10,15]],"date-time":"2024-10-15T15:31:25Z","timestamp":1729006285000},"source":"Crossref","is-referenced-by-count":2,"title":["Lifestyle factors in the biomedical literature: an ontology and comprehensive resources for named entity recognition"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1933-2550","authenticated-orcid":false,"given":"Esmaeil","family":"Nourani","sequence":"first","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"},{"name":"Faculty of Information Technology and Computer Engineering, Azarbaijan Shahid Madani University , Tabriz,","place":["Iran"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8953-3561","authenticated-orcid":false,"given":"Mikaela","family":"Koutrouli","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"}]},{"given":"Yijia","family":"Xie","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"}]},{"given":"Danai","family":"Vagiaki","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"}]},{"given":"Sampo","family":"Pyysalo","sequence":"additional","affiliation":[{"name":"TurkuNLP Group, Department of Computing, Faculty of Technology, University of Turku , Turku 20014,","place":["Finland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3611-5726","authenticated-orcid":false,"given":"Katerina","family":"Nastou","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0316-5866","authenticated-orcid":false,"given":"S\u00f8ren","family":"Brunak","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7885-715X","authenticated-orcid":false,"given":"Lars Juhl","family":"Jensen","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark"}]}],"member":"286","published-online":{"date-parts":[[2024,10,16]]},"reference":[{"key":"2024110805205325900_btae613-B1","doi-asserted-by":"crossref","first-page":"D1305","DOI":"10.1093\/nar\/gkad1051","article-title":"The DO-KB knowledgebase: a 20-year journey developing the disease open science ecosystem","volume":"52","author":"Baron","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2024110805205325900_btae613-B2","author":"Brown"},{"key":"2024110805205325900_btae613-B3","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1186\/2041-1480-4-43","article-title":"The environment ontology: Contextualising biological and biomedical entities","volume":"4","author":"Buttigieg","year":"2013","journal-title":"J Biomed Semant"},{"key":"2024110805205325900_btae613-B4","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1007\/978-1-4939-9089-4_5","volume-title":"Bioinformatics and Drug Discovery, Methods in Molecular Biology","author":"Cook","year":"2019"},{"key":"2024110805205325900_btae613-B5","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1038\/s41538-018-0032-6","article-title":"FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration","volume":"2","author":"Dooley","year":"2018","journal-title":"NPJ Sci Food"},{"key":"2024110805205325900_btae613-B6","doi-asserted-by":"crossref","first-page":"9537","DOI":"10.1038\/s41598-023-31531-w","article-title":"Lifestyle factors and clinical severity of Parkinson\u2019s disease","volume":"13","author":"Gabbert","year":"2023","journal-title":"Sci Rep"},{"key":"2024110805205325900_btae613-B7","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1177\/1559827619834527","article-title":"Precision medicine in lifestyle medicine: the way of the future?","volume":"14","author":"Gray","year":"2020","journal-title":"Am J Lifestyle Med"},{"key":"2024110805205325900_btae613-B8","author":"Grootendorst"},{"key":"2024110805205325900_btae613-B9","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/2041-1480-2-S5-S1","article-title":"Ontology design patterns to disambiguate relations between genes and gene products in GENIA","volume":"2","author":"Hoehndorf","year":"2011","journal-title":"J Biomed Sem"},{"key":"2024110805205325900_btae613-B10","doi-asserted-by":"crossref","first-page":"2219","DOI":"10.1093\/bib\/bbaa054","article-title":"Biomedical named entity recognition and linking datasets: Survey and our recent development","volume":"21","author":"Huang","year":"2020","journal-title":"Brief Bioinform"},{"key":"2024110805205325900_btae613-B11","author":"Jensen"},{"key":"2024110805205325900_btae613-B12","doi-asserted-by":"crossref","first-page":"2152","DOI":"10.1053\/j.gastro.2018.02.021","article-title":"Determining risk of colorectal cancer and starting age of screening based on lifestyle, environmental, and genetic factors","volume":"154","author":"Jeon","year":"2018","journal-title":"Gastroenterology"},{"key":"2024110805205325900_btae613-B13","first-page":"1","author":"Kim","year":"2009"},{"key":"2024110805205325900_btae613-B14","doi-asserted-by":"crossref","first-page":"167","DOI":"10.3233\/SW-140134","article-title":"DBpedia\u2014a large-scale, multilingual knowledge base extracted from Wikipedia","volume":"6","author":"Lehmann","year":"2015","journal-title":"Semantic Web"},{"key":"2024110805205325900_btae613-B15","author":"Lewis P, Ott M, Du J"},{"key":"2024110805205325900_btae613-B16","doi-asserted-by":"crossref","first-page":"btad369","DOI":"10.1093\/bioinformatics\/btad369","article-title":"S1000: a better taxonomic name corpus for biomedical information extraction","volume":"39","author":"Luoma","year":"2023","journal-title":"Bioinformatics"},{"key":"2024110805205325900_btae613-B17"},{"key":"2024110805205325900_btae613-B18","volume":"2023","year":"2023","journal-title":"Database"},{"key":"2024110805205325900_btae613-B19"},{"key":"2024110805205325900_btae613-B20"},{"key":"2024110805205325900_btae613-B21","first-page":"D908","article-title":"Exposome-Explorer 2.0: an update incorporating candidate dietary biomarkers and dietary associations with cancer risk","volume":"48","author":"Neveu","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024110805205325900_btae613-B22","doi-asserted-by":"crossref","first-page":"760","DOI":"10.1001\/jamainternmed.2020.0618","article-title":"Association of healthy lifestyle with years lived without major chronic diseases","volume":"180","author":"Nyberg","year":"2020","journal-title":"JAMA Intern Med"},{"key":"2024110805205325900_btae613-B23","doi-asserted-by":"crossref","first-page":"e65390","DOI":"10.1371\/journal.pone.0065390","article-title":"The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text","volume":"8","author":"Pafilis","year":"2013","journal-title":"PLoS One"},{"key":"2024110805205325900_btae613-B24","doi-asserted-by":"crossref","first-page":"673","DOI":"10.3389\/fcell.2020.00673","article-title":"Named entity recognition and relation detection for biomedical information extraction","volume":"8","author":"Perera","year":"2020","journal-title":"Front Cell Dev Biol"},{"key":"2024110805205325900_btae613-B25","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-13-S11-S2","article-title":"Overview of the ID, EPI and REL tasks of BioNLP shared task 2011","volume":"13","author":"Pyysalo","year":"2012","journal-title":"BMC Bioinform"},{"key":"2024110805205325900_btae613-B26","doi-asserted-by":"crossref","first-page":"bbab282","DOI":"10.1093\/bib\/bbab282","article-title":"Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison","volume":"22","author":"Song","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024110805205325900_btae613-B27"},{"key":"2024110805205325900_btae613-B28"},{"key":"2024110805205325900_btae613-B29","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1186\/s12967-020-02658-5","article-title":"Precision medicine in the era of artificial intelligence: Implications in chronic disease management","volume":"18","author":"Subramanian","year":"2020","journal-title":"JTransl Med"},{"key":"2024110805205325900_btae613-B30","doi-asserted-by":"crossref","first-page":"2438","DOI":"10.1038\/s41591-023-02502-5","article-title":"Second international consensus report on gaps and opportunities for the clinical translation of precision diabetes medicine","volume":"29","author":"Tobias","year":"2023","journal-title":"Nat Med"},{"key":"2024110805205325900_btae613-B31"},{"key":"2024110805205325900_btae613-B32","doi-asserted-by":"crossref","first-page":"W541","DOI":"10.1093\/nar\/gkr469","article-title":"BioPortal: enhanced functionality via new web services from the National Center for Biomedical Ontology to access and use ontologies in software applications","volume":"39","author":"Whetzel","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2024110805205325900_btae613-B33","author":"World Health Organization. Non communicable diseases. WHO 2023."},{"key":"2024110805205325900_btae613-B34","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/2041-1480-5-31","article-title":"CSEO\u2014the cigarette smoke exposure ontology","volume":"5","author":"Younesi","year":"2014","journal-title":"J Biomed Semant"},{"key":"2024110805205325900_btae613-B35","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1038\/s41576-023-00674-x","article-title":"The transition from genomics to phenomics in personalized population health","volume":"25","author":"Yurkovich","year":"2024","journal-title":"Nat Rev Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae613\/59809580\/btae613.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae613\/60530399\/btae613.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/11\/btae613\/60530399\/btae613.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T00:21:16Z","timestamp":1731025276000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae613\/7824054"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,10,16]]},"references-count":35,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae613","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.06.13.598816","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,10,16]]},"article-number":"btae613"}}