{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T18:36:18Z","timestamp":1773340578208,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2024,9,12]],"date-time":"2024-09-12T00:00:00Z","timestamp":1726099200000},"content-version":"vor","delay-in-days":255,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"H2020 Marie Sklodowska-Curie Actions","award":["101023676"],"award-info":[{"award-number":["101023676"]}]},{"DOI":"10.13039\/501100002341","name":"Research Council of Finland","doi-asserted-by":"publisher","award":["332844"],"award-info":[{"award-number":["332844"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF14CC0001"],"award-info":[{"award-number":["NNF14CC0001"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"name":"H2020 Marie Sklodowska-Curie Actions","award":["101023676"],"award-info":[{"award-number":["101023676"]}]},{"DOI":"10.13039\/501100002341","name":"Research Council of Finland","doi-asserted-by":"publisher","award":["332844"],"award-info":[{"award-number":["332844"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF14CC0001"],"award-info":[{"award-number":["NNF14CC0001"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In the field of biomedical text mining, the ability to extract relations from the literature is crucial for advancing both theoretical research and practical applications. There is a notable shortage of corpora designed to enhance the extraction of multiple types of relations, particularly focusing on proteins and protein-containing entities such as complexes and families, as well as chemicals. In this work, we present RegulaTome, a corpus that overcomes the limitations of several existing biomedical relation extraction (RE) corpora, many of which concentrate on single-type relations at the sentence level. RegulaTome stands out by offering 16\u2009961 relations annotated in &amp;gt;2500 documents, making it the most extensive dataset of its kind to date. This corpus is specifically designed to cover a broader spectrum of &amp;gt;40 relation types beyond those traditionally explored, setting a new benchmark in the complexity and depth of biomedical RE tasks. Our corpus both broadens the scope of detected relations and allows for achieving noteworthy accuracy in RE. A transformer-based model trained on this corpus has demonstrated a promising F1-score (66.6%) for a task of this complexity, underscoring the effectiveness of our approach in accurately identifying and categorizing a wide array of biological relations. This achievement highlights RegulaTome\u2019s potential to significantly contribute to the development of more sophisticated, efficient, and accurate RE systems to tackle biomedical tasks. Finally, a run of the trained RE system on all PubMed abstracts and PMC Open Access full-text documents resulted in &amp;gt;18 million relations, extracted from the entire biomedical literature.<\/jats:p>","DOI":"10.1093\/database\/baae095","type":"journal-article","created":{"date-parts":[[2024,8,16]],"date-time":"2024-08-16T10:14:59Z","timestamp":1723803299000},"source":"Crossref","is-referenced-by-count":5,"title":["RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature"],"prefix":"10.1093","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3611-5726","authenticated-orcid":false,"given":"Katerina","family":"Nastou","sequence":"first","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen , Blegdamsvej 3, Copenhagen 2200, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5555-2828","authenticated-orcid":false,"given":"Farrokh","family":"Mehryary","sequence":"additional","affiliation":[{"name":"TurkuNLP Group, Department of Computing, University of Turku , Vesilinnantie 5, Turku 20014, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomoko","family":"Ohta","sequence":"additional","affiliation":[{"name":"Textimi , 1-37-13 Kitazawa, Tokyo, Setagaya-ku 155-0031, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jouni","family":"Luoma","sequence":"additional","affiliation":[{"name":"TurkuNLP Group, Department of Computing, University of Turku , Vesilinnantie 5, Turku 20014, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sampo","family":"Pyysalo","sequence":"additional","affiliation":[{"name":"TurkuNLP Group, Department of Computing, University of Turku , Vesilinnantie 5, Turku 20014, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lars Juhl","family":"Jensen","sequence":"additional","affiliation":[{"name":"Novo Nordisk Foundation Center for Protein Research, University of Copenhagen , Blegdamsvej 3, Copenhagen 2200, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,9,12]]},"reference":[{"key":"2024091606260541300_R1","doi-asserted-by":"crossref","DOI":"10.1016\/j.websem.2022.100756","article-title":"Comparison of biomedical relationship extraction methods and models for knowledge graph creation","volume":"75","author":"Milosevic","year":"2023","journal-title":"J Web Semant"},{"key":"2024091606260541300_R2","doi-asserted-by":"crossref","first-page":"D638","DOI":"10.1093\/nar\/gkac1000","article-title":"The string database in 2023: protein\u2013protein association networks and functional enrichment analyses for any sequenced genome of interest","volume":"51","author":"Szklarczyk","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024091606260541300_R3","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw043","article-title":"Bronco: biomedical entity relation oncology corpus for extracting gene-variant-disease-drug relations","volume":"2016","author":"Lee","year":"2016","journal-title":"Database"},{"key":"2024091606260541300_R4","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2019","journal-title":"Bioinformatics"},{"key":"2024091606260541300_R5","first-page":"146","article-title":"Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art","author":"Lewis","year":"2020"},{"key":"2024091606260541300_R6","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.artmed.2004.07.016","article-title":"Comparative experiments on learning information extractors for proteins and their interactions","volume":"33","author":"Bunescu","year":"2005","journal-title":"Artif Intell Med"},{"key":"2024091606260541300_R7","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","article-title":"The DDI corpus: an annotated corpus with pharmacological substances and drug\u2013drug interactions","volume":"46","author":"Herrero-Zazo","year":"2013","journal-title":"J Biomed Informat"},{"key":"2024091606260541300_R8","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baad080","article-title":"Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical\u2013protein relations","volume":"2023","author":"Miranda-Escalada","year":"2023","journal-title":"Database"},{"key":"2024091606260541300_R9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-8-50","article-title":"Bioinfer: a corpus for information extraction in the biomedical domain","volume":"8","author":"Pyysalo","year":"2007","journal-title":"BMC Bioinf"},{"key":"2024091606260541300_R10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-s2-s1","article-title":"Overview of the protein-protein interaction annotation extraction task of BioCreative II","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol"},{"key":"2024091606260541300_R11","first-page":"1","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database"},{"key":"2024091606260541300_R12","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1093\/bioinformatics\/btq667","article-title":"Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature","volume":"27","author":"Doughty","year":"2011","journal-title":"Bioinformatics"},{"key":"2024091606260541300_R13","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac282","article-title":"BioRED: a rich biomedical relation extraction dataset","volume":"23","author":"Luo","year":"2022","journal-title":"Brief Bioinf"},{"key":"2024091606260541300_R14","doi-asserted-by":"crossref","DOI":"10.1093\/nargab\/lqab062","article-title":"Renet2: high-performance full-text gene\u2013disease relation extraction with iterative training data expansion","volume":"3","author":"Su","year":"2021","journal-title":"NAR Genomics Bioinform"},{"key":"2024091606260541300_R15","first-page":"1","article-title":"Overview of BioNLP\u201909 shared task on event extraction","author":"Kim","year":"2009"},{"key":"2024091606260541300_R16","first-page":"19","article-title":"Event extraction for post-translational modifications","author":"Ohta","year":"2010"},{"key":"2024091606260541300_R17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-13-S11-S2","article-title":"Overview of the ID, EPI And REL tasks of BioNLP shared task 2011","volume":"13","author":"Pyysalo","year":"2012","journal-title":"BMC Bioinf"},{"key":"2024091606260541300_R18","article-title":"The gene ontology knowledgebase in 2023","volume":"224","author":"Aleksander","year":"2023","journal-title":"Genetics"},{"key":"2024091606260541300_R19","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"key":"2024091606260541300_R20","doi-asserted-by":"publisher","DOI":"10.1101\/2023.12.10.570999","article-title":"String-ing together protein complexes: extracting physical protein interactions from the literature","author":"Mehryary","year":"2023","journal-title":"BioRxiv"},{"key":"2024091606260541300_R21","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1002\/pro.3978","article-title":"The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions","volume":"30","author":"Oughtred","year":"2021","journal-title":"Protein Sci"},{"key":"2024091606260541300_R22","doi-asserted-by":"crossref","first-page":"D358","DOI":"10.1093\/nar\/gkt1115","article-title":"The MIntACT project\u2014intact as a common curation platform for 11 molecular interaction databases","volume":"42","author":"Orchard","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2024091606260541300_R23","doi-asserted-by":"crossref","first-page":"D857","DOI":"10.1093\/nar\/gkr930","article-title":"MINT, the molecular interaction database: 2012 update","volume":"40","author":"Licata","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2024091606260541300_R24","doi-asserted-by":"crossref","first-page":"D687","DOI":"10.1093\/nar\/gkab1028","article-title":"The reactome pathway knowledgebase 2022","volume":"50","author":"Gillespie","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024091606260541300_R25","doi-asserted-by":"crossref","first-page":"D418","DOI":"10.1093\/nar\/gkac993","article-title":"InterPro in 2022","volume":"51","author":"Paysan-Lafosse","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024091606260541300_R26","first-page":"102","article-title":"brat: a web-based tool for NLP-assisted text annotation","author":"Stenetorp","year":"2012"},{"key":"2024091606260541300_R27","first-page":"73","article-title":"Deep learning with minimal training data: TurkuNLP entry in the BioNLP shared task 2016","author":"Mehryary","year":"2016"},{"key":"2024091606260541300_R28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-s2-s2","article-title":"Overview of BioCreative II gene mention recognition","volume":"9","author":"Smith","year":"2008","journal-title":"Genome Biol"},{"key":"2024091606260541300_R29","doi-asserted-by":"crossref","first-page":"3533","DOI":"10.1093\/bioinformatics\/btz070","article-title":"PMC text mining subset in BioC: about three million full-text articles and growing","volume":"35","author":"Comeau","year":"2019","journal-title":"Bioinformatics"},{"key":"2024091606260541300_R30","article-title":"One tagger, many uses: illustrating the power of ontologies in dictionary-based named entity recognition","author":"Jensen","year":"2016","journal-title":"bioRxiv"},{"key":"2024091606260541300_R31","doi-asserted-by":"crossref","first-page":"D933","DOI":"10.1093\/nar\/gkac958","article-title":"Ensembl 2023","volume":"51","author":"Martin","year":"2022","journal-title":"Nucleic Acids Res"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baae095\/59133755\/baae095.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baae095\/59133755\/baae095.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T02:26:43Z","timestamp":1726453603000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baae095\/7756349"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":31,"URL":"https:\/\/doi.org\/10.1093\/database\/baae095","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.04.30.591824","asserted-by":"object"}]},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]},"article-number":"baae095"}}