{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:40:16Z","timestamp":1760240416066,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2019,6,8]],"date-time":"2019-06-08T00:00:00Z","timestamp":1559952000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Text information extraction is an important natural language processing (NLP) task, which aims to automatically identify, extract, and represent information from text. In this context, event extraction plays a relevant role, allowing actions, agents, objects, places, and time periods to be identified and represented. The extracted information can be represented by specialized ontologies, supporting knowledge-based reasoning and inference processes. In this work, we will describe, in detail, our proposal for event extraction from Portuguese documents. The proposed approach is based on a pipeline of specialized natural language processing tools; namely, a part-of-speech tagger, a named entities recognizer, a dependency parser, semantic role labeling, and a knowledge extraction module. The architecture is language-independent, but its modules are language-dependent and can be built using adequate AI (i.e., rule-based or machine learning) methodologies. The developed system was evaluated with a corpus of Portuguese texts and the obtained results are presented and analysed. The current limitations and future work are discussed in detail.<\/jats:p>","DOI":"10.3390\/info10060205","type":"journal-article","created":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T03:16:51Z","timestamp":1560136611000},"page":"205","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Event Extraction and Representation: A Case Study for the Portuguese Language"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5086-059X","authenticated-orcid":false,"given":"Paulo","family":"Quaresma","sequence":"first","affiliation":[{"name":"Informatics Department, University of \u00c9vora, 7000-671 \u00c9vora, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0793-0003","authenticated-orcid":false,"given":"V\u00edtor Beires","family":"Nogueira","sequence":"additional","affiliation":[{"name":"Informatics Department, University of \u00c9vora, 7000-671 \u00c9vora, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6166-2038","authenticated-orcid":false,"given":"Kashyap","family":"Raiyani","sequence":"additional","affiliation":[{"name":"Informatics Department, University of \u00c9vora, 7000-671 \u00c9vora, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1290-0239","authenticated-orcid":false,"given":"Roy","family":"Bayot","sequence":"additional","affiliation":[{"name":"Informatics Department, University of \u00c9vora, 7000-671 \u00c9vora, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2019,6,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.dss.2016.02.006","article-title":"A Survey of Event Extraction Methods from Text for Decision Support Systems","volume":"85","author":"Hogenboom","year":"2016","journal-title":"Decis. Support Syst."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Guarino, N., Oberle, D., and Staab, S. (2009). What Is an Ontology?. Handbook on Ontologies, Springer.","DOI":"10.1007\/978-3-540-92673-3_0"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1016\/j.future.2018.11.035","article-title":"Extreme events management using multimedia social networks","volume":"94","author":"Amato","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"17803","DOI":"10.1007\/s11042-017-5556-2","article-title":"Multimedia Summarization Using Social Media Content","volume":"77","author":"Amato","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_5","unstructured":"(2019, May 06). International Conference on the Computational Processing of Portuguese Language. Available online: http:\/\/www.propor.org\/."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1007\/s13173-013-0116-8","article-title":"A review on Relation Extraction with an eye on Portuguese","volume":"19","author":"Bonamigo","year":"2013","journal-title":"J. Braz. Comput. Soc."},{"key":"ref_7","unstructured":"Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., and Piperidis, S. (2012, January 23\u201325). Propbank-Br: A Brazilian Treebank annotated with semantic role labels. Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey."},{"key":"ref_8","unstructured":"(2019, May 06). Agatha an Intelligent Open Source Analysis System. Available online: http:\/\/www.agatha-osi.com\/."},{"key":"ref_9","unstructured":"Raiyani, K., Gon\u00e7alves, T., Quaresma, P., and Nogueira, V.B. (2018, January 10\u201314). Multi-Language Neural Network Model with Advance Preprocessor for Gender Classification over Social Media: Notebook for PAN at CLEF 2018. Proceedings of the Working Notes of CLEF 2018-Conference and Labs of the Evaluation Forum, Avignon, France."},{"key":"ref_10","unstructured":"Raiyani, K., Gon\u00e7alves, T., Quaresma, P., and Nogueira, V.B. (2018, January 20\u201321). Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Association for Computational Linguistics, Santa Fe, NM, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Raiyani, K., Gon\u00e7alves, T., Quaresma, P., and Nogueira, V.B. (2019, January 6\u20137). Vista.ue at SemEval-2019 Task 5: Single Multilingual Hate Speech Detection Model. Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), Minneapolis, MN, USA.","DOI":"10.18653\/v1\/S19-2094"},{"key":"ref_12","unstructured":"Raiyani, K., and Quaresma, P. (2019, January 17\u201321). Keyword & Machine Learning Based Japanese Statute Law Retrieval and Entailment Task at COLIEE-2019. Proceedings of the Competition on Legal Information Retrieval and Entailment Workshop (COLIEE 2019) in association with the 17th International Conference on Artificial Intelligence and Law 2019 (ICAIL 2019), Montr\u00e9al, QC, Canada."},{"key":"ref_13","unstructured":"Mitamura, T., Liu, Z., and Hovy, E.H. (2017, January 13\u201314). Events Detection, Coreference and Sequencing: What\u2019s next? Overview of the TAC KBP 2017 Event Track. Proceedings of the 2017 Text Analysis Conference, TAC 2017, Gaithersburg, MD, USA."},{"key":"ref_14","unstructured":"Bazzan, A.L., and Pichara, K. (2014). Extraction of Relation Descriptors for Portuguese Using Conditional Random Fields. Advances in Artificial Intelligence\u2014IBERAMIA 2014, Springer."},{"key":"ref_15","unstructured":"Bonamigo, T.L., and Vieira, R. (2013, January 24\u201330). A Model for Information Extraction in Portuguese Based on Text Patterns. Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing-Volume 2, Samos, Greece."},{"key":"ref_16","unstructured":"Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. (2004, January 26\u201328). The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal."},{"key":"ref_17","unstructured":"Matsumoto, Y., Sproat, R.W., Wong, K.F., and Zhang, M. (2006). Building Document Graphs for Multiple News Articles Summarization: An Event-Based Approach. Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead, Springer."},{"key":"ref_18","unstructured":"Ahn, D. The Stages of Event Extraction. Proceedings of the Workshop on Annotating and Reasoning about Time and Events."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Halpin, H., and Moore, J.D. (2006, January 17\u201318). Event Extraction in a Plot Advice Agent. Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia.","DOI":"10.3115\/1220175.1220283"},{"key":"ref_20","unstructured":"Xu, F., Uszkoreit, H., and Li, H. (2006, January 16\u201317). Automatic Event and Relation Detection with Seeds of Varying Complexity. Proceedings of the AAAI Workshop Event Extraction and Synthesis, Boston, MA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sakaki, T., Okazaki, M., and Matsuo, Y. (2010, January 26\u201330). Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA.","DOI":"10.1145\/1772690.1772777"},{"key":"ref_22","unstructured":"Benson, E., Haghighi, A., and Barzilay, R. (2011, January 19\u201324). Event Discovery in Social Media Feeds. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Portland, OR, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ritter, A., Etzioni, O., and Clark, S. (2012, January 12\u201316). Open Domain Event Extraction from Twitter. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China.","DOI":"10.1145\/2339530.2339704"},{"key":"ref_24","unstructured":"Zhao, X., Jiang, J., He, J., Song, Y., Achananuparp, P., Xin, W., Jing, Z., Jing, J., Yang, H., and Achananuparp, S.P. (2011, January 19\u201324). Topical keyphrase extraction from twitter. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-2011), Portland, OR, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/j.procs.2016.09.069","article-title":"Real World City Event Extraction from Twitter Data Streams","volume":"98","author":"Zhou","year":"2016","journal-title":"Proced. Comput. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zong, B., Wu, Y., Song, J., Singh, A.K., Cam, H., Han, J., and Yan, X. (2014, January 24\u201327). Towards Scalable Critical Alert Mining. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2623330.2623729"},{"key":"ref_27","unstructured":"(2019, May 06). EU Vocabularies. Available online: https:\/\/publications.europa.eu\/en\/web\/eu-vocabularies."},{"key":"ref_28","unstructured":"Carreras, X., Chao, I., Padr\u00f3, L., and Padro, M. (2004, January 26\u201328). FreeLing: An open-source suite of language analyzers. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC\u201904), Lisbon, Portugal."},{"key":"ref_29","unstructured":"(2019, May 06). Polyglot a natural language pipeline that supports massive multilingual applications. Available online: https:\/\/pypi.org\/project\/polyglot\/."},{"key":"ref_30","unstructured":"(2019, May 06). Compact Language Detector 2. Available online: https:\/\/github.com\/CLD2Owners\/cld2."},{"key":"ref_31","unstructured":"Brants, T. (May, January 29). TnT: A statistical part-of-speech tagger. Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, WA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Carreras, X., M\u00e0rquez, L., and Padr\u00f3, L. (June, January 31). A simple named entity extractor using AdaBoost. Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada.","DOI":"10.3115\/1119176.1119197"},{"key":"ref_33","unstructured":"(2019, May 06). Portuguese Universal Propositions. Available online: https:\/\/github.com\/System-T\/UniversalPropositions\/tree\/master\/UP_Portuguese-Bosque."},{"key":"ref_34","unstructured":"(2019, May 06). FreeLing 4.1 User Manual. Available online: https:\/\/talp-upc.gitbook.io\/freeling-4-1-user-manual\/v\/master\/tagsets\/tagset-pt."},{"key":"ref_35","unstructured":"(2019, May 06). Automated Event Extraction Model for Multiple Linked Portuguese Documents. Available online: https:\/\/github.com\/kraiyani\/Automated-Event-Extraction-Model-for-Multiple-Linked-Portuguese-Documents\/blob\/master\/Universal_to_eagle_tagset.xlsx."},{"key":"ref_36","unstructured":"(2019, May 06). Training and Development Dataset for Automated Event Extraction Model for Multiple Linked Portuguese Documents. Available online: https:\/\/github.com\/kraiyani\/Automated-Event-Extraction-Model-for-Multiple-Linked-Portuguese-Documents."},{"key":"ref_37","unstructured":"Raiyani, K., Gon\u00e7alves, T., Quaresma, P., and Nogueira, V.B. (2019, January 14). Automated Event Extraction Model for Linked Portuguese Documents. Proceedings of Text2Story\u2014Second Workshop on Narrative Extraction From Texts co-located with 41th European Conference on Information Retrieval (ECIR 2019), Cologne, Germany."},{"key":"ref_38","unstructured":"Guarino, N., and Giaretta, P. (1995). Ontologies and knowledge bases: Towards a terminological clarification. Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, IOS Press."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1016\/j.websem.2011.03.003","article-title":"Design and use of the Simple Event Model (SEM)","volume":"9","author":"Segers","year":"2011","journal-title":"Web Semant. Sci. Serv. Agents World Wide Web"},{"key":"ref_40","unstructured":"(2019, May 06). IATE (Interactive Terminology for Europe). Available online: https:\/\/iate.europa.eu\/home."},{"key":"ref_41","unstructured":"(2019, May 06). Protege. Available online: https:\/\/protege.stanford.edu\/."},{"key":"ref_42","unstructured":"(2019, May 06). GraphDB. Available online: http:\/\/graphdb.ontotext.com\/."},{"key":"ref_43","unstructured":"(2019, May 06). EU Vocabularies, Thesauri, 1216 Criminal Law. Available online: https:\/\/publications.europa.eu\/en\/web\/eu-vocabularies\/th-concept-scheme\/-\/resource\/eurovoc\/100180?target=Browse."},{"key":"ref_44","unstructured":"(2019, May 06). Levenshtein Distance. Available online: https:\/\/en.wikipedia.org\/wiki\/Levenshtein_distance."},{"key":"ref_45","unstructured":"(2019, May 06). Development Dataset of Automated Event Extraction Model for Multiple Linked Portuguese Documents. Available online: https:\/\/github.com\/kraiyani\/Automated-Event-Extraction-Model-for-Multiple-Linked-Portuguese-Documents\/blob\/master\/pt_devel.txt."},{"key":"ref_46","unstructured":"(2019, May 06). Validation Dataset of Automated Event Extraction Model for Multiple Linked Portuguese Documents. Available online: https:\/\/github.com\/kraiyani\/Automated-Event-Extraction-Model-for-Multiple-Linked-Portuguese-Documents\/blob\/master\/pt_train.txt."},{"key":"ref_47","unstructured":"(2019, May 06). PortLEX Project, PropBank.Br Dataset. Available online: http:\/\/www.nilc.icmc.usp.br\/portlex\/index.php\/en\/projects\/propbankbringl."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Gamallo, P., Garcia, M., Pineiro, C., Martinez-Casta\u00f1o, R., and Pichel, J.C. (2018, January 15\u201318). LinguaKit: A Big Data-based multilingual tool for linguistic analysis and information extraction. Proceedings of the 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain.","DOI":"10.1109\/SNAMS.2018.8554689"},{"key":"ref_49","unstructured":"Cardoso, N. (2012, January 21\u201327). Rembrandt\u2014A named-entity recognition framework. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/j.eswa.2019.03.048","article-title":"Framework for syntactic string similarity measures","volume":"129","author":"Gali","year":"2019","journal-title":"Expert Syst. Appl."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/6\/205\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:57:03Z","timestamp":1760187423000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/6\/205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,8]]},"references-count":50,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["info10060205"],"URL":"https:\/\/doi.org\/10.3390\/info10060205","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2019,6,8]]}}}