{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:45:47Z","timestamp":1760147147483,"version":"build-2065373602"},"reference-count":53,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,1,13]],"date-time":"2023-01-13T00:00:00Z","timestamp":1673568000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003566","name":"Ministry of Oceans and Fisheries, Republic of Korea","doi-asserted-by":"publisher","award":["201803932"],"award-info":[{"award-number":["201803932"]}],"id":[{"id":"10.13039\/501100003566","id-type":"DOI","asserted-by":"publisher"}]},{"name":"School of Electrical Engineering and Informatics, Institut Teknologi Bandung","award":["201803932"],"award-info":[{"award-number":["201803932"]}]},{"name":"School of Electrical Engineering, Telkom University","award":["201803932"],"award-info":[{"award-number":["201803932"]}]},{"name":"Faculty of Engineering and Technology, Sampoerna University","award":["201803932"],"award-info":[{"award-number":["201803932"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>To pursue a healthy lifestyle, people are increasingly concerned about their food ingredients. Recently, it has become a common practice to use an online recipe to select the ingredients that match an individual\u2019s meal plan and healthy diet preference. The information from online recipes can be extracted and used to develop various food-related applications. Named entity recognition (NER) is often used to extract such information. However, the problem in building an NER system lies in the massive amount of data needed to train the classifier, especially on a specific domain, such as food. There are food NER datasets available, but they are still quite limited. Thus, we proposed an iterative self-training approach called semi-supervised multi-model prediction technique (SMPT) to construct a food ingredient NER dataset. SMPT is a deep ensemble learning model that employs the concept of self-training and uses multiple pre-trained language models in the iterative data labeling process, with a voting mechanism used as the final decision to determine the entity\u2019s label. Utilizing the SMPT, we have created a new annotated dataset of ingredient entities obtained from the Allrecipes website named FINER. Finally, this study aims to use the FINER dataset as an alternative resource to support food computing research and development.<\/jats:p>","DOI":"10.3390\/informatics10010010","type":"journal-article","created":{"date-parts":[[2023,1,16]],"date-time":"2023-01-16T01:31:15Z","timestamp":1673832675000},"page":"10","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["SMPT: A Semi-Supervised Multi-Model Prediction Technique for Food Ingredient Named Entity Recognition (FINER) Dataset Construction"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2618-3039","authenticated-orcid":false,"given":"Kokoy Siti","family":"Komariah","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence Convergence, Pukyong National University, Busan 48513, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2839-8172","authenticated-orcid":false,"given":"Ariana Tulus","family":"Purnomo","sequence":"additional","affiliation":[{"name":"Department of Information System, Faculty of Engineering and Technology, Sampoerna University, L\u2019Avenue Building, Jl. Raya Pasar Minggu No. Kav 16, Pancoran, Jakarta Selatan 12780, Indonesia"},{"name":"Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5400-0995","authenticated-orcid":false,"given":"Ardianto","family":"Satriawan","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha No. 10, Bandung 40132, Indonesia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3413-3875","authenticated-orcid":false,"given":"Muhammad Ogin","family":"Hasanuddin","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha No. 10, Bandung 40132, Indonesia"}]},{"given":"Casi","family":"Setianingsih","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, Telkom University, Jl. Telekomunikasi No. 1, Bandung 40257, Indonesia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0102-0313","authenticated-orcid":false,"given":"Bong-Kee","family":"Sin","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence Convergence, Pukyong National University, Busan 48513, Republic of Korea"},{"name":"Division of Computer Engineering, Pukyong National University, Busan 48513, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"624","DOI":"10.7861\/clinmedicine.10-6-624","article-title":"Malnutrition: Causes and consequences","volume":"10","author":"Saunders","year":"2010","journal-title":"Clin. Med."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kalra, J., Batra, D., Diwan, N., and Bagler, G. (2020, January 20\u201324). Nutritional profile estimation in cooking recipes. Proceedings of the 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), Dallas, TX, USA.","DOI":"10.1109\/ICDEW49219.2020.000-3"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Syed, M.H., and Chung, S.T. (2021). MenuNER: Domain-adapted BERT based NER approach for a domain with limited dataset and its application to food menu domain. Appl. Sci., 11.","DOI":"10.3390\/app11136007"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Pellegrini, C., Ozsoy, E., Wintergerst, M., and Groh, G. (2021, January 11\u201313). Exploiting Food Embeddings for Ingredient Substitution. Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies\u2014HEALTHINF, Vienna, Austria.","DOI":"10.5220\/0010202000670077"},{"key":"ref_5","first-page":"92","article-title":"A survey on food computing","volume":"52","author":"Min","year":"2019","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"baz121","DOI":"10.1093\/database\/baz121","article-title":"FoodBase corpus: A new resource of annotated food entities","volume":"2019","author":"Popovski","year":"2019","journal-title":"Database"},{"key":"ref_7","unstructured":"Krishnan, V., Ganapathy, V., and Named Entity Recognition (2021, February 04). Stanford Lecture CS229. Available online: http:\/\/cs229.stanford.edu\/2005\/KrishnanGanapathy-NamedEntityRecognition.pdf."},{"key":"ref_8","unstructured":"Komariah, K.S., and Shin, B.K. (2020, January 21\u201323). Nutrition-Based Food Recommendation System for Prediabetic Person. Proceedings of the Korea Software Congress 2020 (KSC 2020), Seoul, Republic of Korea."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1109\/TPAMI.2019.2927476","article-title":"Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images","volume":"43","author":"Marin","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bie\u0144, M., Gilski, M., Maciejewska, M., Taisner, W., Wisniewski, D., and Lawrynowicz, A. (2020, January 15\u201318). RecipeNLG: A cooking recipes dataset for semi-structured text generation. Proceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland.","DOI":"10.18653\/v1\/2020.inlg-1.4"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"baaa077","DOI":"10.1093\/database\/baaa077","article-title":"Recipedb: A resource for exploring recipes","volume":"2020","author":"Batra","year":"2020","journal-title":"Database"},{"key":"ref_12","unstructured":"Wr\u00f3blewska, A., Kaliska, A., Paw\u0142owski, M., Wi\u015bniewski, D., Sosnowski, W., and \u0141awrynowicz, A. (2022). TASTEset\u2013Recipe Dataset and Food Entities Recognition Benchmark. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"31586","DOI":"10.1109\/ACCESS.2020.2973502","article-title":"A survey of named-entity recognition methods for food information extraction","volume":"8","author":"Popovski","year":"2020","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Boushehri, S.S., Qasim, A.B., Waibel, D., Schmich, F., and Marr, C. (2021). Systematic comparison of incomplete-supervision approaches for biomedical imaging classification. bioRxiv.","DOI":"10.21203\/rs.3.rs-798207\/v1"},{"key":"ref_15","first-page":"3833","article-title":"Rethinking pre-training and self-training","volume":"33","author":"Zoph","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5391\/IJFIS.2017.17.1.1","article-title":"Deep neural network self-training based on unsupervised learning and dropout","volume":"17","author":"Lee","year":"2017","journal-title":"Int. J. Fuzzy Log. Intell. Syst."},{"key":"ref_17","unstructured":"(2022, March 05). spaCy. Available online: https:\/\/spacy.io\/."},{"key":"ref_18","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_19","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_20","unstructured":"Komariah, K.S., Purnomo, A.T., and Sin, B.K. (2022, April 07). FINER: Food Ingredient NER Dataset. Available online: https:\/\/doi.org\/10.6084\/m9.figshare.20222361.v3."},{"key":"ref_21","unstructured":"Yadav, V., and Bethard, S. (2019). A survey on recent advances in named entity recognition from deep learning models. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1109\/TKDE.2020.2981314","article-title":"A survey on deep learning for named entity recognition","volume":"34","author":"Li","year":"2020","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.cosrev.2018.06.001","article-title":"Recent named entity recognition and classification techniques: A systematic review","volume":"29","author":"Goyal","year":"2018","journal-title":"Comput. Sci. Rev."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cenikj, G., Popovski, G., Stojanov, R., Seljak, B.K., and Eftimov, T. (2020, January 10\u201313). BuTTER: BidirecTional LSTM for Food Named-Entity Recognition. Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA.","DOI":"10.1109\/BigData50022.2020.9378151"},{"key":"ref_25","unstructured":"(2022, December 14). Allrecipes. Available online: https:\/\/www.allrecipes.com\/."},{"key":"ref_26","unstructured":"(2022, December 14). Food. Available online: https:\/\/www.food.com\/."},{"key":"ref_27","unstructured":"(2022, December 14). Tarla Dalal Indian Recipes. Available online: https:\/\/www.tarladalal.com\/."},{"key":"ref_28","unstructured":"(2022, December 14). The Spruce Eats. Available online: https:\/\/www.thespruceeats.com\/."},{"key":"ref_29","unstructured":"(2022, December 14). Epicuriuous. Available online: https:\/\/www.epicurious.com\/."},{"key":"ref_30","unstructured":"(2022, December 14). Food Network. Available online: https:\/\/www.foodnetwork.com\/."},{"key":"ref_31","unstructured":"(2022, December 14). Taste. Available online: https:\/\/www.taste.com.au\/."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Diwan, N., Batra, D., and Bagler, G. (2020, January 20\u201324). A named entity based approach to model recipes. Proceedings of the 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), Dallas, TX, USA.","DOI":"10.1109\/ICDEW49219.2020.000-2"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"59684","DOI":"10.1109\/ACCESS.2020.2981361","article-title":"Construction of machine-labeled data for improving named entity recognition by transfer learning","volume":"8","author":"Kim","year":"2020","journal-title":"IEEE Access"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Du, J., Grave, E., Gunel, B., Chaudhary, V., Celebi, O., Auli, M., Stoyanov, V., and Conneau, A. (2020). Self-training improves pre-training for natural language understanding. arXiv.","DOI":"10.18653\/v1\/2021.naacl-main.426"},{"key":"ref_35","unstructured":"Komariah, K.S., and Sin, B.K. (2021, January 23\u201325). BERT Pre-trained Models for Data Augmentation in Twitter Medical Named-Entity Recognition. Proceedings of the Korea Computer Congress 2021 (KCC 2021), Jeju, Republic of Korea."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Arslan, Y., Allix, K., Veiber, L., Lothritz, C., Bissyand\u00e9, T.F., Klein, J., Goujon, A., Arslan, Y., Allix, K., and Veiber, L. (2021). A comparison of pre-trained language models for multi-class text classification in the financial domain. Companion Proceedings of the Web Conference 2021 (WWW \u201921), Association for Computing Machinery.","DOI":"10.1145\/3442442.3451375"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"e28229","DOI":"10.2196\/28229","article-title":"A fine-tuned bidirectional encoder representations from transformers model for food named-entity recognition: Algorithm development and validation","volume":"23","author":"Stojanov","year":"2021","journal-title":"J. Med. Internet Res."},{"key":"ref_38","unstructured":"Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O\u2019Reilly Media, Inc."},{"key":"ref_39","unstructured":"Nakayama, H., Kubo, T., Kamura, J., Taniguchi, Y., and Liang, X. (2022, February 11). doccano: Text Annotation Tool for Human. Available online: https:\/\/github.com\/doccano\/doccano."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Partalidou, E., Spyromitros-Xioufis, E., Doropoulos, S., Vologiannidis, S., and Diamantaras, K. (2019, January 14\u201317). Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy. Proceedings of the IEEE\/WIC\/ACM International Conference on Web Intelligence, Thessaloniki, Greece.","DOI":"10.1145\/3350546.3352543"},{"key":"ref_41","unstructured":"Thickstun, J. (2021). The Transformer Model in Equations, University of Washington."},{"key":"ref_42","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. arXiv."},{"key":"ref_43","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv, 2."},{"key":"ref_44","unstructured":"Zheng, A. (2015). Evaluating Machine Learning Models: A Beginner\u2019s Guide to Key Concepts and Pitfalls, O\u2019Reilly Media."},{"key":"ref_45","unstructured":"Grandini, M., Bagli, E., and Visani, G. (2020). Metrics for multi-class classification: An overview. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1561\/2200000013","article-title":"An introduction to conditional random fields","volume":"4","author":"Sutton","year":"2012","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1016\/j.procs.2020.03.431","article-title":"Named entity recognition using conditional random fields","volume":"167","author":"Patil","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Komariah, K.S., and Shin, B.K. (February, January 31). Medical Entity Recognition in Twitter using Conditional Random Fields. Proceedings of the 2021 International Conference on Electronics, Information, and Communication (ICEIC), Jeju, Republic of Korea.","DOI":"10.1109\/ICEIC51217.2021.9369799"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1587\/transinf.2016EDP7179","article-title":"LSTM-CRF models for named entity recognition","volume":"100","author":"Lee","year":"2017","journal-title":"IEICE Trans. Inf. Syst."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1162\/tacl_a_00104","article-title":"Named entity recognition with bidirectional LSTM-CNNs","volume":"4","author":"Chiu","year":"2016","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_51","unstructured":"Panchendrarajan, R., and Amaresan, A. (2018, January 1\u20133). Bidirectional LSTM-CRF for named entity recognition. Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, Hong Kong, China."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Ma, X., and Hovy, E. (2016). End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv.","DOI":"10.18653\/v1\/P16-1101"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. (2016). Neural architectures for named entity recognition. arXiv.","DOI":"10.18653\/v1\/N16-1030"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/10\/1\/10\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:05:25Z","timestamp":1760119525000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/10\/1\/10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,13]]},"references-count":53,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["informatics10010010"],"URL":"https:\/\/doi.org\/10.3390\/informatics10010010","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2023,1,13]]}}}