{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T15:35:36Z","timestamp":1772897736775,"version":"3.50.1"},"reference-count":64,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,3,4]],"date-time":"2023-03-04T00:00:00Z","timestamp":1677888000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Identifying failure modes is an important task to improve the design and reliability of a product and can also serve as a key input in sensor selection for predictive maintenance. Failure mode acquisition typically relies on experts or simulations which require significant computing resources. With the recent advances in Natural Language Processing (NLP), efforts have been made to automate this process. However, it is not only time consuming, but extremely challenging to obtain maintenance records that list failure modes. Unsupervised learning methods such as topic modeling, clustering, and community detection are promising approaches for automatic processing of maintenance records to identify failure modes. However, the nascent state of NLP tools combined with incompleteness and inaccuracies of typical maintenance records pose significant technical challenges. As a step towards addressing these challenges, this paper proposes a framework in which online active learning is used to identify failure modes from maintenance records. Active learning provides a semi-supervised machine learning approach, allowing for a human in the training stage of the model. The hypothesis of this paper is that the use of a human to annotate part of the data and train a machine learning model to annotate the rest is more efficient than training unsupervised learning models. Results demonstrate that the model is trained with annotating less than ten percent of the total available data. The framework is able to achieve ninety percent (90%) accuracy in the identification of failure modes in test cases with an F-1 score of 0.89. This paper also demonstrates the effectiveness of the proposed framework with both qualitative and quantitative measures.<\/jats:p>","DOI":"10.3390\/s23052818","type":"journal-article","created":{"date-parts":[[2023,3,6]],"date-time":"2023-03-06T02:28:34Z","timestamp":1678069714000},"page":"2818","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Leveraging Active Learning for Failure Mode Acquisition"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2718-0630","authenticated-orcid":false,"given":"Amol","family":"Kulkarni","sequence":"first","affiliation":[{"name":"Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, State College, PA 16802, USA"}]},{"given":"Janis","family":"Terpenny","sequence":"additional","affiliation":[{"name":"Department of Systems Engineering and Operations Research, George Mason University, Fairfax, VA 22030, USA"}]},{"given":"Vittaldas","family":"Prabhu","sequence":"additional","affiliation":[{"name":"Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, State College, PA 16802, USA"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,4]]},"reference":[{"key":"ref_1","unstructured":"Haldar, A., and Mahadevan, S. (2000). Reliability Assessment Using Stochastic Finite Element Analysis, John Wiley & Sons."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.anucene.2013.01.020","article-title":"Prediction of impact induced failure modes in reinforced concrete slabs through nonlinear transient dynamic finite element simulation","volume":"56","author":"Trivedi","year":"2013","journal-title":"Ann. Nucl. Energy"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1163\/156855498X00379","article-title":"The microdroplet test: Experimental and finite element analysis of the dependance of failure mode on droplet shape","volume":"6","author":"Hodzic","year":"1998","journal-title":"Compos. Interfaces"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1549","DOI":"10.1007\/s11666-015-0317-0","article-title":"Finite Element Analysis and Failure Mode Characterization of Pyramidal Fin Arrays Produced by Masked Cold Gas Dynamic Spray","volume":"24","author":"Cormier","year":"2015","journal-title":"J. Therm. Spray Technol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s10845-019-01466-z","article-title":"A data-driven approach for constructing the component-failure mode matrix for FMEA","volume":"31","author":"Xu","year":"2019","journal-title":"J. Intell. Manuf."},{"key":"ref_6","unstructured":"GlobalData (2022). Mining capital expenditure to rise by 22% across leading miners in 2022. Min. Technol., Available online: https:\/\/www.mining-technology.com\/comment\/mining-capital-expenditure\/."},{"key":"ref_7","first-page":"1","article-title":"Why autonomous assets are good for reliability\u2014The impact of \u2018operator-related component\u2019 failures on heavy mobile equipment reliability","volume":"9","author":"Hodkiewicz","year":"2017","journal-title":"Annu. Conf. PHM Soc."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"e13477","DOI":"10.1002\/acm2.13477","article-title":"Topic modeling of maintenance logs for linac failure modes and trends identification","volume":"23","author":"Yun","year":"2021","journal-title":"J. Appl. Clin. Med. Phys."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.compind.2015.09.005","article-title":"Natural language processing for aviation safety reports: From classification to interactive analysis","volume":"78","author":"Tanguy","year":"2016","journal-title":"Comput. Ind."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.trc.2017.12.018","article-title":"Using structural topic modeling to identify latent topics and trends in aviation incident reports","volume":"87","author":"Kuhn","year":"2018","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1108\/JAMR-02-2017-0024","article-title":"Knowledge management of automobile system failures through development of failure knowledge ontology from maintenance experience","volume":"14","author":"James","year":"2017","journal-title":"J. Adv. Manag. Res."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.1115\/1.3439009","article-title":"The Failure-Experience Matrix\u2014A Useful Design Tool","volume":"98","author":"Collins","year":"1976","journal-title":"J. Eng. Ind."},{"key":"ref_13","unstructured":"Arunajadai, S.G., Stone, R.B., and Tumer, I.Y. (2002). Volume 4: 14th International Conference on Design Theory and Methodology, Integrated Systems Design, and Engineering Design and Culture, ASME."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"364","DOI":"10.4338\/ACI-2014-10-RA-0088","article-title":"A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time","volume":"6","author":"Wu","year":"2015","journal-title":"Appl. Clin. Inform."},{"key":"ref_15","unstructured":"Wani, M.F., and Jan, M. (2006). Volume 1: Advanced Energy Systems, Advanced Materials, Aerospace, Automation and Robotics, Noise Control and Acoustics, and Systems Engineering, ASME."},{"key":"ref_16","first-page":"63","article-title":"Research on automatic generation of software failure modes","volume":"12","author":"Meng","year":"2018","journal-title":"J. Front. Comput. Sci. Technol."},{"key":"ref_17","unstructured":"Chen, L., and Nayak, R. (2007, January 1). A case study of failure mode analysis with text mining methods. Proceedings of the 2nd International Workshop on Integrating Artificial Intelligence and Data Mining (AIDM 2007), Gold Coast, Australia."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"7235","DOI":"10.1016\/j.eswa.2015.04.036","article-title":"Clustering and visualization of failure modes using an evolving tree","volume":"42","author":"Chang","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/s10115-014-0806-3","article-title":"A data- and ontology-driven text mining-based construction of reliability model to analyze and predict component failures","volume":"46","author":"Rajpathak","year":"2015","journal-title":"Knowl. Inf. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1145\/361219.361220","article-title":"A vector space model for automatic indexing","volume":"18","author":"Salton","year":"1975","journal-title":"Commun. ACM"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"011005","DOI":"10.1115\/1.4044506","article-title":"A Framework Based on K-Means Clustering and Topic Modeling for Analyzing Unstructured Manufacturing Capability Data","volume":"20","author":"Sabbagh","year":"2019","journal-title":"J. Comput. Inf. Sci. Eng."},{"key":"ref_22","first-page":"17","article-title":"A correlated topic model of Science","volume":"1","author":"Blei","year":"2007","journal-title":"Ann. Appl. Stat."},{"key":"ref_23","unstructured":"Alharthi, H., Inkpen, D., and Szpakowicz, S. (2023, January 17). Unsupervised Topic Modelling in a Book Recommender System for New Users. Available online: http:\/\/ceur-ws.org."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/j.cose.2017.03.007","article-title":"Analyzing research trends in personal information privacy using topic modeling","volume":"67","author":"Choi","year":"2017","journal-title":"Comput. Secur."},{"key":"ref_25","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1016\/j.procs.2017.08.166","article-title":"Combining IR and LDA Topic Modeling for Filtering Microblogs","volume":"112","author":"Hajjem","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1016\/j.eswa.2016.03.045","article-title":"Ensemble of keyword extraction methods and classifiers in text classification","volume":"57","author":"Onan","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tong, Z., and Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. Comput. Sci. Inf. Technol., 201\u2013210.","DOI":"10.5121\/csit.2016.60616"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.artint.2017.06.004","article-title":"Latent tree models for hierarchical topic detection","volume":"250","author":"Chen","year":"2017","journal-title":"Artif. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5643","DOI":"10.1109\/TNNLS.2018.2808332","article-title":"Latent Topic Text Representation Learning on Statistical Manifolds","volume":"29","author":"Jiang","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.eswa.2016.03.011","article-title":"A step forward for Topic Detection in Twitter: An FCA-based approach","volume":"57","author":"Castellanos","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Waheeb, S.A., Khan, N.A., and Shang, X. (2022). Topic Modeling and Sentiment Analysis of Online Education in the COVID-19 Era Using Social Networks Based Datasets. Electronics, 11.","DOI":"10.3390\/electronics11050715"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/JOEUC.294580","article-title":"A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU","volume":"33","author":"Bao","year":"2021","journal-title":"J. Organ. End User Comput."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Settles, B. (2012). Active Learning, Springer.","DOI":"10.1007\/978-3-031-01560-1"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1007\/BF00116828","article-title":"Queries and Concept Learning","volume":"2","author":"Angluin","year":"1988","journal-title":"Mach. Learn."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Angluin, D. (2001). Queries Revisited, Springer.","DOI":"10.1007\/3-540-45650-3_3"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/BF00993277","article-title":"Improving generalization with active learning","volume":"15","author":"Cohn","year":"1994","journal-title":"Mach. Learn."},{"key":"ref_38","first-page":"8","article-title":"Query learning can work poorly when a human oracle is used","volume":"8","author":"Baum","year":"1992","journal-title":"Int. Jt. Conf. Neural Netw."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Hanneke, S. (2007, January 20\u201324). A bound on the label complexity of agnostic active learning. Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA.","DOI":"10.1145\/1273496.1273541"},{"key":"ref_40","unstructured":"Lewis, D.D. (1994). SIGIR \u201994, Springer."},{"key":"ref_41","first-page":"45","article-title":"Support vector machine active learning with applications to text classification","volume":"2","author":"Tong","year":"2001","journal-title":"J. Mach. Learn. Res."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hoi, S.C.H., Jin, R., and Lyu, M.R. (2006, January 23\u201326). Large-scale text categorization by batch mode active learning. Proceedings of the 15th International Conference on World Wide Web, Edinburgh, Scotland.","DOI":"10.1145\/1135777.1135870"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1023\/A:1007692713085","article-title":"Text Classification from Labeled and Unlabeled Documents using EM","volume":"39","author":"Nigam","year":"2000","journal-title":"Mach. Learn."},{"key":"ref_44","unstructured":"Dagan, I., and Engelson, S.P. (1995). Machine Learning Proceedings 1995, Elsevier."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Kim, D., Lee, S., and Kim, D. (2021). An Applicable Predictive Maintenance Framework for the Absence of Run-to-Failure Data. Appl. Sci., 11.","DOI":"10.3390\/app11115180"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"9022","DOI":"10.1109\/ACCESS.2019.2890979","article-title":"An Active Learning Method Based on Uncertainty and Complexity for Gearbox Fault Diagnosis","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"106294","DOI":"10.1016\/j.ymssp.2019.106294","article-title":"Probabilistic active learning: An online framework for structural health monitoring","volume":"134","author":"Bull","year":"2019","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1016\/j.mfglet.2020.11.001","article-title":"Technical language processing: Unlocking maintenance knowledge","volume":"27","author":"Brundage","year":"2020","journal-title":"Manuf. Lett."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1613\/jair.295","article-title":"Active Learning with Statistical Models","volume":"4","author":"Cohn","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_50","unstructured":"Cai, H., Zheng, V., and Chang, K.C.-C. (2017). Active Learning for Graph Embedding. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Aodha, O., Campbell, N., Kautz, J., and Brostow, G.J. (2014, January 23\u201328). Hierarchical Subquery Evaluation for Active Learning on a Graph. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.79"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Banerjee, S., Ramanathan, K., and Gupta, A. (2007, January 23\u201327). Clustering short texts using wikipedia. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands.","DOI":"10.1145\/1277741.1277909"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Yin, J., and Wang, J. (2016, January 16\u201320). A model-based approach for text clustering with outlier detection. Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland.","DOI":"10.1109\/ICDE.2016.7498276"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Hadifar, A., Sterckx, L., Demeester, T., and Develder, C. (2019, January 2). A Self-Training Approach for Short Text Clustering. Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy.","DOI":"10.18653\/v1\/W19-4322"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"2928","DOI":"10.1109\/TKDE.2014.2313872","article-title":"BTM: Topic Modeling over Short Texts","volume":"26","author":"Cheng","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1016\/j.neucom.2017.11.019","article-title":"Corpus-based topic diffusion for short text clustering","volume":"275","author":"Zheng","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.knosys.2016.03.027","article-title":"Improving short text classification by learning vector representations of both words and hidden topics","volume":"102","author":"Zhang","year":"2016","journal-title":"Knowl.-Based Syst."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.knosys.2018.08.011","article-title":"Experimental explorations on short text topic mining between LDA and NMF based Schemes","volume":"163","author":"Chen","year":"2018","journal-title":"Knowl.-Based Syst."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Yin, J., and Wang, J. (2014, January 24\u201327). A dirichlet multinomial mixture model-based approach for short text clustering. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2623330.2623715"},{"key":"ref_60","first-page":"7882396","article-title":"Railway Fault Text Clustering Method Using an Improved Dirichlet Multinomial Mixture Model","volume":"2022","author":"Yang","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"ref_61","unstructured":"Ho, M.T. (2015). A Shared Reliability Database for Mobile Mining Equipment, University of Western Australia."},{"key":"ref_62","unstructured":"Prognostics Data Library (2022, July 26). Excavator Maintenance Work Orders. Available online: https:\/\/prognosticsdl.systemhealthlab.com\/dataset\/excavator-maintenance-work-orders."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1108\/JQME-04-2015-0013","article-title":"Cleaning historical maintenance work order data for reliability analysis","volume":"22","author":"Hodkiewicz","year":"2016","journal-title":"J. Qual. Maint. Eng."},{"key":"ref_64","unstructured":"Danka, T., and Horvath, P. (2018). modAL: A modular active learning framework for Python. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2818\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:47:51Z","timestamp":1760122071000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2818"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,4]]},"references-count":64,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052818"],"URL":"https:\/\/doi.org\/10.3390\/s23052818","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,4]]}}}