{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T08:43:40Z","timestamp":1778575420886,"version":"3.51.4"},"reference-count":20,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T00:00:00Z","timestamp":1631836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002628","name":"Incheon National University","doi-asserted-by":"publisher","award":["2020"],"award-info":[{"award-number":["2020"]}],"id":[{"id":"10.13039\/501100002628","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2020R1A2C1007917"],"award-info":[{"award-number":["NRF-2020R1A2C1007917"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Automatic meter infrastructure (AMI) systems using remote metering are being widely used to utilize water resources efficiently and minimize non-revenue water. We propose a convolutional neural network-long short-term memory network (CNN-LSTM)-based solution that can predict faulty remote water meter reading (RWMR) devices by analyzing approximately 2,850,000 AMI data collected from 2762 customers over 360 days in a small-sized city in South Korea. The AMI data used in this study is a challenging, highly unbalanced real-world dataset with limited features. First, we perform extensive preprocessing steps and extract meaningful features for handling this challenging dataset with limited features. Next, we select important features that have a higher influence on the classifier using a recursive feature elimination method. Finally, we apply the CNN-LSTM model for predicting faulty RWMR devices. We also propose an efficient training method for ML models to learn the unbalanced real-world AMI dataset. A cost-effective threshold for evaluating the performance of ML models is proposed by considering the mispredictions of ML models as well as the cost. Our experimental results show that an F-measure of 0.82 and MCC of 0.83 are obtained when the CNN-LSTM model is used for prediction.<\/jats:p>","DOI":"10.3390\/s21186229","type":"journal-article","created":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T03:47:35Z","timestamp":1632282455000},"page":"6229","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["A Cost-Effective CNN-LSTM-Based Solution for Predicting Faulty Remote Water Meter Reading Devices in AMI Systems"],"prefix":"10.3390","volume":"21","author":[{"given":"Jaeseung","family":"Lee","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Woojin","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7172-5039","authenticated-orcid":false,"given":"Jibum","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kermany, E., Mazzawi, H., Baras, D., Naveh, Y., and Michaelis, H. (2013, January 11\u201314). Analysis of advanced meter infrastructure data of water consumption in apartment buildings. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA.","DOI":"10.1145\/2487575.2488193"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.watres.2016.05.016","article-title":"Burst detection in district metering areas using a data driven clustering algorithm","volume":"100","author":"Wu","year":"2016","journal-title":"Water Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"542","DOI":"10.1109\/TII.2016.2619180","article-title":"A fault detection method for hard disk drives based on mixture of Gaussians and non-parametric statistics","volume":"13","author":"Queiroz","year":"2017","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_4","unstructured":"Lu, S., Luo, B., Patel, T., Yao, Y., Tiwari, D., and Shi, W. (2020, January 24\u201327). Making disk failure predictions SMARTer!. Proceedings of the USENIX Conference on File and Storage Technologies (FAST 20), Santa Clara, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"939","DOI":"10.2478\/v10006-012-0070-1","article-title":"Data-driven models for fault detection using kernel PCA: A water distribution system case study","volume":"22","author":"Nowicki","year":"2012","journal-title":"Int. J. Appl. Math. Comput. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"13:1","DOI":"10.1147\/JRD.2010.2092173","article-title":"Analytics-driven asset management","volume":"55","author":"Hampapur","year":"2011","journal-title":"IBM J. Res. Dev."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"J. Mach. Learn."},{"key":"ref_8","first-page":"18","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lima, F.D., Amaral, G.M., Leite, L.G., Gomes, J.P., and Machado, J.D. (2017, January 2\u20135). Predicting failures in hard drives with LSTM networks. Proceedings of the 2017 Brazilian Conference on Intelligent Systems (BRACIS), Uberlandia, Brazil.","DOI":"10.1109\/BRACIS.2017.72"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sainath, T.N., Vinyals, O., Senior, A., and Sak, H.S. (2015, January 19\u201324). Convolutional, long short-term memory, fully connected deep neural networks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia.","DOI":"10.1109\/ICASSP.2015.7178838"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans. Knowk. Data Eng."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Davis, J., and Goadrich, M. (2006, January 25\u201329). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143874"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1197\/jamia.M1733","article-title":"Agreement, the F-measure, and reliability in information retrieval","volume":"12","author":"Hripcsak","year":"2005","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Boughorbel, S., Jarray, F., and El-Anbari, M. (2017). Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0177678"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_17","first-page":"783","article-title":"Machine learning methods for predicting failures in hard drives: A multiple instance application","volume":"6","author":"Murray","year":"2005","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1109\/TII.2013.2264060","article-title":"A two-step parametric method for failure prediction in hard disk drives","volume":"10","author":"Wang","year":"2014","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_19","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_20","unstructured":"Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., and Isard, M. (2016, January 2\u20134). Tensorflow: A system for large-scale machine learning. Proceedings of the USENIX Symposiumon Operating Systems Design and Implementation (OSDI), Savannah, GA, USA."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/18\/6229\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:01:01Z","timestamp":1760166061000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/18\/6229"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,17]]},"references-count":20,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["s21186229"],"URL":"https:\/\/doi.org\/10.3390\/s21186229","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,17]]}}}