{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T07:28:57Z","timestamp":1768289337167,"version":"3.49.0"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T00:00:00Z","timestamp":1715817600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T00:00:00Z","timestamp":1715817600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Medical records are a valuable source for understanding patient health conditions. Doctors often use these records to assess health without solely depending on time-consuming and complex examinations. However, these records may not always be directly relevant to a patient\u2019s current health issue. For instance, information about common colds may not be relevant to a more specific health condition. While experienced doctors can effectively navigate through unnecessary details in medical records, this excess information presents a challenge for machine learning models in predicting diseases electronically. To address this, we have developed \u2018al-BERT\u2019, a new disease prediction model that leverages the BERT framework. This model is designed to identify crucial information from medical records and use it to predict diseases. \u2018al-BERT\u2019 operates on the principle that the structure of sentences in diagnostic records is similar to regular linguistic patterns. However, just as stuttering in speech can introduce \u2018noise\u2019 or irrelevant information, similar issues can arise in written records, complicating model training. To overcome this, \u2018al-BERT\u2019 incorporates a semi-supervised layer that filters out irrelevant data from patient visitation records. This process aims to refine the data, resulting in more reliable indicators for disease correlations\u00a0and enhancing the model\u2019s predictive accuracy and utility in medical diagnostics.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Method<\/jats:title>\n                <jats:p>To discern noise diseases within patient records, especially those resembling influenza-like illnesses, our approach employs a customized semi-supervised learning algorithm equipped with a focused attention mechanism. This mechanism is specifically calibrated to enhance the model\u2019s sensitivity to chronic conditions while concurrently distilling salient features from patient records, thereby augmenting the predictive accuracy and utility of the model in clinical settings. We evaluate the performance of al-BERT using real-world health insurance data provided by Taiwan\u2019s National Health Insurance.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Result<\/jats:title>\n                <jats:p>In our study, we evaluated our model against two others: one based on BERT that uses complete disease records, and another variant that includes extra filtering techniques. Our findings show that models incorporating filtering mechanisms typically perform better than those using the entire, unfiltered dataset. Our approach resulted in improved outcomes across several key measures: AUC-ROC (an indicator of a model\u2019s ability to distinguish between classes), precision (the accuracy of positive predictions), recall (the model\u2019s ability to find all relevant cases), and overall accuracy. Most notably, our model showed a 15% improvement in recall compared to the current best-performing method in the field of disease prediction.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>The conducted ablation study affirms the advantages of our attention mechanism and underscores the crucial role of the selection module within al-BERT.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-024-02528-w","type":"journal-article","created":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T05:01:32Z","timestamp":1715835692000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["al-BERT: a semi-supervised denoising technique for disease prediction"],"prefix":"10.1186","volume":"24","author":[{"given":"Yun-Chien","family":"Tseng","sequence":"first","affiliation":[]},{"given":"Chuan-Wei","family":"Kuo","sequence":"additional","affiliation":[]},{"given":"Wen-Chih","family":"Peng","sequence":"additional","affiliation":[]},{"given":"Chih-Chieh","family":"Hung","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,16]]},"reference":[{"issue":"1","key":"2528_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-021-00455-y","volume":"4","author":"L Rasmy","year":"2021","unstructured":"Rasmy L, Xiang Y, Xie Z, Tao C, Zhi D. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Digit Med. 2021;4(1):1\u201313.","journal-title":"NPJ Digit Med."},{"issue":"10","key":"2528_CR2","doi-asserted-by":"publisher","first-page":"108103","DOI":"10.1103\/PhysRevLett.90.108103","volume":"90","author":"ACC Yang","year":"2003","unstructured":"Yang ACC, Hseu SS, Yien HW, Goldberger AL, Peng CK. Linguistic analysis of the human heartbeat using frequency and rank order statistics. Phys Rev Lett. 2003;90(10):108103.","journal-title":"Phys Rev Lett."},{"issue":"2","key":"2528_CR3","doi-asserted-by":"publisher","first-page":"207","DOI":"10.3201\/eid1302.060557","volume":"13","author":"N Marsden-Haug","year":"2007","unstructured":"Marsden-Haug N, Foster VB, Gould PL, Elbert E, Wang H, Pavlin JA. Code-based syndromic surveillance for influenzalike illness by International Classification of Diseases, Ninth Revision. Emerg Infect Dis. 2007;13(2):207.","journal-title":"Emerg Infect Dis."},{"key":"2528_CR4","doi-asserted-by":"publisher","first-page":"103256","DOI":"10.1016\/j.jbi.2019.103256","volume":"97","author":"A Ashfaq","year":"2019","unstructured":"Ashfaq A, Sant\u2019Anna A, Lingman M, Nowaczyk S. Readmission prediction using deep learning on electronic health records. J Biomed Inform. 2019;97:103256.","journal-title":"J Biomed Inform."},{"issue":"15","key":"2528_CR5","doi-asserted-by":"publisher","first-page":"1688","DOI":"10.1001\/jama.2011.1515","volume":"306","author":"D Kansagara","year":"2011","unstructured":"Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688\u201398.","journal-title":"JAMA."},{"issue":"2","key":"2528_CR6","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1177\/0951484817696212","volume":"30","author":"A Awad","year":"2017","unstructured":"Awad A, Bader-El-Den M, McNicholas J. Patient length of stay and mortality prediction: a survey. Health Serv Manag Res. 2017;30(2):105\u201320.","journal-title":"Health Serv Manag Res."},{"key":"2528_CR7","doi-asserted-by":"crossref","unstructured":"Dan T, Li Y, Zhu Z, Chen X, Quan W, Hu Y, Tao G, Zhu L, Zhu J, Jin Y, Li L. Machine learning to predict ICU admission, ICU mortality and survivors\u2019 length of stay among COVID-19 patients: toward optimal allocation of ICU resources. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE; 2021. p. 555\u201361.","DOI":"10.2139\/ssrn.3631305"},{"key":"2528_CR8","doi-asserted-by":"crossref","unstructured":"Shang J, Ma T, Xiao C, Sun J. Pre-training of graph augmented transformers for medication recommendation. In: 28th International Joint Conference on Artificial Intelligence, IJCAI. International Joint Conferences on Artificial Intelligence; 2019. p. 5953\u20139.","DOI":"10.24963\/ijcai.2019\/825"},{"key":"2528_CR9","doi-asserted-by":"crossref","unstructured":"Yang C, Xiao C, Glass L, Sun J. Change matters: Medication change prediction with recurrent residual networks. In: 30th International Joint Conference on Artificial Intelligence, IJCAI. International Joint Conferences on Artificial Intelligence; 2021. p. 3728\u201334.","DOI":"10.24963\/ijcai.2021\/513"},{"key":"2528_CR10","doi-asserted-by":"crossref","unstructured":"Fu T, Xiao C, Qian C, Glass LM, Sun J. Probabilistic and dynamic molecule-disease interaction modeling for drug discovery. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021. p. 404\u201314.","DOI":"10.1145\/3447548.3467286"},{"key":"2528_CR11","doi-asserted-by":"crossref","unstructured":"Lin X, Quan Z, Wang ZJ, Ma T, Zeng X. KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction. In: IJCAI, vol. 380. 2020. p. 2739\u201345.","DOI":"10.24963\/ijcai.2020\/380"},{"key":"2528_CR12","doi-asserted-by":"crossref","unstructured":"Huang K, Xiao C, Hoang T, Glass L, Sun J. Caster: Predicting drug interactions with chemical substructure representation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, No.\u00a001. 2020. p. 702\u20139.","DOI":"10.1609\/aaai.v34i01.5412"},{"key":"2528_CR13","doi-asserted-by":"crossref","unstructured":"Cui L, Biswal S, Glass LM, Lever G, Sun J, Xiao C. CONAN: complementary pattern augmentation for rare disease detection. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, No. 01. 2020. p. 614\u201321.","DOI":"10.1609\/aaai.v34i01.5401"},{"key":"2528_CR14","unstructured":"Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor ai: Predicting clinical events via recurrent neural networks. In: Machine learning for healthcare conference. PMLR; 2016. p. 301\u201318."},{"key":"2528_CR15","unstructured":"Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Adv Neural Inf Process Syst. 2016;29. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2016."},{"key":"2528_CR16","doi-asserted-by":"crossref","unstructured":"Ma F, Chitta R, Zhou J, You Q, Sun T, Gao J. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017. p. 1903\u201311.","DOI":"10.1145\/3097983.3098088"},{"issue":"9","key":"2528_CR17","doi-asserted-by":"publisher","first-page":"2207","DOI":"10.1109\/TMI.2022.3159264","volume":"41","author":"S Zheng","year":"2022","unstructured":"Zheng S, Zhu Z, Liu Z, Guo Z, Liu Y, Yang Y, et al. Multi-modal graph learning for disease prediction. IEEE Trans Med Imaging. 2022;41(9):2207\u201316.","journal-title":"IEEE Trans Med Imaging."},{"key":"2528_CR18","doi-asserted-by":"crossref","unstructured":"Cui S, Luo J, Ye M, Wang J, Wang T, Ma F. Medskim: Denoised health risk prediction via skimming medical claims data. In: 2022 IEEE International Conference on Data Mining (ICDM). IEEE; 2022. p. 81\u201390.","DOI":"10.1109\/ICDM54844.2022.00018"},{"key":"2528_CR19","doi-asserted-by":"crossref","unstructured":"Choi E, Xu Z, Li Y, Dusenberry M, Flores G, Xue E, Dai A. Learning the graphical structure of electronic health records with graph convolutional transformer. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, No. 01. 2020. p. 606\u201313.","DOI":"10.1609\/aaai.v34i01.5400"},{"key":"2528_CR20","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30. https:\/\/papers.nips.cc\/paper_files\/paper\/2017."},{"key":"2528_CR21","doi-asserted-by":"crossref","unstructured":"Choi E, Bahadori MT, Searles E, Coffey C, Thompson M, Bost J, Tejedor-Sojo J, Sun J. Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016. p. 1495\u2013504.","DOI":"10.1145\/2939672.2939823"},{"key":"2528_CR22","unstructured":"Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013. arXiv preprint arXiv:1301.3781."},{"key":"2528_CR23","doi-asserted-by":"crossref","unstructured":"Beam AL, Kompa B, Schmaltz A, Fried I, Weber G, Palmer N, Shi X, Cai T, Kohane IS. Clinical concept embeddings learned from massive sources of multimodal medical data. In: Pacific Symposium on Biocomputing 2020. 2019. p. 295\u2013306.","DOI":"10.1142\/9789811215636_0027"},{"issue":"1","key":"2528_CR24","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1186\/s12859-022-04751-6","volume":"23","author":"S Raza","year":"2022","unstructured":"Raza S, Schwartz B, Rosella LC. CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice. BMC Bioinformatics. 2022;23(1):210.","journal-title":"BMC Bioinformatics."},{"issue":"4","key":"2528_CR25","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1370\/afm.983","volume":"7","author":"JM Valderas","year":"2009","unstructured":"Valderas JM, Starfield B, Sibbald B, Salisbury C, Roland M. Defining comorbidity: implications for understanding health and health services. Ann Fam Med. 2009;7(4):357\u201363.","journal-title":"Ann Fam Med."},{"issue":"3\u20134","key":"2528_CR26","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1016\/S0378-4371(03)00622-8","volume":"329","author":"ACC Yang","year":"2003","unstructured":"Yang ACC, Peng CK, Yien HW, Goldberger AL. Information categorization approach to literary authorship disputes. Phys A Stat Mech Appl. 2003;329(3\u20134):473\u201383.","journal-title":"Phys A Stat Mech Appl."},{"key":"2528_CR27","unstructured":"Yang ACC. Comorbidity Analysis Platform. 2021. https:\/\/dmc.nycu.edu.tw\/comorbidity\/index.php."},{"key":"2528_CR28","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. 2018. arXiv preprint arXiv:1810.04805."},{"key":"2528_CR29","doi-asserted-by":"crossref","unstructured":"Johnson AE, Pollard TJ, Shen L, Lehman LwH, Feng M, Ghassemi M, et\u00a0al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1\u20139.","DOI":"10.1038\/sdata.2016.35"},{"key":"2528_CR30","doi-asserted-by":"crossref","unstructured":"Liu WC, Hung CC, Peng WC. Exploring Graph Neural Network in Administrative Medical Dataset. In: Proceedings - 2022 International Conference on Technologies and Applications of Artificial Intelligence, TAAI. 2022. p. 107\u201312.","DOI":"10.1109\/TAAI57707.2022.00028"},{"issue":"8","key":"2528_CR31","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735\u201380.","journal-title":"Neural Comput."},{"issue":"1","key":"2528_CR32","doi-asserted-by":"publisher","first-page":"5979","DOI":"10.1038\/s41598-022-09954-8","volume":"12","author":"SA Hicks","year":"2022","unstructured":"Hicks SA, Str\u00fcmke I, Thambawita V, Hammou M, Riegler MA, Halvorsen P, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep. 2022;12(1):5979.","journal-title":"Sci Rep."},{"key":"2528_CR33","doi-asserted-by":"publisher","unstructured":"Vig J. A multiscale visualization of attention in the transformer model. Florence: Association for Computational Linguistics; 2019. https:\/\/doi.org\/10.18653\/v1\/P19-3007. https:\/\/www.aclweb.org\/anthology\/P19-3007.","DOI":"10.18653\/v1\/P19-3007"},{"key":"2528_CR34","doi-asserted-by":"publisher","DOI":"10.1089\/wound.2013.0478","volume-title":"ICD-9-CM to ICD-10-CM codes: what? why? how?","author":"DJ Cartwright","year":"2013","unstructured":"Cartwright DJ. ICD-9-CM to ICD-10-CM codes: what? why? how? New Rochelle: Mary Ann Liebert, Inc.; 2013."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-024-02528-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-024-02528-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-024-02528-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T05:02:36Z","timestamp":1715835756000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-024-02528-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,16]]},"references-count":34,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["2528"],"URL":"https:\/\/doi.org\/10.1186\/s12911-024-02528-w","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,16]]},"assertion":[{"value":"31 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 May 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"There are two dataset in our experiments. NHI-CD () belongs to National Health Insurance Research Database, Taiwan. This study is based in part on data from the National Health Insurance Research Database provided by the National Health Insurance Administration, Ministry of Health and Welfare and managed by National Health Research Institutes in Taiwan. The interpretation and conclusions contained herein do not represent those of National Health Insurance Administration, Ministry of Health and Welfare or National Health Research Institutes. MIMIC-III () is available on PhysioNet, only credentialed users who sign the DUA can access the files. For more information: Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. .","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"127"}}