{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:31:35Z","timestamp":1760239895712,"version":"build-2065373602"},"reference-count":55,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,1,11]],"date-time":"2019-01-11T00:00:00Z","timestamp":1547164800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Health is an individual\u2019s most precious asset and healthcare is one of the vehicles for preserving it. The Indian government\u2019s spend on healthcare system is relatively low (1.2% of GDP). Consequently, Secondary and Tertiary government healthcare centers in India (that are presumed to be of above average ratings) are always crowded. In Tertiary healthcare centers, like the All India Institute of Medical Science (AIIMS), patients are often unable to articulate their problems correctly to the healthcare center\u2019s reception staff, so that these patients to be directed to the correct healthcare department. In this paper, we propose a system that will scan prescriptions, referral letters and medical diagnostic reports of a patient, process the input using OCR (Optical Character Recognition) engines, coupled with image processing tools, to direct the patient to the most relevant department. We have implemented and tested parts of this system wherein a patient enters his symptoms and\/or provisional diagnosis; the system suggests a department based on this user input. Our system suggests the correct department 70.19% of the time. On further investigation, we found that one particular department of the hospital was over-represented. We eliminated the department from the data and performance of the system improved to 92.7%. Our system presently makes its suggestions using random forest algorithm that has been trained using two information repositories-symptoms and disease data, functional description of each medical department. It is our informed assumption that, once we have incorporated medicine information and diagnostics imaging data to train the system; and the complete medical history of the patient, performance of the system will improve further.<\/jats:p>","DOI":"10.3390\/info10010025","type":"journal-article","created":{"date-parts":[[2019,1,11]],"date-time":"2019-01-11T11:36:42Z","timestamp":1547206602000},"page":"25","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Identifying a Medical Department Based on Unstructured Data: A Big Data Application in Healthcare"],"prefix":"10.3390","volume":"10","author":[{"given":"Veena","family":"Bansal","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Bhilai, Raipur-492015, India"}]},{"given":"Abhishek","family":"Poddar","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Kanpur, Kanpur-208016, India"}]},{"given":"R.","family":"Ghosh-Roy","sequence":"additional","affiliation":[{"name":"IBM United Kingdom Limited, London SE1 9PZ, UK"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1813","DOI":"10.1016\/S0140-6736(16)31467-2","article-title":"Measuring the health-related Sustainable Development Goals in 188 countries: A baseline analysis from the Global Burden of Disease Study 2015","volume":"388","author":"Murray","year":"2016","journal-title":"Lancet"},{"key":"ref_2","unstructured":"(2016, April 12). World Bank Report. Available online: http:\/\/data.worldbank.org\/indicator\/SH.XPD.TOTL.ZS."},{"key":"ref_3","first-page":"31","article-title":"Knowledge Management in ESMDA: Expert System for Medical Diagnostic Assistance","volume":"10","author":"Naser","year":"2010","journal-title":"ICGST-AIML J."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1016\/j.ijmedinf.2011.08.001","article-title":"Artificial intelligence techniques applied to the development of a decision\u2014Support system for diagnosing celiac disease","volume":"80","author":"Hummel","year":"2011","journal-title":"Int. J. Med. Inf."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Rahaman, S., and Hossain, M.S. (2013, January 17\u201318). A belief rule based clinical decision support system to assess suspicion of heart failure from signs, symptoms and risk factors. Proceedings of the International Conference on Informatics, Electronics and Vision (ICIEV), Dhaka, Bangladesh.","DOI":"10.1109\/ICIEV.2013.6572668"},{"key":"ref_6","first-page":"934","article-title":"Data Mining Model to Predict Fosamax Adverse Events","volume":"3","author":"Ibrahim","year":"2014","journal-title":"Int. J. Comput. Inf. Technol."},{"key":"ref_7","unstructured":"Northwestern University, Centre for Genetic Medicine, and University of Maryland School of Medicine Institute for Genome Sciences Doid-Non-Classified.obo, Format-Version: 1.2, Available online: http:\/\/www.disease-ontology.org\/."},{"key":"ref_8","first-page":"17","article-title":"A Neuro Fuzzy Expert System for Heart Disease Diagnosis","volume":"2","author":"Ephzibah","year":"2012","journal-title":"Comput. Sci. Eng."},{"key":"ref_9","first-page":"84","article-title":"Improving the Prediction Rate of Diabetes using Fuzzy Expert System","volume":"7","author":"Jain","year":"2015","journal-title":"J. Inf. Technol. Comput. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.ijmedinf.2017.02.014","article-title":"A web-based clinical decision support system for gestational diabetes: Automatic diet prescription and detection of insulin needs","volume":"102","author":"Rigla","year":"2017","journal-title":"Int. J. Med. Inform."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1109\/51.473274","article-title":"An expert system for monitoring psychiatric treatment","volume":"15","author":"Goethe","year":"1995","journal-title":"IEEE Eng. Med. Biol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ijmedinf.2016.06.007","article-title":"Using machine learning to support healthcare professionals in making preauthorisation decisions","volume":"94","author":"Santana","year":"2016","journal-title":"Int. J. Med. Inform."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.knosys.2015.04.012","article-title":"Supporting healthcare management decisions via robust clustering of event logs","volume":"84","author":"Delias","year":"2015","journal-title":"Knowl.-Based Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1016\/S0957-4174(94)E0001-B","article-title":"An Expert System for Homeopathic Glaucoma Treatment (SEHO)","volume":"8","author":"Perez","year":"1995","journal-title":"Expert Syst. Appl."},{"key":"ref_15","unstructured":"McAndrew, P.D., Potash, D.L., Higgins, B., Wayand, J., and Held, J. (1996). Expert System for Providing Interactive Assistance in Solving Problems Such as Health Care Management. (5,517,405), U.S. Patent."},{"key":"ref_16","unstructured":"Davenport, T.H. (2014). Big Data at Work: Dispelling the Myths, Uncovering the Opportunities, Harvard Business Review Press."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Aruna Sri, P.S.G., and Anusha, M. (2016). Big Data Survey. Indones. J. Electr. Eng. Inform. IJEEI, 74\u201380.","DOI":"10.11591\/ijeei.v4i1.195"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Schultz, T. (2013). Turning healthcare challenges into big data opportunities: A use-case review across the pharmaceutical development lifecycle. Bull. Assoc. Inf. Sci. Technol.","DOI":"10.1002\/bult.2013.1720390508"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1016\/j.protcy.2014.10.175","article-title":"Towards a Big Data Framework for the Prevention and Control of HIV\/AIDS, TB and Silicosis in the Mining Industry","volume":"16","author":"Jokonya","year":"2014","journal-title":"Procedia Technol."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.procs.2015.04.069","article-title":"Predictive Methodology for Diabetic Data Analysis in Big Data","volume":"50","author":"Kumar","year":"2015","journal-title":"Procedia Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.pec.2005.10.006","article-title":"Patients using the Internet to obtain health information: How this affects the patient-health professional relationship","volume":"63","author":"McMullan","year":"2006","journal-title":"Patient Educ. Couns."},{"key":"ref_22","first-page":"280","article-title":"Managing patient demand: A qualitative study of appointment making in general practice","volume":"51","author":"Gallagher","year":"2001","journal-title":"Br. J. Gen. Pract."},{"key":"ref_23","unstructured":"Busemann, S., Schmeier, S., and Arens, R.G. (May, January 29). Message classification in the call center. Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, WA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term Weighting Approaches in Automatic Text Retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Inf. Process. Manag."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505282.505283","article-title":"Machine Learning in Automated Text Categorization","volume":"34","author":"Sebastiani","year":"2002","journal-title":"Comput. Surv."},{"key":"ref_26","unstructured":"Jing, L., Huang, H., and Shi, H. (2002, January 4\u20135). Improved feature selection approach TFIDF in text mining. Proceedings of the International Conference on Machine Learning and Cybernetics, Beijing, China."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/978-3-540-45219-5_7","article-title":"Sebastiani, Supervised Term Weighting for Automated Text Categorization","volume":"Volume 138","author":"Sirmakessis","year":"2004","journal-title":"Text Mining and Its Applications. Studies in Fuzziness and Soft Computing"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A survey of decision tree classifier methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1109\/101.8118","article-title":"Artificial neural networks","volume":"4","author":"Hopfield","year":"1988","journal-title":"IEEE Circuits Devices Mag."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_32","unstructured":"Gopal, M. (2018). Applied Machine Learning, Mc Graw Hill."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1016\/j.neuroimage.2010.12.066","article-title":"Semi-supervised pattern classification of medical images: Application to mild cognitive impairment (MCI)","volume":"55","author":"Filipovych","year":"2001","journal-title":"NeuroImage"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/j.chemolab.2006.08.012","article-title":"Using hard and soft models for classification of medical images","volume":"88","author":"Kucheryavski","year":"2007","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_35","unstructured":"Antonie, L., Zaiane, O.R., and Alexadru, C. (2001, January 26). Application of Data Mining Techniques for Medical Image Classification. Proceedings of the Second International Conference on Multimedia Data Mining in Conjunction with ACM SIGIKDD Conference, San Francisco, CA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1109\/3468.852443","article-title":"Integrating knowledge sources in Devanagari text recognition system","volume":"30","author":"Bansal","year":"2000","journal-title":"IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ciregan, D., Meier, U., and Schmidhuber, J. (2012, January 16\u201321). Multi-column deep neural networks for image classification. Proceedings of the 25th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA.","DOI":"10.1109\/CVPR.2012.6248110"},{"key":"ref_38","first-page":"208","article-title":"Handwritten Character Recognition Using Multiresolution Technique and Euclidean Distance Metric","volume":"3","author":"Patel","year":"2012","journal-title":"J. Signal Inf. Process."},{"key":"ref_39","unstructured":"Ullman, J., and Rajaraman, A. (2019, January 11). Mining of Massive Datasets. Available online: http:\/\/infolab.stanford.edu\/~ullman\/mmds\/book.pdf."},{"key":"ref_40","unstructured":"Lovins, J.B. (1968). Development of a Stemming Algorithm, Mechanical Translation and Computational Linguistics, Defense Technical Information Center. 11(1 and 2)."},{"key":"ref_41","first-page":"1157","article-title":"An Introduction to Feature Extraction","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Ripley, B.D. (1996). Pattern Recognition and Neural Networks, Cambridge University Press.","DOI":"10.1017\/CBO9780511812651"},{"key":"ref_43","first-page":"1","article-title":"Survey on 0 Classification Methods","volume":"19","author":"Aly","year":"2005","journal-title":"Neutral Netw."},{"key":"ref_44","unstructured":"Click, C., Malohlava, M., Candel, A., Roark, H., and Parmar, V. (2015). Gradient Boosting Machine with H2O, H2O.ai, Inc.. Available online: https:\/\/h2o-release.s3.amazonaws.com\/h2o\/master\/3157\/docs-website\/h2o-docs\/booklets\/GBM_Vignette.pdf."},{"key":"ref_45","unstructured":"Candel, A., Parmar, V., Ledell, E., and Arora, A. (2019, January 10). Deep Learning with H2O. Available online: http:\/\/h2o.ai\/resources."},{"key":"ref_46","unstructured":"(2019, January 11). H2O, (10 January 2016). Available online: http:\/\/docs.h2o.ai\/h2o\/latest-stable\/h2o-docs\/data-science\/drf.html."},{"key":"ref_47","unstructured":"Collier, A.B. (2019, January 11). Making Sense of Logarithmic Loss. Available online: https:\/\/datawookie.netlify.com\/blog\/2015\/12\/making-sense-of-logarithmic-loss\/."},{"key":"ref_48","unstructured":"Henderson, R. (2019, January 11). Available online: http:\/\/www.netdoctor.co.uk\/health-services\/nhs\/a4502\/a-to-z-of-hospital-departments\/."},{"key":"ref_49","unstructured":"(2016, January 10). Mayoclinic. Available online: http:\/\/www.mayoclinic.org\/departments-centers\/index."},{"key":"ref_50","unstructured":"Kalman, B.L., and Kwasny, S.C. (1992, January 7\u201311). Why tanh: Choosing a sigmoidal function. Proceedings of the International Joint Conference on Neural Networks, Baltimore, MD, USA."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1038\/35016072","article-title":"Digital Selection and analogue amplification coexist in a cortex-inspired silicon circuit","volume":"405","author":"Hahnloser","year":"2000","journal-title":"Nature"},{"key":"ref_52","unstructured":"Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courtville, A., and Bengio, Y. (2013, January 16\u201321). Maxout networks. Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_53","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"73","DOI":"10.7326\/M14-2423","article-title":"TRIPOD: A New Reporting Baseline for Developing and Interpreting Prediction Models","volume":"162","author":"Collins","year":"2015","journal-title":"Art. Ann. Internal Med."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1080\/01431160412331269698","article-title":"Random forest classifier for remote sensing classification","volume":"26","author":"Pal","year":"2005","journal-title":"Int. J. Remote Sens."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/1\/25\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:25:26Z","timestamp":1760185526000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/1\/25"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,11]]},"references-count":55,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["info10010025"],"URL":"https:\/\/doi.org\/10.3390\/info10010025","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2019,1,11]]}}}