{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T04:25:10Z","timestamp":1741753510048,"version":"3.38.0"},"reference-count":25,"publisher":"SAGE Publications","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDT"],"published-print":{"date-parts":[[2024,2,20]]},"abstract":"<jats:p>A promoter is a brief stretch of DNA (100\u20131,000 bp) where RNA polymerase starts to transcribe a gene. A DNA (Deoxyribonucleic Acid) base pair is a fundamental unit of DNA structure and represents the pairing of two complementary nucleotide bases within the DNA double helix. The four DNA nucleotide bases are adenine (A), thymine (T), cytosine (C), and guanine (G). DNA base pairs are the building blocks of the DNA molecule, and their complementary pairing is central to the storage and transmission of genetic information in all living organisms. Normally, a promoter is found at the 5\u2032 end of the transcription initiation site or immediately upstream. Numerous human disorders, particularly diabetes, cancer, and Huntington\u2019s disease, have been shown to have DNA promoter as their root cause. The scientific community has long been interested in learning crucial information about protein-coding genes. Finding the promoters is therefore the first step in finding genes in DNA sequences. The scientific world has always been attracted by the effort to glean crucial knowledge about protein-coding genes. Consequently, identifying promoters has emerged as an intriguing challenge that has caught the interest of numerous researchers in the field of bioinformatics. We proposed Gaussian Decision Boundary Estimation in machine learning models to detect transcription start sites (promoters) in the DNA sequences of a common bacteria, Escherichia coli. The best features are identified through a score-based function to select relevant nucleotides that are directly responsible for promoter recognition, in order maximise the models\u2019 performance. The Gaussian Decision Boundary Estimation based support-vector-machine model is trained with these features and finds the best hyperplane that separates the data into different classes. Throughout this study, promoter regions could be identified with high accuracy 99.9% which is better compared to other state of art algorithms. The comparison of machine learning classification models is another major emphasis of this paper in order to identify the model that most accurately predicts DNA sequence promoters. It provides analysis for further biological research as well as precision medicine.<\/jats:p>","DOI":"10.3233\/idt-230283","type":"journal-article","created":{"date-parts":[[2023,12,29]],"date-time":"2023-12-29T16:54:21Z","timestamp":1703868861000},"page":"613-631","source":"Crossref","is-referenced-by-count":0,"title":["Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation"],"prefix":"10.1177","volume":"18","author":[{"given":"Pradeepa","family":"S","sequence":"first","affiliation":[{"name":"Department of CSE, SASTRA Deemed University, Thanjavur, Tamil Nadu, India"}]},{"given":"Niveda","family":"Gaspar","sequence":"additional","affiliation":[{"name":"Department of CSE, SASTRA Deemed University, Thanjavur, Tamil Nadu, India"}]},{"given":"Vimal","family":"Shanmuganathan","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Data Science, Deep Learning Lab, Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India"}]},{"given":"Subbulakshmi","family":"P","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, VIT Chennai Campus, Tamil Nadu, India"}]},{"given":"Ahmed","family":"Alkhayyat","sequence":"additional","affiliation":[{"name":"Department of Computer Technical Engineering, College of Technical Engineering, The Islamic University, Najaf, Iraq"}]},{"given":"Kaliappan","family":"M","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Data Science, Deep Learning Lab, Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India"}]}],"member":"179","reference":[{"issue":"1","key":"10.3233\/IDT-230283_ref1","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1093\/bioinformatics\/btx579","article-title":"iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC","volume":"34","author":"Liu","year":"2018","journal-title":"Bioinformatics."},{"issue":"4356","key":"10.3233\/IDT-230283_ref2","doi-asserted-by":"publisher","first-page":"737","DOI":"10.1038\/171737a0","article-title":"Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid","volume":"171","author":"Watson","year":"1953","journal-title":"Nature."},{"issue":"6822","key":"10.3233\/IDT-230283_ref3","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1038\/35057000","article-title":"Guide to the draft human genome","volume":"409","author":"Wolfsberg","year":"2001","journal-title":"Nature."},{"issue":"5175","key":"10.3233\/IDT-230283_ref4","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1038\/221043a0","article-title":"Factor stimulating transcription by RNA polymerase","volume":"221","author":"Burgess","year":"1969","journal-title":"Nature."},{"issue":"5193","key":"10.3233\/IDT-230283_ref5","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1038\/222537a0","article-title":"Cyclic re-use of the RNA polymerase sigma factor","volume":"222","author":"Travers","year":"1969","journal-title":"Nature."},{"issue":"9","key":"10.3233\/IDT-230283_ref6","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1101\/gr.7.9.861","article-title":"Eukaryotic promoter recognition","volume":"7","author":"Fickett","year":"1997","journal-title":"Genome Research."},{"issue":"3-4","key":"10.3233\/IDT-230283_ref7","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/S0097-8485(99)00015-7","article-title":"The biology of eukaryotic promoter prediction\u00a0\u2013 a review","volume":"23","author":"Pedersen","year":"1999","journal-title":"Computers & Chemistry."},{"key":"10.3233\/IDT-230283_ref8","doi-asserted-by":"publisher","DOI":"10.1109\/INDICON49873.2020.9342360"},{"issue":"5","key":"10.3233\/IDT-230283_ref9","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1093\/bioinformatics\/btl670.","article-title":"Analysis of E. coli promoter recognition problem in dinucleotide feature space","volume":"23","author":"Rani","year":"2007","journal-title":"Bioinformatics."},{"issue":"1","key":"10.3233\/IDT-230283_ref10","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1093\/bib\/4.1.22","article-title":"The state of the art of mammalian promoter recognition","volume":"4","author":"Werner","year":"2003","journal-title":"Briefings in Bioinformatics."},{"issue":"3","key":"10.3233\/IDT-230283_ref11","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"Journal of molecular biology."},{"key":"10.3233\/IDT-230283_ref12","unstructured":"Towell GG, Shavlik JW, Noordewier MO. Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the eighth National conference on Artificial intelligence-Volume 2; 1990 Jul 29 (pp. 861-866)."},{"key":"10.3233\/IDT-230283_ref13","doi-asserted-by":"publisher","first-page":"41","DOI":"10.2165\/00822942-200403010-00006","article-title":"Neural networks for protein classification","author":"Weinert","year":"2004","journal-title":"Applied Bioinformatics."},{"issue":"2","key":"10.3233\/IDT-230283_ref14","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1093\/bioinformatics\/bti771","article-title":"Improved prediction of bacterial transcription start sites","volume":"22","author":"Gordon","year":"2006","journal-title":"Bioinformatics."},{"key":"10.3233\/IDT-230283_ref15","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85984-0_115"},{"issue":"2","key":"10.3233\/IDT-230283_ref16","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1101\/gr.6991408","article-title":"Generic eukaryotic core promoter prediction using structural features of DNA","volume":"18","author":"Abeel","year":"2008","journal-title":"Genome research."},{"key":"10.3233\/IDT-230283_ref17","doi-asserted-by":"crossref","unstructured":"Zhang YJ. A novel promoter prediction method inspiring by biological immune principles. In 2009 WRI Global Congress on Intelligent Systems 2009 May 19 (Vol. 1, pp. 569-573). IEEE.","DOI":"10.1109\/GCIS.2009.369"},{"issue":"1","key":"10.3233\/IDT-230283_ref18","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/J.ESWA.2007.","article-title":"A new method to forecast of Escherichia coli promoter gene sequences: Integrating feature selection and Fuzzy-AIRS classifier system","volume":"36","author":"Polat","year":"2009","journal-title":"Expert Systems with Applications."},{"issue":"11","key":"10.3233\/IDT-230283_ref20","first-page":"3000","article-title":"Promoter prediction in DNA sequences of escherichia coli using machine learning algorithms","volume":"8","author":"Anveshrithaa","year":"2019","journal-title":"International Journal of Scientific and Technology Research."},{"issue":"5","key":"10.3233\/IDT-230283_ref22","doi-asserted-by":"publisher","first-page":"1020","DOI":"10.2174\/2213275912666190417150421","article-title":"Machine Learning Based Predictive Action on Categorical Non-Sequential Data","volume":"13","author":"Kallimani","year":"2020","journal-title":"Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science)."},{"key":"10.3233\/IDT-230283_ref23","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-33383-0_5"},{"key":"10.3233\/IDT-230283_ref24","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"The Journal of Machine Learning Research."},{"issue":"4","key":"10.3233\/IDT-230283_ref25","doi-asserted-by":"publisher","first-page":"407","DOI":"10.4103\/aca.ACA_94_19","article-title":"Application of student\u2019s t-test, analysis of variance, and covariance","volume":"22","author":"Mishra","year":"2019","journal-title":"Annals of Cardiac Anaesthesia."},{"issue":"2","key":"10.3233\/IDT-230283_ref26","doi-asserted-by":"publisher","first-page":"12167","DOI":"10.1023\/A1009715923555","article-title":"A tutorial on support vector machines for pattern recognition","volume":"2","author":"Burges","year":"1998","journal-title":"Data Mining and Knowledge Discovery."},{"key":"10.3233\/IDT-230283_ref27","unstructured":"Han J, Pei J, Tong H. Data mining: concepts and techniques. Morgan Kaufmann. 2022 Jul 2."}],"container-title":["Intelligent Decision Technologies"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDT-230283","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T08:12:06Z","timestamp":1741680726000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDT-230283"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,20]]},"references-count":25,"journal-issue":{"issue":"1"},"URL":"https:\/\/doi.org\/10.3233\/idt-230283","relation":{},"ISSN":["1872-4981","1875-8843"],"issn-type":[{"type":"print","value":"1872-4981"},{"type":"electronic","value":"1875-8843"}],"subject":[],"published":{"date-parts":[[2024,2,20]]}}}