{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:32:30Z","timestamp":1772166750461,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,1,29]],"date-time":"2021-01-29T00:00:00Z","timestamp":1611878400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,1,29]],"date-time":"2021-01-29T00:00:00Z","timestamp":1611878400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005981","name":"Direktorat Jenderal Pendidikan Tinggi","doi-asserted-by":"publisher","award":["07.1\/LP\/UG\/III\/2020 March,26 2020"],"award-info":[{"award-number":["07.1\/LP\/UG\/III\/2020 March,26 2020"]}],"id":[{"id":"10.13039\/501100005981","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Findings<\/jats:title>\n                    <jats:p>The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences. This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method. The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model. In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s40537-021-00413-1","type":"journal-article","created":{"date-parts":[[2021,1,29]],"date-time":"2021-01-29T12:03:17Z","timestamp":1611921797000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation"],"prefix":"10.1186","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5058-4580","authenticated-orcid":false,"family":"Rianto","sequence":"first","affiliation":[]},{"given":"Achmad Benny","family":"Mutiara","sequence":"additional","affiliation":[]},{"given":"Eri Prasetyo","family":"Wibowo","sequence":"additional","affiliation":[]},{"given":"Paulus Insap","family":"Santosa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,29]]},"reference":[{"key":"413_CR1","unstructured":"Kemdikbud: Kamus Besar Bahasa Indonesia (2016). https:\/\/kbbi.kemdikbud.go.id\/ Accessed 2020-03-20."},{"key":"413_CR2","doi-asserted-by":"publisher","unstructured":"Utami E, Hartanto AD, Adi S, Setya Putra RB, Raharjo S. Formal and Non-Formal Indonesian Word Usage Frequency in Twitter Profile Using Non-Formal Affix Rule. In: 2019 1st International Conference on Cybernetics and Intelligent System (ICORIS), 2019; vol. 1, pp. 173\u2013176. https:\/\/doi.org\/10.1109\/ICORIS.2019.8874908.","DOI":"10.1109\/ICORIS.2019.8874908"},{"key":"413_CR3","doi-asserted-by":"publisher","unstructured":"Putra RBS, Utami E. Non-formal affixed word stemming in Indonesian language. In: 2018 International Conference on Information and Communications Technology (ICOIACT), 2018; pp. 531\u2013536. IEEE, Yogyakarta. https:\/\/doi.org\/10.1109\/ICOIACT.2018.8350735.","DOI":"10.1109\/ICOIACT.2018.8350735"},{"key":"413_CR4","doi-asserted-by":"publisher","unstructured":"Hidayatullah AF. Language tweet characteristics of Indonesian citizens. In: 2015 International Conference on Science and Technology (TICST) 2015. https:\/\/doi.org\/10.1109\/TICST.2015.7369393.","DOI":"10.1109\/TICST.2015.7369393"},{"key":"413_CR5","doi-asserted-by":"publisher","unstructured":"Setya Putra RB, Utami E, Raharjo S. Accuracy Measurement on Indonesian Non-formal Affixed Word Stemming With Levenhstein. In: 2019 International Conference on Information and Communications Technology (ICOIACT), 2019; pp. 486\u2013490. https:\/\/doi.org\/10.1109\/ICOIACT46704.2019.8938423.","DOI":"10.1109\/ICOIACT46704.2019.8938423"},{"key":"413_CR6","unstructured":"Waridah E. Pedoman Umum Ejaan Bahasa Indonesia. 5. RuangKata, Bandung, Indonesia, 2019."},{"key":"413_CR7","doi-asserted-by":"crossref","unstructured":"Nugraheni E. Indonesian twitter data pre-processing for the emotion recognition. In: 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2019; p. 58\u201363.","DOI":"10.1109\/ISRITI48646.2019.9034653"},{"key":"413_CR8","doi-asserted-by":"crossref","unstructured":"Patel H, Patel B. Stemmatizer\u2014Stemmer-based Lemmatizer for Gujarati Text. In: Rathore VS, Worring M, Mishra DK, Joshi A, Maheshwari S, editors. Emerging Trends in Expert Applications and Security. Springer, Singapore, 2019; vol. 841, p. 667\u2013674. Series Title: Advances in Intelligent Systems and Computing.","DOI":"10.1007\/978-981-13-2285-3_78"},{"key":"413_CR9","doi-asserted-by":"publisher","unstructured":"Kong X, Yang J. Indonesian Corpus Constructing and Text Processing for Speech Synthesis. In: 2018 International Conference on Asian Language Processing (IALP), 2018; p. 193\u2013196. IEEE, Bandung, Indonesia. https:\/\/doi.org\/10.1109\/IALP.2018.8629122.","DOI":"10.1109\/IALP.2018.8629122"},{"key":"413_CR10","doi-asserted-by":"publisher","unstructured":"Yuwana RS, Suryawati E, Pardede HF. On Empirical Evaluation of Deep Architectures for Indonesian POS Tagging Problem. In: 2018 International Conference on Computer, Control, Informatics and Its Applications (IC3INA), 2018; p. 204\u2013208. https:\/\/doi.org\/10.1109\/IC3INA.2018.8629531","DOI":"10.1109\/IC3INA.2018.8629531"},{"key":"413_CR11","doi-asserted-by":"publisher","unstructured":"Yuwana RS, Yuliani AR, Pardede HF. On part of speech tagger for Indonesian language. In: 2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2017; p. 369\u2013372. https:\/\/doi.org\/10.1109\/ICITISEE.2017.8285530.","DOI":"10.1109\/ICITISEE.2017.8285530"},{"key":"413_CR12","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1016\/j.softx.2019.01.011","volume":"9","author":"DA Kwary","year":"2019","unstructured":"Kwary DA. A corpus platform of Indonesian academic language. SoftwareX. 2019;9:102\u20136. https:\/\/doi.org\/10.1016\/j.softx.2019.01.011.","journal-title":"SoftwareX"},{"key":"413_CR13","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1016\/j.procs.2016.04.054","volume":"81","author":"MA Jiwanggi","year":"2016","unstructured":"Jiwanggi MA, Adriani M. Topic Summarization of Microblog Document in Bahasa Indonesia using the Phrase Reinforcement Algorithm. Procedia Computer Sci. 2016;81:229\u201336. https:\/\/doi.org\/10.1016\/j.procs.2016.04.054.","journal-title":"Procedia Computer Sci"},{"key":"413_CR14","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1016\/j.procs.2018.08.213","volume":"135","author":"AAS Gunawan","year":"2018","unstructured":"Gunawan AAS, Mulyono PR, Budiharto W. Indonesian question answering system for solving arithmetic word problems on intelligent humanoid robot. Procedia Computer Sci. 2018;135:719\u201326. https:\/\/doi.org\/10.1016\/j.procs.2018.08.213.","journal-title":"Procedia Computer Sci"},{"key":"413_CR15","doi-asserted-by":"crossref","unstructured":"Ario Utomo MR, Sibaroni Y. Text classification of british english and american english using support vector machine. In: 2019 7th International Conference on Information and Communication Technology (ICoICT), 2019; p. 1\u20136.","DOI":"10.1109\/ICoICT.2019.8835256"},{"key":"413_CR16","doi-asserted-by":"crossref","unstructured":"Ouyang J. Research on english text information filtering algorithm based on svm. In: 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), 2020; p. 1001\u20131004.","DOI":"10.1109\/ICPICS50287.2020.9202016"},{"key":"413_CR17","volume-title":"Data Classification Algorithms and Application","author":"CC Aggarwal","year":"2015","unstructured":"Aggarwal CC. Data Classification Algorithms and Application. : CRC Press; 2015."},{"key":"413_CR18","doi-asserted-by":"publisher","unstructured":"Sebastian D, Nugraha KA. Text Normalization for Indonesian Abbreviated Word Using Crowdsourcing Method. In: 2019 International Conference on Information and Communications Technology (ICOIACT), 2019; p. 529\u2013532. IEEE, Yogyakarta, Indonesia. https:\/\/doi.org\/10.1109\/ICOIACT46704.2019.8938463.","DOI":"10.1109\/ICOIACT46704.2019.8938463"},{"key":"413_CR19","doi-asserted-by":"publisher","first-page":"553","DOI":"10.1016\/j.procs.2019.11.155","volume":"161","author":"D Gunawan","year":"2019","unstructured":"Gunawan D, Saniyah Z, Hizriadi A. Normalization of abbreviation and acronym on Microtext in Bahasa Indonesia by using dictionary-based and longest common subsequence (LCS). Procedia Computer Sci. 2019;161:553\u20139. https:\/\/doi.org\/10.1016\/j.procs.2019.11.155.","journal-title":"Procedia Computer Sci"},{"key":"413_CR20","doi-asserted-by":"publisher","unstructured":"Agarwal A, Gupta B, Bhatt G, Mittal A. Construction of a semi-automated model for faq retrieval via short message service. In: Proceedings of the 7th Forum for Information Retrieval Evaluation. FIRE \u201915, 2015; pp. 35\u201338. Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/2838706.2838717.","DOI":"10.1145\/2838706.2838717"},{"key":"413_CR21","doi-asserted-by":"publisher","first-page":"66536","DOI":"10.1109\/ACCESS.2019.2917159","volume":"7","author":"A Agarwal","year":"2019","unstructured":"Agarwal A, Toshniwal D. Face off: Travel habits, road conditions and traffic city characteristics bared using twitter. IEEE Access. 2019;7:66536\u201352.","journal-title":"IEEE Access"},{"key":"413_CR22","doi-asserted-by":"crossref","unstructured":"Agarwal A, Toshniwal D. Identifying leadership characteristics from social media data during natural hazards using personality traits. Scientific Reports, 2020.","DOI":"10.1038\/s41598-020-59086-0"},{"key":"413_CR23","doi-asserted-by":"publisher","unstructured":"Lin N, Fu S, Jiang S, Chen C, Xiao L, Zhu G. Learning Indonesian Frequently Used Vocabulary from Large-Scale News. In: 2018 International Conference on Asian Language Processing (IALP), 2018; p. 234\u2013239. https:\/\/doi.org\/10.1109\/IALP.2018.8629227.","DOI":"10.1109\/IALP.2018.8629227"},{"key":"413_CR24","doi-asserted-by":"publisher","unstructured":"Nugraheni E. Indonesian Twitter Data Pre-processing for the Emotion Recognition. In: 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2019; p. 58\u201363. https:\/\/doi.org\/10.1109\/ISRITI48646.2019.9034653.","DOI":"10.1109\/ISRITI48646.2019.9034653"},{"key":"413_CR25","doi-asserted-by":"publisher","unstructured":"Hasanah U, Astuti T, Wahyudi R, Rifai Z, Pambudi RA. An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian. In: 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), 2018; p. 230\u2013234. https:\/\/doi.org\/10.1109\/ICITISEE.2018.8720957.","DOI":"10.1109\/ICITISEE.2018.8720957"},{"key":"413_CR26","doi-asserted-by":"publisher","unstructured":"Rahman T, Agustin FEM, Rozy NF. Normalization of Unstructured Indonesian Tweet Text For Presidential Candidates Sentiment Analysis. In: 2019 7th International Conference on Cyber and IT Service Management (CITSM), 2019; p. 1\u20136. IEEE, Jakarta, Indonesia. https:\/\/doi.org\/10.1109\/CITSM47753.2019.8965324. https:\/\/ieeexplore.ieee.org\/document\/8965324\/ Accessed 2020-05-23.","DOI":"10.1109\/CITSM47753.2019.8965324"},{"key":"413_CR27","doi-asserted-by":"publisher","unstructured":"Neforawati I, Pratama MO, Satyawan W. Indonesian Lyrics Classification using Feature Level Fusion. In: 2019 2nd International Conference of Computer and Informatics Engineering (IC2IE), 2019; p. 6\u201311. https:\/\/doi.org\/10.1109\/IC2IE47452.2019.8940826.","DOI":"10.1109\/IC2IE47452.2019.8940826"},{"key":"413_CR28","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1016\/j.knosys.2019.05.025","volume":"180","author":"J Singh","year":"2019","unstructured":"Singh J, Gupta V. A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics. Knowledge-Based Systems. 2019;180:147\u201362. https:\/\/doi.org\/10.1016\/j.knosys.2019.05.025.","journal-title":"Knowledge-Based Systems"},{"key":"413_CR29","unstructured":"Rianto Mutiara AB, Wibowo EP, Santosa PI. Improving stemming techniques for non-formal indonesian sentences using incorbiz. ICIC Express Letter (On Press) 2021."},{"key":"413_CR30","doi-asserted-by":"publisher","unstructured":"Obaid HS, Dheyab SA, Sabry SS. The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning. In: 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON),2019; p. 279\u2013283. https:\/\/doi.org\/10.1109\/IEMECONX.2019.8877011.","DOI":"10.1109\/IEMECONX.2019.8877011"},{"issue":"9","key":"413_CR31","doi-asserted-by":"publisher","first-page":"2080","DOI":"10.1080\/03610926.2019.1568485","volume":"49","author":"G Zeng","year":"2020","unstructured":"Zeng G. On the confusion matrix in credit scoring and its analytical properties. Communications in Statistics - Theory and Methods. 2020;49(9):2080\u201393. https:\/\/doi.org\/10.1080\/03610926.2019.1568485.","journal-title":"Communications in Statistics - Theory and Methods"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00413-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-021-00413-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00413-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,29]],"date-time":"2021-01-29T12:04:10Z","timestamp":1611921850000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00413-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,29]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["413"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00413-1","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-41431\/v3","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-41431\/v1","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-41431\/v2","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,29]]},"assertion":[{"value":"30 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"26"}}