{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T00:47:07Z","timestamp":1767833227678,"version":"3.49.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,11,30]],"date-time":"2021-11-30T00:00:00Z","timestamp":1638230400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM\/IMS Trans. Data Sci."],"published-print":{"date-parts":[[2021,11,30]]},"abstract":"<jats:p>\n            Class imbalance can adversely affect the performance of machine learning for prediction and classification. One approach to address the class imbalance problem is synthetic minority oversampling. Oversampling approaches can be broadly categorized as either being structural or statistical in nature. Structural approaches generally have the advantage of identifying and oversampling those minority data points that best facilitate class separation, while statistical approaches model the underlying distribution from which the minority samples can be drawn. In this article, we formulate a distance-based approach that generates samples by both modeling the underlying minority class distribution and by geometrically considering those borderline samples entangled in the majority class. We demonstrate the efficacy of our approach operating on the Class-Imbalance data set from UCI by comparing its mean accuracy, AUC and\n            <jats:italic>F<\/jats:italic>\n            <jats:sub>1<\/jats:sub>\n            -score performance against both statistical and structural synthetic minority oversampling methods.\n          <\/jats:p>","DOI":"10.1145\/3510834","type":"journal-article","created":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T22:52:48Z","timestamp":1644015168000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Distance-based Probabilistic Data Augmentation for Synthetic Minority Oversampling"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7980-1104","authenticated-orcid":false,"given":"Joel","family":"Goodman","sequence":"first","affiliation":[{"name":"US Naval Research Laboratory, Washington, DC"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7287-4415","authenticated-orcid":false,"given":"Sharham","family":"Sarkani","sequence":"additional","affiliation":[{"name":"The George Washington University, Washington, DC"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4584-4018","authenticated-orcid":false,"given":"Thomas","family":"Mazzuchi","sequence":"additional","affiliation":[{"name":"The George Washington University, Washington, DC"}]}],"member":"320","published-online":{"date-parts":[[2022,5,24]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2878400"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.232"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007735"},{"key":"e_1_3_1_5_2","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop Christopher M.","year":"2007","unstructured":"Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning. Springer Information Science and Statistics Series."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01307-2_43"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.37"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/1622407.1622416"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2016.06.009"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2324567"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2015.06.005"},{"key":"e_1_3_1_13_2","first-page":"1","article-title":"Maximum likelihood from incomplete data via the EM algorithm","author":"Dempster Arthur P.","year":"1977","unstructured":"Arthur P. Dempster, Nan M. Laird, and Donald.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1 (1977), 1\u201338.","journal-title":"Journal of the Royal Statistical Society"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2015.05.008"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009700419189"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/41.8.578"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2014.02.006"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICACCI.2017.8125820"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007736"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2008.4633969"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.12.035"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/11538059_91"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"e_1_3_1_25_2","unstructured":"Robert C. Holte Liane E. Acker and Bruce W. Porter. 1989. Concept learning and the problem of small disjuncts. In Proceedings of the 11th International Joint Conference on Artificial Intelligence . 813\u2013818."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/WCSE.2009.756"},{"key":"e_1_3_1_27_2","first-page":"1","volume-title":"Proceedings of the 2017 International Conference on Information, Communication, Instrumentation and Control.","author":"Jain Anju","year":"2017","unstructured":"Anju Jain, Saroj Ratnoo, and Dinesh Kumar. 2017. Addressing class imbalance problem in medical diagnosis: A genetic algorithm approach. In Proceedings of the 2017 International Conference on Information, Communication, Instrumentation and Control.1\u20138."},{"key":"e_1_3_1_28_2","volume-title":"Data Science at the Command Line: Facing the Future with Time-Tested Tools","author":"Janssens Jeroen","year":"2014","unstructured":"Jeroen Janssens. 2014. Data Science at the Command Line: Facing the Future with Time-Tested Tools. O\u2019Reilly Media."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-17534-3_19"},{"key":"e_1_3_1_30_2","unstructured":"J. Alcal\u00e1-Fdez A. Fernandez J. Luengo J. Derrac S. Garc\u00eda L. S\u00e1nchez and F. Herrera. 2020. Supervised Classification Library. (2020). Retrieved 2020-7-11 from https:\/\/sci2s.ugr.es\/keel\/datasets.php."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2991231"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.06.100"},{"issue":"5","key":"e_1_3_1_33_2","article-title":"A survey on addressing high-class imbalance in big data","volume":"42","author":"Leevy Joffrey L.","year":"2018","unstructured":"Joffrey L. Leevy, Taghi M. Khoshgoftaar, Richard A. Bauder, and Naeem Seliya. 2018. A survey on addressing high-class imbalance in big data. Springer Journal of Big Data 42, 5 (2018), 1\u201330.","journal-title":"Springer Journal of Big Data"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-019-0938-1"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313778"},{"key":"e_1_3_1_36_2","volume-title":"Aspects of Multivariate Statistical Theory","author":"Muirhead Robb J.","year":"2005","unstructured":"Robb J. Muirhead. 2005. Aspects of Multivariate Statistical Theory. Wiley, New York."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2015.10.031"},{"key":"e_1_3_1_38_2","first-page":"56","article-title":"On estimation of the probability density function and mode","author":"Parzen Emmanuel","year":"1962","unstructured":"Emmanuel Parzen. 1962. On estimation of the probability density function and mode. The Annals of Mathematical Statistics 3, 33 (1962), 56\u201365.","journal-title":"The Annals of Mathematical Statistics"},{"key":"e_1_3_1_39_2","first-page":"554","article-title":"The infinite gaussian mixture model","author":"Rasmussen Carl E.","year":"2000","unstructured":"Carl E. Rasmussen. 2000. The infinite gaussian mixture model. In Proceedings of the Advances in Neural Information Processing Systems.554\u2013600.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10840-7_8"},{"key":"e_1_3_1_41_2","unstructured":"Saravanan Jaichandaran. 2020. Standard Classification Library Banana Data Set. (2020). Retrieved 2020-6-25 from https:\/\/www.kaggle.com\/saranchandar\/standard-classification-with-banana-dataset."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2009.2029559"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/C2010-0-67023-4"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360646"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2609424"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.apm.2011.11.053"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2009.2016060"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.207"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2013.12.003"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3291801.3291812"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOMAN.2018.8392662"}],"container-title":["ACM\/IMS Transactions on Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510834","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3510834","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:11Z","timestamp":1750186931000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510834"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,30]]},"references-count":50,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,11,30]]}},"alternative-id":["10.1145\/3510834"],"URL":"https:\/\/doi.org\/10.1145\/3510834","relation":{},"ISSN":["2691-1922"],"issn-type":[{"value":"2691-1922","type":"print"}],"subject":[],"published":{"date-parts":[[2021,11,30]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}