Kombinasi Oversampling dan Undersampling dalam Menangani Class Imbalanced dan Overlapping pada Klasifikasi Data Bank Marketing
DOI:
https://doi.org/10.31598/jurnalresistor.v7i1.1515Keywords:
bank marketing, class imbalance, class overlapping, oversampling, undersamplingAbstract
Class imbalance can occur in various types of datasets, one of which is bank marketing datasets. The class imbalance can cause classification problems. To handle the problem, the SMOTE method can be used. However, the application of SMOTE can cause class overlapping and interfere with classification performance. Therefore, this research will try to handle it by combining the SMOTE method with undersampling methods consisting of ENN, NCL, and TomekLink. The classification algorithm used is Logistic Regression and the performance evaluation uses sensitivity, specificity, and g-means of the model. The results show that the SMOTE-ENN combination produces the most optimal results with sensitivity, specificity, and g-means of 94.05%, 83.22%, and 88.47% respectively on bank marketing datasets, while on credit card fraud datasets it has almost uniform results with sensitivity, specificity, and g-means ranging from 88.62%, 97.59%, and 93.00%. Finally, on cerebral stroke datasets, SMOTE-ENN produces the highest sensitivity at 80.1%, the highest specificity on SMOTE-NCL at 75.62%, and the highest g-means on SMOTE at 77.03%.
Downloads
References
I. Tahyudin, Pengenalan Machine Learning Menggunakan Jupyter Notebook. in Mechine Learning. Zahira Media Publisher, 2020. [Online]. Available: https://books.google.co.id/books?id=_uMREAAAQBAJ
W. Ustyannie and S. Suprapto, “Oversampling Method to Handling Imbalanced Datasets Problem in Binary Logistic Regression Algorithm,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 14, no. 1, p. 1, Jan. 2020, doi: 10.22146/ijccs.37415.
S. Mutmainah, “Penanganan Imabalance Data pada Klasifikasi Kemungkinan Penyakit Stroke,” Yogyakarta, 2021. [Online]. Available: https://library.uii.ac.id/osr
T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information (Switzerland), vol. 14, no. 1, Jan. 2023, doi: 10.3390/info14010054.
C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” Teknik Informatika dan Rekayasa Komputer, vol. 22, no. 2, pp. 227–238, 2023, doi: 10.30812/matrik.v22i2.2515.
P. Kaur and A. Gosain, “Comparing The Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise,” in Advances in Intelligent Systems and Computing, Springer Verlag, 2018, pp. 23–30. doi: 10.1007/978-981-10-6602-3_3.
A. Guzmán-Ponce, R. M. Valdovinos, J. S. Sánchez, and J. R. Marcial-Romero, “A New Under-Sampling Method to Face Class Overlap and Imbalance,” Applied Sciences (Switzerland), vol. 10, no. 15, Aug. 2020, doi: 10.3390/app10155164.
H. Guo, X. Diao, and H. Liu, “Embedding undersampling rotation forest for imbalanced problem,” Comput Intell Neurosci, vol. 2018, 2018, doi: 10.1155/2018/6798042.
H. Cai, S. Shen, Q. Lin, X. Li, and H. Xiao, “Predicting the Energy Consumption of Residential Buildings for Regional Electricity Supply-Side and Demand-Side Management,” IEEE Access, vol. 7, pp. 30386–30397, 2019, doi: 10.1109/ACCESS.2019.2901257.
S. Suparyati, Emma Utami, and Alva Hendi Muhammad, “Applying Different Resampling Strategies In Random Forest Algorithm To Predict Lumpy Skin Disease,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 555–562, Aug. 2022, doi: 10.29207/resti.v6i4.4147.
N. P. Y. T. Wijayanti, E. N. Kencana, and I. W. Sumarjaya, “SMOTE: Potensi dan Kekurangannya Pada Survei,” E-Jurnal Matematika, vol. 10, no. 4, p. 235, Nov. 2021, doi: 10.24843/mtk.2021.v10.i04.p348.
N. Santoso, W. Wibowo, and H. Himawati, “Integration of Synthetic Minority Oversampling Technique for Imbalanced Class,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 13, no. 1, pp. 102–108, Jan. 2019, doi: 10.11591/ijeecs.v13.i1.pp102-108.
M. Bach, A. Werner, and M. Palt, “The proposal of undersampling method for learning from imbalanced datasets,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 125–134. doi: 10.1016/j.procs.2019.09.167.
Z. Xu, D. Shen, T. Nie, and Y. Kou, “A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data,” J Biomed Inform, vol. 107, Jul. 2020, doi: 10.1016/j.jbi.2020.103465.
H. Wang and X. Liu, “Undersampling bankruptcy prediction: Taiwan bankruptcy data,” PLoS One, vol. 16, no. 7 July, Jul. 2021, doi: 10.1371/journal.pone.0254030.
S. Sawangarreerak and P. Thanathamathee, “Random forest with sampling techniques for handling imbalanced prediction of university student depression,” Information (Switzerland), vol. 11, no. 11, pp. 1–13, Nov. 2020, doi: 10.3390/info11110519.
F. S. Pamungkas, B. D. Prasetya, and I. Kharisudin, “Perbandingan Metode Klasifikasi Supervised Learning pada Data Bank Customers Menggunakan Python,” PRISMA, Prosiding Seminar Nasional Matematika, vol. 3, pp. 689–694, 2019, [Online]. Available: https://journal.unnes.ac.id/sju/index.php/prisma/
N. A. Firdausanti, R. A. Ningrum, and S. Qomariyah, “Comparisons of Logistic Regression and Support Vector Machines in Classification of Echocardiogram Dataset,” Inferensi, vol. 5, no. 2, p. 85, Sep. 2022, doi: 10.12962/j27213862.v5i2.14121.
Y. Li, N. Adams, and T. Bellotti, “A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression,” Journal of Computational and Graphical Statistics, vol. 31, no. 1, pp. 241–253, 2022, doi: 10.1080/10618600.2021.1978470.
V. Sridhar, M. C. Padma, and K. A. R. Rao, Emerging Research in Electronics, Computer Science and Technology: Proceedings of International Conference, ICERECT 2018. in Lecture Notes in Electrical Engineering. Springer Nature Singapore, 2019. [Online]. Available: https://books.google.co.id/books?id=eXWUDwAAQBAJ
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Anak Agung Gde Wahyu Sukma Erlangga, I Gede Aris Gunadi, I Made Gede Sunarya
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright in each article belongs to the author.
- The authors admit that RESISTOR Journal as a publisher who published the first time under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
- Authors can include writing separately, regulate distribution of non-ekskulif of manuscripts that have been published in this journal into another version (eg sent to respository institution author, publication into a book, etc.), by recognizing that the manuscripts have been published for the first time in RESISTOR Journal