Penerapan Random Forest Dan Borderline SMOTE Untuk Prediksi Risiko Drop Out Mahasiswa

Authors

  • Christian Bautista Universitas Multi Data Palembang
  • Daniel Udjulawa Universitas Multi Data Palembang

DOI:

https://doi.org/10.31598/sintechjournal.v8i3.2000

Keywords:

drop out, prediction, random forest, borderline-SMOTE, machine learning

Abstract

Education plays a key role in developing qualified, knowledgeable, and competitive human resources. However, one of the main challenges in higher education is the phenomenon of student dropout, which can impact academic quality and institutional accreditation. This study aims to predict the risk of student dropout using the Random Forest algorithm optimized through Grid Search and data balancing with Borderline-SMOTE to address class imbalance. The dataset used comes from the Open University Learning Analytics Dataset (OULAD), which includes demographic, academic, and online learning activity data of students. The research stages include data cleaning, feature transformation and normalization, application of 5-Fold Cross Validation, and determination of optimal parameters (n_estimators and max_depth) using Grid Search. The evaluation results show that both models with and without Borderline-SMOTE have similar performance, with accuracies of 79.1% and 78.3%, respectively. Therefore, data balancing does not significantly improve model performance. Feature importance analysis reveals that the score attribute and total VLE clicks are the most influential factors on the risk of dropout. This model is expected to be used as an early warning system for universities in identifying at-risk students early on.

References

[1] N. T. H. Trinh, “Higher Education and Its Role for National Development. A Research Agenda with Bibliometric Analysis,” Interchange, vol. 54, no. 2, pp. 125–143, 2023, doi: 10.1007/s10780-023-09493-9.

[2] H. Tanuwijaya and M. S. Erstiawan, “Peningkatan Pengetahuan Pendidikan Tinggi Bagi Peserta Didik SMA Barunawati Surabaya,” Kontribusi J. Penelit. dan Pengabdi. Kpd. Masy., vol. 4, no. 2, pp. 287–302, 2024, doi: 10.53624/kontribusi.v4i2.374.

[3] E. Mulyani, E. Ismantohadi, and K. Koriah, “Sistem Prediksi Potensi Drop Out Mahasiswa Menggunakan Rule Based System Pada Jurusan Teknik Informatika Politeknik Negeri Indramayu,” J. Inform., vol. 8, no. 1, pp. 19–25, 2020, doi: 10.36987/informatika.v8i1.1473.

[4] Moesarofah, “Mengapa mahasiswa putus kuliah sebelum lulus ?,” Pros. Semin. Nas. Progr. Pascasarj. Univ. PGRI Palembang, pp. 52–55, 2021, [Online]. Available: https://jurnal.univpgri-palembang.ac.id/index.php/Prosidingpps/article/view/5472/4810

[5] A. S. Gustian and F. Mahardika, “Analisis Klasifikasi Risiko Dropout Mahasiswa Menggunakan Algoritma Decision Tree dan Random Forest,” Jupiter Publ. Ilmu Keteknikan Ind. Tek. Elektro dan Inform., vol. 3, no. 4, pp. 182–189, 2025, doi: 10.61132/jupiter.v3i4.980.

[6] T. A. Marzuqi, E. Kristiani, and Marcel, “Prediksi Mahasiswa Drop-Out Di Universitas XYZ,” J. Teknol. Inf. dan Ilmu Komput., vol. 11, no. 6, pp. 1345–1350, 2024, doi: 10.25126/jtiik.2024118689.

[7] R. B. Lubis, “Tingkat Drop Out Mahasiswa di Indonesia Kembali Turun pada 2022,” GoodStats.id. Accessed: Sep. 27, 2025. [Online]. Available: https://goodstats.id/article/tingkat-drop-out-mahasiswa-di-indonesia-kembali-turun-pada-2022-4gr2P?

[8] R. Syahranita and S. Zaman, “Regresi Logistik Multinomial untuk Prediksi Kategori Kelulusan Mahasiswa,” J. Inform. Sunan Kalijaga), vol. 8, no. 2, pp. 102–111, 2023.

[9] A. W. Abdullah and A. Muhid, “Social support, academic satisfaction, and student drop out tendency,” Psikoislamika J. Psikol. dan Psikol. Islam, vol. 18, no. 1, pp. 174–187, 2021, [Online]. Available: https://doi.org/10.18860/psi.v18i1.11546

[10] S. F. Puteri, B. S. Yulistiawan, and M. O. Pratama, “Identifikasi Dini Mahasiswa Berpotensi Drop-Out dengan Metode Machine Learning ( Studi Kasus : Universitas Pembangunan Nasional ‘ Veteran ’ Jakarta ),” 2022.

[11] N. Y. L. Gaol, “Prediksi Mahasiswa Berpotensi Non Aktif Menggunakan Data Mining dalam Decision Tree dan Algoritma C4.5,” J. Inf. Teknol., vol. 2, pp. 23–29, 2020, doi: 10.37034/jidt.v2i1.22.

[12] M. Imani and A. Beikmohammadi, “Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE , ADASYN , and GNUS Under Varying Imbalance Levels,” Techonologies, vol. 13, no. Ml, pp. 1–40, 2025.

[13] J. Dong and Q. Qian, “A Density-Based Random Forest for Imbalanced Data Classification,” Futur. Internet, vol. 14, p. 90, 2022.

[14] M. Altalhan, A. Algarni, and M. Turki-Hadj Alouane, “Imbalanced Data Problem in Machine Learning: A Review,” IEEE Access, vol. 13, pp. 13686–13699, 2025, doi: 10.1109/ACCESS.2025.3531662.

[15] S. Sidiq, Alfian, and N. S. Mabrur, “Pengembangan Model Prediksi Risiko Diabetes Menggunakan Pendekatan AdaBoost dan Teknik Oversampling SMOTE,” J. Ilm. Inform. dan Ilmu Komput., vol. 4, no. 1, pp. 13–23, 2025.

[16] R. Ridwan, E. H. Hermaliani, and M. Ernawati, “Penerapan: Penerapan Metode SMOTE Untuk Mengatasi Imbalanced Data Pada Klasifikasi Ujaran Kebencian,” Comput. Sci., vol. 4, no. 1, pp. 80–88, 2024, [Online]. Available: https://jurnal.bsi.ac.id/index.php/co-science/article/view/2990

[17] H. Andrianof, A. P. Gusman, and O. A. Putra, “Implementasi Algoritma Random Forest untuk Prediksi Kelulusan Mahasiswa Berdasarkan Data Akademik: Studi Kasus di Perguruan Tinggi Indonesia,” J. Sains Inform. Terap. E-ISSN 2828-1659, vol. 4, no. 1, pp. 24–28, 2024.

[18] E. P. Yudha, E. Purwanto, and J. Maulindar, “Diagnosis Penyakit Hepatitis Menggunakan Fuzzy K-NN dan Ensemble Learning,” pp. 677–682, 2023.

[19] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata J., vol. 20, no. 1, pp. 3–29, 2020, doi: 10.1177/1536867X20909688.

[20] N. H. Cahyana, Y. Fauziah, W. Wisnalmawati, A. S. Aribowo, and S. Saifullah, “The Evaluation of Effects of Oversampling and Word Embedding on Sentiment Analysis,” J. Infotel, vol. 17, no. 1, pp. 54–67, 2025, doi: 10.20895/infotel.v17i1.1077.

[21] V. Ren, E. Stella, C. Patruno, A. Capurso, G. Dimauro, and R. Maglietta, “applied sciences Learning Analytics : Analysis of Methods for Online Assessment,” Appl. Sci., vol. 12, no. 18, pp. 1–10, 2022.

[22] K. Jawad, M. A. Shah, and M. Tahir, “Students’ Academic Performance and Engagement Prediction in a Virtual Learning Environment Using Random Forest with Data Balancing,” Sustain., vol. 14, no. 22, 2022, doi: 10.3390/su142214795.

[23] M. Y. Putra and D. I. Putri, “Pemanfaatan Algoritma Naïve Bayes dan K-Nearest Neighbor Untuk Klasifikasi Jurusan Siswa Kelas XI,” J. Tekno Kompak, vol. 16, no. 2, pp. 176–176, 2022.

[24] Cumel, D. Zamri, Rahmaddeni, and Syamsurizal, “Perbandingan Metode Data Mining untuk Prediksi Banjir dengan Algoritma Naïve Bayes dan KNN,” SENTIMAS Semin. Nas. Penelit. dan …, pp. 40–48, 2022, [Online]. Available: https://journal.irpi.or.id/index.php/sentimas/article/view/353%0Ahttps://journal.irpi.or.id/index.php/sentimas/article/download/353/132

[25] M. Sholeh, D. Andayati, and Y. Rachmawati, “Data Mining Model Klasifikasi Menggunakan K-Nearest Neighbor With Normalization For Diabetes Prediction,” TeIka, vol. 12, no. 1, pp. 77–87, 2022.

[26] I. Binanto, N. F. Sianipar, F. Dea, M. N. Primadani, and T. W. Kartikasari, “Klasifikasi Senyawa Keladi Tikus Menggunakan Algoritma Knn, Gaussian NaãVe Bayes Dengan Menerapkan Imbalance Data Borderline Smote,” Pros. Sains Nas. dan Teknol., vol. 13, no. 1, pp. 377–383, 2023, doi: 10.36499/psnst.v13i1.9005.

[27] M. M. Rofi, F. A. Setiawan, and F. Riana, “Perbandingan Metode K-Nn Dan Random Forest Pada Klasifikasi Mahasiswa Berpotensi Dropout,” INFOTECH J., vol. 10, no. 1, pp. 84–89, 2024, doi: 10.31949/infotech.v10i1.8856.

[28] A. M. Shetty, M. F. Aljunid, D. H. Manjaiah, and A. M. S. Shaik Afzal, “Hyperparameter Optimization of Machine Learning Models Using Grid Search for Amazon Review Sentiment Analysis,” Lect. Notes Networks Syst., vol. 821, no. May, pp. 451–474, 2024, doi: 10.1007/978-981-99-7814-4_36.

[29] H. Yuliana, S. Basuki, M. R. Hidayat, and A. Charisma, “Hyperparameter Optimization of Random Forest Algorithm to Enhance Performance Metric Evaluation of 5G Coverage Prediction,” vol. 22, no. 1, pp. 75–90, 2024.

[30] D. El-Shahat, A. Tolba, M. Abouhawwash, and M. Abdel-Basset, “Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00991-w.

[31] G. Saranya and A. Pravin, “Grid Search based Optimum Feature Selection by Tuning hyperparameters for Heart Disease Diagnosis in Machine learning,” Open Biomed. Eng. J., vol. 17, no. 1, pp. 1–13, 2024, doi: 10.2174/18741207-v17-e230510-2022-ht28-4371-8.

[32] M. Maulidah, Windu Gata, Rizki Aulianita, and Cucu Ika Agustyaningrum, “Algoritma Klasifikasi Decision Tree Untuk Rekomendasi Buku Berdasarkan Kategori Buku,” E-Bisnis J. Ilm. Ekon. dan Bisnis, vol. 13, no. 2, pp. 89–96, 2020, doi: 10.51903/e-bisnis.v13i2.251.

Downloads

Published

2025-12-31

How to Cite

Bautista, C., & Udjulawa, D. (2025). Penerapan Random Forest Dan Borderline SMOTE Untuk Prediksi Risiko Drop Out Mahasiswa. SINTECH (Science and Information Technology) Journal, 8(3), 200–210. https://doi.org/10.31598/sintechjournal.v8i3.2000

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.