Agregasi Peringkat Berdasarkan Feature Filter Rangking Dalam Cross-Project Software Defects

Rudy Herteno; Mohammad Reza  Faisal; Radityo Adi  Nugroho; Friska  Abadi; Setyo Wahyu  Saputro

doi:10.31598/sintechjournal.v8i1.1763

Authors

Rudy Herteno Lambung Mangkurat University
Mohammad Reza Faisal Lambung Mangkurat University
Radityo Adi Nugroho Lambung Mangkurat University
Friska Abadi Lambung Mangkurat University
Setyo Wahyu Saputro Lambung Mangkurat University

DOI:

https://doi.org/10.31598/sintechjournal.v8i1.1763

Keywords:

Cross-Project Defect Prediction, Feature Filter Ranking, Feature Selection, Machine Learning, Software Development, Software Defect Prediction

Abstract

Software defects are a significant challenge in software engineering, as they can cause fatal damage if detected during system execution. This research focuses on Cross-Project Defect Prediction (CPDP), a methodology that utilizes historical data from different projects to improve defect prediction for the target project. However, CPDP is often constrained by data distribution mismatch and irrelevant high-dimensional features. To overcome this, we propose a novel approach with Feature Filter Ranking to reduce the dimensionality and overcome the imbalanced data, combined with Borda aggregation and classification algorithms KNN, Random Forest, Decision Tree, Logistic Regression, SVM, and Gardient Boosting. Experimental results show that the combination of 5 features on the NASA MDP dataset, 15 features on PROMISE, and 5 features on RELINK provides optimal performance. NASA MDP with KNN produces AUC 0.6600, PROMISE with KNN produces AUC 0.7000, and RELINK with KNN produces AUC 0.7167 for RELINK. From the average of all classification algorithms, it proves that KNN is more effective in improving the performance of software defect identification when viewed from the AUC. These results confirm that the integration of methods using CPDP with Feature Filter Ranking, Synthetic Data Vault, and Borda Aggregation helps to overcome the problem of data dimensionality and class imbalance, thus improving the process of predicting software defects.

References

A. Munde, “Chapter 3 - An empirical validation for predicting bugs and the release time of open source software using entropy measures—Software reliability growth models,” in Emerging Methodologies and Applications in Modelling, P. Johri, A. Anand, J. Vain, J. Singh, and M. B. T.-S. A. Quasim, Eds., Academic Press, 2022, pp. 41–49. doi: https://doi.org/10.1016/B978-0-323-90240-3.00003-5.

P. K. Singh, D. Agarwal, and A. Gupta, “A systematic review on software defect prediction,” 2015 International Conference on Computing for Sustainable Global Development, INDIACom 2015, pp. 1793–1797, 2015.

J. Bai, J. Jia, and L. F. Capretz, “A three-stage transfer learning framework for multi-source cross-project software defect prediction,” Inf Softw Technol, vol. 150, Oct. 2022, doi: 10.1016/j.infsof.2022.106985.

A. B. Nassif et al., “Software defect prediction using learning to rank approach,” Sci Rep, vol. 13, no. 1, p. 18885, 2023, doi: 10.1038/s41598-023-45915-5.

A. Syifa Hermiati, R. Herteno, F. Indriani, and T. Hamonangan Saragih, “Comparative Study: Application of Principal Component Analysis and Recursive Feature Elimination in Machine Learning for Stroke Prediction,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 231–242, 2024, doi: 10.35882/jeeemi.v6i3.446.

A. B. Nasser et al., “Depth linear discrimination-oriented feature selection method based on adaptive sine cosine algorithm for software defect prediction,” Expert Syst Appl, vol. 253, Nov. 2024, doi: 10.1016/j.eswa.2024.124266.

A. Abdu, Z. Zhai, R. Algabri, H. A. Abdo, K. Hamad, and M. A. Al-antari, “Deep Learning-Based Software Defect Prediction via Semantic Key Features of Source Code—Systematic Survey,” Mathematics, vol. 10, no. 17, Sep. 2022, doi: 10.3390/math10173120.

S. Zheng, J. Gai, H. Yu, H. Zou, and S. Gao, “Training data selection for imbalanced cross-project defect prediction,” Computers and Electrical Engineering, vol. 94, Sep. 2021, doi: 10.1016/j.compeleceng.2021.107370.

A. Abdu, Z. Zhai, H. A. Abdo, R. Algabri, and S. Lee, “Graph-Based Feature Learning for Cross-Project Software Defect Prediction,” Computers, Materials and Continua, vol. 77, no. 1, pp. 161–180, 2023, doi: 10.32604/cmc.2023.043680.

A. Saifudin and Y. Yulianti, “Dimensional Reduction on Cross Project Defect Prediction,” J Phys Conf Ser, vol. 1477, no. 3, p. 32011, 2020, doi: 10.1088/1742-6596/1477/3/032011.

B. Khan et al., “Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques,” J Healthc Eng, vol. 2021, 2021, doi: 10.1155/2021/8899263.

M. Y. A. Pratama, R. Herteno, M. R. Faisal, R. A. Nugroho, and F. Abadi, “Improving with Hybrid Feature Selection in Software Defect Prediction,” Jurnal Online Informatika, vol. 9, no. 1, pp. 52–60, Apr. 2024, doi: 10.15575/join.v9i1.1307.

R. B. Bahaweres, E. D. H. Jana, I. Hermadi, A. I. Suroso, and Y. Arkeman, “Handling High-Dimensionality on Software Defect Prediction with FLDA,” in Proceedings of 2nd 2021 International Conference on Smart Cities, Automation and Intelligent Computing Systems, ICON-SONICS 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 76–81. doi: 10.1109/ICON-SONICS53103.2021.9616999.

Y. Khatri and S. K. Singh, “An effective feature selection based cross-project defect prediction model for software quality improvement,” International Journal of System Assurance Engineering and Management, vol. 14, no. 1, pp. 154–172, 2023, doi: 10.1007/s13198-022-01831-x.

R. Malhotra and S. Meena, “Empirical validation of feature selection techniques for cross-project defect prediction,” International Journal of System Assurance Engineering and Management, 2023, doi: 10.1007/s13198-023-02051-7.

Y. Z. Bala, P. A. Samat, K. Y. Sharif, and N. Manshor, “Improving Cross-Project Software Defect Prediction Method Through Transformation and Feature Selection Approach,” IEEE Access, vol. 11, pp. 2318–2326, 2023, doi: 10.1109/ACCESS.2022.3231456.

H. Luo, H. Dai, W. Peng, W. Hu, and F. Li, “An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction,” 2021. doi: 10.3390/s21227535.

A. Saifudin, A. Trisetyarso, W. Suparta, C. H. Kang, B. S. Abbas, and Y. Heryadi, “Feature Selection in Cross-Project Software Defect Prediction,” J Phys Conf Ser, vol. 1569, no. 2, p. 22001, 2020, doi: 10.1088/1742-6596/1569/2/022001.

P. Nabella, R. Herteno, S. W. Saputro, M. R. Faisal, and F. Abadi, “Impact of a Synthetic Data Vault for Imbalanced Class in Cross-Project Defect Prediction,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 219–230, Apr. 2024, doi: 10.35882/jeeemi.v6i2.409.

T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 2–13, 2007, doi: 10.1109/TSE.2007.256941.

M. Jureczko and L. Madeyski, “Towards identifying software project clusters with regard to defect prediction,” in Proceedings of the 6th International Conference on Predictive Models in Software Engineering, in PROMISE ’10. New York, NY, USA: Association for Computing Machinery, 2010. doi: 10.1145/1868328.1868342.

R. Wu, H. Zhang, S. Kim, and S. C. Cheung, “RELINK : Recovering Links Between Bugs and Changes,” in Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 2011, pp. 15–25.

W. BinSaeedan and S. Alramlawi, “CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis,” Knowl Based Syst, vol. 227, Sep. 2021, doi: 10.1016/j.knosys.2021.107224.

V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos, “Feature selection for high-dimensional data,” Progress in Artificial Intelligence, vol. 5, no. 2, pp. 65–75, May 2016, doi: 10.1007/s13748-015-0080-y.

A. Rizka, S. Efendi, and P. Sirait, “Gain ratio in weighting attributes on simple additive weighting,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Oct. 2018. doi: 10.1088/1757-899X/420/1/012099.

P. Bathla and R. Kumar, “A hybrid system to predict brain stroke using a combined feature selection and classifier,” Intelligent Medicine, Aug. 2023, doi: 10.1016/j.imed.2023.06.002.

J. Linja, J. Hämäläinen, P. Nieminen, and T. Kärkkäinen, “Feature selection for distance-based regression: An umbrella review and a one-shot wrapper,” Neurocomputing, vol. 518, pp. 344–359, Jan. 2023, doi: 10.1016/j.neucom.2022.11.023.

N. García-Pedrajas and G. Cerruela-García, “MABUSE: A margin optimization based feature subset selection algorithm using boosting principles,” Knowl Based Syst, vol. 253, Oct. 2022, doi: 10.1016/j.knosys.2022.109529.

B. Sen Peng, H. Xia, Y. K. Liu, B. Yang, D. Guo, and S. M. Zhu, “Research on intelligent fault diagnosis method for nuclear power plant based on correlation analysis and deep belief network,” Progress in Nuclear Energy, vol. 108, pp. 419–427, Sep. 2018, doi: 10.1016/j.pnucene.2018.06.003.

M. G. A. Nassef, T. M. Hussein, and O. Mokhiamar, “An adaptive variational mode decomposition based on sailfish optimization algorithm and Gini index for fault identification in rolling bearings,” Measurement (Lond), vol. 173, Mar. 2021, doi: 10.1016/j.measurement.2020.108514.

R. J. Urbanowicz, M. Meeker, W. La Cava, R. S. Olson, and J. H. Moore, “Relief-based feature selection: Introduction and review,” Sep. 01, 2018, Academic Press Inc. doi: 10.1016/j.jbi.2018.07.014.

B. Tang and L. Zhang, “Local preserving logistic I-Relief for semi-supervised feature selection,” Neurocomputing, vol. 399, pp. 48–64, Jul. 2020, doi: 10.1016/j.neucom.2020.02.098.

C. Z. Radulescu, M. Radulescu, and R. Boncea, “A Hybrid Group Weighting Method based on the Borda and the Group Best Worst Method with application for digital development indicators,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 10–17. doi: 10.1016/j.procs.2022.11.142.

Q. Liu, Y. Jing, Y. Yan, and Y. Li, “Mean-based Borda count for paradox-free comparisons of optimization algorithms,” Inf Sci (N Y), vol. 660, Mar. 2024.

C. Lamboray, “A comparison between the prudent order and the ranking obtained with Borda’s, Copeland’s, Slater’s and Kemeny’s rules,” Math Soc Sci, vol. 54, no. 1, pp. 1–16, Jul. 2007, doi: 10.1016/j.mathsocsci.2007.04.004.

R. Malhotra, R. Kapoor, P. Saxena, and P. Sharma, “SAGA: A Hybrid Technique to handle Imbalance Data in Software Defect Prediction,” in ISCAIE 2021 - IEEE 11th Symposium on Computer Applications and Industrial Electronics, Institute of Electrical and Electronics Engineers Inc., Apr. 2021, pp. 331–336. doi: 10.1109/ISCAIE51753.2021.9431842.

N. S. Mohamed et al., “Impact factors of orthopaedic journals between 2010 and 2016: trends and comparisons with other surgical specialties,” Ann Transl Med, vol. 6, no. 7, pp. 114–114, Apr. 2018, doi: 10.21037/atm.2018.03.02.

Agregasi Peringkat Berdasarkan Feature Filter Rangking Dalam Cross-Project Software Defects

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Menu

Template

Tools

RJI

Stats

Indexer

Submission

Acreditation

INDEXWIDGET

Information